{"id":1227,"date":"2012-07-05T18:00:18","date_gmt":"2012-07-05T16:00:18","guid":{"rendered":"http:\/\/doanduyhai.wordpress.com\/?p=1227"},"modified":"2012-07-05T18:00:18","modified_gmt":"2012-07-05T16:00:18","slug":"apache-cassandra-tricks-and-traps","status":"publish","type":"post","link":"https:\/\/www.doanduyhai.com\/blog\/?p=1227","title":{"rendered":"Apache Cassandra tricks and traps"},"content":{"rendered":"<p>This post is a start of a new topic on NoSQL\/Big Data<\/p>\n<p>Today I will give some tricks I found out with Apache <strong>Cassandra<\/strong> during my work on <a title=\"Tatami\" href=\"https:\/\/github.com\/doanduyhai\/tatami\" target=\"_blank\">Tatami<\/a> and point out some gotchas to avoid.<\/p>\n<p>Those who are not familiar with the Cassandra data model, you can have a look there:<\/p>\n<ul>\n<li><a href=\"http:\/\/www.slideshare.net\/ebenhewitt\/cassandra-datamodel-4985524\" title=\"http:\/\/www.slideshare.net\/ebenhewitt\/cassandra-datamodel-4985524\" target=\"_blank\">http:\/\/www.slideshare.net\/ebenhewitt\/cassandra-datamodel-4985524<\/a><\/li>\n<li><a href=\"http:\/\/www.datastax.com\/docs\/0.8\/ddl\/index\" title=\"http:\/\/www.datastax.com\/docs\/0.8\/ddl\/index\" target=\"_blank\">http:\/\/www.datastax.com\/docs\/0.8\/ddl\/index<\/a><\/li>\n<\/ul>\n<p><!--more--><\/p>\n<h1>I Tricks<\/h1>\n<p> First we start with good things, the tricks that will help you be more efficient with <strong>Cassandra<\/strong><br \/>\n&nbsp;<\/p>\n<h3>A Simple text-based searching<\/h3>\n<p> The first disappointment when people discover Cassandra is the lack of support for searching. We are so used to the SQL &#8220;<strong>like %xxx%<\/strong>&#8221; syntax that we take it for granted.<\/p>\n<p> With <strong>Cassandra<\/strong>, they is no such syntax for text search. So are we done ?<\/p>\n<p> Of course not. There is a small trick that rely on the clever use of the column sort provided natively by Cassandra.<\/p>\n<p> Since we know that all columns are sorted by their type (string, number, date, UUID &#8230;) we can take avantage of this sorting for text search.<\/p>\n<p> Let&#8217;s suppose we have a list of user logins and we put them in a <strong>wide row<\/strong> as string-ordered column name:<\/p>\n<ul>\n<li>jduboi\n\t<\/li>\n<li>jdubois\n\t<\/li>\n<li>jduboiseaux\n\t<\/li>\n<li>jduboisier\n        <\/li>\n<li>jduboit\n\t<\/li>\n<li>jduboita\n\t<\/li>\n<li>xxx\n<\/li>\n<\/ul>\n<p> Naturally, <strong>Cassandra<\/strong> will use lexicographic ordering for these column names.<\/p>\n<p> Now we want to search for <strong>all logins starting with &#8220;jdubois&#8221;<\/strong>. With a quick <strong>SliceQuery<\/strong>, setting <strong>start<\/strong> = &#8220;<strong>jdubois<\/strong>&#8220;, <strong>end<\/strong> = <strong>&#8220;&#8221;<\/strong>, we will get:<\/p>\n<ul>\n<li>jdubois\n\t<\/li>\n<li>jduboiseaux\n\t<\/li>\n<li>jduboisier\n        <\/li>\n<li>jduboit\n\t<\/li>\n<li>jduboita\n\t<\/li>\n<li>xxx\n<\/li>\n<\/ul>\n<p> That&#8217;s not exactly what we want. Indeed <strong>Cassandra<\/strong> filters out the first login (&#8220;<strong>jduboi<\/strong>&#8220;) because it is sorted lexicograpically before the start limit &#8220;<strong>jdubois<\/strong>&#8220;.Since the end limit was set to empty string, <strong>Cassandra<\/strong> will take all entries following the first exact match (if there is a match). <strong>If there is no exact match, Cassandra still takes all logins that follow lexicograpically the start limit<\/strong>.<\/p>\n<p>If we have set  <strong>start<\/strong> = &#8220;<strong>jduboisa<\/strong>&#8220;, <strong>end<\/strong> = <strong>&#8220;&#8221;<\/strong>, we would have got:<\/p>\n<ul>\n<li><strong>jduboiseaux<\/strong> : because it is the first login that follow &#8220;<strong>jduboisa<\/strong>&#8221; in lexicographic order\n\t<\/li>\n<li>jduboisier\n        <\/li>\n<li>jduboit\n\t<\/li>\n<li>jduboita\n\t<\/li>\n<li>xxx\n<\/li>\n<\/ul>\n<p>Now if we want to implement the &#8220;<strong>start with jdubois<\/strong>&#8221; search semantic, we need find the way to make <strong>Cassandra<\/strong> stop scanning if there is no partial match. <strong>The trick is to set the end limit as the start limit, with the last letter being advanced one notch<\/strong>.<\/p>\n<p>For <strong>start<\/strong> = &#8220;<strong>jdubois<\/strong>&#8220;, <strong>end<\/strong> = <strong>&#8220;jduboit&#8221;<\/strong>, we will get:<\/p>\n<ul>\n<li>jdubois\n\t<\/li>\n<li>jduboiseaux\n\t<\/li>\n<li>jduboisier\n        <\/li>\n<li>jduboit : to be filterd out\n<\/li>\n<\/ul>\n<p> This time, the &#8220;contains&#8221; sematic is respected. Of course we need to get rid of the last entry &#8220;<strong>jduboit<\/strong>&#8221; because it does not match our input. <strong>Cassandra<\/strong> selected it because all the SliceQuery range limits are included in the search by default (it&#8217;s not possible to exclude the limit bounds currently).<\/p>\n<p> This technique only address the requirement for &#8220;<strong>startWith<\/strong>&#8221; search semantic. For &#8220;<strong>endWith<\/strong>&#8221; semantic, you can reverse all logins and proceed similarly.<\/p>\n<p> For &#8220;<strong>contains<\/strong>&#8221; search semantic I did not find so far any solution apart from in-memory filtering&#8230; But honestly if you have complex search requirements it&#8217;d be better to look at dedicated solutions like <a href=\"http:\/\/www.elasticsearch.org\/\" title=\"ElasticSearch\" target=\"_blank\">ElasticSearch<\/a>.<\/p>\n<p>&nbsp;<\/p>\n<h3>B Lexicographic TimeUUID ordering<\/h3>\n<p>Cassandra provides, among all the primitive types, support for UUID values of type 1 (time and server based) and type 4 (random).<\/p>\n<p>The primary use of <strong>UUID<\/strong> (<strong>U<\/strong>nique <strong>U<\/strong>niversal <strong>ID<\/strong>entifier) is to obtain a really unique identifier in a potentially distributed environment.<\/p>\n<p>One naive idea when working with unique identifier is to rely on <strong>System<\/strong>.<em>currentTimeMillis()<\/em>. It should work fine for most &#8220;basic&#8221; applications but for the use case of Big Data where we need to deal with thousands or millions of objects per second, it is clearly not sufficient. In one millisecond your server may create more than one message. <strong>System<\/strong>.<em>currentTimeMillis()<\/em> is not fine-grain enough to guarantee unicity.<\/p>\n<p>The JDK also provides <strong>System<\/strong>.<em>nanoTime()<\/em> but the Javadocs clearly state that it should be used only to compute duration, not as a time reference.<\/p>\n<p>So we end up with the good old UUID.<\/p>\n<p><strong>Cassandra<\/strong> does support version 1 UUID. It gives you an unique identifier by combining the computer\u2019s MAC address and the number of 100-nanosecond intervals since the beginning of the Gregorian calendar.<\/p>\n<p>As you can see the precision is only 100 nanoseconds, but fortunately it is mixed with a <strong>clock sequence<\/strong> to add randomness. Furthermore the MAC address is also used to compute the UUID so it&#8217;s very unlikely that you face collision on one cluster of machine, unless you need to process a really really huge volume of data (don&#8217;t forget, not everyone is Twitter or Facebook).<\/p>\n<p>One of the most relevant use case for UUID, and espcecially TimeUUID, is to use it as <strong>column key<\/strong>. Since Cassandra column keys are sorted, we can take advantage of this feature to have a <strong>natural ordering<\/strong> for our column families.<\/p>\n<p>The problem with the default <strong>com.eaio.uuid.UUID<\/strong> provided by the Hector client is that it&#8217;s not easy to work with. As an ID you may need to bring this value from the server up to the view layer, and that&#8217;s the gotcha.<\/p>\n<p>Basically, <strong>com.eaio.uuid.UUID<\/strong> overrides the <em>toString()<\/em> to gives a String representation of the UUID. However this String formatting cannot be sorted lexicographically&#8230;<\/p>\n<p>Below are some TimeUUID generated consecutively:<\/p>\n<ul>\n<li>8e4cab00-c481-11e1-983b-20cf309ff6dc at some <strong>t1<\/strong><\/li>\n<li>2b6e3160-c482-11e1-addf-20cf309ff6dc at some <strong>t2<\/strong> with <strong>t2 &gt; t1<\/strong><\/li>\n<\/ul>\n<p><strong>&#8220;2b6e3160-c482-11e1-addf-20cf309ff6dc&#8221;.compareTo(&#8220;8e4cab00-c481-11e1-983b-20cf309ff6dc&#8221;) gives -6<\/strong> meaning that &#8220;<strong>2b6e3160-c482-11e1-addf-20cf309ff6dc<\/strong>&#8221; is less\/before &#8220;<strong>8e4cab00-c481-11e1-983b-20cf309ff6dc<\/strong>&#8221; which is incorrect.<\/p>\n<p> The current textual display of TimeUUID is split as follow:<\/p>\n<p><strong>time_low &#8211; time_mid &#8211; time_high_and_version &#8211; variant_and_sequence &#8211; node<\/strong><\/p>\n<p>If we re-order it starting with time_high_and_version, we can then sort it lexicographically:<\/p>\n<p><strong>time_high_and_version &#8211; time_mid &#8211; time_low &#8211; variant_and_sequence &#8211; node<\/strong><\/p>\n<p>The utility class is given below:<\/p>\n<pre class=\"brush: java; highlight: [12]; title: ; notranslate\" title=\"\">\npublic static String reorderTimeUUId(String originalTimeUUID)\n\t{\n\t\tStringTokenizer tokens = new StringTokenizer(originalTimeUUID, &amp;quot;-&amp;quot;);\n\t\tif (tokens.countTokens() == 5)\n\t\t{\n\t\t\tString time_low = tokens.nextToken();\n\t\t\tString time_mid = tokens.nextToken();\n\t\t\tString time_high_and_version = tokens.nextToken();\n\t\t\tString variant_and_sequence = tokens.nextToken();\n\t\t\tString node = tokens.nextToken();\n\n\t\t\treturn time_high_and_version + '-' + time_mid + '-' + time_low + '-' + variant_and_sequence + '-' + node;\n\n\t\t}\n\n\t\treturn originalTimeUUID;\n\t}\n<\/pre>\n<p>The TimeUUIDs become:<\/p>\n<ul>\n<li>11e1-c481-8e4cab00-983b-20cf309ff6dc<\/li>\n<li>11e1-c482-2b6e3160-addf-20cf309ff6dc<\/li>\n<\/ul>\n<p>Now we get &#8220;<strong>11e1-c481-8e4cab00-983b-20cf309ff6dc<\/strong>&#8220;.compareTo(&#8220;<strong>11e1-c482-2b6e3160-addf-20cf309ff6dc<\/strong>&#8220;) = -1<\/p>\n<p>&nbsp;<\/p>\n<h3>C Paging data with sorted column name<\/h3>\n<p>If you are using Cassandra, it means that your application is dealing with a huge volume of data. If it&#8217;s not then there is something wrong with your data model  and you probably should look at conventional SQL databases instead of NoSQL.<\/p>\n<p> With high volume comes the need to page data. For this purpose, the <strong>SliceQuery<\/strong> comes to the rescue again!<\/p>\n<p> Let&#8217;s say that your mail application manages a list of messages in the inbox. The messages are reverse-ordered with respect to their reception time. <\/p>\n<p> The inbox will only display the latest N messages. Upon user click on &#8220;Next&#8221; it will fetch the following N messages and so on.<\/p>\n<p> To build a paging system, first we should create an index for the message ID. It is merely a column family with column name of type <strong>TimeUUID <\/strong>(see above) for time ordering and column value pointing to a message ID.<\/p>\n<p> To fetch the first N messages, just use <strong>SliceQuery<\/strong> with:<\/p>\n<ul>\n<li>start = null<\/li>\n<li>end = null<\/li>\n<li><strong>reverse = true<\/strong><\/li>\n<p> for descending sort order. Latest messages first.<\/p>\n<li>limit = N<\/li>\n<\/ul>\n<p>Once to get the list of first N messages, it should be pretty easy to get the message ID of the last element in the list. To fetch the next N messages:<\/p>\n<ul>\n<li><strong>start = last messageID of previous list<\/strong><\/li>\n<li>end = null<\/li>\n<li>reverse = true<\/li>\n<li>limit = N<\/li>\n<\/ul>\n<p> And that&#8217;s all !<\/p>\n<p>&nbsp;<\/p>\n<h1>II Traps<\/h1>\n<p> And now the traps that you should absolutely avoid if you don&#8217;t want to suffer endless hours of debugging \ud83d\ude42<\/p>\n<p>&nbsp;<\/p>\n<h3>A The real semantic of composite column key<\/h3>\n<p> In the latest version of <strong>Cassandra<\/strong> <strong>composite column key<\/strong> has been added. What is it ?<\/p>\n<p> Basically instead of having a simple column key (column name) of one type, you can aggregate several values (also called <strong>components<\/strong>) of several types to form one unique column key.<\/p>\n<p> Examples: <strong>login:location<\/strong>,  <strong>surname:firstname:age<\/strong> &#8230;<\/p>\n<p> In the earlier days, people used to concatenate their fields a string value with a arbitrary chosen separator to form a kind of composite key. But this approach is rather limited because the sort order is lexicographical (string). It works well when all the component are of string type because concatenating them does not modify the sort order. If the component are of different types, you&#8217;re screwed.<\/p>\n<p> Now with the composite type, it is possible to mix components of different types!<\/p>\n<p> Cool right ? <\/p>\n<p> Not really indeed &#8230;<\/p>\n<p> Let&#8217;s look at a simple example. We have a list of users and want to find them by their login, age or city. Usually we would have to create 3 indexes, one to index the login, one for the age and one for the city. Now the naive developer will create a composite column key as <strong>login : age : city<\/strong>.<\/p>\n<p> Let&#8217;s consider the following data set:<\/p>\n<ul>\n<li>alice:27:New York<\/li>\n<li>bob:32:New York<\/li>\n<li>bob:35:Seattle<\/li>\n<li>boby:25:Atlanta<\/li>\n<li>jack:27:Los Angeles<\/li>\n<\/ul>\n<p>Now let&#8217;s perform a SliceQuery with composite type. If we want to retrieve all people whose login is exactly &#8220;bob&#8221;:<\/p>\n<pre class=\"brush: java; highlight: [5]; title: ; notranslate\" title=\"\">\nComposite start = new Composite();\nstart.addComponent(0, &amp;quot;bob&amp;quot;, Composite.ComponentEquality.EQUAL);\n\nComposite end = new Composite();\nend.addComponent(0, &amp;quot;bob&amp;quot;, Composite.ComponentEquality.GREATER_THAN_EQUAL);\n\nList&amp;lt;HColumn&amp;lt;Composite, Object&amp;gt;&amp;gt; columns = HFactory.createSliceQuery(keyspace, se, ce, oe)\n\t.setColumnFamily(&amp;quot;composite&amp;quot;).setKey(&amp;quot;test&amp;quot;)\n\t.setRange(start, end, false, 100).execute().get().getColumns();\n<\/pre>\n<p>The result is:<\/p>\n<ul>\n<li>bob:32:New York<\/li>\n<li>bob:35:Seattle<\/li>\n<\/ul>\n<p> Please notice the <strong>line 5<\/strong> in the above code. Intuitively we would set  <strong>Composite.ComponentEquality.EQUAL<\/strong> for the end value but paradoxically Cassandra will return no result if set to <strong>EQUAL<\/strong>. So I set it to <strong>GREATER_THAN_EQUAL<\/strong>.<\/p>\n<p> Now what if we want to get all users whose <strong>login start with &#8220;bob&#8221; <\/strong>?<\/p>\n<pre class=\"brush: java; highlight: [5]; title: ; notranslate\" title=\"\">\nComposite start = new Composite();\nstart.addComponent(0, &amp;quot;bob&amp;quot;, Composite.ComponentEquality.EQUAL);\n\nComposite end = new Composite();\nend.addComponent(0, &amp;quot;boc&amp;quot;, Composite.ComponentEquality.LESS_THAN_EQUAL);\n\nList&amp;lt;HColumn&amp;lt;Composite, Object&amp;gt;&amp;gt; columns = HFactory.createSliceQuery(keyspace, se, ce, oe)\n\t.setColumnFamily(&amp;quot;composite&amp;quot;).setKey(&amp;quot;test&amp;quot;)\n\t.setRange(start, end, false, 100).execute().get().getColumns();\n<\/pre>\n<p>The result is:<\/p>\n<ul>\n<li>bob:32:New York<\/li>\n<li>bob:35:Seattle<\/li>\n<li><strong>boby:25:Atlanta<\/strong><\/li>\n<\/ul>\n<p> Please notice the trick at <strong>line 5<\/strong>. We shift the last letter of &#8220;<strong>bob<\/strong>&#8221; up one notch and we use <strong>LESS_THAN_EQUAL<\/strong>. This time the result includes &#8220;<strong>boby<\/strong>&#8221;<\/p>\n<p> Now we want to get users whose login = &#8220;bob&#8221; and having 32 years old.<\/p>\n<pre class=\"brush: java; highlight: [6,7]; title: ; notranslate\" title=\"\">\nComposite start = new Composite();\nstart.addComponent(0, &amp;quot;bob&amp;quot;, Composite.ComponentEquality.EQUAL);\nstart.addComponent(1, 32, Composite.ComponentEquality.EQUAL);\n\nComposite end = new Composite();\nend.addComponent(0, &amp;quot;bob&amp;quot;, Composite.ComponentEquality.EQUAL);\nend.addComponent(1, 32, Composite.ComponentEquality.GREATER_THAN_EQUAL);\n\nList&amp;lt;HColumn&amp;lt;Composite, Object&amp;gt;&amp;gt; columns = HFactory.createSliceQuery(keyspace, se, ce, oe)\n\t.setColumnFamily(&amp;quot;composite&amp;quot;).setKey(&amp;quot;test&amp;quot;)\n\t.setRange(start, end, false, 100).execute().get().getColumns();\n<\/pre>\n<p>The result is:<\/p>\n<ul>\n<li>bob:32:New York<\/li>\n<\/ul>\n<p> Please notice that now for the first component (login) we can both use EQUAL for start and end limit (line 5). For the second component (age) again EQUAL for start limit and GREATER_THAN_EQUAL for end limit if we want an exact match.<\/p>\n<p> What if we want users whose login = &#8220;bob&#8221; and age between 32 and 35 years old inclusive ?<\/p>\n<pre class=\"brush: java; highlight: [7]; title: ; notranslate\" title=\"\">\nComposite start = new Composite();\nstart.addComponent(0, &amp;quot;bob&amp;quot;, Composite.ComponentEquality.EQUAL);\nstart.addComponent(1, 32, Composite.ComponentEquality.EQUAL);\n\nComposite end = new Composite();\nend.addComponent(0, &amp;quot;bob&amp;quot;, Composite.ComponentEquality.EQUAL);\nend.addComponent(1, 36, Composite.ComponentEquality.LESS_THAN_EQUAL);\n\nList&amp;lt;HColumn&amp;lt;Composite, Object&amp;gt;&amp;gt; columns = HFactory.createSliceQuery(keyspace, se, ce, oe)\n\t.setColumnFamily(&amp;quot;composite&amp;quot;).setKey(&amp;quot;test&amp;quot;)\n\t.setRange(start, end, false, 100).execute().get().getColumns();\n<\/pre>\n<p>The result is:<\/p>\n<ul>\n<li>bob:32:New York<\/li>\n<li>bob:35:Seattle<\/li>\n<\/ul>\n<p> Again, we need to set <strong>36<\/strong> and LESS_THAN_EQUAL to the end limit to retrieve <strong>bob:35:Seattle<\/strong> because the LESS_THAN_EQUAL operator is a strict<strong> &lt;<\/strong><\/p>\n<p> Now, we want to get all users with age = 27<\/p>\n<pre class=\"brush: java; highlight: [2,6]; title: ; notranslate\" title=\"\">\nComposite start = new Composite();\nstart.addComponent(0, &amp;quot;&amp;quot;, Composite.ComponentEquality.EQUAL);\nstart.addComponent(1, 27, Composite.ComponentEquality.EQUAL);\n\nComposite end = new Composite();\nend.addComponent(0, Character.MAX_VALUE + &amp;quot;&amp;quot;, Composite.ComponentEquality.EQUAL);\nend.addComponent(1, 27, Composite.ComponentEquality.GREATER_THAN_EQUAL);\n\nList&amp;lt;HColumn&amp;lt;Composite, Object&amp;gt;&amp;gt; columns = HFactory.createSliceQuery(keyspace, se, ce, oe)\n\t.setColumnFamily(&amp;quot;composite&amp;quot;).setKey(&amp;quot;test&amp;quot;)\n\t.setRange(start, end, false, 100).execute().get().getColumns();\n<\/pre>\n<p> The result:<\/p>\n<ul>\n<li>alice:27:New York<\/li>\n<li>bob:32:New York<\/li>\n<li>bob:35:Seattle<\/li>\n<li>boby:25:Atlanta<\/li>\n<li>jack:27:Los Angeles<\/li>\n<\/ul>\n<p> It is mandatory to provide a value for the first component. Since we do not want to filter by login, I set &#8220;&#8221; for start limit and Character.MAX_VALUE (<strong>line 6<\/strong>) for end limit.<\/p>\n<p> We expect <strong>Cassandra<\/strong> to return back only <strong>alice:27:New York<\/strong> and <strong>jack:27:Los Angeles<\/strong> but it returns all the users! Why ?<\/p>\n<p> <strong>Simply because of the way how Cassandra orders composite columns. The composite columns are ordered first by its first component, then by its second component etc&#8230;<br \/>\n<\/strong><\/p>\n<p> To get all users of 27 years old, <strong>Cassandra<\/strong> need to scan all possible values for the first composite component (login) before it can access the second one (age). In the end it needs to retrieve all the columns because there is no other way to process.<\/p>\n<p> Most developers naively think that composite columns provides them multi-dimensional axes for data filtering but it&#8217;s just an illusion. The way <strong>Cassandra<\/strong> sort columns is the real limitation.<\/p>\n<p> The only way to use composite columns effectively is to filter<\/p>\n<ul>\n<li>only by the first component<\/li>\n<li>or fixing the first component then filter by the second component<\/li>\n<li>or fixing the first and second component then filter by the third component<\/li>\n<li>etc&#8230;<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<h3>B Row or colum indexing ? The (not so) difficult choice<\/h3>\n<p> With Cassandra you can filter data either using the row key as search key (exact match or range match with <strong>RangeQuery<\/strong>) or using the column key as described previously.<\/p>\n<p> This leads to main 2 designs: <strong>wide row<\/strong> and <strong>skinny row<\/strong>.<\/p>\n<p> A <strong>wide row<\/strong> pattern consists of a column family structure with very few rows and, for each rows, many many columns.<\/p>\n<p> A <strong>skinny row<\/strong> pattern consists of a column family structure with many rows and for each row, very few columns.<\/p>\n<p> Each of those design matches different technical needs. However for indexing data, the <strong>skinny row<\/strong> pattern should be avoided.<\/p>\n<p> Basically you build indexes to accelerate data access and retrieval. Instead of scanning and reading values from various column families we only need to read data from one column family, the index.<\/p>\n<p> Now what if the index data is spread over many Cassandra nodes ? Well the cluster need to fetch them all from different node and merge them. It has a cost.<\/p>\n<p> Why does it occur ? <strong>Because when you index data using the row name as search key, with a RandomPartioner (default configuration) a hash is computed from the row key and the data is sent to a node depending on this hashed value.<\/strong><\/p>\n<p> Of course you can choose a <strong>OrderPreservingPartioner<\/strong> but it has many drawbacks, unbalanced cluster structure to cite the few (complete details <a href=\"http:\/\/www.datastax.com\/docs\/0.8\/cluster_architecture\/partitioning\" title=\"Data Partitioning\" target=\"_blank\">here<\/a>). Officially the use of  <strong>OrderPreservingPartioner<\/strong> is strongly not recommended unless you do not have any choice.<\/p>\n<p> So it let us with the <strong>wide row<\/strong> pattern as indexing solution and this is the right choice. All columns of a row are stored on the same node, in the same data block and sorted on disk so accessing and scanning these columns is extremely fast. And that&#8217;s what you want for your index !<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This post is a start of a new topic on NoSQL\/Big Data Today I will give some tricks I found out with Apache Cassandra during my work on Tatami and point out some gotchas to avoid. Those who are not&#8230;<br \/><a class=\"read-more-button\" href=\"https:\/\/www.doanduyhai.com\/blog\/?p=1227\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[10],"tags":[],"_links":{"self":[{"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/1227"}],"collection":[{"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1227"}],"version-history":[{"count":0,"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=\/wp\/v2\/posts\/1227\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1227"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1227"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.doanduyhai.com\/blog\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1227"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}