Feedback from SF Cassandra Summit

After 2 days of high paced Cassandra Summit in San Francisco, it’s time to lay down and give a little feedback of the event.

The first impression is that the conference is quite well organised. There were enough staff at the registration desk so that the process went smoothly. It was a little bit crowded at rush hour, around 7:30AM – 8AM but it did not look like a zoo with long queuing lines and people pushing back.

For the content, the summit was split into 2 days, the first one dedicated to training and the second to conferences. There were training sessions for Cassandra Fresh Starters, for Data Modeling and for Performance Tuning. The 3 sessions cover a wide range of attendees requirement. The conference day themes split into Real production use cases for Cassandra, Cassandra-related technical talks and again Performance tuning in production

I The training day

Since I have some knowledge about Cassandra, I attended the Advanced Performance Tuning session held by Aaron Morton, a Cassandra veteran. The training was quite interesting, he re-used some materials from previous year but added new chapters to adapt to new features. The slides were very well structured, starting by the definition of what is performance, what goals we want to achieve, before looking into metrics detail. Then he showed the close relationship between latency & throughput and how they are related.

Aaron is a good story teller, every figure & metrics were explained along side with real production issue he encountered. It makes the training more interactive, with people asking questions to dig into some particular topics. At the end of the day, he introduced a checklist on Cassandra Performance Tuning Methodology, a very detailed document about what to look for, in which order, and which metrics to collect in order to troubleshoot Cassandra performance issues. This list is definitely a must-have

II The conference day

The first keynote

The conference started at sharp 8:15AM with Billy Bosworth, Datastax CEO, opening the show. He explained why in this new digital area of always online businesses, Cassandra high availability and quick response time is the perfect match. He then displayed a Gaussian curve with 20% of legacy systems using traditional SQL solutions on the left, 20% of bleeding-edge NoSQL technologies used by early adopters and techies on the right and the big belly of 60% in the middle consisting of running applications shifting to NoSQL paradigm and how Cassandra can/should address those.

Interestingly enough, he gave some examples of real production use cases powered by Cassandra. He invited Jef Ludwig, VP Engineering of Sony Network Entertainment, on stage to explain how Datastax helps them to power all Sony Entertainment data platform. That was a very strong reference for Cassandra being a rock solid OLTP solution that scale.

The next special guest invited on stage is even more surprising: Yi Li, CEO of Orbeus. This small Californian start-up is developing an awesome digital image recognition service. Yi did a quick live demo for the audience with her application, recognising the sex, age and mood of any face captured by her Ipad camera, among other features. Truly amazing. This demo is quite lengthy though.

I think Datastax made a great moves showcasing such start-up companies. It’s a clear message there: “You are a small start-up. If you enroll in our start-up program, you may get a free showcase at one of the biggest NoSQL conference in the US“. Nice, isn’t it?

The tech keynote

Second keynote was presented by Jonathan Ellis, co-founder of Datastax and chairman of the Apache Cassandra project. He announced the fresh release of the so awaited Cassandra 2.1 and is listing through new features of this version.

First for developers, introduction of the new “User Defined Type” (UDT). It’s basically a custom type you define statically to nest arbitrary types inside. You can even nest UDT inside UDT. The most common given example is an user having an address UDT comprising street name, street number, state and zip code. With this new feature it’ll be dead easy to save JSON messages inside Cassandra.

The second interesting feature for developers are static columns. They are called static because they are defined on a clustered table (table having a compound primary key) and they relate to the partition key only. All static columns are shared among all clustered data. To clarify the concept, think about a blog post with comments. The blog post title, author, creation date and content would be static columns, with the post id as partition key. The clustered data are people comments on the post, with clustering columns being the date of the comment.

On the performance side, Cassandra 2.1 is up to 50% faster than the previous version, due to some internal optimisation on memtables. There was a blog post explaining in detail those perf improvement. He also explained how Cassandra achieves not only to have overall fast response time but also consistent fast response time, even for the 99th percentile, quite an amazing optimisation. Last but not least, dynamic re-sampling of partition keys for dichotomic search index will help optimise memory usage.

But the 2.1 release did not forget the ops. For them, new optimisation for the repair process is here. Cassandra now will mark SSTables that are already repaired so they can be skipped for the next repair. Before that, the repair time grows linearly with you data set. Now the repair time is proportional to the data creation rate, which is by far much more scalable. Another interesting improvement is the loading of new SSTable chunks created during compaction into the OS page cache. This helps avoiding dips of read latency at then end of the compaction, thus helping to achieve the aforementioned consistent fast response time

Counters have been redesigned in 2.1 to be more reliable and resilient to node failures. Now you won’t have counter over-count when a node is resurrecting after a failure and replaying commit log. More details can be found in this counter blog post

The conferences

I’ve targeted mostly conferences on performance tuning because I’m interested by the subject so my feedback may be biased. Anyway, let’s have a look at some of them:

TitanDB, Scaling Relationship Data and Analysis with Cassandra (speaker Matthias Broechler): Matthias is lead developer of TitanDB and is presenting the framework. The idea of TitanDB is to implements the Tinkerpop specs and use various data store (Cassandra/HBase/BerkeleyDB…) underneath. What makes TitanDB stand out of the crowd is its scalability compared to other graph datastores, said Matthias. He then presented some of the graph traversal API used in TitanDB to query data. It looks nice but is kind of complex to handle if you’re not familiar with the Tinkerpop stack. The talk then got into details of how vertices and attributes are mapped to Cassandra. For complex graphs with high cardinality vertices, TitanDB can partition them to make query faster. Please note that there is even a query engine inside TitanDB that will optimise the query plan for you.

Although the ideas and architecture behind TitanDB is very nice and appealing. I felt that it is somehow too complex and the framework kind of “hides” this complexity and tries to optimise performance on behalf of the users. It resembles Hibernate attempt to hide the same complexity of SQL from developers. I’m not sure it’s the right path to choose. Nevertheless, having a framework that can scale on graphs (under which conditions scalablity is guarateed is another debate…) is nice

Lesser Known Features of Cassandra 2.0 and 2.1 (speaker Aaron Morton again!): this talk is more like a catalog of numerous small features not sufficiently highlighted during public talks. Aaron is listing the most interesting one:

new logging framework (Logback) for Cassandra to make the logging configuration easier and more dynamic (no need to restart the server!). See CASSANDRA-5883
new join_ring toggle to make a node go into hibernate when set to “false”. It is usefull when bringing a dead node into the cluster after a long time, to avoid it serving staled data. Nice trick to know for ops. See CASSANDRA-6961
pluggable configuration loader. You no longer need to configure Cassandra using the cassandra.yaml file, now you can plug in your own config manager. But I doubt the utility of this feature, although it’s always nice to have the choice. See CASSANDRA-5045
CQL3 now support column aliases, a nice to have feature especially useful when you need to grant an alias to a function call like `writetime()` or `ttl()`. See CASSANDRA-5075
new min & max column names stored in SSTable meta data to accelerate slice query, it’s an “old” feature of 2.0.x but it’s nice to know that now slice queries can benefit from it to skip hitting unnecessary SSTables on disk. See CASSANDRA-5514
new tool “sstablelevelreset” to force LeveledCompaction tables to reset the level to 0 and so re-compact all SSTables. Before the trick was to remove the JSON manifest file. See CASSANDRA-5271

This talk was nice and technical, ideal for folks like me that knows Cassandra internals quite well. I’m not sure that beginners would appreciate it for what it really worths

Real Data Models of Silicon Valley (speaker Patrick McFadin): this time, Patrick McFadin, Cassandra chief evangelist at Datastax, is on stage. As always he was very comfortable with big audience and the presentation was very fluid. Patrick went into detail about the new “User Defined Type” (UDT) and also mentioned the new tuple type, which is just a variant of UDT. He demonstrated what UDT can bring to Cassandra modelling with a simple example of how one can model documents represented as JSON into Cassandra. It’s amazing to see how a big document hierarchy can be neatly mapped into a CQL3 table using UDT. He then mentioned the meaning of the “frozen” keyword and the practical reason for it to be there: backward compatibility. Patrick gave a glimpse of what UDT would be in Cassandra 3.0 when UDT will be fully functional. You will be able to modify atomically every field of an UDT whereas the 2.1 UDT version is basically just a blob.

As always with Patrick McFadin, the talk is a show, with lot of fun (a big troll on the frozen keyword) but still quite interesting and very technical

CQL Under the Hood (speaker: Robbie Strickland): I did not have the opportunity to attend this talk because it was occurring at the same time than Patrick McFadin talk but I stumble upon the slides. They are definitely worth reading and should be on the top of your reading list for effective data modelling in Cassandra

Performance Tuning Cassandra in AWS (speaker Randy Bliss): this talks illustrated how FamilySearch, a Mormon-backed website that offers a genealogy service, leveraging Cassandra on AWS. First Randy introduced the business use cases of FamilySearch, then he highlighted some tuning done on AWS to increase the cluster performance. Among others, switching to TokenAware load balancing strategy on the driver reduces inter-node latency and offers better throughput. Similarly they increased the number of threads dedicated to read & write, up to 128, to achieve desired performance. 128 seems a pretty high number compared to the default 32 but for their use case, it was a winning move

The talk is too focused on the FamilySearch business, not enough perf tuning content for me so I’m a little bit disappointed

Common Cassandra Performance Patterns Seen Through Histograms (speaker Christ Eniry): this tech talk is definitely a must-seen for any Cassandra ops. Even though I already know about Cassandra histograms, Christ shed a light on new aspects of those figures, especially the fact that the displayed metrics are only a snapshot view in time of the server status. To have a high level overview, he recommended building heat map by taking regular snapshots over a long period of time. He then showed the difference between the cfhistograms, which throw metrics at table level, and proxyhistograms which are more related to the cluster. Analyzing both histograms are interesting and give you some hints for what could be wrong inside the cluster. Since histograms are temporal views, you should not only rely on those for performance investigation

A very good tech talk for perf tuning. Watch it once the Summit videos are online

Cassandra Doctor at Apple (speaker: Richard Low): Richard Low is a well known Cassandra speaker, being nominated MVP last year at the Cassandra EU Summit. He is working now at Apple and exposed some performance issues he dealt with recently. He took a detective approach, first presenting the symptoms and consequences before digging further into performance and code analyzis. The first issue was a client having high read/write latency. It appeared after looking at the perf metrics that the latency resembled typical cross-DC latency. Richard found out that there was a bug in the Java driver that sets “null” to local data center in a system table, making the client selecting the wrong DC as local DC. Another highlighted bug is suspicious sstable state after compaction. He noticed that some very old sstables (having lower generation ID in their name) are still hanging around after several compactions. Looking into the source code of Cassandra, he caught a nasty bug about counter and filed a JIRA. Interestingly the bug was fixed almost in the same day. Open source power ! Please note that, this talk, along with the Cassandra at Apple for Massive Scale talk I didn’t attend, showed how Apple uses Cassandra extensively in their infrastructure. Go get the videos once online!

This talk is an incentive to learn more about Cassandra internals. You don’t need to know in details how every piece of the database works together but having a big picture in mind does help a lot narrowing down root causes. He urged people not to hesitate looking into the source code when having Java stacktrace at hand, it can save your day. Personally that’s what I did yesterday when helping a customer and looking into source code definitely help finding the root cause

The lightning talks

This lightning talks session is a kind of tradition to wrap up the conference. The idea is dead simple: you have 5 minutes to pitch your idea/talk. The timer is displayed on the big screen in the background. After 5 minutes, the bell rings and you’re out. Having had a chance to do a lightning talk last November at the Cassandra EU Summit in London, I must admit that time is running really really fast when you’re on stage.

This year, Christian Hasker, responsible for the Apache Cassandra Community development, leads the show. There were some interesting talks. One about Instaclustr, a Cassandra-in-the-cloud provider, proving that Apache Cassandra is now at the core of lot new startup business. Patricia Gorla, this year MVP, did a quick presentation of SSTable generator, a worthy tool to generate raw SSTable from CQL3, really nice. Brian Lynch then introduces new support for Cassandra in GCE. They experienced some issue configuring Cassandra on GCE but finally made it. Now you can in one click select and deploy a Cassandra cluster on GCE, with all the perf tuning integrated. The last lightning talk I like is the one by Apple, presenting how they architected Cassandra to serve as distributed cache. Though technically interesting, I am afraid that it can give bad idea for some folks using Cassandra as a caching solution without proper tuning.

All the extra

Apart from the main conference and training session, there were some interesting spots to look at during this Summit. First one was the Cassandra LIVE room where you can drop by and have people from Netflix, Sony, Instagram … talk about their experience using Cassandra in production. Another sweet spot is the Meet the Experts room, where you can just drop by and grab any expert there to answer any of your question. The room was really crowded at rush hours. This was the place to be for techies.

Wrap up

This was my first Cassandra Summit in SF and it was intense. 2 days of non stop technical info stream to ingest. The organisation was excellent, no chaotic big line for food or registration. I’ve met of lot of interesting people and put a face on some of the folks I used to exchange with online. The Cassandra Summit is definitely a worth-to-attend event. I can’t wait for the next Cassandra Summit Europe happening in London this fall.