Distributed Data Show Episode 67
Jonathan Ellis recounts several attempts in the history of Apache Cassandra to coerce developers into specific behaviors around issues such as sequential scans, tombstones, and joins. We discuss why these attempts didn’t work, and how to respond to developer feedback more effectively.
0:15 - Welcoming Jonathan back to the show and summarizing the great happenings at the Distributed Data Summit (http://distributeddatasummit.com/).
1:30 - Attempts Jonathan was a part of making in the Cassandra community to guide developer behavior.
2:25 - Example 1: the “ALLOW FILTERING” keyword. This keyword was added to the CQL SELECT statement to prevent developers from accidentally asking for too much data and suffering timeouts or running nodes out of memory. (This wasn’t a problem with the original Thrift API.)
6:21 - The initial solution to this problem was to reject sequential scans and provide a warning message, which the client could override with the ALLOW FILTERING keyword. This didn’t work because developers just added the keyword and kept going.
9:07 - The better solution was to introduce paging. Another proposed option of trying to predict which queries would result in reading too much data would not have scaled well. However, an estimation of work was key to solving another challenge...
10:46 - Example 2: tombstone warning threshold. Tombstones are a mechanism for representing what data has been deleted, which is required in a system in which nodes can go online/online, so that data doesn’t get accidentally resurrected. For tables with clustering columns, it’s possible that a large number of tombstones will be included in the result set provided from a given node to the coordinator node for a given query.
12:41 - Paging the tombstone records would have made things too slow. Instead, a warning threshold was configured on the server, and a log statement would be generated when the threshold was triggered. The problem with this solution was that we were burying the warning in the server log for a client side request, which just confused people.
14:57 - Most often, the root cause turned out to be the data model. There was a workaround you could use: if you were using a time type as a clustering column your data and knew where the tombstones fell in that time range, you could use a WHERE clause to skip over the tombstones.
16:30 - This isn’t a simple problem to fix. We really need a major design change of the whole way deletes are handled including the gc_grace_seconds option. The real goal is throw away the tombstones once we know all the nodes are aware.
18:06 - The key is recognizing that the tombstone issue is ultimately a repair problem. Some of the challenges with repair have been addressed in DataStax Enterprise via node sync, the remaining work is to make repair much faster - on the order of hours vs. days.
20:19 - Example 3: supporting joins. When presented with Cassandra recommended design pattern of supporting entire queries in denormalized tables, developers often choose to implement application-side joins instead of creating denormalized tables.
23:39 - The future of joins in Cassandra - it may happen in a post 4.0 release. Single partition joins seem like a very likely case that won’t impact consistency guarantees.
Joins within a partition seem like a realistic option to support. The key will be to provide transparency and predictability to developers about their queries.
26:47 - How we can do better at listening to developers?
Product at DataStax
Technology at DataStax