Streaming in Cassandra 2.0
date: September 4, 2013
“Streaming” is a component which handles data (part of SSTable file) exchange among nodes in the cluster.
When you bootstrap a new node, it gets data from existing nodes using streaming. When you run
nodetool repair, nodes exchange out-of-sync data using streaming.
If you want to bulk load data from your backup set,
sstableloader uses streaming to complete task.
Why we updated the streaming protocol?
As you can see, streaming is one of the core components inside Cassandra. If you have experience in operating Cassandra clusters, you have probably wondered why streaming sometimes gets stuck or is slow, and it’s hard to track down what went wrong.
Over time streaming has been improved, but to make it more reliable, traceable, and faster, we had to re-design it from the ground up.
C* version 2.0 was the best timing for re-designing streaming protocol and API.
What's changed in Streaming 2.0?
Streaming 2.0 is designed to achieve the following goals:
- Better control
- Better traceability
- Better performance
For better control
Up to 1.2, there are two kinds of streaming.
Stream Out, which a node sends data to another node and
Stream In, which a node receives data from another node, and those are separate stream sessions even though operations like
repair do both send and receive data.
Stream sessions are also separated from operations, so it was hard to tell what operation those streaming you saw on
nodetool netstats belong.
In streaming 2.0, all streaming sessions related to certain operation(bulkload, move, bootstrap, etc.) are associated to the same
As you can see in improved
nodetool netstats output, you are able to see what operation is performing streams in one place.
$ bin/nodetool -h 192.168.1.141 netstats Mode: NORMAL Bulk Load fdf4cc70-10e9-11e3-bed0-27ba85b87bf8 /192.168.1.163 Receiving 3 files, 28437084 bytes total /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-ja-4-Data.db 9244384/9244384 bytes(100%) received from /192.168.1.163 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-ja-5-Data.db 9249617/9249617 bytes(100%) received from /192.168.1.163 /var/lib/cassandra/data/Keyspace1/Standard1/Keyspace1-Standard1-tmp-ja-6-Data.db 5635715/9943083 bytes(56%) received from /192.168.1.163 Read Repair Statistics: Attempted: 0 Mismatch (Blocking): 0 Mismatch (Background): 0 Pool Name Active Pending Completed Commands n/a 0 0 Responses n/a 0 0
Here, you can see the node is receiving 3 files from
/192.168.1.163 for "Bulk Load" and has Stream Plan ID of fdf4cc70-10e9-11e3-bed0-27ba85b87bf8.
For better traceability
As stated above, each
Stream Plan has its own Stream Plan ID, and all the events happen during the operation are associated with this ID.
You can search through the log file with this ID to trace what is happening:
INFO [STREAM-INIT-/192.168.1.163:58320] ...snip... [Stream #fdf4cc70-10e9-11e3-bed0-27ba85b87bf8] Received streaming plan for Bulk Load INFO [STREAM-IN-/192.168.1.163] ...snip... [Stream #fdf4cc70-10e9-11e3-bed0-27ba85b87bf8] Prepare completed. Receiving 3 files(28437084 bytes) INFO [STREAM-IN-/192.168.1.163] ...snip... [Stream #fdf4cc70-10e9-11e3-bed0-27ba85b87bf8] Session with /192.168.1.163 is complete INFO [STREAM-IN-/192.168.1.163] ...snip... [Stream #fdf4cc70-10e9-11e3-bed0-27ba85b87bf8] All sessions completed
Through JMX interface, you can get status of all currently running
Stream Plans. Streaming 2.0 also added feature to export stream events through JMX Notification, so it is easy to build streaming monitoring application of your own.
For better performance
In the previous versions, file transfer between two nodes happens as described below:
When the file is sent, the sender waits for ACK, then proceed sending next files. The sender disconnects every time it receives ACK.
Streaming 2.0 turns above into pipeline on the same connection, so the sender does not have to wait for ACK to transfer next files.
Still we have several points that can be improved in the code for better performance, so at this point performance gain is limited, but I hope we will see more performance gain in the future.
Since the protocol is re-designed, we can add new feature to support streaming of older version of SSTable files. So in the future upgrade of C*, you don't need to perform
upgradesstables before streaming occurs.