When running a Hadoop job that uses a Cassandra column family as input, how does setting the read consistency affect the range scan used to get the input? Ie if we want to guarantee that a hadoop job runs on the most up to date information in a column family, is it better to have written with a consistency level of "ALL" so the job can read with consistency of "ONE" or to both read and write at a consistency level of "QUORUM"?
ColumnFamilyInputFormat - read and write consistency levels
(11 posts) (2 voices)-
Posted 1 year ago #
-
This is best summed up by the following: "Note that if W + R > ReplicationFactor, where W is the number of nodes to block for on write, and R the number to block for on reads, you will have strongly consistent behavior"
ALL should only be used in situations where high availability is not required.
Posted 1 year ago # -
When we run our hadoop job at read consistency "QUORUM" we get a timeout in the ColumnFamilyInputFormat. Our rows really aren’t that big (a few dozen columns @ a few KB per). We have a 4 node cluster running Cassandra 1.0.5. We have tried setting the timeout to a minute, as well as decreasing the range batch size, but we still end up with the following error:
java.lang.RuntimeException: TimedOutException()
at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.maybeInit(ColumnFamilyRecordReader.java:281)
at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:295)
at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$RowIterator.computeNext(ColumnFamilyRecordReader.java:183)
at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.nextKeyValue(ColumnFamilyRecordReader.java:139)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:456)
at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:647)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:32Is there something we can check or try and/or change to keep from timing out?
Posted 1 year ago # -
What is the replication factor and range batch size? Also, do you have any other metrics for performance of individual nodes? Like what is the OpsCenter output for the nodes when the timeouts occur?
Posted 1 year ago # -
The replication factor is set to 3 and we have dropped the batch size ranges as low as 32. What metrics would you like and is there a way to print a report of the metrics?
Posted 1 year ago # -
When we run the job and read with consistency level "QUORUM" the cluster's read latency will jump to over a minute, and we see the disk utilization hits 100%.
When we run the same job against a keyspace with a replication factor of 1, CL "ONE", the read latency gets as high as 2 or 3 seconds and the disk utilization doesn't get above 55%.
Are there settings we can change to help bring down the disk utilization on replicate keyspaces?
Posted 1 year ago # -
What version of Cassandra is this? This behavior certainly does not sound correct.
Posted 1 year ago # -
1.0.5
Posted 1 year ago # -
We are currently looking into this and we'll update here as soon as we find something. Thanks for the extra details and patience.
Posted 1 year ago # -
Ok, looks like you are getting hit by https://issues.apache.org/jira/browse/CASSANDRA-3551
If you would like to apply the patch on the source and give it a try, this would actually be super valueable as an external test of the fix.
Posted 1 year ago # -
We were able to try the patch yesterday and we were able to run the hadoop job and read at CL of QUORUM. Thanks for the help.
Posted 1 year ago #
