Or can it only be done at runtime via jmx?
In an effort to consolidate free help offered for our products we have decided to move these forums to a more widely used forum. Please use one of the following queries (or any combination):
- Cassandra: tag search or plain text search
- DataStax Enterprise: tag search or plain text search
- DataStax OpsCenter: tag search or plain text search
Is BackPressureThreshold configurable via dse.yaml?(10 posts) (2 voices)
it is not currently possible to configure backpressure via dse.yaml, but we're considering to allow for it in one of the next versions.
Followup question. Is it possible that the current indexing system system sleeps when the queue drops to zero? (Sorry, but your stuff isn't open source so I can't self-serve here.) I'm seeing backpressure kicking in when a test run starts up where in the previous 5 minutes there might have been zero writes and then it starts putting a couple thousand writes per second into Cassandra. It quickly drops to single digits in each queue...seems suspiciously like polling.
the indexing system is implemented via blocking queues for task distribution and latches for producer/consumer synchronization on commit/flush: it is not pull-based so, and there are no sleep calls.
I'm not sure I'm getting your problem: could you elaborate more with a step-by-step description, or even better, produce a test case I can run?
What I'm seeing is mutation messages dropped on one or two nodes which results in solr queries being inconsistent from node to node. The back pressure also seems to happen most frequently when the cluster goes from very few or no writes to ~1000 writes/second. The index cue size spikes, backpressure kicks in, mutations are dropped. Within seconds at the same data rate there are no issues with mutations dropping and tpstats for MutationStage shows very low numbers, usually 0, with some in the 10s or 20s.
I'm not sure how to generate an easy-to-reproduce test case. Is there any way to use the cassandra stress tool and have a Solr index associated with the data?
Here's a snippet of the logs for one test run where we went from 0 pending MutationStage messages to 43k in under 16 seconds. At the same time other nodes don't see anywhere near the number of pending messages and I don't see anything in the logs on that cluster that indicate the source of the spike in data.
tpstats across the cluster for MutationStage:
Associated logs from 131.227
I can't tell why you're getting those spikes, but dropped messages happen because the backpressure pauses the Cassandra write stage for too long, trying to catch up on indexing, until mutations further in the queue expire and get dropped.
In other words, Lucene indexing can't keep up with your insert rate, so here are a few things you could try to speeding it up:
1) Increase the IndexWriter max ram buffer size and/or max buffered docs.
2) Increase the soft commit interval.
By our side, we'll work on improving the backpressure mechanism to make it smarter and less drastic.
Hope that helps, keep me posted.
Our soft commit is already 10 seconds, so pretty large. Our max buffered docs is also 100k -- our docs are only ~1600 bytes.
We can chew through hundreds of docs a sec, so the 1000 setting on backpressure kicks in way too easily for any spike.
Strong request to make this yaml configurable in a future version (next one???)
For now we're going to have to put a process in place to poll & set via jmx.
the 1000 backpressure limit is actually computed over the mean of all thread queues, in other words it means you roughly have 1000 * num threads queued indexing tasks.
Anyways, what I'm missing here is: does increasing the backpressure threshold help so?
Absolutely it helps...haven't had a single dropped message since increasing it to 100k ( I was making sure the two issues were related ) -- and my average mutation stage pending tpstat sits at < 1...nominal "peaks" are in the 20's to 30's.
I'm glad to know that: that's why we have configurable backpressure at runtime :)
Just be aware too large queues may generate too much memory pressure.