<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>DataStax Support Forums &#187; Topic: OpsCenter agent stuck on maxed thrift queue</title>
		<link>http://www.datastax.com/support-forums/topic/opscenter-agent-stuck-on-maxed-thrift-queue</link>
		<description>Software, Support, and Training for Apache Cassandra</description>
		<language>en-US</language>
		<pubDate>Wed, 22 May 2013 03:11:41 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.3</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://www.datastax.com/support-forums/search.php</link>
		</textInput>
		<atom:link href="http://www.datastax.com/support-forums/rss/topic/opscenter-agent-stuck-on-maxed-thrift-queue" rel="self" type="application/rss+xml" />

		<item>
			<title>cooptron on "OpsCenter agent stuck on maxed thrift queue"</title>
			<link>http://www.datastax.com/support-forums/topic/opscenter-agent-stuck-on-maxed-thrift-queue#post-8000</link>
			<pubDate>Mon, 17 Dec 2012 21:26:13 +0000</pubDate>
			<dc:creator>cooptron</dc:creator>
			<guid isPermaLink="false">8000@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;We do have secondary indexes, I assume that would affect the repair details as well since it is depending on a compaction?  We will look forward to the new version!  &#60;/p&#62;
&#60;p&#62;Thanks,&#60;br /&#62;
Andrew
&#60;/p&#62;</description>
		</item>
		<item>
			<title>nickmbailey on "OpsCenter agent stuck on maxed thrift queue"</title>
			<link>http://www.datastax.com/support-forums/topic/opscenter-agent-stuck-on-maxed-thrift-queue#post-7999</link>
			<pubDate>Mon, 17 Dec 2012 19:44:37 +0000</pubDate>
			<dc:creator>nickmbailey</dc:creator>
			<guid isPermaLink="false">7999@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Andrew,&#60;/p&#62;
&#60;p&#62;Do any of your CFs have secondary indexes? There is a known bug where secondary index compaction tasks can cause the compaction/streaming details to stall indefinitely. That will be fixed in the upcoming 2.1.3 release.&#60;/p&#62;
&#60;p&#62;-Nick
&#60;/p&#62;</description>
		</item>
		<item>
			<title>cooptron on "OpsCenter agent stuck on maxed thrift queue"</title>
			<link>http://www.datastax.com/support-forums/topic/opscenter-agent-stuck-on-maxed-thrift-queue#post-7998</link>
			<pubDate>Mon, 17 Dec 2012 19:33:03 +0000</pubDate>
			<dc:creator>cooptron</dc:creator>
			<guid isPermaLink="false">7998@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;The symptoms are the repair/compaction/stream information on the cluster views gets &#34;stuck&#34;.  The percentages no longer move, existing repairs do not go away in Opscenter, even though the cassandra node is no longer repairing or compacting.  No new information shows up in opscenter for nodes that start repairs.  Basically that part of the agent appears to stall indefinitely, while the OS stats and basic ring information still works.&#60;/p&#62;
&#60;p&#62;The only ERROR line I show in the agent log is from the initial configuration, which happens on every restart and appears to be the auto-discover process of the thrift port.  It does connect to jmx on localhost&#60;/p&#62;
&#60;p&#62;ERROR [Initialization] 2012-12-17 12:24:01,871 MARK HOST AS DOWN TRIGGERED for host 10.1.1.43(10.1.1.43):9160&#60;br /&#62;
ERROR [Initialization] 2012-12-17 12:24:01,872 Pool state on shutdown: &#38;lt;ConcurrentCassandraClientPoolByHost&#38;gt;:{10.1.1.43(10.1.1.43):9160}; IsActive?: true; Active: 0; Blocked: 0; Idle: 0; NumBeforeExhausted: 1&#60;br /&#62;
ERROR [Initialization] 2012-12-17 12:24:01,878 Error when performing thrift operation: #&#38;lt;HectorException me.prettyprint.hector.api.exceptions.HectorException: All host pools marked down. Retry burden pushed out to client.&#38;gt;&#60;br /&#62;
ERROR [Thread-5] 2012-12-17 12:24:01,879 Unable to connect to Cassandra #&#38;lt;HectorException me.prettyprint.hector.api.exceptions.HectorException: All host pools marked down. Retry burden pushed out to client.&#38;gt;&#60;/p&#62;
&#60;p&#62;I turned logging up to debugging on our test cluster (we are seeing the same situation there, much less CF's) and I see it collecting metrics on a regular basis, but then randomly it will spam the following (multiple times a second).  Debugging level logging did not provide any additional information to the cause.  I can send full logs if you are interested.&#60;/p&#62;
&#60;p&#62; WARN [Thread-2] 2012-12-17 13:16:13,062 Thrift operation queue is full, discarding thrift operation&#60;br /&#62;
 WARN [Thread-2] 2012-12-17 13:16:13,062 271315 operations dropped so far.&#60;/p&#62;
&#60;p&#62;It doesnt appear that the issue is based on number of metrics, the &#34;stall&#34; happens in our test cluster with roughly 100 CF's, and I cranked down the metrics in production (using ignore_keyspaces in the opscenter server config), but the issue still exists.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>nickmbailey on "OpsCenter agent stuck on maxed thrift queue"</title>
			<link>http://www.datastax.com/support-forums/topic/opscenter-agent-stuck-on-maxed-thrift-queue#post-7996</link>
			<pubDate>Mon, 17 Dec 2012 17:50:58 +0000</pubDate>
			<dc:creator>nickmbailey</dc:creator>
			<guid isPermaLink="false">7996@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Andrew,&#60;/p&#62;
&#60;p&#62;You are correct that reducing the column families you collect metrics for will help with thrift operations being discarded.  Node information like compaction and streams is already separate from metric collection however. Those operations being discarded shouldn't be affecting that data. You are seeing compactions/streaming from nodetool that aren't showing up in OpsCenter?&#60;/p&#62;
&#60;p&#62;Are there any other errors in the agent log when you see this issue?&#60;/p&#62;
&#60;p&#62;-Nick
&#60;/p&#62;</description>
		</item>
		<item>
			<title>cooptron on "OpsCenter agent stuck on maxed thrift queue"</title>
			<link>http://www.datastax.com/support-forums/topic/opscenter-agent-stuck-on-maxed-thrift-queue#post-7995</link>
			<pubDate>Mon, 17 Dec 2012 16:58:14 +0000</pubDate>
			<dc:creator>cooptron</dc:creator>
			<guid isPermaLink="false">7995@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;We are seeing an issue with the opscenter agent where we stop receiving information related to cassandra activities in the opscenter server view (repairs, streams, compactions).  We do still see IO information and load information.  This correlates to log info on the agents stating that the thrift operations queue is full and it is dropping thrift requests.  If we restart the agent it will start sending all information again for a limited timeframe and then starts dropping thrift operations again.  We have quite a few column families (across all keyspaces, roughly 3000).  I know we could likely fix this by reducing the amount of column families that we want to see metrics for, but I was curious if there were some tuning knobs to either increase the polling interval between metrics gatherings, or increase the thrift queue?  Is there a way to put the operations information (repairs, compactions, streams) into a separate queue so it is not affected by the metrics gathering?&#60;/p&#62;
&#60;p&#62;Thanks,&#60;br /&#62;
Andrew
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
