<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>DataStax Support Forums &#187; Topic: Unbalanced ring on Cassandra side of Brisk cluster</title>
		<link>http://www.datastax.com/support-forums/topic/unbalanced-ring-on-cassandra-side-of-brisk-cluster</link>
		<description>Software, Support, and Training for Apache Cassandra</description>
		<language>en-US</language>
		<pubDate>Wed, 22 May 2013 20:08:09 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.3</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://www.datastax.com/support-forums/search.php</link>
		</textInput>
		<atom:link href="http://www.datastax.com/support-forums/rss/topic/unbalanced-ring-on-cassandra-side-of-brisk-cluster" rel="self" type="application/rss+xml" />

		<item>
			<title>joaquin on "Unbalanced ring on Cassandra side of Brisk cluster"</title>
			<link>http://www.datastax.com/support-forums/topic/unbalanced-ring-on-cassandra-side-of-brisk-cluster#post-393</link>
			<pubDate>Thu, 28 Jul 2011 20:55:10 +0000</pubDate>
			<dc:creator>joaquin</dc:creator>
			<guid isPermaLink="false">393@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;After investigating this further, this is a common side affect of not having your memory overcommitted for Brisk. Since Brisk has many components bundled under the same JVM, this is a requirement.&#60;/p&#62;
&#60;p&#62;Could you run:&#60;br /&#62;
echo 1 &#124; sudo tee /proc/sys/vm/overcommit_memory&#60;/p&#62;
&#60;p&#62;and let us know if this problem returns?&#60;/p&#62;
&#60;p&#62;After making this change, do your nodetool operations now complete successfully?&#60;/p&#62;
&#60;p&#62;Thanks,&#60;br /&#62;
Joaquin
&#60;/p&#62;</description>
		</item>
		<item>
			<title>joaquin on "Unbalanced ring on Cassandra side of Brisk cluster"</title>
			<link>http://www.datastax.com/support-forums/topic/unbalanced-ring-on-cassandra-side-of-brisk-cluster#post-391</link>
			<pubDate>Thu, 28 Jul 2011 20:36:33 +0000</pubDate>
			<dc:creator>joaquin</dc:creator>
			<guid isPermaLink="false">391@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Could you add the line:&#60;br /&#62;
JVM_OPTS=&#34;$JVM_OPTS -XX:ErrorFile=/path/to/file&#34;&#60;/p&#62;
&#60;p&#62;at some point in cassandra-env.sh. You should find this under /etc/brisk/cassandra/cassandra-env.sh if you installed it via the packages or $BRISK_HOME/resources/cassandra/conf/cassandra-env.sh.&#60;/p&#62;
&#60;p&#62;This way we will be able to easily find your log. If this is not set, then typically the log will be in the folder that you called the process from. I'm having others look into this and will let you know what we find.&#60;/p&#62;
&#60;p&#62;Thanks,&#60;br /&#62;
joaquin
&#60;/p&#62;</description>
		</item>
		<item>
			<title>blueplastic on "Unbalanced ring on Cassandra side of Brisk cluster"</title>
			<link>http://www.datastax.com/support-forums/topic/unbalanced-ring-on-cassandra-side-of-brisk-cluster#post-386</link>
			<pubDate>Thu, 28 Jul 2011 19:00:38 +0000</pubDate>
			<dc:creator>blueplastic</dc:creator>
			<guid isPermaLink="false">386@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;You're right Joaquin. That EOFException error was not critical. It actually happened during the Cassandra start, not during the crash. Jonathan Ellis also said it was harmless on the Cassandra mailing list.&#60;/p&#62;
&#60;p&#62;However, we have tried running Repair and compaction on the node unbalanced node with no luck. Both Repair and Compaction seem to be crashing the Cassandra java process on the node. I eventually got repair to successfully complete on the node.&#60;/p&#62;
&#60;p&#62;Then yesterday, while running compaction, I got the following error in OpsCenter in the evening:&#60;br /&#62;
7/27/2011 11:30pm Alert Node reported as being down 10.2.206.x&#60;/p&#62;
&#60;p&#62;However, the system.log file for that node shows nothing but informational messages at that time:&#60;br /&#62;
 INFO [ScheduledTasks:1] 2011-07-28 06:29:21,429 GCInspector.java (line 128) GC for ParNew: 213 ms, 147241608 reclaimed leaving 2152005360 used; max is 4030726144&#60;br /&#62;
 INFO [ScheduledTasks:1] 2011-07-28 06:29:28,622 GCInspector.java (line 128) GC for ParNew: 219 ms, 147301832 reclaimed leaving 2188187360 used; max is 4030726144&#60;br /&#62;
 INFO [ScheduledTasks:1] 2011-07-28 06:29:39,666 GCInspector.java (line 128) GC for ParNew: 240 ms, 148102840 reclaimed leaving 2222749344 used; max is 4030726144&#60;/p&#62;
&#60;p&#62;After that in the log, there is a long gap till I restart Cassandra this morning. Note that the time stamp in the log is different than what OpsCenter reports maybe because of EC2 or OpsCenter time zone stuff, but they are the same time.&#60;/p&#62;
&#60;p&#62;There was definitely something quirky with the Cassandra process on the node this morning. I couldn't run nodetool ring on it:&#60;br /&#62;
Caused by: java.net.ConnectException: Connection refused&#60;br /&#62;
        at java.net.PlainSocketImpl.socketConnect(Native Method)&#60;br /&#62;
        at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351)&#60;/p&#62;
&#60;p&#62;I restarted the Cassandra process this morning and now OpsCenter has it marked as up, but I think Compaction is incomplete now.&#60;/p&#62;
&#60;p&#62;This shows me restarting the Cassandra process this morning:&#60;/p&#62;
&#60;p&#62;INFO [ScheduledTasks:1] 2011-07-28 06:29:21,429 GCInspector.java (line 128) GC for ParNew: 213 ms, 147241608 reclaimed leaving 2152005360 used; max is 4030726144&#60;br /&#62;
 INFO [ScheduledTasks:1] 2011-07-28 06:29:28,622 GCInspector.java (line 128) GC for ParNew: 219 ms, 147301832 reclaimed leaving 2188187360 used; max is 4030726144&#60;br /&#62;
 INFO [ScheduledTasks:1] 2011-07-28 06:29:39,666 GCInspector.java (line 128) GC for ParNew: 240 ms, 148102840 reclaimed leaving 2222749344 used; max is 4030726144&#60;br /&#62;
 INFO [main] 2011-07-28 17:48:28,584 AbstractCassandraDaemon.java (line 78) Logging initialized&#60;br /&#62;
 INFO [main] 2011-07-28 17:48:28,617 AbstractCassandraDaemon.java (line 96) Heap size: 3894411264/3894411264&#60;br /&#62;
 INFO [main] 2011-07-28 17:48:34,199 CLibrary.java (line 106) JNA mlockall successful&#60;/p&#62;
&#60;p&#62;I also checked the Linux syslog around that time (July 28 6:29am) and it looks like the OS dumped the java process?&#60;/p&#62;
&#60;p&#62;Jul 28 06:25:01 ip-10-2-206-127 CRON[15405]: (root) CMD (command -v debian-sa1 &#38;gt; /dev/null &#38;amp;&#38;amp; debian-sa1 1 1)&#60;br /&#62;
Jul 28 06:25:01 ip-10-2-206-127 CRON[15406]: (root) CMD (test -x /usr/sbin/anacron &#124;&#124; ( cd / &#38;amp;&#38;amp; run-parts --report /etc/cron.daily ))&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627014] apt-get invoked oom-killer: gfp_mask=0x201da, order=0, oom_adj=0&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627020] apt-get cpuset=/ mems_allowed=0&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627025] Pid: 15420, comm: apt-get Not tainted 2.6.35-30-virtual #54-Ubuntu&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627027] Call Trace:&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627041]  [&#38;lt;ffffffff810aefbd&#38;gt;] ? cpuset_print_task_mems_allowed+0x9d/0xb0&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627048]  [&#38;lt;ffffffff81104451&#38;gt;] dump_header+0x81/0xc0&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627052]  [&#38;lt;ffffffff81104511&#38;gt;] oom_kill_process+0x81/0x180&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627056]  [&#38;lt;ffffffff81104a48&#38;gt;] __out_of_memory+0x58/0xd0&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627059]  [&#38;lt;ffffffff81104b46&#38;gt;] out_of_memory+0x86/0x1c0&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627063]  [&#38;lt;ffffffff811085ae&#38;gt;] __alloc_pages_slowpath+0x58e/0x5a0&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627068]  [&#38;lt;ffffffff8110872c&#38;gt;] __alloc_pages_nodemask+0x16c/0x1d0&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627074]  [&#38;lt;ffffffff8113aaaa&#38;gt;] alloc_pages_current+0x9a/0x100&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627078]  [&#38;lt;ffffffff81101b57&#38;gt;] __page_cache_alloc+0x87/0x90&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627081]  [&#38;lt;ffffffff8110168e&#38;gt;] ? find_get_page+0x1e/0x90&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627085]  [&#38;lt;ffffffff81103113&#38;gt;] filemap_fault+0x1b3/0x450&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627091]  [&#38;lt;ffffffff8111e3d4&#38;gt;] __do_fault+0x54/0x560&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627095]  [&#38;lt;ffffffff81121379&#38;gt;] handle_mm_fault+0x1b9/0x440&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627101]  [&#38;lt;ffffffff810072f2&#38;gt;] ? check_events+0x12/0x20&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627107]  [&#38;lt;ffffffff815aaa35&#38;gt;] do_page_fault+0x125/0x350&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627111]  [&#38;lt;ffffffff815a75b5&#38;gt;] page_fault+0x25/0x30&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627114] Mem-Info:&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627116] Node 0 DMA per-cpu:&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627119] CPU    0: hi:    0, btch:   1 usd:   0&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627122] CPU    1: hi:    0, btch:   1 usd:   0&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627124] Node 0 DMA32 per-cpu:&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627127] CPU    0: hi:  186, btch:  31 usd:  41&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627129] CPU    1: hi:  186, btch:  31 usd: 172&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627131] Node 0 Normal per-cpu:&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627134] CPU    0: hi:  186, btch:  31 usd:   0&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627136] CPU    1: hi:  186, btch:  31 usd:  30&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627141] active_anon:665806 inactive_anon:133214 isolated_anon:0&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627142]  active_file:259 inactive_file:1039 isolated_file:0&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627143]  unevictable:1039172 dirty:0 writeback:259 unstable:0&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627144]  free:8797 slab_reclaimable:1978 slab_unreclaimable:3119&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627146]  mapped:4552 shmem:49 pagetables:18095 bounce:0&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627148] Node 0 DMA free:7808kB min:20kB low:24kB high:28kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15712kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627159] lowmem_reserve[]: 0 4024 7559 7559&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627165] Node 0 DMA32 free:22208kB min:5916kB low:7392kB high:8872kB active_anon:2135320kB inactive_anon:427136kB active_file:888kB inactive_file:3612kB unevictable:1425952kB isolated(anon):0kB isolated(file):0kB present:4120800kB mlocked:1425952kB dirty:0kB writeback:1036kB mapped:2532kB shmem:4kB slab_reclaimable:756kB slab_unreclaimable:4656kB kernel_stack:1240kB pagetables:50796kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:6976 all_unreclaimable? no&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627176] lowmem_reserve[]: 0 0 3535 3535&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627182] Node 0 Normal free:5172kB min:5196kB low:6492kB high:7792kB active_anon:527904kB inactive_anon:105720kB active_file:148kB inactive_file:544kB unevictable:2730736kB isolated(anon):0kB isolated(file):0kB present:3619840kB mlocked:2730736kB dirty:16kB writeback:0kB mapped:15676kB shmem:192kB slab_reclaimable:7156kB slab_unreclaimable:7820kB kernel_stack:2208kB pagetables:21584kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1088 all_unreclaimable? yes&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627192] lowmem_reserve[]: 0 0 0 0&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627197] Node 0 DMA: 2*4kB 1*8kB 1*16kB 1*32kB 1*64kB 2*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 1*4096kB = 7808kB&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627212] Node 0 DMA32: 672*4kB 662*8kB 501*16kB 4*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 1*4096kB = 22208kB&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627226] Node 0 Normal: 221*4kB 28*8kB 0*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 1*2048kB 0*4096kB = 5172kB&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627257] 5640 total pagecache pages&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627260] 0 pages in swap cache&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627263] Swap cache stats: add 0, delete 0, find 0/0&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627264] Free swap  = 0kB&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.627266] Total swap = 0kB&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.652382] 1966064 pages RAM&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.652387] 54111 pages reserved&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.652388] 14176 pages shared&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.652390] 1897463 pages non-shared&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.652394] Out of memory: kill process 2126 (java) score 20305 or a child&#60;br /&#62;
Jul 28 06:29:44 ip-10-2-206-127 kernel: [576488.652414] Killed process 2126 (java) vsz:951250956kB, anon-rss:4554448kB, file-rss:15276kB&#60;/p&#62;
&#60;p&#62;I don't see a JVM crashlog ( hs_err_pid[pid].log) in ~/brisk/resources/cassandra/bin or /tmp.&#60;/p&#62;
&#60;p&#62;So, I guess my question is: Why did the OS kill Java/Cassandra during compaction? Any more thoughts on why the ring would be so unbalanced?&#60;/p&#62;
&#60;p&#62;Also, I'm starting to suspect a bug in the random partitioner in 0.8.1? Has anyone loaded a large amount of data into 0.8.1?
&#60;/p&#62;</description>
		</item>
		<item>
			<title>joaquin on "Unbalanced ring on Cassandra side of Brisk cluster"</title>
			<link>http://www.datastax.com/support-forums/topic/unbalanced-ring-on-cassandra-side-of-brisk-cluster#post-352</link>
			<pubDate>Tue, 26 Jul 2011 17:56:26 +0000</pubDate>
			<dc:creator>joaquin</dc:creator>
			<guid isPermaLink="false">352@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;That above error just means that a message either came in corrupt, got cut off, or another application pinged the Cassandra port.&#60;/p&#62;
&#60;p&#62;If you could use OpsCenter, MBeans, or nodetool netstats to watch the repair that would be great. Also if there are any other errors, that's something to look into, but this one is benign.&#60;/p&#62;
&#60;p&#62;Thanks,&#60;br /&#62;
Joaquin
&#60;/p&#62;</description>
		</item>
		<item>
			<title>blueplastic on "Unbalanced ring on Cassandra side of Brisk cluster"</title>
			<link>http://www.datastax.com/support-forums/topic/unbalanced-ring-on-cassandra-side-of-brisk-cluster#post-319</link>
			<pubDate>Thu, 21 Jul 2011 18:32:09 +0000</pubDate>
			<dc:creator>blueplastic</dc:creator>
			<guid isPermaLink="false">319@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Looks like the Repair failed with this error on the 1st node (with 900+GB of data):&#60;/p&#62;
&#60;p&#62;ERROR [Thread-23] 2011-07-21 15:48:43,868 AbstractCassandraDaemon.java (line 113) Fatal exception in thread Thread[Thread-23,5,main]&#60;br /&#62;
java.io.IOError: java.io.EOFException&#60;br /&#62;
	at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:78)&#60;br /&#62;
Caused by: java.io.EOFException&#60;br /&#62;
	at java.io.DataInputStream.readInt(DataInputStream.java:375)&#60;br /&#62;
	at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:66)&#60;/p&#62;
&#60;p&#62;There's just a bunch of informational messages about Gossip before this.&#60;/p&#62;
&#60;p&#62;Any ideas about what could have caused this?
&#60;/p&#62;</description>
		</item>
		<item>
			<title>joaquin on "Unbalanced ring on Cassandra side of Brisk cluster"</title>
			<link>http://www.datastax.com/support-forums/topic/unbalanced-ring-on-cassandra-side-of-brisk-cluster#post-318</link>
			<pubDate>Wed, 20 Jul 2011 18:38:54 +0000</pubDate>
			<dc:creator>joaquin</dc:creator>
			<guid isPermaLink="false">318@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Were you running at QUORUM or ANY writes? This can also affect it, probably not as much as your seeing, but if the writes never made it across to the other two nodes there would be log errors on machines dropping messages to those two nodes.&#60;/p&#62;
&#60;p&#62;Let us know when the repairs are done!
&#60;/p&#62;</description>
		</item>
		<item>
			<title>tamalex on "Unbalanced ring on Cassandra side of Brisk cluster"</title>
			<link>http://www.datastax.com/support-forums/topic/unbalanced-ring-on-cassandra-side-of-brisk-cluster#post-317</link>
			<pubDate>Tue, 19 Jul 2011 22:59:29 +0000</pubDate>
			<dc:creator>tamalex</dc:creator>
			<guid isPermaLink="false">317@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Thanks, Joaquin. I'll let you know how it turns out.&#60;/p&#62;
&#60;p&#62;I also want to let you know that I saw this load imbalance during the writing process. We had very very few deletes, and no nodes down during the writing process, so I'm not sure why we should even have to be running repair.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>joaquin on "Unbalanced ring on Cassandra side of Brisk cluster"</title>
			<link>http://www.datastax.com/support-forums/topic/unbalanced-ring-on-cassandra-side-of-brisk-cluster#post-316</link>
			<pubDate>Tue, 19 Jul 2011 22:42:55 +0000</pubDate>
			<dc:creator>joaquin</dc:creator>
			<guid isPermaLink="false">316@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Hello,&#60;/p&#62;
&#60;p&#62;Yes, Node 0 will create more data than it needs when doing repairs, but once it's done, it should shrink down appropriately. However, 900 GB does seem a bit high, but hopefully it will shrink back down momentarily. Do keep us posted on the size of this node.&#60;/p&#62;
&#60;p&#62;As for the nodes with just 128 GB of data, could you run repair on those nodes? Repair is IO intensive so running staggered repairs are definitely advised.&#60;/p&#62;
&#60;p&#62;Once these operations are complete, then let us know if you still see the same issue. That will give us better ground to determine where the problem would lie.&#60;/p&#62;
&#60;p&#62;Thanks,&#60;br /&#62;
Joaquin
&#60;/p&#62;</description>
		</item>
		<item>
			<title>tamalex on "Unbalanced ring on Cassandra side of Brisk cluster"</title>
			<link>http://www.datastax.com/support-forums/topic/unbalanced-ring-on-cassandra-side-of-brisk-cluster#post-315</link>
			<pubDate>Tue, 19 Jul 2011 21:05:28 +0000</pubDate>
			<dc:creator>tamalex</dc:creator>
			<guid isPermaLink="false">315@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;My team is running brisk v1 beta2 on 12 nodes – 8 cassandra in DC1 and 4 brisk in DC 2 in EC2. Wrote a few TBs of data to the cluster via Hector as we have done in the past, and unfortunately the load is very unbalanced. Every key is the same size and we are using RandomPartitioner.&#60;/p&#62;
&#60;p&#62;There are two replicas of data in DC1 and one replica in DC2. The load amount in DC2 makes sense (about 250GB per node). DC1 should also have about 250GB per node (since there is twice the data and twice the number of nodes), but as can be seen below two nodes have an inordinate amount of data and the other 6 have only about 128GB:&#60;/p&#62;
&#60;p&#62;Address         DC          Rack        Status State   Load            Owns    Token&#60;br /&#62;
                                                                               148873535527910577765226390751398592512&#60;br /&#62;
10.2.206.x    DC1         RAC1        Up     Normal  901.6 GB        12.50%  0&#60;br /&#62;
10.116.230.x  DC2         RAC1        Up     Normal  258.23 GB       6.25%   10633823966279326983230456482242756608&#60;br /&#62;
10.110.6.x    DC1         RAC1        Up     Normal  129.08 GB       6.25%   21267647932558653966460912964485513216&#60;br /&#62;
10.2.38.x      DC1         RAC1        Up     Normal  128.51 GB       12.50%  42535295865117307932921825928971026432&#60;br /&#62;
10.114.39.x   DC2         RAC1        Up     Normal  257.32 GB       6.25%   53169119831396634916152282411213783040&#60;br /&#62;
10.210.27.x   DC1         RAC1        Up     Normal  128.67 GB       6.25%   63802943797675961899382738893456539648&#60;br /&#62;
10.207.39.x   DC1         RAC2        Up     Normal  643.14 GB       12.50%  85070591730234615865843651857942052864&#60;br /&#62;
10.85.157.x    DC2         RAC1        Up     Normal  256.78 GB       6.25%   95704415696513942849074108340184809472&#60;br /&#62;
10.2.209.x    DC1         RAC2        Up     Normal  128.96 GB       6.25%   106338239662793269832304564822427566080&#60;br /&#62;
10.96.74.x    DC1         RAC2        Up     Normal  128.3 GB        12.50%  127605887595351923798765477786913079296&#60;br /&#62;
10.194.205.x  DC2         RAC1        Up     Normal  257.15 GB       6.25%   138239711561631250781995934269155835904&#60;br /&#62;
10.201.194.x   DC1         RAC2        Up     Normal  129.46 GB       6.25%   148873535527910577765226390751398592512  &#60;/p&#62;
&#60;p&#62;I should also node that the first node used to have 640GB of load until the instance went down and we needed to run repair on a new instance in its place. Repair still hasn't finished running on it, and we're hoping it will get back down to 640GB when it does.&#60;/p&#62;
&#60;p&#62;Any ideas why this may have happened?&#60;/p&#62;
&#60;p&#62;Here is Sameer's note on this as well:&#60;br /&#62;
FYI - This manual reordering of the DCs and RACs might make it easier to see how the tokens are arranged. Pretty sure that the token ranges are picked correctly. Ignore the Owns column, b/c it is not multi-datacenter aware (so it thinks all of the nodes are in one ring as opposed to two (DC1 &#38;amp; DC2)).&#60;/p&#62;
&#60;p&#62;Here is what the nodetool ring output looked like before we replaced the 1st node (643 GB) with new hardware. After running repair on it, for some reason, to our dismay, it re-spawned as a 900+ GB node.&#60;/p&#62;
&#60;p&#62;Address         DC          Rack        Status State   Load            Owns    Token&#60;br /&#62;
                                                                               148873535527910577765226390751398592512&#60;br /&#62;
10.192.143.x       DC1         RAC1        Up     Normal  643.42 GB       12.50%  0&#60;br /&#62;
10.192.171.x    DC1         RAC1        Up     Normal  128.96 GB       6.25%   21267647932558653966460912964485513216&#60;br /&#62;
10.210.95.x       DC1         RAC1        Up     Normal  128.34 GB       12.50%  42535295865117307932921825928971026432&#60;br /&#62;
10.211.19.x        DC1         RAC1        Up     Normal  128.55 GB       6.25%   63802943797675961899382738893456539648&#60;br /&#62;
10.68.58.x         DC1         RAC2        Up     Normal  643.05 GB       12.50%  85070591730234615865843651857942052864&#60;br /&#62;
10.110.31.x        DC1         RAC2        Up     Normal  128.84 GB       6.25%   106338239662793269832304564822427566080&#60;br /&#62;
10.96.58.x        DC1         RAC2        Up     Normal  128.11 GB       12.50%  127605887595351923798765477786913079296&#60;br /&#62;
10.210.195.x       DC1         RAC2        Up     Normal  129.33 GB       6.25%   148873535527910577765226390751398592512&#60;br /&#62;
10.114.138.x      DC2         RAC1        Up     Normal  258.04 GB       6.25%   10633823966279326983230456482242756608&#60;br /&#62;
10.203.79.x       DC2         RAC1        Up     Normal  257.14 GB       6.25%   53169119831396634916152282411213783040&#60;br /&#62;
10.242.209.x      DC2         RAC1        Up     Normal  256.58 GB       6.25%   95704415696513942849074108340184809472&#60;br /&#62;
10.38.25.x        DC2         RAC1        Up     Normal  257.08 GB       6.25%   138239711561631250781995934269155835904
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
