<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>DataStax Support Forums &#187; Topic: A lot of disk space used and unbalanced nodes</title>
		<link>http://www.datastax.com/support-forums/topic/a-lot-of-disk-space-used-and-unbalanced-nodes</link>
		<description>Software, Support, and Training for Apache Cassandra</description>
		<language>en-US</language>
		<pubDate>Wed, 22 May 2013 14:19:24 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.3</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://www.datastax.com/support-forums/search.php</link>
		</textInput>
		<atom:link href="http://www.datastax.com/support-forums/rss/topic/a-lot-of-disk-space-used-and-unbalanced-nodes" rel="self" type="application/rss+xml" />

		<item>
			<title>ithkuil on "A lot of disk space used and unbalanced nodes"</title>
			<link>http://www.datastax.com/support-forums/topic/a-lot-of-disk-space-used-and-unbalanced-nodes#post-151</link>
			<pubDate>Fri, 03 Jun 2011 11:23:22 +0000</pubDate>
			<dc:creator>ithkuil</dc:creator>
			<guid isPermaLink="false">151@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;I was using cassandra 0.7.6, replication factor 3. Feeding data with hadoop.&#60;/p&#62;
&#60;p&#62;I written a java client to feed the data without hadoop and I never run into this issue again. I'll retry later when I upgrade to cassandra 0.8.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>joaquin on "A lot of disk space used and unbalanced nodes"</title>
			<link>http://www.datastax.com/support-forums/topic/a-lot-of-disk-space-used-and-unbalanced-nodes#post-118</link>
			<pubDate>Wed, 25 May 2011 22:22:24 +0000</pubDate>
			<dc:creator>joaquin</dc:creator>
			<guid isPermaLink="false">118@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;What versions of Cassandra and OpsCenter are you running?&#60;/p&#62;
&#60;p&#62;Having that much data usage to actual data is not normal.&#60;/p&#62;
&#60;p&#62;Having one node bigger doesn't appear to be normal behavior either. Because random partitoner evenly distributes the data, I don't see how this is happening.&#60;/p&#62;
&#60;p&#62;What replication factor are you using? Are there any errors in any of your logs? How are you sending the data to Cassandra?
&#60;/p&#62;</description>
		</item>
		<item>
			<title>ithkuil on "A lot of disk space used and unbalanced nodes"</title>
			<link>http://www.datastax.com/support-forums/topic/a-lot-of-disk-space-used-and-unbalanced-nodes#post-77</link>
			<pubDate>Wed, 18 May 2011 17:21:27 +0000</pubDate>
			<dc:creator>ithkuil</dc:creator>
			<guid isPermaLink="false">77@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;I have a cassandra installation with 3 nodes + OpsCenter.&#60;/p&#62;
&#60;p&#62;I loaded about 800k rows containing a single value, about 2.6k size each.&#60;/p&#62;
&#60;p&#62;The content was loaded using a maponly job from hadoop, the input file 2 Gb input file.&#60;/p&#62;
&#60;p&#62;I started with an empty cassandra cluster (except for OpsCenter data, about 50M)&#60;br /&#62;
After the import the cassandra nodes have from 20 to 30 Gb used data.&#60;/p&#62;
&#60;p&#62;I tried compacting, multiple times, draning, shutting down, restarting, compacting, repairing, etc&#60;/p&#62;
&#60;p&#62;Usage is still high, far above what is needed to store the data. There are no deleted&#60;br /&#62;
rows, and the commit logs are drained and everything compacted.&#60;/p&#62;
&#60;p&#62;Why on earth does cassandra require so much disk space?&#60;/p&#62;
&#60;p&#62;(I have to store max 10 Gb of real data, I have to allocate 100Gb now to hold 2-3 Gb and have free space to perform compactions and repairs etc)&#60;/p&#62;
&#60;p&#62;Furthermore, why one node is so bigger than others? I use random partitioner.&#60;/p&#62;
&#60;p&#62;&#60;a href=&#34;http://awesomescreenshot.com/05cd8igb1&#34; rel=&#34;nofollow&#34;&#62;http://awesomescreenshot.com/05cd8igb1&#60;/a&#62;&#60;/p&#62;
&#60;p&#62;Is there a way to see that the 30 Gb of the biggest node are indeed filled also by data&#60;br /&#62;
arriving from the another peer; and that part of the data which belongs to this biggest node is actually replicated? I have somehow the feeling that one node gets more data and doesn't replicate. (my payloads are quite homogeneous)&#60;/p&#62;
&#60;p&#62;Any help is appreciated!&#60;/p&#62;
&#60;p&#62;Marko
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
