<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>DataStax Support Forums &#187; Topic: Scalability Strategy</title>
		<link>http://www.datastax.com/support-forums/topic/scalability-strategy</link>
		<description>Software, Support, and Training for Apache Cassandra</description>
		<language>en-US</language>
		<pubDate>Mon, 20 May 2013 07:46:26 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.3</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://www.datastax.com/support-forums/search.php</link>
		</textInput>
		<atom:link href="http://www.datastax.com/support-forums/rss/topic/scalability-strategy" rel="self" type="application/rss+xml" />

		<item>
			<title>ZFabrik on "Scalability Strategy"</title>
			<link>http://www.datastax.com/support-forums/topic/scalability-strategy#post-213</link>
			<pubDate>Tue, 21 Jun 2011 17:37:28 +0000</pubDate>
			<dc:creator>ZFabrik</dc:creator>
			<guid isPermaLink="false">213@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Hi blueplastic, &#60;/p&#62;
&#60;p&#62;thank's a lot for this fast and formidable response :-)&#60;/p&#62;
&#60;p&#62;Udo
&#60;/p&#62;</description>
		</item>
		<item>
			<title>blueplastic on "Scalability Strategy"</title>
			<link>http://www.datastax.com/support-forums/topic/scalability-strategy#post-212</link>
			<pubDate>Tue, 21 Jun 2011 17:03:10 +0000</pubDate>
			<dc:creator>blueplastic</dc:creator>
			<guid isPermaLink="false">212@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Yeah, depending on your partitioner and data model, there's different strategies you can consider.&#60;/p&#62;
&#60;p&#62;If you're using the random partitioner (MD5 hash is used to determine where to place keys on the node ring), then you can assume that your data is evenly spread across all nodes. So, when all of the nodes are evenly getting full, you can opt to easily double the cluster size by bisecting each node's range in the ring. If you want, you could also just add like 3 additional nodes to an existing 20 node cluster. But then you would potentially have to just pick 3 nodes to bisect, or if you want an even spread of the data, you would have to change the token range on all 20 nodes and then have them redistribute data, which could be heavy on the network.&#60;/p&#62;
&#60;p&#62;If you're using an Order-preserving partitioner (OPP), then adding 3 nodes to a 20 node cluster makes more sense. As you may know, with OPP rows are stored by key order, aligning the physical structure of the data with the sort order (giving you the ability to perform range slices with ordering). So, if you're down with OPP, then your ring could potentially be very lopsided. When one node grows a hot spot, you can bisect its range in half.&#60;/p&#62;
&#60;p&#62;Also, keep in mind that you should only plan on using 50% of the free space on the Cassandra data volume. The other half needs to remain free to accommodate compaction. And on the Apache Mailing list, the general consensus is that each Cassandra node should store no more than maybe 500 GB of data. That's just a rule of thumb, so depending on your use case, you may be able to store more.&#60;/p&#62;
&#60;p&#62;Someone from DataStax might be able to provide some more guidance...&#60;/p&#62;
&#60;p&#62;I assume you've seen the DataStax post on adding nodes? :&#60;br /&#62;
&#60;a href=&#34;http://www.datastax.com/docs/0.8/install/adding_nodes&#34; rel=&#34;nofollow&#34;&#62;http://www.datastax.com/docs/0.8/install/adding_nodes&#60;/a&#62;
&#60;/p&#62;</description>
		</item>
		<item>
			<title>ZFabrik on "Scalability Strategy"</title>
			<link>http://www.datastax.com/support-forums/topic/scalability-strategy#post-211</link>
			<pubDate>Tue, 21 Jun 2011 15:41:46 +0000</pubDate>
			<dc:creator>ZFabrik</dc:creator>
			<guid isPermaLink="false">211@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Hi, &#60;/p&#62;
&#60;p&#62;On &#60;a href=&#34;http://www.datastax.com/solutions/scaleable-elastic-datacenter&#34; rel=&#34;nofollow&#34;&#62;http://www.datastax.com/solutions/scaleable-elastic-datacenter&#60;/a&#62; you write that is very easy to add new hardware on demand  to a Cassandra cluster.&#60;br /&#62;
I wonder on the best strategy here. Given a cluster with n nodes, does it makes sense to just add 1,2,3 or so, or should one always add exactly n nodes, i.e. should one always double the cluster size?&#60;br /&#62;
Since Cassandra arranges the data in a logical ring, new nodes can always be placed between two existing nodes, right?&#60;br /&#62;
This means however the token range of the successor neighbor is bisected.&#60;br /&#62;
Thus if not adding exactly n nodes the token ranges will become unbalanced across the ring, leading to different load factors for different nodes.&#60;/p&#62;
&#60;p&#62;Thanks in advance and best regards&#60;/p&#62;
&#60;p&#62;Udo
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
