<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>DataStax Support Forums &#187; User Favorites: ethrbunny</title>
		<link><a href='http://www.datastax.com/support-forums/profile/ethrbunny'>ethrbunny</a></link>
		<description>Software, Support, and Training for Apache Cassandra</description>
		<language>en-US</language>
		<pubDate>Sun, 19 May 2013 08:35:54 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.3</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://www.datastax.com/support-forums/search.php</link>
		</textInput>
		<atom:link href="http://www.datastax.com/support-forums/rss/profile/" rel="self" type="application/rss+xml" />

		<item>
			<title>ethrbunny on "Sequencing - how to replace auto_increment?"</title>
			<link>http://www.datastax.com/support-forums/topic/sequencing-how-to-replace-auto_increment#post-8670</link>
			<pubDate>Fri, 25 Jan 2013 17:06:38 +0000</pubDate>
			<dc:creator>ethrbunny</dc:creator>
			<guid isPermaLink="false">8670@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;One sticking point to my adopting Cassandra for our research is the question of how to set 'milestones' or 'markers' in data and do 'retrieve all since last marker'. In MySQL I use an auto_increment column and just select for everything greater than a known point.&#60;/p&#62;
&#60;p&#62;Situation: we collect and store sensor data. Its hundreds of millions of records x hundreds of locations. The data comes in as 'raw' and has to be processed several times over. To this end I keep pointers of 'what have I looked at'. &#60;/p&#62;
&#60;p&#62;How could I implement this in Cassandra given the 'eventual consistency' model? I've pondered date/time/mSec/sensor keys and even putting a 'seen'/'not seen' bit on each piece of sensor data. &#60;/p&#62;
&#60;p&#62;Any suggestions on how this could be implemented?
&#60;/p&#62;</description>
		</item>
		<item>
			<title>ethrbunny on "Diskless nodes?"</title>
			<link>http://www.datastax.com/support-forums/topic/diskless-nodes#post-8457</link>
			<pubDate>Wed, 16 Jan 2013 16:07:45 +0000</pubDate>
			<dc:creator>ethrbunny</dc:creator>
			<guid isPermaLink="false">8457@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;(bear with me) I'm wondering about the feasibility of using diskless nodes in a cluster. Assume that each node is PXE booted, has bonded 10G NICs and gets / puts data into a DFS. Nodes would be maxed out on RAM to reduce swapping. &#60;/p&#62;
&#60;p&#62;Seems like this would have the advantages of:&#60;br /&#62;
1) less power consumption&#60;br /&#62;
2) easy to replace failed nodes&#60;br /&#62;
3) easy to add new nodes / grow array&#60;br /&#62;
4) lower cost / node&#60;/p&#62;
&#60;p&#62;Disadvantages:&#60;br /&#62;
1) slower disk IO speed&#60;br /&#62;
2) ??&#60;/p&#62;
&#60;p&#62;What *huge* *horrible* *are you crazy* things am I missing?
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Sven on "Recommended disk / node?"</title>
			<link>http://www.datastax.com/support-forums/topic/recommended-disk-node#post-8162</link>
			<pubDate>Thu, 27 Dec 2012 17:18:36 +0000</pubDate>
			<dc:creator>Sven</dc:creator>
			<guid isPermaLink="false">8162@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;There are several factors impacting the node sizing. Tyler gives a good overview here: &#60;a href=&#34;http://stackoverflow.com/questions/4775388/how-much-data-per-node-in-cassandra-cluster&#34; rel=&#34;nofollow&#34;&#62;http://stackoverflow.com/questions/4775388/how-much-data-per-node-in-cassandra-cluster&#60;/a&#62;
&#60;/p&#62;</description>
		</item>
		<item>
			<title>ethrbunny on "Recommended disk / node?"</title>
			<link>http://www.datastax.com/support-forums/topic/recommended-disk-node#post-8093</link>
			<pubDate>Mon, 24 Dec 2012 13:27:18 +0000</pubDate>
			<dc:creator>ethrbunny</dc:creator>
			<guid isPermaLink="false">8093@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;I've been following a discussion about Cassandra capacity planning on serverfault.com where it was suggested that nodes should have no more than .5 Tb of data. IE for 5 Tb of data one would need at least 10 nodes. I haven't been able to verify this against any documentation though. Has anyone seen recommendation(s) like this?  &#60;/p&#62;
&#60;p&#62;Note - this is exclusive of how much RAM each node has. My understanding on this is that 'more is better - but don't give Java too much'.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Sven on "Using Lustre as a backend?"</title>
			<link>http://www.datastax.com/support-forums/topic/using-lustre-as-a-backend#post-7961</link>
			<pubDate>Wed, 12 Dec 2012 21:25:10 +0000</pubDate>
			<dc:creator>Sven</dc:creator>
			<guid isPermaLink="false">7961@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;There are too many factors that will influence the behavior to know up front how this will perform. In general we have found that local disks are the way to go. You can tune Cassandra for a slower disk backend (for example increase the fsync interval to batch up more data to write, but at the risk that data is lost in a catastrophic event). &#60;/p&#62;
&#60;p&#62;My concern would be that you turn a local disk seek and write into an extra network transport plus a remote seek and write. Depending on your data volume and response time requirements that may be not an issue though. Technically you can use a mounted filesystem for sure (see the previous link for some suggestions on setup), but the best thing to do might be to run an experiment.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>ethrbunny on "Using Lustre as a backend?"</title>
			<link>http://www.datastax.com/support-forums/topic/using-lustre-as-a-backend#post-7942</link>
			<pubDate>Tue, 11 Dec 2012 14:45:04 +0000</pubDate>
			<dc:creator>ethrbunny</dc:creator>
			<guid isPermaLink="false">7942@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;It appears that Cassandra is using a write-behind for the commit log(config dependent). The reason I'm considering the DFS path is that it seems like a server with many cores could handle this without slowing the process.&#60;/p&#62;
&#60;p&#62;Given my (likely) investment in a robust DFS system I'm wondering if I can buy cheaper machines for Cassandra (many cores, lots of RAM, minimal disk) instead.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Sven on "Using Lustre as a backend?"</title>
			<link>http://www.datastax.com/support-forums/topic/using-lustre-as-a-backend#post-7934</link>
			<pubDate>Tue, 11 Dec 2012 05:35:59 +0000</pubDate>
			<dc:creator>Sven</dc:creator>
			<guid isPermaLink="false">7934@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;I would say that using a DFS to store local Cassandra data files is probably not the best way (at the very least you would probably want to keep the commit log on a local drive). Cassandra itself will move data between nodes to balance access and provide replication. I don't think there is a technical reason for this to not work though.&#60;/p&#62;
&#60;p&#62;You can find some discussion around using iSCSI here &#60;a href=&#34;http://www.mail-archive.com/user@cassandra.apache.org/msg09020.html&#34; rel=&#34;nofollow&#34;&#62;http://www.mail-archive.com/user@cassandra.apache.org/msg09020.html&#60;/a&#62;. Not exactly what you want to do, but also an approach not using a local drive.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>ethrbunny on "Using Lustre as a backend?"</title>
			<link>http://www.datastax.com/support-forums/topic/using-lustre-as-a-backend#post-7907</link>
			<pubDate>Sun, 09 Dec 2012 13:09:14 +0000</pubDate>
			<dc:creator>ethrbunny</dc:creator>
			<guid isPermaLink="false">7907@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Im likely to migrate our data to Cassandra in the coming months. We're about to enter a phase of 10x growth followed by another 10x soon after and it's more than our current setup can handle. &#60;/p&#62;
&#60;p&#62;One aspect of this is file storage. We collect sensor data from many sources and flat files typically get included. This data will be put into Lustre which leads me to my question: how viable would it be to use Lustre as the file store for Cassandra? IE build servers with lots of RAM but minimal disk space and use a DFS to store the DB logs and such. &#60;/p&#62;
&#60;p&#62;It sounds like Cassandra uses disk for compaction, backup and logging. Would it suffer if the files were networked and not local? Im thinking multiple 1Gb bonded network - maybe 10Gb if necessary. &#60;/p&#62;
&#60;p&#62;Thoughts? Comments?
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
