<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>DataStax Support Forums &#187; Topic: Why a 3rd column family when creating a custom index on Cassandra?</title>
		<link>http://www.datastax.com/support-forums/topic/why-a-3rd-column-family-when-creating-a-custom-index-on-cassandra</link>
		<description>Software, Support, and Training for Apache Cassandra</description>
		<language>en-US</language>
		<pubDate>Sat, 25 May 2013 15:34:51 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.3</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://www.datastax.com/support-forums/search.php</link>
		</textInput>
		<atom:link href="http://www.datastax.com/support-forums/rss/topic/why-a-3rd-column-family-when-creating-a-custom-index-on-cassandra" rel="self" type="application/rss+xml" />

		<item>
			<title>Janbalik on "Why a 3rd column family when creating a custom index on Cassandra?"</title>
			<link>http://www.datastax.com/support-forums/topic/why-a-3rd-column-family-when-creating-a-custom-index-on-cassandra#post-8850</link>
			<pubDate>Fri, 01 Feb 2013 17:00:12 +0000</pubDate>
			<dc:creator>Janbalik</dc:creator>
			<guid isPermaLink="false">8850@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Since I think someone else could have the same doubt than me, I'm going to answer my own question with the things I have found:&#60;/p&#62;
&#60;p&#62;As I supposed the main problem is the concurrency. If we assume that many users at the same time can be changing the same indexed value, as you have to read the index before updating, between the time you read a value and the time you update a value in the index another user could have changed again that value. As well as, from the moment the value is updated until you update the index the system could crash. Then, after a few concurrent changes the index could have old values that point to rows that have not that value. &#60;/p&#62;
&#60;p&#62;By adding the third column family this process is safer but NOT 100% SAFE. &#60;/p&#62;
&#60;p&#62;And a last thing: from my understanding, if there is no concurrency when updating the values, then there must be no problem. Let's supose you are indexing some user data. If  only the owner of the data is allowed to modify the data, there is no concurrency at all. The unique risk is the system crashes before you finish the process to align the index with the value, but this operation is idempotent, so you can repeat it until success. &#60;/p&#62;
&#60;p&#62;Hope this explain what I have understood and help others.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Janbalik on "Why a 3rd column family when creating a custom index on Cassandra?"</title>
			<link>http://www.datastax.com/support-forums/topic/why-a-3rd-column-family-when-creating-a-custom-index-on-cassandra#post-8702</link>
			<pubDate>Sun, 27 Jan 2013 10:54:22 +0000</pubDate>
			<dc:creator>Janbalik</dc:creator>
			<guid isPermaLink="false">8702@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Hi Khahn, &#60;/p&#62;
&#60;p&#62;Don't say sorry friend ;) You are helping me! Thank you very much. I'm still tring to understand this. I think a key is this: &#34;Updates to the index an data are not atomic&#34;. Another one is &#34;a node might index data held by another node&#34;. This is aligned with what I thought: you can have an index to a data which does not exist or a data not indexed. But, why using a third with the values indexed we can avoid this problem? We could say updates to the index, data and values indexes are not atomic, I mean, from my understanding, there could be concurrency and consistency problems also between the third CF (values indexed) and data. Or am I wrong? If so, why? Can you (or anyone) explain me why I'm wrong? &#60;/p&#62;
&#60;p&#62;Thanks!!
&#60;/p&#62;</description>
		</item>
		<item>
			<title>khahn on "Why a 3rd column family when creating a custom index on Cassandra?"</title>
			<link>http://www.datastax.com/support-forums/topic/why-a-3rd-column-family-when-creating-a-custom-index-on-cassandra#post-8617</link>
			<pubDate>Thu, 24 Jan 2013 02:09:59 +0000</pubDate>
			<dc:creator>khahn</dc:creator>
			<guid isPermaLink="false">8617@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Sorry the link didn't answer your question. I learned a little more after researching your question, but I'm afraid it's not a complete answer. I hope an expert gives you a better explanation soon. &#60;/p&#62;
&#60;p&#62;The third column family is needed because simply reading the previous value from the users CF before updating it and then removing the index entry for that value from the Users_Index_Entries will not reliably work due to Cassandra's model of eventual consistency and lack of transactions. When creating your own index, a node might index data held by another node. Updates to the index and data are not atomic. To overcome these problems, you maintain a third column family to list previous values for the property of an item and use the list to remove these values from the index before adding the new value to the index.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Janbalik on "Why a 3rd column family when creating a custom index on Cassandra?"</title>
			<link>http://www.datastax.com/support-forums/topic/why-a-3rd-column-family-when-creating-a-custom-index-on-cassandra#post-8609</link>
			<pubDate>Wed, 23 Jan 2013 18:10:30 +0000</pubDate>
			<dc:creator>Janbalik</dc:creator>
			<guid isPermaLink="false">8609@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Hi Khahn, &#60;/p&#62;
&#60;p&#62;Thanks for your answer. I had read the links you recommend. And I have read them again (a few times), but I still don't understand the problem. Probably it is obvious, but I don't see it. Why there could be a concurrency issue if I read from users CF instead of reading from Users_Index_Entries? Both CF must have the most recent value due to the timestamp which is part of each column. What is the difference? Following the process I don't see why the value from Users CF could be not aligned with the index CF if I use only these 2 CFs. Do you (or anyone) know what is the problem that could happen? Have any example of the issue?&#60;/p&#62;
&#60;p&#62;Thank you very much!
&#60;/p&#62;</description>
		</item>
		<item>
			<title>khahn on "Why a 3rd column family when creating a custom index on Cassandra?"</title>
			<link>http://www.datastax.com/support-forums/topic/why-a-3rd-column-family-when-creating-a-custom-index-on-cassandra#post-8582</link>
			<pubDate>Tue, 22 Jan 2013 19:34:12 +0000</pubDate>
			<dc:creator>khahn</dc:creator>
			<guid isPermaLink="false">8582@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Does the last paragraph (below) of this InfoQ article help: &#60;a href=&#34;http://www.infoq.com/news/2011/07/cassandraindexing&#34; rel=&#34;nofollow&#34;&#62;http://www.infoq.com/news/2011/07/cassandraindexing&#60;/a&#62; This article also contains a link to sample code: &#60;a href=&#34;https://github.com/edanuff/CassandraIndexedCollections&#34; rel=&#34;nofollow&#34;&#62;https://github.com/edanuff/CassandraIndexedCollections&#60;/a&#62;&#60;/p&#62;
&#60;p&#62;&#34;In general, he (Anuff) said that reading before writing can be an issue with Cassandra. Rather than doing locking (e.g., with ZooKeeper), Anuff presented a technique that uses three Column Families. For example, in a table with a users Column Family and an indexes Column Family, there will be a third Column Family Users_Index_Entries. Updates first read the previous index values from this column family to avoid concurrency issues and both it and Users use timestamped columns to avoid the need for locking. ...&#34;
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Janbalik on "Why a 3rd column family when creating a custom index on Cassandra?"</title>
			<link>http://www.datastax.com/support-forums/topic/why-a-3rd-column-family-when-creating-a-custom-index-on-cassandra#post-8559</link>
			<pubDate>Mon, 21 Jan 2013 19:24:51 +0000</pubDate>
			<dc:creator>Janbalik</dc:creator>
			<guid isPermaLink="false">8559@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;As I always say, sorry for my English. I'm working on creating some manual indexes for some column families in Cassandra. I have read everything I could about this but I have found something I'm not able to understand properly.&#60;/p&#62;
&#60;p&#62;In this presentation (&#60;a href=&#34;http://www.slideshare.net/edanuff/indexing-in-cassandra)&#34; rel=&#34;nofollow&#34;&#62;http://www.slideshare.net/edanuff/indexing-in-cassandra)&#60;/a&#62;, pages 36 to 45, done by Ed Anuff, I have seen his simple example for creating an index for a Users column family. He uses the 2 obvious CFs and another one to deal with concurrency. This third CF is &#34;my problem&#34;. If I'm not wrong, Cassandra will always store the most recent value for each column. If this value is indexed, I have to update it in the Index CF (delete old index and create the new one), but why it is necessary the third CF? When I think about that and the concurrency, what my understanding says is: ok, many people updating a value which is indexed. It will mean a lot of work updating the index, but finally the last value will be in the Users CF and also in the Index CF, that's why there is a timestamp per column, so what's the matter with the concurrency? Even more, if the value can be updated only by one user (the owner of the data), there will be no concurrency...&#60;/p&#62;
&#60;p&#62;I know I am a big ignorant in Cassandra affairs, but I don't see the reason behind the third CF. Ed Anuff explains that using this third column family you can restore the indexes to a consistent status (&#60;a href=&#34;http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html)&#34; rel=&#34;nofollow&#34;&#62;http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html)&#60;/a&#62;, but, why are them going to fall into an inconsistent status? And, if this happens, the Users CF could be enough to restore the index, or am I wrong?&#60;/p&#62;
&#60;p&#62;Please, could someone explain me this? What is/are my error/s?
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
