<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>DataStax Support Forums &#187; User Favorites: janbalik</title>
		<link><a href='http://www.datastax.com/support-forums/profile/janbalik'>janbalik</a></link>
		<description>Software, Support, and Training for Apache Cassandra</description>
		<language>en-US</language>
		<pubDate>Sun, 26 May 2013 03:30:20 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.3</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://www.datastax.com/support-forums/search.php</link>
		</textInput>
		<atom:link href="http://www.datastax.com/support-forums/rss/profile/" rel="self" type="application/rss+xml" />

		<item>
			<title>arviarya on "Cassandra performance: split or not to split CF?"</title>
			<link>http://www.datastax.com/support-forums/topic/cassandra-performance-split-or-not-to-split-cf#post-9139</link>
			<pubDate>Fri, 22 Feb 2013 19:35:04 +0000</pubDate>
			<dc:creator>arviarya</dc:creator>
			<guid isPermaLink="false">9139@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;You can split a column family on requirement basis. It depends how you are organizing your data or fetch. Generally, huge columns columns families may throw lots of data in your program that you may not need.&#60;br /&#62;
Better organise your data in single and multiple CFs as per requirement. There are advantage of split CFs if you do not need to read both/all CFs for fetch.&#60;br /&#62;
Writes are faster, hence you can send few duplicate data in both CFs to avoid joins.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Janbalik on "Why a 3rd column family when creating a custom index on Cassandra?"</title>
			<link>http://www.datastax.com/support-forums/topic/why-a-3rd-column-family-when-creating-a-custom-index-on-cassandra#post-8850</link>
			<pubDate>Fri, 01 Feb 2013 17:00:12 +0000</pubDate>
			<dc:creator>Janbalik</dc:creator>
			<guid isPermaLink="false">8850@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Since I think someone else could have the same doubt than me, I'm going to answer my own question with the things I have found:&#60;/p&#62;
&#60;p&#62;As I supposed the main problem is the concurrency. If we assume that many users at the same time can be changing the same indexed value, as you have to read the index before updating, between the time you read a value and the time you update a value in the index another user could have changed again that value. As well as, from the moment the value is updated until you update the index the system could crash. Then, after a few concurrent changes the index could have old values that point to rows that have not that value. &#60;/p&#62;
&#60;p&#62;By adding the third column family this process is safer but NOT 100% SAFE. &#60;/p&#62;
&#60;p&#62;And a last thing: from my understanding, if there is no concurrency when updating the values, then there must be no problem. Let's supose you are indexing some user data. If  only the owner of the data is allowed to modify the data, there is no concurrency at all. The unique risk is the system crashes before you finish the process to align the index with the value, but this operation is idempotent, so you can repeat it until success. &#60;/p&#62;
&#60;p&#62;Hope this explain what I have understood and help others.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Janbalik on "Cassandra performance: split or not to split CF?"</title>
			<link>http://www.datastax.com/support-forums/topic/cassandra-performance-split-or-not-to-split-cf#post-8849</link>
			<pubDate>Fri, 01 Feb 2013 16:40:07 +0000</pubDate>
			<dc:creator>Janbalik</dc:creator>
			<guid isPermaLink="false">8849@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Hi @ll,&#60;/p&#62;
&#60;p&#62;I'm working on the design of a Cassandra database to learn about it. But I have a question I would like some expert help me to clarify:&#60;/p&#62;
&#60;p&#62;I have read that the rows of each column family are distributed through the nodes, thus each node has a part of the rows of a given column family. Does it mean that it is not a good idea to divide a column family into many column families even when that column family has millions of rows?&#60;/p&#62;
&#60;p&#62;My experience with RDBMS says that is better to split very big tables into smaller tables to get a better performance, but it seems that in Cassandra there is no need of this and, even more, if I have many column families I would need more memory. Am I right? Is it better keeping many rows in a column family to get a better performance than split the column family in many?&#60;/p&#62;
&#60;p&#62;Thanks!
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Janbalik on "Why a 3rd column family when creating a custom index on Cassandra?"</title>
			<link>http://www.datastax.com/support-forums/topic/why-a-3rd-column-family-when-creating-a-custom-index-on-cassandra#post-8702</link>
			<pubDate>Sun, 27 Jan 2013 10:54:22 +0000</pubDate>
			<dc:creator>Janbalik</dc:creator>
			<guid isPermaLink="false">8702@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Hi Khahn, &#60;/p&#62;
&#60;p&#62;Don't say sorry friend ;) You are helping me! Thank you very much. I'm still tring to understand this. I think a key is this: &#34;Updates to the index an data are not atomic&#34;. Another one is &#34;a node might index data held by another node&#34;. This is aligned with what I thought: you can have an index to a data which does not exist or a data not indexed. But, why using a third with the values indexed we can avoid this problem? We could say updates to the index, data and values indexes are not atomic, I mean, from my understanding, there could be concurrency and consistency problems also between the third CF (values indexed) and data. Or am I wrong? If so, why? Can you (or anyone) explain me why I'm wrong? &#60;/p&#62;
&#60;p&#62;Thanks!!
&#60;/p&#62;</description>
		</item>
		<item>
			<title>khahn on "Why a 3rd column family when creating a custom index on Cassandra?"</title>
			<link>http://www.datastax.com/support-forums/topic/why-a-3rd-column-family-when-creating-a-custom-index-on-cassandra#post-8617</link>
			<pubDate>Thu, 24 Jan 2013 02:09:59 +0000</pubDate>
			<dc:creator>khahn</dc:creator>
			<guid isPermaLink="false">8617@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Sorry the link didn't answer your question. I learned a little more after researching your question, but I'm afraid it's not a complete answer. I hope an expert gives you a better explanation soon. &#60;/p&#62;
&#60;p&#62;The third column family is needed because simply reading the previous value from the users CF before updating it and then removing the index entry for that value from the Users_Index_Entries will not reliably work due to Cassandra's model of eventual consistency and lack of transactions. When creating your own index, a node might index data held by another node. Updates to the index and data are not atomic. To overcome these problems, you maintain a third column family to list previous values for the property of an item and use the list to remove these values from the index before adding the new value to the index.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Janbalik on "Why a 3rd column family when creating a custom index on Cassandra?"</title>
			<link>http://www.datastax.com/support-forums/topic/why-a-3rd-column-family-when-creating-a-custom-index-on-cassandra#post-8609</link>
			<pubDate>Wed, 23 Jan 2013 18:10:30 +0000</pubDate>
			<dc:creator>Janbalik</dc:creator>
			<guid isPermaLink="false">8609@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Hi Khahn, &#60;/p&#62;
&#60;p&#62;Thanks for your answer. I had read the links you recommend. And I have read them again (a few times), but I still don't understand the problem. Probably it is obvious, but I don't see it. Why there could be a concurrency issue if I read from users CF instead of reading from Users_Index_Entries? Both CF must have the most recent value due to the timestamp which is part of each column. What is the difference? Following the process I don't see why the value from Users CF could be not aligned with the index CF if I use only these 2 CFs. Do you (or anyone) know what is the problem that could happen? Have any example of the issue?&#60;/p&#62;
&#60;p&#62;Thank you very much!
&#60;/p&#62;</description>
		</item>
		<item>
			<title>khahn on "Why a 3rd column family when creating a custom index on Cassandra?"</title>
			<link>http://www.datastax.com/support-forums/topic/why-a-3rd-column-family-when-creating-a-custom-index-on-cassandra#post-8582</link>
			<pubDate>Tue, 22 Jan 2013 19:34:12 +0000</pubDate>
			<dc:creator>khahn</dc:creator>
			<guid isPermaLink="false">8582@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Does the last paragraph (below) of this InfoQ article help: &#60;a href=&#34;http://www.infoq.com/news/2011/07/cassandraindexing&#34; rel=&#34;nofollow&#34;&#62;http://www.infoq.com/news/2011/07/cassandraindexing&#60;/a&#62; This article also contains a link to sample code: &#60;a href=&#34;https://github.com/edanuff/CassandraIndexedCollections&#34; rel=&#34;nofollow&#34;&#62;https://github.com/edanuff/CassandraIndexedCollections&#60;/a&#62;&#60;/p&#62;
&#60;p&#62;&#34;In general, he (Anuff) said that reading before writing can be an issue with Cassandra. Rather than doing locking (e.g., with ZooKeeper), Anuff presented a technique that uses three Column Families. For example, in a table with a users Column Family and an indexes Column Family, there will be a third Column Family Users_Index_Entries. Updates first read the previous index values from this column family to avoid concurrency issues and both it and Users use timestamped columns to avoid the need for locking. ...&#34;
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Janbalik on "Why a 3rd column family when creating a custom index on Cassandra?"</title>
			<link>http://www.datastax.com/support-forums/topic/why-a-3rd-column-family-when-creating-a-custom-index-on-cassandra#post-8559</link>
			<pubDate>Mon, 21 Jan 2013 19:24:51 +0000</pubDate>
			<dc:creator>Janbalik</dc:creator>
			<guid isPermaLink="false">8559@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;As I always say, sorry for my English. I'm working on creating some manual indexes for some column families in Cassandra. I have read everything I could about this but I have found something I'm not able to understand properly.&#60;/p&#62;
&#60;p&#62;In this presentation (&#60;a href=&#34;http://www.slideshare.net/edanuff/indexing-in-cassandra)&#34; rel=&#34;nofollow&#34;&#62;http://www.slideshare.net/edanuff/indexing-in-cassandra)&#60;/a&#62;, pages 36 to 45, done by Ed Anuff, I have seen his simple example for creating an index for a Users column family. He uses the 2 obvious CFs and another one to deal with concurrency. This third CF is &#34;my problem&#34;. If I'm not wrong, Cassandra will always store the most recent value for each column. If this value is indexed, I have to update it in the Index CF (delete old index and create the new one), but why it is necessary the third CF? When I think about that and the concurrency, what my understanding says is: ok, many people updating a value which is indexed. It will mean a lot of work updating the index, but finally the last value will be in the Users CF and also in the Index CF, that's why there is a timestamp per column, so what's the matter with the concurrency? Even more, if the value can be updated only by one user (the owner of the data), there will be no concurrency...&#60;/p&#62;
&#60;p&#62;I know I am a big ignorant in Cassandra affairs, but I don't see the reason behind the third CF. Ed Anuff explains that using this third column family you can restore the indexes to a consistent status (&#60;a href=&#34;http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html)&#34; rel=&#34;nofollow&#34;&#62;http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html)&#60;/a&#62;, but, why are them going to fall into an inconsistent status? And, if this happens, the Users CF could be enough to restore the index, or am I wrong?&#60;/p&#62;
&#60;p&#62;Please, could someone explain me this? What is/are my error/s?
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Janbalik on "Better choice on manual indexes"</title>
			<link>http://www.datastax.com/support-forums/topic/better-choice-on-manual-indexes#post-8402</link>
			<pubDate>Mon, 14 Jan 2013 13:42:09 +0000</pubDate>
			<dc:creator>Janbalik</dc:creator>
			<guid isPermaLink="false">8402@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Hi khahn! I have checked the links and I had already read them before writting my post, except the last one. Unfortunately, I think secondary indexes are not the best solution and the example of the last link I think it is not suitable for my needs, but I appreciate a lot you help ;). Thank you very much.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Janbalik on "Better choice on manual indexes"</title>
			<link>http://www.datastax.com/support-forums/topic/better-choice-on-manual-indexes#post-8390</link>
			<pubDate>Sun, 13 Jan 2013 19:24:52 +0000</pubDate>
			<dc:creator>Janbalik</dc:creator>
			<guid isPermaLink="false">8390@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Thank you very much, khahn ;) I'm going to check all the links with a lot of interest!
&#60;/p&#62;</description>
		</item>
		<item>
			<title>khahn on "Better choice on manual indexes"</title>
			<link>http://www.datastax.com/support-forums/topic/better-choice-on-manual-indexes#post-8389</link>
			<pubDate>Sun, 13 Jan 2013 17:45:46 +0000</pubDate>
			<dc:creator>khahn</dc:creator>
			<guid isPermaLink="false">8389@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Thanks for the interesting question. I cannot provide a complete answer, but hope to give you a clue until an expert answers.  Please see:&#60;/p&#62;
&#60;p&#62;&#60;a href=&#34;http://www.datastax.com/docs/1.2/ddl/indexes#when-to-use-secondary-indexes&#34; rel=&#34;nofollow&#34;&#62;http://www.datastax.com/docs/1.2/ddl/indexes#when-to-use-secondary-indexes&#60;/a&#62;&#60;br /&#62;
&#60;a href=&#34;http://www.datastax.com/docs/1.2/ddl/indexes#when-not-to-use-secondary-indexes&#34; rel=&#34;nofollow&#34;&#62;http://www.datastax.com/docs/1.2/ddl/indexes#when-not-to-use-secondary-indexes&#60;/a&#62;&#60;br /&#62;
&#60;a href=&#34;http://www.datastax.com/docs/1.2/ddl/indexes#building-and-using-secondary-indexes&#34; rel=&#34;nofollow&#34;&#62;http://www.datastax.com/docs/1.2/ddl/indexes#building-and-using-secondary-indexes&#60;/a&#62;, which describes the &#34;tables as indexes&#34; (same as your second option)&#60;br /&#62;
&#60;a href=&#34;http://www.datastax.com/docs/1.2/ddl/table#example-a-music-service&#34; rel=&#34;nofollow&#34;&#62;http://www.datastax.com/docs/1.2/ddl/table#example-a-music-service&#60;/a&#62;, which covers using a secondary index in the songs example.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Janbalik on "Better choice on manual indexes"</title>
			<link>http://www.datastax.com/support-forums/topic/better-choice-on-manual-indexes#post-8383</link>
			<pubDate>Sun, 13 Jan 2013 13:04:23 +0000</pubDate>
			<dc:creator>Janbalik</dc:creator>
			<guid isPermaLink="false">8383@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Another detail: I have calculated that using the second option (storing all the attributes in the CF which works as &#34;indexes&#34;) I need about 80% more memory than using the first option (CFs really work as indexes to find the right data in the &#34;main&#34; CF of songs).
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Janbalik on "Better choice on manual indexes"</title>
			<link>http://www.datastax.com/support-forums/topic/better-choice-on-manual-indexes#post-8382</link>
			<pubDate>Sun, 13 Jan 2013 11:58:19 +0000</pubDate>
			<dc:creator>Janbalik</dc:creator>
			<guid isPermaLink="false">8382@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Hi @ll experts,&#60;/p&#62;
&#60;p&#62;First, excuse me for my English. It is not my native language. I'm working on moving a SQL database to Cassandra but I have a question I'm not able to solve. Let's say I have a SQL table where I store songs. Each song has an ID as primary key which allows to access all its related data, which are stored in the fields of the row given by the key. I also have some indexes to search using some different criterias as the author, gender, title... &#60;/p&#62;
&#60;p&#62;When I think on moving this to a Cassandra schema, I work around the idea that I can create an equivalent column family, where the song ID is the row key and the song attributes are the columns. Then, I can create 5 or 6 manual indexes to search by author, title, gender and more. The author, title... will be the column key (adding some extra data to keep them unique, using a composite column name) and the value will be the song ID for searching in the static column family where each row is identified by the song ID.&#60;/p&#62;
&#60;p&#62;But I here appears my doubt. What is better: each index CF storing only the ID or storing all the attributes? The first option allows me to reduce the amount of necessary memory, but I need (at least) 2 reads to get each song attributes. With the second option I need more memory because repeat the same information once per index, but by in one read I can get all the attributes I need. I think I can assume the extra memory needed if this will be a faster schema, but, it will be really faster? Having a bigger database will not make it work slower? Or the slower operation is to search each row given by the index CF due to the way Cassandra stores the rows and due to the 2 reads?&#60;/p&#62;
&#60;p&#62;Any help will be very appreciated. &#60;/p&#62;
&#60;p&#62;Thanks in advance!
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
