As I always say, sorry for my English. I'm working on creating some manual indexes for some column families in Cassandra. I have read everything I could about this but I have found something I'm not able to understand properly.
In this presentation (http://www.slideshare.net/edanuff/indexing-in-cassandra), pages 36 to 45, done by Ed Anuff, I have seen his simple example for creating an index for a Users column family. He uses the 2 obvious CFs and another one to deal with concurrency. This third CF is "my problem". If I'm not wrong, Cassandra will always store the most recent value for each column. If this value is indexed, I have to update it in the Index CF (delete old index and create the new one), but why it is necessary the third CF? When I think about that and the concurrency, what my understanding says is: ok, many people updating a value which is indexed. It will mean a lot of work updating the index, but finally the last value will be in the Users CF and also in the Index CF, that's why there is a timestamp per column, so what's the matter with the concurrency? Even more, if the value can be updated only by one user (the owner of the data), there will be no concurrency...
I know I am a big ignorant in Cassandra affairs, but I don't see the reason behind the third CF. Ed Anuff explains that using this third column family you can restore the indexes to a consistent status (http://www.anuff.com/2010/07/secondary-indexes-in-cassandra.html), but, why are them going to fall into an inconsistent status? And, if this happens, the Users CF could be enough to restore the index, or am I wrong?
Please, could someone explain me this? What is/are my error/s?
