Thank you so much for anwering, Nate. To be more specific about what I'm trying to do:
The use case is a location CF with predefined columns for county, municipality etc, and an areaid. There are about 5.300 rows in this cf.
In addition to using this as a cf for lookup, it is also used to verify if areaids from a log import job are valid when importing log lines. To do this, we have so far read all ids by getting all rows and returning the id column (done once, putting the IDs in a List).
However, I thought it would be more efficient (and more cassandra-ish) to include 1 additional row in the location cf with all the IDS, so that when importing we would instead just read the valid IDs from columns in this row. And this is where the CQL dilemma shows up, writing 5300+ columns in that gnarly CQL statement :-)
But we also thought about another option: concatenating all the IDs into a loong delimited String and then writing that string to a single column in the new lookup row, and then splitting it again when doing the import job.
This approach has a hacky feel to it, but whatever is more efficient is good. It should also ensure less overhead than using thousands of columns, as the location cf is also rebuilt periodically, and this may involve areaids being removed. An update query against the row with thousands of columns will not remove the IDs that are obsolete, so we would have to first delete the row and then write it with the new columns. Having all IDs concatenated in one column means the update only needs to update this single column.
However, this could perhaps also result in the query being too large for a single thrift msg?
That was a lot of rambling. What do you think?
1 column?
Or thousands of columns with row deletion and new insert, in batches of i.e. 500 columns?
The columns will be String columns with a maximum of 5 digits each: 20012,20063 etc
Marius