I have a ColumnFamily in cassandra with only 4 rows (keys). Each row has a lot of columns (and corresponding values). The 'column names' are numbers but stored as 'UTF-8' strings (for some reasons). There are literally millions of columns per row. I want to do a selective 'get' from each row to fetch values of very small number of columns...something like this (using pycassa python)
from pycassa.pool import ConnectionPool
from pycassa.columnfamily import ColumnFamily
pool = ConnectionPool('BIGDATA_1', server_list=['127.0.0.1'])
col_fam = ColumnFamily(pool, 'row1')
print col_fam.get('IDTOEVENT',['12', '135', '1234'])
The output I get is really strange:-
OrderedDict([(u'12', 'Value of 12'), (u'1234', 'Value of 1234')])
If you see value for column '135' is missing.
If I do a get() on individual column names like '12', '135' and '1234' I see correct results.
I feel, since column names are UTF-8 encoded, they are sorted alphabetically inside cassandra. Hence, while giving the output, Cassandra is throwing 12, followed by 1234 and then perhaps runs 'out of range' or 'out of buffer' limit. This missed 135 ..
I am still new to Cassandra, so there is a high chance that I have missed something.
Could someone please explain what is happening under the hood?