We are trying to see whether to use HBase or Cassandra for our use case.
We are facing performance issue while inserting the data in to Cassandra.
We need to load 1 csv file(Structured file having 14000 column) in to Cassandra. We have written a code which is parsing the csv file and putting the data in to Cassandra, using the Hector interface. Since the column count is high we are using
"mutator.addInsertion(key, columnFamilyName, HFactory.createStringColumn(columnName, value));",
for every column, so that all the column can be added in mutator in 1 go and then using mutator.execute() for batch insertion.
problem : above method(mutator.addInsertion()) is taking almost 700-900 miliseconds to add all 14000 column in to mutator.
However HBase client api method "put.add(column.getFamily().getBytes("UTF-8"), column.getQualifier().getBytes(),vals[col].getBytes());"
is taking only 260-280 miliseconds.
As per our understanding Cassandra insertion should be faster than the hbase but here we are not observing the same.It seems that during preparing the batch of data set mutator is taking too much time.
We have tried to insert 100, 200 columns then preparing time lies in permissible limits; but when we increase the column count batch preparation time gets increased.
Any idea ?