I use the brisk to deal with many smail text file, but I find the data in Cassandra grows too fast, any ideas? Thanks.
Each hour I upload 10+ file, eahc file < 300K, total < 800K, than run some hive to output files, it runs for 30 days, after that, I find the Data in Cassandra is grows more than 21 G ??
0.8M * 24 * 30 = 574 M