I have a design question. Suppose you have Order and OrderDetails tables as in RDBMS.But Suppose I have a big text or varchar column OrderDetailDesc which is say like xml column in OrderDetail and it contains details like ProductName,ProductDescriptions,OrderAmount,rate etc. Thus the size of the this column is quite large say on average 10-15KB and some are even 200KB in size. There are millions of rows in OrderDetails and thus causing the size of the table to be in 100's of GB's. It is causing unnecessarily large Database on RDBMS. Thus I have decided that I will move the OrderDetailDesc column to Cassandra. I will be using version 1.2. Now I have decided two options for this.
1. Create table with OrderId,OrderDetailId and OrderDetailDesc columns with composite primary key on OrderId and OrderDetailId. Thus sharding happens on the OrderId column and index order will be based on OrderDetailId.
2. Create table with OrderId and map column where OrderdetailId will map with OrderDetailDesc.
Now to give you more information. There will be around 3 million insert\update in this table. However, Inserts will be far too much than update.
There will be around 500 or say 1000 select based on OrderId which brings all the orderDetails.
There will be around 100 select based on OrderId and OrderDetailId.
I would like to know which approach would be better. I am leaning towards approach 1. However, I want to know about map columns. Do the whole map column will be read even if a single orderdetailid is retrieved or updated?
In approach 1 OrderId will be repeated for each OrderDetailId. Is the space used in approach 1 will be more than approach 2?
Please let me know if I have not made myself clear and you need any other information.
Also, please note that this scenario I have just fabricated and in my case the OrderDetailDesc column can not be normalized further as some of you may point to that.