I am configing a new cluster consisting of two machines.
(A small, but possibly important, the background: I had, until this, single node cluster (which was configured for only one folder in data_file_directories), then I bought a second server, just configure it for two folders in the properties data_file_directories (!! And data in these folders were balanced realy good!), added it to the ring, awarded tokens, but there's something gone wrong, some data was lost, so I decided to just re-create the cluster).
Now each server is three hdd. One system - and two for cassandra (not raid! - it is two separate disks).
I stopped by cassandra on both machines, removed all data from the folders, added in cassandra.yaml the appropriate link:
Than i started both machines. Few paused, looked at the command nodetool ring output:
root @ jopa: ~ # nodetool-h localhost ring
Address DC Rack Status State Load Owns Token
192.168.1.102 datacenter1 rack1 Up Normal 11.26 KB 50.00% 0
192.168.1.103 datacenter1 rack1 Up Normal 18.31 KB 50.00% 85070591730234615865843651857942052864
After starting cassandra, on each server, in all three folders were created subfolders "system" and "stil" (the name of my keyspace).
But! When I ran the script that fills the database, I noticed that the data inflating just the way it was set up in the first version of the cluster - that is, on the first server, the actual data are added only in the first folder( as it been before). On the second server, the data are added to those two folders which were previously used, and the third directory is ignored. Although threre were created subfolder "system" and "stil", but it was empty.
I thought that the increasing the number of hard drives in data_file_directories is increasing performance, am I right?
Does it depends on free space on each disk?
And how i should ask cassandra to use all folders which i enumerate in data_file_directories?