I have a DSE(1.x) cluster on Amazon which I setup using the DataStax ami.
I've loaded data and experimented with both Hive and Pig, but I am trying to understand the configuration -- trying to find out what options are used, etc -- and I'm really stumped trying to figure a few things out.
Here's the output of nodetool ring
x.x.x.x Cassandra rack1 Up Normal 22.29 GB 16.67% 0
x.x.x.x Analytics rack1 Up Normal 2.05 GB 16.67% 28356863910078205288614550619314017621
x.x.x.x Cassandra rack1 Up Normal 16.81 GB 16.67% 56713727820156410577229101238628035242
x.x.x.x Analytics rack1 Up Normal 2.16 GB 16.67% 85070591730234615865843651857942052863
x.x.x.x Cassandra rack1 Up Normal 15.61 GB 16.67% 113427455640312821154458202477256070485
x.x.x.x Analytics rack1 Up Normal 2.16 GB 16.67% 141784319550391026443072753096570088106
1) where do the datacenter and rack names come from? from doc, it seems like it should be cassandra-topology.properties, but mine seem to be the default, sample entries, certainly nothing that matches Cassandra or Analytics
2) and what configuration parameters indicate that the Analytics nodes won't have much data.. From reading the docs, I would have thought the initial tokens would be different (e.g. minimal gap for analytics)
3) According to docs, I added the following strategy_options for any CF:
Why Brisk? Shouldn't this be Analytics? --- Hmmm, just had a thought.. .is that the reason for lack of data on Analytics nodes? will have to explore that. But I didn't get any errors when using the schema...
4) How do the analytics nodes find each other for hadoop jobs? I assume that the dse hadoop runtime could figure it out via cassandra API, but just curious..
I'm guessing I'll have a few head slaps when I get answers... but that's ok..