I am new to Cassandra and I am trying to determine if I can use the latest version of Cassandra (release 1.2.1) to achieve the following:
(1) I have many remote sites where "raw" data is gathered and persisted.
(2) At each remote site, the gathered raw data needs to be aggregated/smoothed.
(3) This aggregated/smoothed data (not the raw data) must be sent to a central site.
(4) The aggregated/smoothed data is also persisted at the central site.
(5) Further aggregation is performed at the central site and the results are persisted at the central site.
So in essence, the sites form a star-topology.
(6) The raw data at each remote site should never leave that remote site.
(7) The aggregated data at each remote site should get persisted only at that remote site and the central site. It should not get replicated to other remote sites.
(8) When a client reads data at the central site in order to perform further aggregation, the reads should be local to the central site and should not go out to any of the remote sites.
(9) When a client writes aggregated data to the central site, the writes should stay local to the central site and should not go out to any of the remote sites.
(10) A WAN connects the remote sites to the central site.
(a) If a WAN link between a remote site and the central site goes down, the persisting of raw data and aggregated data should continue at the remote site.
(b) Once the WAN link is restored, the aggregated data should get replicated from the remote site to the central site.
Can Cassandra 1.2.1 achieve the above using only configuration settings? If so, what are the main settings (I am not looking for a configuration file, but just what settings I should focus on).
If this is not possible with just configuration, is it possible to write some custom code that "plugs into" Cassandra? I am guessing I would need to create a custom partitioner, a custom replication strategy, and a custom snitch.