<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>DataStax Support Forums &#187; Topic: Does this ime series data sharding approach make sense?</title>
		<link>http://www.datastax.com/support-forums/topic/does-this-ime-series-data-sharding-approach-make-sense</link>
		<description>Software, Support, and Training for Apache Cassandra</description>
		<language>en-US</language>
		<pubDate>Tue, 21 May 2013 11:48:36 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.3</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://www.datastax.com/support-forums/search.php</link>
		</textInput>
		<atom:link href="http://www.datastax.com/support-forums/rss/topic/does-this-ime-series-data-sharding-approach-make-sense" rel="self" type="application/rss+xml" />

		<item>
			<title>jas on "Does this ime series data sharding approach make sense?"</title>
			<link>http://www.datastax.com/support-forums/topic/does-this-ime-series-data-sharding-approach-make-sense#post-2541</link>
			<pubDate>Sat, 30 Jun 2012 16:51:45 +0000</pubDate>
			<dc:creator>jas</dc:creator>
			<guid isPermaLink="false">2541@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Thanks xedin.  I don't think the leap second will be an issue in my scenario.  I need to spend more time coming up to speed on the analytics side of things, and I don't want to try to be met with a bunch of data I cannot use.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>xedin on "Does this ime series data sharding approach make sense?"</title>
			<link>http://www.datastax.com/support-forums/topic/does-this-ime-series-data-sharding-approach-make-sense#post-2439</link>
			<pubDate>Wed, 27 Jun 2012 22:47:25 +0000</pubDate>
			<dc:creator>xedin</dc:creator>
			<guid isPermaLink="false">2439@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;That should work just fine although I have a thing for you to consider about this CLIENT_METRICS_GRANULARITY_MS - would &#34;leap second&#34; affect correctness of your results and is that important to you?
&#60;/p&#62;</description>
		</item>
		<item>
			<title>jas on "Does this ime series data sharding approach make sense?"</title>
			<link>http://www.datastax.com/support-forums/topic/does-this-ime-series-data-sharding-approach-make-sense#post-2438</link>
			<pubDate>Wed, 27 Jun 2012 22:29:33 +0000</pubDate>
			<dc:creator>jas</dc:creator>
			<guid isPermaLink="false">2438@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Hello:&#60;/p&#62;
&#60;p&#62;I recently read a couple articles regarding Cassandra and time series data:&#60;/p&#62;
&#60;p&#62;&#60;a href=&#34;http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/&#34; rel=&#34;nofollow&#34;&#62;http://rubyscale.com/blog/2011/03/06/basic-time-series-with-cassandra/&#60;/a&#62;&#60;br /&#62;
&#60;a href=&#34;http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra&#34; rel=&#34;nofollow&#34;&#62;http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra&#60;/a&#62;&#60;/p&#62;
&#60;p&#62;I'm already recording activity performed for a given OAuth token (row key). The token lifetime is 24 hours, so I don't think a given row will get too long; 1000s of columns perhaps.  But, I also want to track what a given client (source IP address) does. What I'm doing is more the &#34;Index Column Family&#34; in the advanced time series article.  This data will not be needed in real-time, so I don't want to denormalize it all again. Referring to the tokens the client used allows for processing the actions of each token. There is an anti-hijacking feature in the tokens, so there is a one-to-many relationship of client to tokens.&#60;/p&#62;
&#60;p&#62;But, unlike the token, an IP address does not expire, so the number of columns will continue to grow w/o bound over time, making for some very long rows. I think 'sharding' by day is sensible, so I want the client tracking rowkey to consist of IpAddress-day.  For example 1.2.3.4-20120627.  I'm doing this using Java, and I can use Date and SimpleDateFormat to get the day suffix.&#60;/p&#62;
&#60;p&#62;This is running in a Tomcat app server, and the SimpleDateFormat javadocs say instances are not thread-safe. So, I could either use a ThreadLocal, synchronization, or construct a new one each time. As the number of clients grows, I expect this to be pretty high volume, and wonder about the overhead of constructing Date objects, and using SimpleDateFormat.  So, my thought is to do this:&#60;/p&#62;
&#60;pre&#62;&#60;code&#62;public static final long CLIENT_METRICS_GRANULARITY_MS = 24 * 60 * 60 * 1000;	// One day in ms
...
final long interval = System.currentTimeMillis() / CLIENT_METRICS_GRANULARITY_MS;
final String rowKey = String.format(&#38;quot;%s-%d&#38;quot;, clientAddress, interval);&#60;/code&#62;&#60;/pre&#62;
&#60;p&#62;My time series CF is defined as:&#60;/p&#62;
&#60;pre&#62;&#60;code&#62;CREATE COLUMN FAMILY ClientMetrics
WITH comparator=TimeUUIDType
AND key_validation_class=UTF8Type
AND default_validation_class=UTF8Type;&#60;/code&#62;&#60;/pre&#62;
&#60;p&#62;Example content looks like (for testing for today):&#60;/p&#62;
&#60;pre&#62;&#60;code&#62;[default@IsecMetrics] list ClientMetrics;
Using default limit of 100
-------------------
RowKey: 1.2.3.4-15518
=&#38;gt; (column=a5281d10-c0a2-11e1-8239-c8bcc88a2db5, value={&#38;quot;category&#38;quot;:&#38;quot;security&#38;quot;,&#38;quot;name&#38;quot;:&#38;quot;bindToken&#38;quot;,&#38;quot;params&#38;quot;:{&#38;quot;tokenValue&#38;quot;:&#38;quot;269e16d1-9b1f-48ac-8431-6fa54f42e2e0&#38;quot;}}, timestamp=1340834058593000)
=&#38;gt; (column=aa6c3a40-c0a2-11e1-8239-c8bcc88a2db5, value={&#38;quot;category&#38;quot;:&#38;quot;security&#38;quot;,&#38;quot;name&#38;quot;:&#38;quot;bindToken&#38;quot;,&#38;quot;params&#38;quot;:{&#38;quot;tokenValue&#38;quot;:&#38;quot;eedf7da3-3a99-4746-a27d-2e82afcf21bc&#38;quot;}}, timestamp=1340834067428000)
=&#38;gt; (column=bb06de50-c0a2-11e1-8239-c8bcc88a2db5, value={&#38;quot;category&#38;quot;:&#38;quot;security&#38;quot;,&#38;quot;name&#38;quot;:&#38;quot;blacklistedClientAddress&#38;quot;,&#38;quot;params&#38;quot;:{&#38;quot;tokenValue&#38;quot;:&#38;quot;f2ca9749-1391-4cd8-9b82-e15acbc91d07&#38;quot;,&#38;quot;duration&#38;quot;:&#38;quot;120&#38;quot;}}, timestamp=1340834095285001)
=&#38;gt; (column=d76040f0-c0a2-11e1-8239-c8bcc88a2db5, value={&#38;quot;category&#38;quot;:&#38;quot;security&#38;quot;,&#38;quot;name&#38;quot;:&#38;quot;blacklistedBindingAttempt&#38;quot;,&#38;quot;params&#38;quot;:{&#38;quot;tokenValue&#38;quot;:&#38;quot;ceeb67a8-6a94-4c2b-8e09-6a899c864689&#38;quot;}}, timestamp=1340834142847000)
=&#38;gt; (column=6d8d3b00-c0a3-11e1-8239-c8bcc88a2db5, value={&#38;quot;category&#38;quot;:&#38;quot;security&#38;quot;,&#38;quot;name&#38;quot;:&#38;quot;blacklistedClientAddress&#38;quot;,&#38;quot;params&#38;quot;:{&#38;quot;tokenValue&#38;quot;:&#38;quot;c5fd3021-f431-405c-b8e4-7fd1c0bcaedd&#38;quot;,&#38;quot;duration&#38;quot;:&#38;quot;0&#38;quot;}}, timestamp=1340834394800000)

1 Row Returned.
Elapsed time: 2 msec(s).
[default@IsecMetrics]&#60;/code&#62;&#60;/pre&#62;
&#60;p&#62;So, today is 15518, yesterday is 15517 and tomorrow is 15519 etc.  Then, later when performing analytics, I can reconstitute the data.  E.g.:&#60;/p&#62;
&#60;pre&#62;&#60;code&#62;def long granularity = 24 * 60 * 60 * 1000L
def long interval = 15518L
def theDate = new Date(interval * granularity)
println &#38;quot;interval date: ${theDate}&#38;quot;&#60;/code&#62;&#60;/pre&#62;
&#60;p&#62;And that seems to work (considering the time zone).  I don't think there's much overhead in System.currentTimeMillis(), and I'm sure if I make use of Date when I record the event, I'll be incurring that anyway. No SimpleDateFormat instances to deal with etc.&#60;/p&#62;
&#60;p&#62;Does this make sense, or is there something that's going to bite me later?&#60;/p&#62;
&#60;p&#62;Thanks,&#60;/p&#62;
&#60;p&#62;Jeff
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
