<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>DataStax Support Forums &#187; Tag: data model - Recent Posts</title>
		<link>http://www.datastax.com/support-forums/tags/data-model</link>
		<description>Software, Support, and Training for Apache Cassandra</description>
		<language>en-US</language>
		<pubDate>Thu, 23 May 2013 07:38:37 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.3</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://www.datastax.com/support-forums/search.php</link>
		</textInput>
		<atom:link href="http://www.datastax.com/support-forums/rss/tags/data-model" rel="self" type="application/rss+xml" />

		<item>
			<title>Anonymous on "Start with Queries - data model"</title>
			<link>http://www.datastax.com/support-forums/topic/start-with-queries-data-model#post-1063</link>
			<pubDate>Wed, 25 Jan 2012 18:58:17 +0000</pubDate>
			<dc:creator>Anonymous</dc:creator>
			<guid isPermaLink="false">1063@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;There are some answers from Quora:&#60;/p&#62;
&#60;p&#62;&#60;a href=&#34;http://www.quora.com/Cassandra-database/What-are-good-ways-to-design-data-model-in-Cassandra-for-historical-data&#34; rel=&#34;nofollow&#34;&#62;http://www.quora.com/Cassandra-database/What-are-good-ways-to-design-data-model-in-Cassandra-for-historical-data&#60;/a&#62;&#60;/p&#62;
&#60;p&#62;It looks like Quora is more active.&#60;/p&#62;
&#60;p&#62;Thanks,&#60;br /&#62;
Charlie&#124;DBA
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Anonymous on "Start with Queries - data model"</title>
			<link>http://www.datastax.com/support-forums/topic/start-with-queries-data-model#post-1060</link>
			<pubDate>Mon, 23 Jan 2012 19:15:47 +0000</pubDate>
			<dc:creator>Anonymous</dc:creator>
			<guid isPermaLink="false">1060@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Howdy,&#60;/p&#62;
&#60;p&#62;I'm wondering can you add some examples to this document,&#60;br /&#62;
&#60;a href=&#34;http://www.datastax.com/docs/1.0/ddl/data_model_planning#start-with-queries&#34; rel=&#34;nofollow&#34;&#62;http://www.datastax.com/docs/1.0/ddl/data_model_planning#start-with-queries&#60;/a&#62;&#60;/p&#62;
&#60;p&#62;E.g.&#60;/p&#62;
&#60;p&#62;Data model examples in Cassandra for popular and common use cases:&#60;/p&#62;
&#60;p&#62;- Historical data&#60;br /&#62;
 -- Time Series Data&#60;br /&#62;
- ordering query, e.g. get order data order by customer_id and order_date.&#60;br /&#62;
- only care about the last 6 months worth of data&#60;br /&#62;
 -- should I implement expiring columns TTL?&#60;br /&#62;
- data archiving and purging, from OLTP to DW&#60;br /&#62;
...&#60;br /&#62;
- and lot's of them I don't know yet.&#60;/p&#62;
&#60;p&#62;Thanks,&#60;br /&#62;
Charlie
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Anonymous on "Denormalize to Optimize data model"</title>
			<link>http://www.datastax.com/support-forums/topic/denormalize-to-optimize-data-model#post-1059</link>
			<pubDate>Mon, 23 Jan 2012 19:04:44 +0000</pubDate>
			<dc:creator>Anonymous</dc:creator>
			<guid isPermaLink="false">1059@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Howdy,&#60;/p&#62;
&#60;p&#62;I have some concerns about Denormalize to Optimize.&#60;br /&#62;
&#60;a href=&#34;http://www.datastax.com/docs/1.0/ddl/data_model_planning#denormalize-to-optimize&#34; rel=&#34;nofollow&#34;&#62;http://www.datastax.com/docs/1.0/ddl/data_model_planning#denormalize-to-optimize&#60;/a&#62;&#60;/p&#62;
&#60;p&#62;&#34;a single column family are used to answer each query. This sacrifices disk space (one of the cheapest resources for a server) in order to reduce the number of disk seeks and the amount of network traffic.&#34;&#60;/p&#62;
&#60;p&#62;First, it will NOT reduce the amount of network traffic, compared with Normalized data when you get same number of columns and rows.&#60;br /&#62;
2nd, for the number of disk seeks, it depends. some times Normalized data increase the cache hit ratio with smaller memory footprint and reduce the number of disk seeks.&#60;/p&#62;
&#60;p&#62;And last, could you add some data model examples to explain how to Denormalize to Optimize? it help us to learn and understand the data modeling in Cassandra.&#60;/p&#62;
&#60;p&#62;Thanks and happy Chinese new year,&#60;br /&#62;
Charlie
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Anonymous on "Help me Learn"</title>
			<link>http://www.datastax.com/support-forums/topic/help-me-learn#post-898</link>
			<pubDate>Wed, 21 Dec 2011 04:17:27 +0000</pubDate>
			<dc:creator>Anonymous</dc:creator>
			<guid isPermaLink="false">898@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Ok... So you can have a single &#34;row&#34; with a bunch of &#34;columns&#34; and no &#34;values&#34;. I can see how this might be useful... But please elaborate on the contents of the third paragraph (Both of these column fam...). In this example, why would you want to create a setup with just a single row and a ton of columns? Why not make it a single column with a ton of rows?
&#60;/p&#62;</description>
		</item>
		<item>
			<title>zznate on "Help me Learn"</title>
			<link>http://www.datastax.com/support-forums/topic/help-me-learn#post-884</link>
			<pubDate>Tue, 20 Dec 2011 18:29:51 +0000</pubDate>
			<dc:creator>zznate</dc:creator>
			<guid isPermaLink="false">884@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;To satisfy those queries you are looking at 2 column families:&#60;/p&#62;
&#60;p&#62;TagsByDriver&#60;br /&#62;
- comparator would be the UTF8Type to maintain the columns sorted alphabetically by name&#60;br /&#62;
- column name is the driver name, value could be a serialized list of 1 or more tags&#60;/p&#62;
&#60;p&#62;TagIndex&#60;br /&#62;
- using, in this trivial example, LongType comparator to order the columns by tag&#60;br /&#62;
- column name is the tag, column value the driver name&#60;/p&#62;
&#60;p&#62;Both of these column families need row keys still. In the above example we are treating these as indexes, taking advantage of the comparator sort order to minimize disk seeks. Cassandra can store 2 billiion columns in a single row, but because a row is the unit of distribution in the cluster, this would create hot spots where a single node would hold all the results.&#60;/p&#62;
&#60;p&#62;To avoid this we would have to 'bucket' results in such a way that makes them easier to break up. In the above, the row keys could be the first two characters of the name and tag number respectively. We still have some hotspot potential here, so you would want to play with this bucketing a bit depending on distribution of results. &#60;/p&#62;
&#60;p&#62;If you need a higher level search - say all drivers starting with N, then you would create all the 26 potential combinations and issue a multiget_slice as opposed to a get_slice (the former retrieving N keys at a time).  Same with tag: for 102 to 104 you would issue a multiget_slice for 102,103,104. &#60;/p&#62;
&#60;p&#62;There are several other ways to do this, but this example is the most approachable, IMO. Because we are relying on a column index of the comparator, lookups - even for a large number of users in order - only consist of two disk seeks: one to locate the row, the next to seek to the column position. The read is then sequential regardless of the number of results. &#60;/p&#62;
&#60;p&#62;To &#34;page&#34; over results, use the following approach:&#60;br /&#62;
1. initial get_slice with a slicepredicate containing null for start and end with a limit of 2&#60;br /&#62;
2. use the last result of the above get_slice as the &#34;start&#34; for the next get_slice and set the limit to 3 (N+1)&#60;br /&#62;
3. skip the first result (since you displayed it last time [though seemingly dumb, this is sort of fundamental to a distributed columnar datastore where you 'don't know what you have until you have it])&#60;br /&#62;
4. go to step 2 with results from 3
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Anonymous on "Help me Learn"</title>
			<link>http://www.datastax.com/support-forums/topic/help-me-learn#post-882</link>
			<pubDate>Tue, 20 Dec 2011 17:53:09 +0000</pubDate>
			<dc:creator>Anonymous</dc:creator>
			<guid isPermaLink="false">882@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;I read that already... And it gives a nice overview of the structure itself and such, but it is by no means clear. For instance, though there are secondary indexes defined in this example, it doesn't explain if those indexes let me then request sorted data in a response based on those columns. And time_ordered_blogs_by_user??? I understand that this column family has something to do with getting information sorted, but it is not explained just sort of dropped in there, and there are no code examples.&#60;/p&#62;
&#60;p&#62;Given that, I believe all my original questions are still valid and I still need them answered.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>zznate on "Help me Learn"</title>
			<link>http://www.datastax.com/support-forums/topic/help-me-learn#post-879</link>
			<pubDate>Tue, 20 Dec 2011 16:09:21 +0000</pubDate>
			<dc:creator>zznate</dc:creator>
			<guid isPermaLink="false">879@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Read this first: &#60;a href=&#34;http://www.datastax.com/docs/1.0/ddl/index&#34; rel=&#34;nofollow&#34;&#62;http://www.datastax.com/docs/1.0/ddl/index&#60;/a&#62;&#60;/p&#62;
&#60;p&#62;Not intended to be an RTFM, but the above should answer all your questions and give you an overview of how Cassandra stores data on disk (and taking advantage of this in your data model to minimize disk seeks). &#60;/p&#62;
&#60;p&#62;In short, a very good place to start. &#60;/p&#62;
&#60;p&#62;If you have specific questions after reading the above, I will be happy to answer them.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Anonymous on "Help me Learn"</title>
			<link>http://www.datastax.com/support-forums/topic/help-me-learn#post-875</link>
			<pubDate>Tue, 20 Dec 2011 06:37:06 +0000</pubDate>
			<dc:creator>Anonymous</dc:creator>
			<guid isPermaLink="false">875@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Ok... Very new to Cassandra. Still trying to wrap my head around how it works. Let me start with a super simple table.&#60;/p&#62;
&#60;p&#62;This is a table matching car license plates to vehicle owners. Tag numbers are unique and will be the most looked-up value so that will be my primary key. For simplicity's sake, let's also assume people's names are a single word and unique as well. I get this:&#60;/p&#62;
&#60;p&#62;Tag &#124; Owner&#60;br /&#62;
-----------&#60;br /&#62;
211 &#124; Jack&#60;br /&#62;
112 &#124; David&#60;br /&#62;
313 &#124; Suzanne&#60;br /&#62;
114 &#124; Jack&#60;br /&#62;
227 &#124; Mike&#60;br /&#62;
472 &#124; Adam&#60;br /&#62;
552 &#124; Mike&#60;br /&#62;
102 &#124; Jason&#60;/p&#62;
&#60;p&#62;As you can see, Jack and Mike both have two cars, and everyone else only has one.&#60;/p&#62;
&#60;p&#62;Now, here are my questions:&#60;/p&#62;
&#60;p&#62;* How would I get a list of all tag numbers in ascending order?&#60;br /&#62;
* How would I get a list of all tag numbers held by Jack? In ascending order?&#60;br /&#62;
* How would I get a list of all the owners in alphabetical order?&#60;br /&#62;
* What if this was on a web page showing 2 names at a time. How do I get the next two and so on? In order?&#60;br /&#62;
* Knowing I need the above information in the above forms in advance, how do I set this ColumnFamily (???) up in Cassandra in the most optimum fashion?&#60;/p&#62;
&#60;p&#62;Thanks in advance for helping out a Cassandra noob!!!
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
