<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>DataStax Support Forums &#187; Topic: Writing loads of columns using CQL vs Hector</title>
		<link>http://www.datastax.com/support-forums/topic/writing-loads-of-columns-using-cql-vs-hector</link>
		<description>Software, Support, and Training for Apache Cassandra</description>
		<language>en-US</language>
		<pubDate>Thu, 20 Jun 2013 09:37:41 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.3</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://www.datastax.com/support-forums/search.php</link>
		</textInput>
		<atom:link href="http://www.datastax.com/support-forums/rss/topic/writing-loads-of-columns-using-cql-vs-hector" rel="self" type="application/rss+xml" />

		<item>
			<title>Marius Waldal on "Writing loads of columns using CQL vs Hector"</title>
			<link>http://www.datastax.com/support-forums/topic/writing-loads-of-columns-using-cql-vs-hector#post-1161</link>
			<pubDate>Thu, 16 Feb 2012 13:15:46 +0000</pubDate>
			<dc:creator>Marius Waldal</dc:creator>
			<guid isPermaLink="false">1161@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Thank you so much for anwering, Nate. To be more specific about what I'm trying to do:&#60;/p&#62;
&#60;p&#62;The use case is a location CF with predefined columns for county, municipality etc, and an areaid. There are about 5.300 rows in this cf. &#60;/p&#62;
&#60;p&#62;In addition to using this as a cf for lookup, it is also used to verify if areaids from a log import job are valid when importing log lines. To do this, we have so far read all ids by getting all rows and returning the id column (done once, putting the IDs in a List). &#60;/p&#62;
&#60;p&#62;However, I thought it would be more efficient (and more cassandra-ish) to include 1 additional row in the location cf with all the IDS, so that when importing we would instead just read the valid IDs from columns in this row. And this is where the CQL dilemma shows up, writing 5300+ columns in that gnarly CQL statement :-)&#60;/p&#62;
&#60;p&#62;But we also thought about another option: concatenating all the IDs into a loong delimited String and then writing that string to a single column in the new lookup row, and then splitting it again when doing the import job.&#60;/p&#62;
&#60;p&#62;This approach has a hacky feel to it, but whatever is more efficient is good. It should also ensure less overhead than using thousands of columns, as the location cf is also rebuilt periodically, and this may involve areaids being removed. An update query against the row with thousands of columns will not remove the IDs that are obsolete, so we would have to first delete the row and then write it with the new columns. Having all IDs concatenated in one column means the update only needs to update this single column.&#60;br /&#62;
However, this could perhaps also result in the query being too large for a single thrift msg?&#60;/p&#62;
&#60;p&#62;That was a lot of rambling. What do you think?&#60;br /&#62;
1 column?&#60;br /&#62;
Or thousands of columns with row deletion and new insert, in batches of i.e. 500 columns?&#60;/p&#62;
&#60;p&#62;The columns will be String columns with a maximum of 5 digits each: 20012,20063 etc&#60;/p&#62;
&#60;p&#62;Marius
&#60;/p&#62;</description>
		</item>
		<item>
			<title>zznate on "Writing loads of columns using CQL vs Hector"</title>
			<link>http://www.datastax.com/support-forums/topic/writing-loads-of-columns-using-cql-vs-hector#post-1150</link>
			<pubDate>Wed, 15 Feb 2012 20:47:05 +0000</pubDate>
			<dc:creator>zznate</dc:creator>
			<guid isPermaLink="false">1150@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Yeah, that insert would end up being some gnarly CQL. &#60;/p&#62;
&#60;p&#62;Using Hector's Mutator class would be simple and most likely more efficient. If this is an insert your workload will be doing frequently, take some time to play with the batch size. The bigger this is, the more efficient, but making this too big will cause transport errors and burn memory. To avoid continuous array resizing overhead, Thrift will keep expanding buffer sizes of the underlying connection until the max is reached (16MB by default). &#60;/p&#62;
&#60;p&#62;Start with about 500 columns and go up or down as needed.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Marius Waldal on "Writing loads of columns using CQL vs Hector"</title>
			<link>http://www.datastax.com/support-forums/topic/writing-loads-of-columns-using-cql-vs-hector#post-1143</link>
			<pubDate>Wed, 15 Feb 2012 09:43:57 +0000</pubDate>
			<dc:creator>Marius Waldal</dc:creator>
			<guid isPermaLink="false">1143@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;We are trying to use CQL for all Cassandra operations, but in cases that are typical cassandra-ish and not relational-ish (like having a single row with thousands of columns), this seems like bad practice. As a relational db would never have a high number of columns, inserting multiple columns listing first the column names and then column values or updating columns using name=value pairs is practical. &#60;/p&#62;
&#60;p&#62;But creating a CQL statement with thousands of name=value pairs does not ring well in my ears. Of course, one could do batch inserts instead, but this still seems like trying to force relational db-pants on nosql-legs... &#60;/p&#62;
&#60;p&#62;Is it in any way better to use e.g. Hector mutators with addInsertion for each column? Will it be more efficient? &#60;/p&#62;
&#60;p&#62;Marius
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
