<?xml version="1.0" encoding="UTF-8"?>
<!-- generator="bbPress/1.0.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title>DataStax Support Forums &#187; Topic: Hive Query Problems</title>
		<link>http://www.datastax.com/support-forums/topic/hive-query-problems</link>
		<description>Software, Support, and Training for Apache Cassandra</description>
		<language>en-US</language>
		<pubDate>Thu, 20 Jun 2013 11:27:40 +0000</pubDate>
		<generator>http://bbpress.org/?v=1.0.3</generator>
		<textInput>
			<title><![CDATA[Search]]></title>
			<description><![CDATA[Search all topics from these forums.]]></description>
			<name>q</name>
			<link>http://www.datastax.com/support-forums/search.php</link>
		</textInput>
		<atom:link href="http://www.datastax.com/support-forums/rss/topic/hive-query-problems" rel="self" type="application/rss+xml" />

		<item>
			<title>Anonymous on "Hive Query Problems"</title>
			<link>http://www.datastax.com/support-forums/topic/hive-query-problems#post-1855</link>
			<pubDate>Mon, 07 May 2012 00:02:04 +0000</pubDate>
			<dc:creator>Anonymous</dc:creator>
			<guid isPermaLink="false">1855@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Hi..&#60;br /&#62;
can anybody suggest me if I could aggregate all the values of a column in an array using Hive UDAF.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>zznate on "Hive Query Problems"</title>
			<link>http://www.datastax.com/support-forums/topic/hive-query-problems#post-797</link>
			<pubDate>Fri, 02 Dec 2011 16:00:47 +0000</pubDate>
			<dc:creator>zznate</dc:creator>
			<guid isPermaLink="false">797@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;We actually use the ColumnFamily[Input&#124;Output]Format from the Cassandra source tree. The example examples directory there might be of some use as well:&#60;br /&#62;
&#60;a href=&#34;http://svn.apache.org/viewvc/cassandra/branches/cassandra-1.0/examples/hadoop_word_count/src/&#34; rel=&#34;nofollow&#34;&#62;http://svn.apache.org/viewvc/cassandra/branches/cassandra-1.0/examples/hadoop_word_count/src/&#60;/a&#62;
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Anonymous on "Hive Query Problems"</title>
			<link>http://www.datastax.com/support-forums/topic/hive-query-problems#post-796</link>
			<pubDate>Fri, 02 Dec 2011 14:49:17 +0000</pubDate>
			<dc:creator>Anonymous</dc:creator>
			<guid isPermaLink="false">796@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Nate,&#60;br /&#62;
Since it doesn't look like there is a workaround for this issue forthcoming, I've been trying to build custom map-reduce jobs to do what I need to do. Unfortunately, I'm having some issues using the ColumnFamilyInputFormat (mainly class cast exceptions on the output key format... no matter what I do, it always says it received a HeapByteBuffer instead of Text or LongWritable when I write the output in the mapper). Is there any documentation out there for writing custom map-reduce code when using DSE? I couldn't find anything on the site.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>zznate on "Hive Query Problems"</title>
			<link>http://www.datastax.com/support-forums/topic/hive-query-problems#post-682</link>
			<pubDate>Wed, 16 Nov 2011 18:52:54 +0000</pubDate>
			<dc:creator>zznate</dc:creator>
			<guid isPermaLink="false">682@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;If you want to post (or send direct) some sample schema and data i'll add this to the ticket.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>fmeyer on "Hive Query Problems"</title>
			<link>http://www.datastax.com/support-forums/topic/hive-query-problems#post-681</link>
			<pubDate>Wed, 16 Nov 2011 18:48:34 +0000</pubDate>
			<dc:creator>fmeyer</dc:creator>
			<guid isPermaLink="false">681@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;@zznate, &#60;/p&#62;
&#60;p&#62;I'm testing the same structure against 0.8 and whenever I try to use the &#34;group by&#34; statement I get the same error. &#60;/p&#62;
&#60;pre&#62;&#60;code&#62;select bd.f1, count (DISTINCT bd.f2) from bd group by bd.f1;
        FAILED: Hive Internal Error: java.lang.NullPointerException(null)                                                           java.lang.NullPointerException                                                                                                      at org.apache.hadoop.hive.ql.exec.Utilities.getColumnNamesFromSortCols(Utilities.java:1314)                                 at org.apache.hadoop.hive.ql.optimizer.GroupByOptimizer$BucketGroupByProcessor.checkBucketGroupBy(GroupByOptimizer.java:196)                                                                                                                            at org.apache.hadoop.hive.ql.optimizer.GroupByOptimizer$BucketGroupByProcessor.process(GroupByOptimizer.java:128)           at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
        at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88)
        at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:128)
        at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102)
        at org.apache.hadoop.hive.ql.optimizer.GroupByOptimizer.transform(GroupByOptimizer.java:92)
        at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:85)
        at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:6625)
        at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:238)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:340)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:736)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:164)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:241)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:456)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)&#60;/code&#62;&#60;/pre&#62;</description>
		</item>
		<item>
			<title>zznate on "Hive Query Problems"</title>
			<link>http://www.datastax.com/support-forums/topic/hive-query-problems#post-628</link>
			<pubDate>Tue, 08 Nov 2011 17:09:07 +0000</pubDate>
			<dc:creator>zznate</dc:creator>
			<guid isPermaLink="false">628@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;the numBuckets property should not make much of a difference there. &#60;/p&#62;
&#60;p&#62;Thanks for the sample data - I've added it to our internal issue tracking system. I'll update this post as soon as we have more information. Thanks for your patience - if there is anything else we can do to help, let us know.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Anonymous on "Hive Query Problems"</title>
			<link>http://www.datastax.com/support-forums/topic/hive-query-problems#post-626</link>
			<pubDate>Tue, 08 Nov 2011 13:59:20 +0000</pubDate>
			<dc:creator>Anonymous</dc:creator>
			<guid isPermaLink="false">626@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;I have the same issue for group by when I try grouping by a numeric column as well. So far as I can tell, this happens for all columns. I was looking more closely at the describe extended output for both my auto-created table and my manually created table. the only discernible difference I can see is that in the auto-created table, the numBuckets property is set to 0 whereas it is -1 in the manual table. Could this make a difference? If so, how can I modify that? I couldn't find anything in the documentation about that specifically.&#60;/p&#62;
&#60;p&#62;Here are a few rows of data from the table:&#60;br /&#62;
&#60;pre&#62;&#60;code&#62;01341453-9154-30d7-7e0e-d2f85f8e5970    A1 ABBCD28554   DESKTOP 419     1304929690000   0       ABABB74926   BFFCC73322   DESKTOP NULL    NULL    NULL    S1     0       1304928983000   -7       01341453-9154-30D7-7E0E-D2F85F8E5970    ABBCA702221
01461557-695d-40cf-a418-0ae5d2562dc7    A2 AAAFE31920  DESKTOP NULL    NULL    NULL    MBFFEE82655  MBFFEE69677  DESKTOP 300     1305910710000   305     S2     0       1305910386000   -4       01461557-695D-40CF-A418-0AE5D2562DC7    ABBCA243802
01427859-4097-2a60-9cbb-8e351d451feb    A3     FDEEA31765   DESKTOP 0       1299483991000   0       BFFEE115573  BFFEE73159   DESKTOP NULL    NULL    NULL    S1     0       1299483990000   -7       01427859-4097-2A60-9CBB-8E351D451FEB    ABBCA306426
0170f2e2-efb9-fa56-a3a9-fc5a395d9c59    A1 CCEEEE32078   DESKTOP NULL    NULL    NULL    BFFEE135412  BFFEE74397   DESKTOP 300     1305623569000   232     S1     0       1305623262000   -7       0170F2E2-EFB9-FA56-A3A9-FC5A395D9C59    ABBCA755429
00d64cc6-202f-314f-67dc-f53d40cb0169    A1 EFFGGA25558   DESKTOP 2388    1305522068000   4103    BFFEE99154   BFFEE71523   DESKTOP 300     1305461033000   2014    S2     0       1305460722000    0       -4      00D64CC6-202F-314F-67DC-F53D40CB0169    ABBCA922718&#60;/code&#62;&#60;/pre&#62;</description>
		</item>
		<item>
			<title>zznate on "Hive Query Problems"</title>
			<link>http://www.datastax.com/support-forums/topic/hive-query-problems#post-618</link>
			<pubDate>Mon, 07 Nov 2011 23:06:13 +0000</pubDate>
			<dc:creator>zznate</dc:creator>
			<guid isPermaLink="false">618@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;We've opened an issue internally for investigating this further. I'll post an update here as soon as I know more. &#60;/p&#62;
&#60;p&#62;Does groupBy work on any of the other types? Long or Integer?&#60;/p&#62;
&#60;p&#62;Also, if you have a few rows of sample data, that would help. Contact me directly if you don't feel comfortable posting it here. &#60;/p&#62;
&#60;p&#62;Thanks!
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Anonymous on "Hive Query Problems"</title>
			<link>http://www.datastax.com/support-forums/topic/hive-query-problems#post-609</link>
			<pubDate>Fri, 04 Nov 2011 14:50:18 +0000</pubDate>
			<dc:creator>Anonymous</dc:creator>
			<guid isPermaLink="false">609@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Here's the Hive table create script&#60;/p&#62;
&#60;pre&#62;&#60;code&#62;create external table myhivetable(row_key string, myfield string, myfield2 string, myfield3 string,
  myfield4 int, myfield5 int,myfield6 bigint,
   myfield7 string, myfield8 string, myfield9 string,
   myfield10 int, myfield11 int, myfield12 bigint,
   myfield13 string, myfield14 int, myfield15 int,myfield16 bigint,
   myfield17 string, myfield18 string, myfield19 string)
  stored  by &#38;#039;org.apache.hadoop.hive.cassandra.CassandraStorageHandler&#38;#039;
  with
  serdeproperties  (&#38;quot;cassandra.columns.mapping&#38;quot; = &#38;quot;:key,myfield,myfield1,myfield2,myfield3,myfield4,myfield5,myfield6,myfield7,myfield8,myfield9,myfield10,myfield11,myfield21,myfield13,myfield14,myfield15,myfield16,myfield17,myfield18&#38;quot;,
  &#38;quot;cassandra.cf.validatorType&#38;quot; = &#38;quot;org.apache.cassandra.db.marshal.UUIDType,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.LongType,org.apache.cassandra.db.marshal.LongType,org.apache.cassandra.db.marshal.IntegerType,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.LongType,org.apache.cassandra.db.marshal.LongType,org.apache.cassandra.db.marshal.IntegerType,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.LongType,org.apache.cassandra.db.marshal.LongType,org.apache.cassandra.db.marshal.IntegerType,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type&#38;quot;)
  tblproperties (&#38;quot;cassandra.range.size&#38;quot; = &#38;quot;100&#38;quot;,&#38;quot;cassandra.slice.predicate.size&#38;quot; = &#38;quot;100&#38;quot;,&#38;quot;cassandra.ks.name&#38;quot; = &#38;quot;MyKeyspace&#38;quot;, &#38;quot;cassandra.cf.name&#38;quot; = &#38;quot;MyTable&#38;quot;);&#60;/code&#62;&#60;/pre&#62;</description>
		</item>
		<item>
			<title>zznate on "Hive Query Problems"</title>
			<link>http://www.datastax.com/support-forums/topic/hive-query-problems#post-607</link>
			<pubDate>Thu, 03 Nov 2011 22:19:44 +0000</pubDate>
			<dc:creator>zznate</dc:creator>
			<guid isPermaLink="false">607@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Can you post the hive table create script you used against the above schema as well as the query? That would help us try to reproduce.
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Anonymous on "Hive Query Problems"</title>
			<link>http://www.datastax.com/support-forums/topic/hive-query-problems#post-605</link>
			<pubDate>Thu, 03 Nov 2011 16:23:25 +0000</pubDate>
			<dc:creator>Anonymous</dc:creator>
			<guid isPermaLink="false">605@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;That seemed to do the trick. Now I can see non-null values for the numeric columns. I can also execute the groupBy query as well. I have a new problem, though: the groupBy doesn't produce the expected results. I'm doing &#60;code&#62;select myfield2, count(*) from mytable group by myfield2&#60;/code&#62; but all I ever get is the touple NULL, XX (where XX is the total number of rows in my table) as output. If I do a select on the someStringcol column, I see the actual values. It just looks like whatever is responsible for doing the comparison of the strings in the group by returns NULL.&#60;/p&#62;
&#60;p&#62;In response to your question about the Cassandra CF columns and types, here's the output of the describe on the column family:&#60;br /&#62;
`&#60;br /&#62;
ColumnFamily: MyTable&#60;br /&#62;
      Key Validation Class: org.apache.cassandra.db.marshal.UUIDType&#60;br /&#62;
      Default column value validator: org.apache.cassandra.db.marshal.BytesType&#60;br /&#62;
      Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type&#60;br /&#62;
      Row cache size / save period in seconds / keys to save : 0.0/0/all&#60;br /&#62;
      Key cache size / save period in seconds: 200000.0/14400&#60;br /&#62;
      GC grace seconds: 864000&#60;br /&#62;
      Compaction min/max thresholds: 4/32&#60;br /&#62;
      Read repair chance: 1.0&#60;br /&#62;
      Replicate on write: true&#60;br /&#62;
      Built indexes: []&#60;br /&#62;
      Column Metadata:&#60;br /&#62;
        Column Name: myfield&#60;br /&#62;
          Validation Class: org.apache.cassandra.db.marshal.UTF8Type&#60;br /&#62;
        Column Name: myfield2&#60;br /&#62;
          Validation Class: org.apache.cassandra.db.marshal.UTF8Type&#60;br /&#62;
        Column Name: myfield3&#60;br /&#62;
          Validation Class: org.apache.cassandra.db.marshal.UTF8Type&#60;br /&#62;
        Column Name: myfield4&#60;br /&#62;
          Validation Class: org.apache.cassandra.db.marshal.LongType&#60;br /&#62;
        Column Name: myfield5&#60;br /&#62;
          Validation Class: org.apache.cassandra.db.marshal.LongType&#60;br /&#62;
        Column Name: myfield6&#60;br /&#62;
          Validation Class: org.apache.cassandra.db.marshal.IntegerType&#60;br /&#62;
        Column Name: myfield7&#60;br /&#62;
          Validation Class: org.apache.cassandra.db.marshal.UTF8Type&#60;br /&#62;
        Column Name: myfield8&#60;br /&#62;
          Validation Class: org.apache.cassandra.db.marshal.UTF8Type&#60;br /&#62;
        Column Name: myfield9&#60;br /&#62;
          Validation Class: org.apache.cassandra.db.marshal.UTF8Type&#60;br /&#62;
        Column Name: myfield10&#60;br /&#62;
          Validation Class: org.apache.cassandra.db.marshal.LongType&#60;br /&#62;
        Column Name: myfield11&#60;br /&#62;
          Validation Class: org.apache.cassandra.db.marshal.LongType&#60;br /&#62;
        Column Name: myfield12&#60;br /&#62;
          Validation Class: org.apache.cassandra.db.marshal.IntegerType&#60;br /&#62;
        Column Name: myfield13&#60;br /&#62;
          Validation Class: org.apache.cassandra.db.marshal.UTF8Type&#60;br /&#62;
        Column Name: myfield14&#60;br /&#62;
          Validation Class: org.apache.cassandra.db.marshal.LongType&#60;br /&#62;
        Column Name: myfield15&#60;br /&#62;
          Validation Class: org.apache.cassandra.db.marshal.LongType&#60;br /&#62;
        Column Name: myfield16&#60;br /&#62;
          Validation Class: org.apache.cassandra.db.marshal.IntegerType&#60;br /&#62;
        Column Name: myfield17&#60;br /&#62;
          Validation Class: org.apache.cassandra.db.marshal.UTF8Type&#60;br /&#62;
        Column Name: myfield18&#60;br /&#62;
          Validation Class: org.apache.cassandra.db.marshal.UTF8Type&#60;br /&#62;
        Column Name: myfield19&#60;br /&#62;
          Validation Class: org.apache.cassandra.db.marshal.UTF8Type&#60;br /&#62;
      Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrateg&#60;br /&#62;
1
&#60;/p&#62;</description>
		</item>
		<item>
			<title>tjake on "Hive Query Problems"</title>
			<link>http://www.datastax.com/support-forums/topic/hive-query-problems#post-604</link>
			<pubDate>Thu, 03 Nov 2011 13:59:42 +0000</pubDate>
			<dc:creator>tjake</dc:creator>
			<guid isPermaLink="false">604@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;Hi,&#60;/p&#62;
&#60;p&#62;There is an example on how to map numeric fields in the demo hive query.&#60;/p&#62;
&#60;p&#62;There is a SERDE setting called cassandra.cf.validatorType&#60;/p&#62;
&#60;p&#62;create external table StockHist(row_key string, column_name string, value double)&#60;br /&#62;
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'&#60;br /&#62;
WITH SERDEPROPERTIES (&#34;cassandra.ks.name&#34; = &#34;PortfolioDemo&#34;,&#60;br /&#62;
  &#34;cassandra.cf.validatorType&#34; = &#34;UTF8Type,UTF8Type,DoubleType&#34;&#60;br /&#62;
);&#60;/p&#62;
&#60;p&#62;So specifying this along with the name mapping may fix your issue.&#60;/p&#62;
&#60;p&#62;Can you give us more info on the Cassandra CF columns and types so we can try to reproduce and see if there is a bug in the automapping?  &#60;/p&#62;
&#60;p&#62;Thanks
&#60;/p&#62;</description>
		</item>
		<item>
			<title>Anonymous on "Hive Query Problems"</title>
			<link>http://www.datastax.com/support-forums/topic/hive-query-problems#post-603</link>
			<pubDate>Thu, 03 Nov 2011 13:30:56 +0000</pubDate>
			<dc:creator>Anonymous</dc:creator>
			<guid isPermaLink="false">603@http://www.datastax.com/support-forums/</guid>
			<description>&#60;p&#62;I'm having a strange problem running a hive query against my Datastax Enterprise 1.0 cluster:&#60;/p&#62;
&#60;p&#62;If I use the automatic cassandra to Hive mapping, then I can't execute a query with a group by:&#60;/p&#62;
&#60;p&#62;&#60;code&#62;select myfield, count(*) from mytable group by myfield;&#60;/code&#62;&#60;/p&#62;
&#60;p&#62;it fails with the following error:&#60;br /&#62;
&#60;pre&#62;&#60;code&#62;FAILED: Hive Internal Error: java.lang.NullPointerException(null)
java.lang.NullPointerException
        at org.apache.hadoop.hive.ql.exec.Utilities.getColumnNamesFromSortCols(Utilities.java:1314)&#60;/code&#62;&#60;/pre&#62;
&#60;p&#62;Just doing a select on myfield works (so the column name is valid), it's only when it's in a group by that it doesn't seem to work.&#60;/p&#62;
&#60;p&#62;I tried doing a manual mapping of the column family to a hive table. In that case, i can execute the query above BUT all my numeric columns (ints and bigints) show up in Hive as null despite showing up correctly in Cassandra and in the automapped table. I did a &#60;code&#62;describe extended mytable&#60;/code&#62; on both the auto-created table and the manually created table and verified that the datatypes match for all the columns. &#60;/p&#62;
&#60;p&#62;Any thoughts would be greatly appreciated.
&#60;/p&#62;</description>
		</item>

	</channel>
</rss>
