I've gotten the portfolio demo running before and now i've set up a 4 node (2 Cassandra, 2 Brisk) cluster running inside of EC2. Things seem to be working well, but I'm running into problems with simple hive queries, here is an example session:
ubuntu@ip-10-88-197-147:/tmp/ubuntu$ brisk hive
Hive history file=/tmp/ubuntu/hive_job_log_ubuntu_201106281921_318487041.txt
hive> drop table Timelines;
OK
Time taken: 0.714 seconds
hive> CREATE EXTERNAL TABLE Timelines
> (row_key string, column_name string, value string)
> STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
> WITH SERDEPROPERTIES ("cassandra.ks.name" = "TRProd");
OK
Time taken: 0.855 seconds
hive> select * from Timelines limit=10;
FAILED: Parse Error: line 1:29 mismatched input '=' expecting Number in limit clause
hive> select * from Timelines limit 10;
OK
s:1116:0 <@�~�5��@Ɩ6� 85546918847070208
s:1116:0 gXuB�7��YKr�O�> 85549591134617601
s:1116:0 �����7��}���a 85550274655174656
s:1116:0 �H��7����JQ�ç 85550611818496000
s:1116:0 6ν�8��:�e�q� 85552055397253120
s:1116:0 Hc���:�����O��A 85555894108176384
s:1116:0 �i�:�������� 85557110934474752
s:1116:0 ���<��l�^��� 85559776351748096
s:1116:0 p4�=���~���] 85561406514135040
s:1116:0 }K�.�>��m^��@ 85563216570228736
Time taken: 4.166 seconds
hive> select * from Timelines limit 10;
OK
Failed with exception java.io.IOException:java.lang.NullPointerException
Time taken: 0.19 seconds
What amusing is that the second query fails (hrm). Also, Any query that causes a mapreduce job to be created fails. Here is a backtrace I see on the JobTracker Page:
java.util.NoSuchElementException
at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1115)
at java.util.TreeMap$EntryIterator.next(TreeMap.java:1153)
at java.util.TreeMap$EntryIterator.next(TreeMap.java:1148)
at org.apache.hadoop.hive.cassandra.input.HiveCassandraStandardColumnInputFormat$2.next(HiveCassandraStandardColumnInputFormat.java:168)
at org.apache.hadoop.hive.cassandra.input.HiveCassandraStandardColumnInputFormat$2.next(HiveCassandraStandardColumnInputFormat.java:111)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:66)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:32)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:67)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:236)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:216)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)
Also, I'm curious to find out how best to represent cassandra UUID's when doing hive jobs...That's a vague question, but yeah, just wondering. I've tried making it a bigint and it just comes back as NULL
