I have been trying to do a query and it is exploding in a fantastic fashion. Below is my test case
Hive history file=/tmp/vachon/hive_job_log_vachon_201110211052_1405555749.txt
hive> CREATE EXTERNAL TABLE TESTTABLE
> (row_key string, column_name string, value string)
> STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
> WITH SERDEPROPERTIES ("cassandra.ks.name" = "Greyhound");
OK
Time taken: 1.458 seconds
>> OK so Hive works and has write it seems
hive> SELECT * FROM greyhound.PlayerEvents limit 10;
OK
[10 Records worth of output]
Time taken: 2.365 seconds
>> OK so Hive can read as well
So now I know I can read and write I want to do:
SELECT count(*) FROM greyhound.PlayerEvents;
I get this output in syslog
n/jobcache/job_201110210214_0001/jars/, file:/tmp/hadoop-cassandra/mapred/local/taskTracker/vachon/jobcache/job_201110210214_0001/attempt_201110210214_0001_m_000000_0/work/]
2011-10-21 02:15:29,463 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Adding alias greyhound.playerevents to work list for file cfs://null/user/hive/warehouse/greyhound.db/playerevents
2011-10-21 02:15:29,465 INFO org.apache.hadoop.hive.ql.exec.MapOperator: dump TS struct<player_id:string,application_event_id:string,id:string,event_data:string>
2011-10-21 02:15:29,466 INFO ExecMapper:
<MAP>Id =7
<Children>
<TS>Id =0
<Children>
<SEL>Id =1
<Children>
<GBY>Id =2
<Children>
<RS>Id =3
<Parent>Id = 2 null<\Parent>
<\RS>
<\Children>
<Parent>Id = 1 null<\Parent>
<\GBY>
<\Children>
<Parent>Id = 0 null<\Parent>
<\SEL>
<\Children>
<Parent>Id = 7 null<\Parent>
<\TS>
<\Children>
<\MAP>
2011-10-21 02:15:29,466 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Initializing Self 7 MAP
2011-10-21 02:15:29,466 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: Initializing Self 0 TS
2011-10-21 02:15:29,466 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: Operator 0 TS initialized
2011-10-21 02:15:29,466 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: Initializing children of 0 TS
2011-10-21 02:15:29,466 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing child 1 SEL
2011-10-21 02:15:29,466 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing Self 1 SEL
2011-10-21 02:15:29,466 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: SELECT struct<player_id:string,application_event_id:string,id:string,event_data:string>
2011-10-21 02:15:29,469 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Operator 1 SEL initialized
2011-10-21 02:15:29,469 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing children of 1 SEL
2011-10-21 02:15:29,469 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator: Initializing child 2 GBY
2011-10-21 02:15:29,469 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator: Initializing Self 2 GBY
2011-10-21 02:15:29,484 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator: Operator 2 GBY initialized
2011-10-21 02:15:29,484 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator: Initializing children of 2 GBY
2011-10-21 02:15:29,484 INFO org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: Initializing child 3 RS
2011-10-21 02:15:29,484 INFO org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: Initializing Self 3 RS
2011-10-21 02:15:29,488 INFO org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: Using tag = -1
2011-10-21 02:15:29,499 INFO org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: Operator 3 RS initialized
2011-10-21 02:15:29,499 INFO org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: Initialization Done 3 RS
2011-10-21 02:15:29,499 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator: Initialization Done 2 GBY
2011-10-21 02:15:29,499 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initialization Done 1 SEL
2011-10-21 02:15:29,499 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: Initialization Done 0 TS
2011-10-21 02:15:29,499 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Initialization Done 7 MAP
2011-10-21 02:15:29,532 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Processing path cfs://null/user/hive/warehouse/greyhound.db/playerevents
2011-10-21 02:15:29,532 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias greyhound.playerevents for file cfs://null/user/hive/warehouse/greyhound.db/playerevents
2011-10-21 02:15:29,532 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 forwarding 1 rows
2011-10-21 02:15:29,533 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 1 rows
2011-10-21 02:15:29,533 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 forwarding 1 rows
2011-10-21 02:15:29,533 INFO ExecMapper: ExecMapper: processing 1 rows: used memory = 129923416
2011-10-21 02:15:29,536 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 forwarding 10 rows
2011-10-21 02:15:29,536 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 10 rows
2011-10-21 02:15:29,536 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 forwarding 10 rows
2011-10-21 02:15:29,536 INFO ExecMapper: ExecMapper: processing 10 rows: used memory = 129923416
2011-10-21 02:15:29,559 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 forwarding 100 rows
2011-10-21 02:15:29,559 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarding 100 rows
2011-10-21 02:15:29,559 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 forwarding 100 rows
2011-10-21 02:15:29,559 INFO ExecMapper: ExecMapper: processing 100 rows: used memory = 130469504
2011-10-21 02:15:29,691 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 finished. closing...
2011-10-21 02:15:29,691 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 forwarded 415 rows
2011-10-21 02:15:29,692 INFO org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0
2011-10-21 02:15:29,692 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 finished. closing...
2011-10-21 02:15:29,692 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 forwarded 415 rows
2011-10-21 02:15:29,692 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 finished. closing...
2011-10-21 02:15:29,692 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 forwarded 415 rows
2011-10-21 02:15:29,692 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator: 2 finished. closing...
2011-10-21 02:15:29,692 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator: 2 forwarded 0 rows
2011-10-21 02:15:29,692 WARN org.apache.hadoop.hive.ql.exec.GroupByOperator: Begin Hash Table flush at close: size = 1
2011-10-21 02:15:29,692 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator: 2 forwarding 1 rows
2011-10-21 02:15:29,693 INFO org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 3 finished. closing...
2011-10-21 02:15:29,693 INFO org.apache.hadoop.hive.ql.exec.ReduceSinkOperator: 3 forwarded 0 rows
2011-10-21 02:15:29,693 INFO org.apache.hadoop.hive.ql.exec.GroupByOperator: 2 Close done
2011-10-21 02:15:29,693 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 Close done
2011-10-21 02:15:29,693 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 0 Close done
2011-10-21 02:15:29,693 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 7 Close done
2011-10-21 02:15:29,694 INFO ExecMapper: ExecMapper: processed 415 rows: used memory = 108402288
2011-10-21 02:15:29,700 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2011-10-21 02:15:29,715 WARN org.apache.hadoop.mapred.Child: Error running child
java.lang.NullPointerException
at org.apache.hadoop.hive.cassandra.input.HiveCassandraStandardColumnInputFormat$2.next(HiveCassandraStandardColumnInputFormat.java:173)
at org.apache.hadoop.hive.cassandra.input.HiveCassandraStandardColumnInputFormat$2.next(HiveCassandraStandardColumnInputFormat.java:111)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:66)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:32)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:67)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:236)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:216)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)
2011-10-21 02:15:29,721 INFO org.apache.hadoop.mapred.Task: Runnning cleanup for the task
>> OK so that explodes. I try it on another brisk cluster and I get:
java.lang.UnsupportedOperationException: This operation is not supported for Super Columns.
at org.apache.cassandra.db.SuperColumn.value(SuperColumn.java:174)
at org.apache.hadoop.hive.cassandra.input.HiveCassandraStandardColumnInputFormat$2.next(HiveCassandraStandardColumnInputFormat.java:242)
at org.apache.hadoop.hive.cassandra.input.HiveCassandraStandardColumnInputFormat$2.next(HiveCassandraStandardColumnInputFormat.java:111)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:66)
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:32)
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:67)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:236)
at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:216)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
at org.apache.hadoop.mapred.Child$4.run(Child.java:259)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1059)
at org.apache.hadoop.mapred.Child.main(Child.java:253)
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
Caused by: java.io.IOException: Task process exit with nonzero status of 65.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
>>So my overall question is this, what in particular isn't supported, also why are the two error messages different.
Also I ran the select limit 10 query on our test enviroment and I get: "Failed with exception java.io.IOException:java.lang.UnsupportedOperationException: This operation is not supported for Super Columns." again, but this works on our production cluster
