All the statements passes except statement:"INSERT OVERWRITE TABLE portfolio_returns". Below is the trail:
hive> DROP TABLE IF EXISTS Portfolios;
OK
Time taken: 1.105 seconds
hive> create external table Portfolios(row_key string, column_name string, value string)
> STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
> WITH SERDEPROPERTIES ("cassandra.columns.mapping" = ":key,:column,:value",
> "cassandra.ks.name" = "PortfolioDemo",
> "cassandra.ks.repfactor" = "1",
> "cassandra.ks.strategy" = "org.apache.cassandra.locator.SimpleStrategy",
> "cassandra.cf.name" = "Portfolios" ,
> "cassandra.host" = "127.0.0.1" ,
> "cassandra.port" = "9160",
> "cassandra.partitioner" = "org.apache.cassandra.dht.RandomPartitioner")
> TBLPROPERTIES (
> "cassandra.input.split.size" = "64000",
> "cassandra.range.size" = "1000",
> "cassandra.slice.predicate.size" = "1000");
OK
Time taken: 0.302 seconds
hive> DROP TABLE IF EXISTS StockHist;
OK
Time taken: 0.0060 seconds
hive> create external table StockHist(row_key string, column_name string, value double)
> STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
> WITH SERDEPROPERTIES ("cassandra.ks.name" = "PortfolioDemo",
> "cassandra.cf.validatorType" = "UTF8Type,UTF8Type,DoubleType"
> );
OK
Time taken: 0.018 seconds
hive> --first calculate returns
> DROP TABLE IF EXISTS 10dayreturns;
OK
Time taken: 0.0060 seconds
hive> CREATE TABLE 10dayreturns(ticker string, rdate string, return double)
> STORED AS SEQUENCEFILE;
OK
Time taken: 0.011 seconds
hive> INSERT OVERWRITE TABLE 10dayreturns
> select a.row_key ticker, b.column_name rdate, (b.value - a.value) ret
> from StockHist a JOIN StockHist b on
> (a.row_key = b.row_key AND date_add(a.column_name,10) = b.column_name);
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201203021013_0001, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201203021013_0001
Kill Command = /Users/srandhawa/datastax/dse-1.0.2/bin/dse hadoop job -Dmapred.job.tracker=127.0.0.1:8012 -kill job_201203021013_0001
2012-03-02 10:25:24,752 Stage-1 map = 0%, reduce = 0%
2012-03-02 10:25:55,284 Stage-1 map = 100%, reduce = 0%
2012-03-02 10:25:56,522 Stage-1 map = 0%, reduce = 0%
2012-03-02 10:26:36,284 Stage-1 map = 0%, reduce = 100%
Ended Job = job_201203021013_0001
Loading data to table default.10dayreturns
Deleted cfs://null/user/hive/warehouse/10dayreturns
Table default.10dayreturns stats: [num_partitions: 0, num_files: 1, num_rows: 0, total_size: 4158622]
90000 Rows loaded to 10dayreturns
OK
Time taken: 93.094 seconds
hive> INSERT OVERWRITE TABLE portfolio_returns
> select row_key portfolio, rdate, SUM(b.return)
> from Portfolios a JOIN 10dayreturns b ON
> (a.column_name = b.ticker)
> group by row_key, rdate;
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201203021013_0002, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201203021013_0002
Kill Command = /Users/srandhawa/datastax/dse-1.0.2/bin/dse hadoop job -Dmapred.job.tracker=127.0.0.1:8012 -kill job_201203021013_0002
2012-03-02 10:27:57,851 Stage-1 map = 0%, reduce = 0%
2012-03-02 10:28:04,941 Stage-1 map = 100%, reduce = 0%
2012-03-02 10:28:06,191 Stage-1 map = 0%, reduce = 0%
2012-03-02 10:28:14,677 Stage-1 map = 100%, reduce = 0%
2012-03-02 10:28:22,915 Stage-1 map = 100%, reduce = 17%
2012-03-02 10:28:29,138 Stage-1 map = 100%, reduce = 67%
2012-03-02 10:28:32,288 Stage-1 map = 100%, reduce = 68%
2012-03-02 10:28:38,316 Stage-1 map = 100%, reduce = 73%
2012-03-02 10:28:41,468 Stage-1 map = 100%, reduce = 80%
2012-03-02 10:28:46,135 Stage-1 map = 100%, reduce = 90%
2012-03-02 10:28:47,140 Stage-1 map = 100%, reduce = 95%
2012-03-02 10:28:48,146 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201203021013_0002
Launching Job 2 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201203021013_0003, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201203021013_0003
Kill Command = /Users/srandhawa/datastax/dse-1.0.2/bin/dse hadoop job -Dmapred.job.tracker=127.0.0.1:8012 -kill job_201203021013_0003
2012-03-02 10:28:57,248 Stage-2 map = 0%, reduce = 0%
2012-03-02 10:29:17,341 Stage-2 map = 100%, reduce = 100%
Ended Job = job_201203021013_0003 with errors
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
hive> DROP TABLE IF EXISTS HistLoss;
OK
Time taken: 0.0060 seconds
hive> create external table HistLoss(row_key string, worst_date string, loss string)
> STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
> WITH SERDEPROPERTIES ("cassandra.ks.name" = "PortfolioDemo");
OK
Time taken: 0.08 seconds
hive> INSERT OVERWRITE TABLE HistLoss
> select a.portfolio, rdate, cast(minp as STRING)
> FROM (
> select portfolio, MIN(preturn) as minp
> FROM portfolio_returns
> group by portfolio
> ) a JOIN portfolio_returns b ON (a.portfolio = b.portfolio and a.minp = b.preturn);
Total MapReduce jobs = 2
Launching Job 1 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201203021013_0004, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201203021013_0004
Kill Command = /Users/srandhawa/datastax/dse-1.0.2/bin/dse hadoop job -Dmapred.job.tracker=127.0.0.1:8012 -kill job_201203021013_0004
2012-03-02 10:34:50,648 Stage-1 map = 0%, reduce = 0%
2012-03-02 10:34:51,655 Stage-1 map = 100%, reduce = 0%
2012-03-02 10:34:59,686 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201203021013_0004
Launching Job 2 out of 2
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = job_201203021013_0005, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201203021013_0005
Kill Command = /Users/srandhawa/datastax/dse-1.0.2/bin/dse hadoop job -Dmapred.job.tracker=127.0.0.1:8012 -kill job_201203021013_0005
2012-03-02 10:35:05,026 Stage-0 map = 0%, reduce = 0%
2012-03-02 10:35:07,039 Stage-0 map = 50%, reduce = 0%
2012-03-02 10:35:12,057 Stage-0 map = 100%, reduce = 0%
2012-03-02 10:35:20,090 Stage-0 map = 100%, reduce = 17%
2012-03-02 10:35:21,111 Stage-0 map = 100%, reduce = 100%
Ended Job = job_201203021013_0005
OK
Time taken: 38.27 seconds