DataStax Enterprise 4.0

Example: Handle a compound primary key

This example, like the previous one, shows you how to work with CQL tables in Pig. The previous example used tables having a simple primary key. The tables in this example use compound primary keys. You create the tables in cqlsh and merge them using Pig.



Procedure

  1. Create a four-column (a, b, c, d) Cassandra table named table1 and another five-column (id, x, y, z, data) table named table2.
    cqlsh:cql3ks> CREATE TABLE table1 (
                    a int,
                    b int,
                    c text,
                    d text,
                    PRIMARY KEY (a,b,c)
                  );
    cqlsh:cql3ks> CREATE TABLE table2 (
                    id int PRIMARY KEY, 
                    x int,
                    y int,
                    z text, 
                    data text
                  );
  2. Insert data into the tables.
    cqlsh:cql3ks> INSERT INTO table1 (a, b , c , d )
                    VALUES ( 1,1,'One','match');
    cqlsh:cql3ks> INSERT INTO table1 (a, b , c , d )
                    VALUES ( 2,2,'Two','match');
    cqlsh:cql3ks> INSERT INTO table1 (a, b , c , d )
                    VALUES ( 3,3,'Three','match');
    cqlsh:cql3ks> INSERT INTO table1 (a, b , c , d )
                    VALUES ( 4,4,'Four','match');
    cqlsh:cql3ks> INSERT INTO table2 (id, x, y, z,data)
                    VALUES (1,5,6,'Fix','nomatch');
    cqlsh:cql3ks> INSERT INTO table2 (id, x, y, z,data)
                    VALUES (2,6,5,'Sive','nomatch');
    cqlsh:cql3ks> INSERT INTO table2 (id, x, y, z,data)
                    VALUES (3,7,7,'Seven','match');
    cqlsh:cql3ks> INSERT INTO table2 (id, x, y, z,data)
                    VALUES (4,8,8,'Eight','match');
    cqlsh:cql3ks> INSERT INTO table2 (id, x, y, z,data)
                    VALUES (5,9,10,'Ninen','nomatch');
  3. Using Pig, add logic to load the data from the Cassandra table2 to a Pig relation.
    grunt> moredata = load 'cql://cql3ks/table2' USING CqlStorage;
  4. Convert the data to a tuple.
    grunt> insertformat = FOREACH moredata GENERATE TOTUPLE
             (TOTUPLE('a',x),TOTUPLE('b',y),
             TOTUPLE('c',z)),TOTUPLE(data);

    During the actual data processing, the data is formatted as follows:

    ((PartitionKey_Name,Value),(ClusteringKey_1_name,Value)...)(ArgValue1,ArgValue2,ArgValue3,...)
  5. Save the Pig relation to the Cassandra table1 table. The data from table 1 and table 2 will be merged.
    grunt> STORE insertformat INTO 'cql://cql3ks/table1?output_query=UPDATE%20cql3ks.table1%20SET%20d%20%3D%20%3F' USING CqlStorage;

    The cql:// URL includes a prepared statement, described later, that needs to be copied/pasted as a continuous string (no spaces or line breaks).

  6. In cqlsh, query table1 to check that the data from table1 and table2 have been merged.
    cqlsh:cql3ks> SELECT * FROM table1;
    
     a | b  | c     | d
    ---+----+-------+---------
     5 |  6 |   Fix | nomatch
     1 |  1 |   One |   match
     8 |  8 | Eight |   match
     2 |  2 |   Two |   match
     4 |  4 |  Four |   match
     7 |  7 | Seven |   match
     6 |  5 |  Sive | nomatch
     9 | 10 | Ninen | nomatch
     3 |  3 | Three |   match
Show/hide