DataStax Enterprise 4.0

CQL data access

In DataStax Enterprise 4.0.4, to access data in CQL tables, use the CqlNativeStorage handler with the new input_cql statement or use the output_query statement that was available in earlier releases.

In DataStax Enterprise 4.0-4.0.3, to access data in CQL tables, use the CqlStorage() handler. To access data in the CassandraFS, the target keyspace and table must already exist. Data in a Pig relation can be stored in a Cassandra table, but Pig will not create the table.

The Pig LOAD function pulls Cassandra data into a Pig relation through the storage handler as shown in these examples:
  • DataStax Enterprise 4.0.4
    <pig_relation_name> = LOAD 'cql://<keyspace>/<table>' 
        USING CqlNativeStorage(); -- DataStax Enterprise 4.0.4
  • DataStax Enterprise 4.0 - 4.0.3
    <pig_relation_name> = LOAD 'cql://<keyspace>/<table>' 
        USING CqlStorage(); -- DataStax Enterprise 4.0 - 4.0.3
DataStax Enterprise supports these Pig data types:
  • int
  • long
  • float
  • double
  • boolean
  • chararray
The Pig demo examples include using the LOAD command.

LOAD schema

The LOAD Schema is:

(colname:colvalue, colname:colvalue, … )

where each colvalue is referenced by the Cassandra column name.

Accessing data using input_cql and CqlNativeStorage handler

The input_cql statement contains the following components:
  • A SELECT statement that includes the partition key columns
  • A WHERE clause that includes the range of the columns consistent with the order in the cluster and in the following format:
    WHERE token(partitionkey) > ? and token(partitionkey) <?
  • The value of the native_port

For example, the input_cql statement before encoding might look like this:

'SELECT * FROM where token(key) > ? and token (key) <= ?' USING CqlNativeStorage();
Append the encoded statement as an argument to the pig Load command using the ?input_cql= syntax.
x = LOAD 'cql://ks/tab?input_cql=SELECT%20*' USING CqlNativeStorage();
Use an ampersand to append additional parameters. For example, to modify the port used by the Java Driver, append the following parameter and port number.

The entire migrated Pig command would look like this:

x = LOAD 'cql://ks/tab?input_cql=SELECT%20*;native_port=9042' USING CqlNativeStorage(); 

Optional input_cql parameters

You can use the following list of parameters with input_cql in DataStax Enterprise 4.0.4 and later as shown by the example in the last section. The ampersand must preface the parameter.
  • &native_port=<native_port>
  • &core_conns=<core_conns>
  • &max_conns=<max_conns>
  • &min_simult_reqs=<min_simult_reqs>
  • &max_simult_reqs=<max_simult_reqs>
  • &native_timeout=<native_timeout>
  • &native_read_timeout=<native_read_timeout>
  • &rec_buff_size=<rec_buff_size>
  • &send_buff_size=<send_buff_size>
  • &solinger=<solinger>
  • &tcp_nodelay=<tcp_nodelay>
  • &reuse_address=<reuse_address>
  • &keep_alive=<keep_alive>
  • &auth_provider=<auth_provider>
  • &trust_store_path=<trust_store_path>
  • &key_store_path=<key_store_path>
  • &trust_store_password=<trust_store_password>
  • &key_store_password=<key_store_password>
  • &cipher_suites=<cipher_suites>
  • &input_cql=<input_cql>

Handling special characters in the CQL

If the input_cql or output_query to a Pig function contains special characters, you need to url-encode a prepared statement to make special characters readable by Pig.