DataStax Enterprise 4.5

Using Pig

DataStax Enterprise (DSE) includes a Cassandra File System (CFS) enabled Apache Pig Client. Pig is a high-level programming environment for MapReduce coding. You can explore big data sets using the Pig Latin data flow language for programmers. Relations, which are similar to tables, are constructed of tuples, which correspond to the rows in a table. Unlike a relational database table, Pig relations do not require every tuple to contain the same number of fields. Fields in the same position (column) need not be of the same type. Using Pig, you can devise logic for data transformations, such as filtering data and grouping relations. The transformations occur during the MapReduce phase.

Configure the job tracker node for the node running Pig as you would for any analytics (Hadoop) node. Use the dsetool commands to manage the job tracker. After configuration, Pig clients automatically select the correct job tracker node on startup. Pig programs are compiled into MapReduce jobs, executed in parallel by Hadoop, and run in a distributed fashion on a local or remote cluster.

Support for TTL

In DataStax Enterprise 4.5 you can set the TTL (time to live) on Pig data. You use the cql:// URL, which includes a prepared statement shown in step 10 of the library demo.

Support for CQL collections

Pig in DataStax Enterprise supports CQL collections. Pig-supported types must be used.

Show/hide