DataStax Enterprise 4.0

Using Pig

DataStax Enterprise (DSE) includes a CassandraFS enabled Apache Pig Client. Pig is a higher-level programming environment for MapReduce coding. You can explore big data sets using the Pig Latin data flow language for programmers. Relations, which are similar to tables, are constructed of tuples, which correspond to the rows in a table. Unlike a relational database table, Pig relations do not require every tuple to contain the same number of fields. Fields in the same position (column) need not be of the same type. Using Pig, you can devise logic for data transformations, such as filtering data and grouping relations. The transformations occur during the MapReduce phase.

Configure the job tracker node for the node running Pig as you would for any analytics (Hadoop) node. Use the dsetool commands to manage the job tracker. After configuration, Pig clients automatically select the correct job tracker node on startup. Pig programs are compiled into MapReduce jobs, executed in parallel by Hadoop, and run in a distributed fashion on a local or remote cluster.

Support for CQL collections

Pig in DataStax Enterprise supports CQL collections. Pig-supported types must be used.

Show/hide