Getting started with the DataStax C/C++ driver

By Michael Penick -  December 30, 2014 | 0 Comments

The DataStax C/C++ driver is one of the newest members of the DataStax drivers family. It just recently had its first release candidate. Up to now, the focus of our work has been on matching feature parity with the other drivers as well as finalizing the API. The goal of this post is to provide some of that introductory documentation. More in-depth documentation can be found in the fully documented header file as well as examples provided with the driver. In addition to this post, we are currently working to include additional documentation as part of the final 1.0 release.

This post will not cover building the driver or setting up a Cassandra cluster. If you haven't built the driver before the instructions for doing so can be found in the README. We have near-term plans for making this process easier and providing binary releases for major platforms. Documentation for setting up a Cassandra cluster can found on datastax.com or planetcassandra.com. Let's get started using the driver!

Configuring the driver

The cluster object

The first step to using the driver is to create a CassCluster object that describes your Cassandra cluster's configuration. The default cluster object is good for most clusters and only a list of contact points needs to be configured. The list of contact points doesn't need to contain every host in your cluster, only a small subset is required, because the rest of the cluster will be automatically discovered through the control connection. It's a good idea to change the order of your contact points for each of your client hosts to prevent a single Cassandra host from becoming the control connection on every client machine in your cluster. The plan is to do this automatically in a future release. The control connection also monitors changes in your cluster's topology (automatically handling node outages, adding new nodes, and removal of old nodes) and tracks schema changes.

CassCluster* cluster = cass_cluster_new();

/* Contact points can be added as a comma-delimited list */
cass_cluster_set_contact_points("127.0.0.1,127.0.0.2");

/* Or individually */
cass_cluster_set_contact_points("127.0.0.3");
cass_cluster_set_contact_points("127.0.0.4");

/* DNS can also be used */
cass_cluster_set_contact_points("node1.datastax.com,node2.datastax.com");

Other cluster settings

The cluster object can also be used to configure SSL, set authentication credentials, and tune driver performance. The full list and explanation of all the driver's cluster object settings can be found in the driver's header file.

Connecting a session and executing queries

The session object

The session object is used to execute queries. Internally, it also manages a pool of client connections to Cassandra and uses a load balancing policy to distribute requests across those connections. It's recommend that your application only creates a single session object per keyspace as a session object is designed to be created once, reused and shared by multiple application threads. The throughput of a session can be scaled by increasing the number of I/O threads. An I/O thread is used to handle reading and writing query request data to and from Cassandra. The number of I/O threads defaults to one per CPU core, but it can be configured using cass_cluster_set_num_threads_io(). It's generally better to create a single session with more I/O threads than multiple sessions with a smaller number of I/O threads. More DataStax driver best practices can be found in this post.

Connecting a session

The C/C++ driver's API is designed so that no operation will force your application to block. Operations that would normally cause your application to block, such as connecting to a cluster or running a query, instead return a CassFuture object that can be waited on, polled or used to register a callback. The API can also be used synchronously by immediately attempting to get the result from a future. To demonstrate the use of CassFuture let's create and connect a CassSession using the cluster object we created earlier.

CassSession* session = cass_session_new();

CassFuture* connect_future = cass_session_connect(session, cluster);

/* This operation will block until the result is ready */
CassError rc = cass_future_error_code(connect_future);

printf("Connect result: %s\n", cass_error_desc(rc));

cass_future_free(connect_future);
cass_session_free(session);

In that example the future is waited on synchronously, it's also possible to asynchronously receive notification of the connection from a callback.

void on_connect(CassFuture* future, void* data) {
  /* This operation will now return immediately */
  CassError rc = cass_future_error_code(future);
  printf("%s\n", cass_error_desc(rc));
}

CassSession* session = cass_session_new();

CassFuture* connect_future = cass_session_connect(session, cluster);

/* Set a callback instead of waiting for the result to be returned */
cass_future_set_callback(on_connect, NULL);

/* The application's reference to the future can be freed immediately */
cass_future_free(connect_future);

/* Run other application logic */

cass_session_free(session);

It should be noted that the driver may run the callback on thread that's different from the application's calling thread. Any data accessed in the callback must be immutable or synchronized with a mutex, semaphore, etc. A full example using callbacks can be found here.

Running queries

The connected session can now be used to run queries. Queries are constructed using CassStatement objects. There are two types of statement objects, regular and prepared. Regular statements are most useful for ad hoc queries and applications where the query string will change often. A prepared statement caches the query on the Cassandra server and requires the extra step of preparing the query server-side first.

CassStatement objects can also be used to bind variables. The '?' marker is used to denote the bind variables in a query string. In addition to adding the bind marker to your query string your application must also provide the number of bind variables to cass_statement_new() when constructing a new statement. If a query doesn't require any bind variables then 0 can be used. cass_statement_bind_*() functions are then used to bind values to the statement's variables. Bind variables can be bound by the marker's position (index) or by name. Variables can only be bound by name for prepared statements (see the prepared statement example below). This limitation exists because query metadata provided by Cassandra is required to map the variable name to the variable's marker index.

CassString insert_query = cass_string_init("INSERT INTO example (key, value) VALUES (?, ?);");

/* There are two bind variables in the query string */
CassStatement* statement = cass_statement_new(insert_query, 2);

/* Bind the values using the indices of the bind variables */
cass_statement_bind_string(statement, 0, cass_string_init("abc"));
cass_statement_bind_int32(statement, 1, 123);

CassFuture* query_future = cass_session_execute(session, statement);

/* Statement objects can be freed immediately after being executed */
cass_statement_free(statement);

/* This will block until the query has finished */
CassError rc = cass_future_error_code(query_future);

printf("Query result: %s\n", cass_error_desc(rc));

cass_future_free(query_future);

Prepared statements

A prepared statement should be used to improve the performance of frequently executed queries. Preparing the query caches it on the Cassandra nodes and only needs to be done once. Once created, prepared statements should be reused with different bind variables.

CassString insert_query = cass_string_init("INSERT INTO example (key, value) VALUES (?, ?);");

/* Prepare the statement on the Cassandra cluster */
CassFuture* prepare_future = cass_session_prepare(session, insert_query);

/* Wait for the statement to prepare and get the result */
CassError rc = cass_future_error_code(prepare_future);

printf("Prepare result: %s\n", cass_error_desc(rc));

if (rc != CASS_OK) {
  /* Handle error */
  cass_future_free(prepare_future);
  return -1;
}

/* Get the prepared object from the future */
const CassPrepared* prepared = cass_future_get_prepared(prepared_future);

/* The future can be freed immediately after getting the prepared object */
cass_future_free(prepare_future);

/* The prepared object can now be used to create statements that can be executed */
CassStatement* statement = cass_prepared_bind(prepared);

/* Bind variables by name this time (this can only be done with prepared statements)*/
cass_statement_bind_string_by_name(statement, "key", cass_string_init("abc"));
cass_statement_bind_int32_by_name(statement, "value", 123);

/* Execute statement (same a the non-prepared code) */

/* The prepared object must be freed */
cass_prepared_free(prepared);

Notice that this example also uses the cass_statement_bind_*_byname() functions instead of binding by index. Also, the CassPrepared object is immutable and can be used to prepare statements on multiple threads concurrently.

Handling results

Before, when inserting a new row the future object didn't have any meaningful result other than error code. Now that data has been inserted into the "examples" table we can use a SELECT statement to retrieve the results. The code to do this looks similar to the INSERT example except now a CassResult object can be retrieved from the queries' future object.

CassString query = cass_string_init("SELECT * FROM example (key, value) WHERE key = ?;");

/* There's only a single variable to bind this time */
CassStatement* statement = cass_statement_new(query, 1);

/* Bind the value using the index of the bind variable */
cass_statement_bind_string(statement, 0, cass_string_init("abc"));

CassFuture* query_future = cass_session_execute(session, statement);

/* Statement objects can be freed immediately after being executed */
cass_statement_free(statement);

/* This will also block until the query returns */
const CassResult* result = cass_future_get_result(future);

/* If there was an error then the result won't be available */
if (result == NULL) {
  /* Handle error */
  cass_future_free(query_future);
  return -1;
}

/* The future can be freed immediately after getting the result object */
cass_future_free(query_future);

/* This can be used to retrieve on the first row of the result */
const CassRow* row = cass_result_first_row(result);

/* Now we can retrieve the column values from the row */
CassString key;
/* Get the column value of "key" by name */
cass_value_get_string(cass_row_get_column_by_name(row, "key"), &key);

cass_int32_t value;
/* Get the column value of "value" by name */
cass_value_get_int32(cass_row_get_column_by_name(row, "value"), &value);


/* This will free the future as well as the string pointed to by the CassString 'key' */
cass_result_free(result);

In this example, only a single row is retrieved from Cassandra so the convenience function cass_result_first_row() can be used to get the first and only row. If multiple rows are returned a CassIterator object can be used to iterate over the returned rows (see the example below). Column values, of type const CassValue*, are then retrieved from the row using either cass_row_get_column() or cass_row_get_column_by_name().

Values such as CassString and CassBytes point to memory held by the result object. The lifetimes of those values are valid as long as the result object isn't freed. These values need to be copied into application memory if they need to live longer than the result object's lifetime. Primitive types such as cass_int32_t are copied by the driver because it can be done cheaply without incurring extra allocations.

The returned result object can be read and iterated on by multiple threads concurrently because the iterator object itself contains the position state allowing the result object to remain immutable.

Iterators

The queries in the previous examples returned a single row result, but queries often return many rows. An iterator object is used to access all the rows of a result.

/* Create a new row iterator from the result */
CassIterator* row_iterator = cass_iterator_from_result(result);

while (cass_iterator_next(row_iterator)) {
  const CassRow* row = cass_iterator_get_row(row_iterator);
  /* Copy data from the row */
}

cass_iterator_free(row_iterator);

Code inside the iteration loop should make a copy of the row values (or process them immediately) because cass_iterator_next() invalidates the previous row returned by cass_iterator_get_row(). In addition to iterating a result with multiple rows, there are iterators that can be used to iterator over columns and collections. The column and collection iterators have a very similar API and the same semantics as shown in the row iterator example.

Paging

Large result sets can be divided into multiple pages automatically using the driver's paging API. To do this the result object keeps track of the pagination state for the sequence of paging queries. When paging through the result set the result object is checked to see if more pages exist and then attached to the statement before re-executing the query to get the next page.

CassString query = cass_string_init("SELECT * FROM example");
CassStatement* statement = cass_statement_new(query, 0);

/* Return a 100 rows every time this statement is executed */
cass_statement_set_paging_size(statement, 100);

cass_bool_t has_more_pages = cass_true;

while (has_more_pages) {
  CassFuture* query_future = cass_session_execute(session, statement);

  const CassResult* result = cass_future_get_result(future);

  if (result == NULL) {
     /* Handle error */
     cass_future_free(query_future);
     break;
  }

  /* Get values from result... */

  /* Check to see if there are more pages remaining for this result */
  has_more_pages = cass_result_has_more_pages(result);

  if (has_more_pages) {
    /* If there are more pages we need to set the position for the next execute */
    cass_statement_set_paging_state(statement, result);
  }

  cass_result_free(result);  
}

A more complete example of paging can be found here.

Batches

Batches can be used to group multiple mutations (UPDATE, INSERT, DELETE) together into a single statement. CASS_BATCH_TYPE_LOGGED can be used to make sure that multiple mutations across multiple partitions happen atomically, that is, all the included mutations will eventually succeed. However, there is some overhead associated with using logged batches in Cassandra. Batches can also be used to group mutations for a single partition key by setting CASS_BATCH_TYPE_UNLOGGED and for counters via CASS_BATCH_TYPE_COUNTER. In the case with unlogged batches, it should NOT be used as a performance optimization. More information on the use cases of batch statements can be found in this excellent post. Here's how to use batches:

/* This logged batch will makes sure that all the mutations eventually succeed */
CassBatch* batch = cass_batch_new(CASS_BATCH_TYPE_LOGGED);

/* Statements can be immediately freed after being added to the batch */

{
  CassStatement* statement = cass_statement_new(cass_string_init("INSERT INTO example1(key, value) VALUES ('a', '1')"), 0);
  cass_batch_add_statement(batch, statement);
  cass_statement_free(statement);
}

{
  CassStatement* statement = cass_statement_new(cass_string_init("UPDATE example2 set value = '2' WHERE key = 'b'"), 0);
  cass_batch_add_statement(batch, statement);
  cass_statement_free(statement);
}

{
  CassStatement* statement = cass_statement_new(cass_string_init("DELETE FROM example3 WHERE key = 'c'"), 0);
  cass_batch_add_statement(batch, statement);
  cass_statement_free(statement);
}

CassFuture* batch_future = cass_session_execute_batch(session, batch);

/* Batch objects can be freed immediately after being executed */
cass_batch_free(batch);

/* This will block until the query has finished */
CassError rc = cass_future_error_code(batch_future);

printf("Batch result: %s\n", cass_error_desc(rc));

cass_future_free(batch_future);

A full example using batches can be found here.

Additional resources

This post covered the basic functionality provided by the DataStax C/C++ driver with the goal of helping you to get started. More in-depth API documentation and example code be found in the C/C++ driver's GitHub repository. In addition to this, we are working on substantially improving the formal documentation for the C/C++ driver over the next few releases. If you need help or have questions please use the mailing list or IRC.









DataStax has many ways for you to advance in your career and knowledge.

You can take free classes, get certified, or read one of our many white papers.



register for classes

get certified

DBA's Guide to NoSQL







Comments

Your email address will not be published. Required fields are marked *




Subscribe for newsletter:

Tel. +1 (408) 933-3120 sales@datastax.com Offices France GermanyJapan

DataStax Enterprise is powered by the best distribution of Apache Cassandra™.

© 2017 DataStax, All Rights Reserved. DataStax, Titan, and TitanDB are registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache Cassandra, Apache, Tomcat, Lucene, Solr, Hadoop, Spark, TinkerPop, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.