DataStax Developer Blog

Python DataStax Enterprise Driver 1.0 and Driver 3.5.0 with Execution Profiles

By Adam Holmberg -  July 5, 2016 | 0 Comments

Last week we released a new Python DSE Driver 1.0.0 in conjunction with DataStax Enterprise 5.0. The DSE driver builds on the existing DataStax Python Driver for Apache Cassandra, adding support for DSE-specific data types, authentication mechanisms, and graph query execution. In this post I will introduce the new DSE driver features, and discuss a new Execution Profiles API introduced in the core driver 3.5.0, which we also released last week.

DataStax Enterprise Python Driver

The DSE Python Driver is a new package that depends on the core driver. The source repository is on github, and the source distribution is published as cassandra-driver-dse.

The driver documentation contains information and examples for all the additional features provided on top of the core driver functionality. Please refer to these pages for DSE features, including authentication, geometric types, and graph request execution. There is also an installation page, and an upgrade guide for bringing the DSE driver in for use where core was used previously. Most applications can upgrade by simply changing a package import. Those using custom load balancing configuration, timeouts, or certain execution parameters will need to know about Execution Profiles, a new feature in the core 3.5.0 release. That feature is introduced below. More detail about upgrading can be found in the upgrade guide.

Execution Profiles

Execution Profiles is introduced as follows in the documentation:

Execution profiles are an experimental API aimed at making it easier to execute requests in different ways within a single connected Session. Execution profiles are being introduced to deal with the exploding number of configuration options, especially as the database platform evolves more complex workloads.

The Execution Profile API is being introduced now, in an experimental capacity, in order to take advantage of it in existing projects, and to gauge interest and feedback in the community. For now, the legacy configuration remains intact, but legacy and Execution Profile APIs cannot be used simultaneously on the same client Cluster.

Execution profiles provide more flexibility for configuring request execution parameters without creating multiple Clusters and Sessions. For a very simple example, we could define an alternate profile that has a longer timeout and returns dicts instead of the default namedtuples.

from cassandra.cluster import Cluster, ExecutionProfile
from cassandra.query import dict_factory

cluster = Cluster(execution_profiles={'df': ExecutionProfile(request_timeout=30.0, row_factory=dict_factory)})
session = cluster.connect()

session.execute('SELECT rpc_address FROM system.local')[0]  # uses default profile
#    Row(rpc_address='127.0.0.1')

session.execute('SELECT rpc_address FROM system.local', execution_profile='df')[0]  # uses named profile
#    {u'rpc_address': '127.0.0.1'}

Another more interesting pattern is to define different load balancing, for example to target statements to different datacenters — possibly serving different workloads. Here is an example configuring the default profile to target one datacenter (with other parameters defaulted), and another profile to target a separate datacenter with some alternative parameters:

from cassandra.cluster import EXEC_PROFILE_DEFAULT
from cassandra.policies import DCAwareRoundRobinPolicy, TokenAwarePolicy
from cassandra.query import tuple_factory

ep1 = ExecutionProfile(load_balancing_policy=TokenAwarePolicy(DCAwareRoundRobinPolicy(local_dc='dc1')))
ep2 = ExecutionProfile(load_balancing_policy=TokenAwarePolicy(DCAwareRoundRobinPolicy(local_dc='dc2')),
                       row_factory=tuple_factory, request_timeout=None)  # target dc2, return tuples, never timeout
session = Cluster(execution_profiles={EXEC_PROFILE_DEFAULT: ep1, 'other-dc': ep2}).connect()

# cluster topology
{(h.address, h.datacenter) for h in session.cluster.metadata.all_hosts()}
#    {('127.0.0.1', u'dc1'), ('127.0.0.2', u'dc1'), ('127.0.0.3', u'dc2')}

# default profile cycles between nodes in 'dc1'
session.execute('SELECT rpc_address, data_center FROM system.local')[0]  
#    Row(rpc_address='127.0.0.1', data_center=u'dc1')

session.execute('SELECT rpc_address, data_center FROM system.local')[0]
#    Row(rpc_address='127.0.0.2', data_center=u'dc1')

session.execute('SELECT rpc_address, data_center FROM system.local')[0]
#    Row(rpc_address='127.0.0.1', data_center=u'dc1')

session.execute('SELECT rpc_address, data_center FROM system.local')[0]
#    Row(rpc_address='127.0.0.2', data_center=u'dc1')

# other profile is pinned to 'dc2'
session.execute('SELECT rpc_address, data_center FROM system.local', execution_profile='other-dc')[0]
#    ('127.0.0.3', u'dc2')

session.execute('SELECT rpc_address, data_center FROM system.local', execution_profile='other-dc')[0]
#    ('127.0.0.3', u'dc2')

session.execute('SELECT rpc_address, data_center FROM system.local', execution_profile='other-dc')[0]
#    ('127.0.0.3', u'dc2')

There are plenty of details and more ways to use this in the overview document.

We are eager to hear any feedback from community members who elect to try this API. If it proves useful, the ultimate goal would be to retire the legacy execution parameters and commit to Execution Profiles.

Wrap

As always, thanks to all who provided contributions and bug reports. The continued involvement of the community is appreciated:



Leave a Reply

Your email address will not be published. Required fields are marked *