DataStax Developer Blog

Python Driver 2.1 Released

By Tyler Hobbs -  August 7, 2014 | 0 Comments

We are happy to release version 2.1 of the DataStax Python driver for Apache Cassandra. This release brings support for Cassandra 2.1, while remaining compatible with 1.2 and 2.0. In addition to several new features, there are many bug fixes that improve the stability and performance of the driver. For a full list of changes, please see the changelog.

We’ll briefly cover some of the new features here.

User Defined Types

Cassandra 2.1 introduced User Defined Types (UDTs), which are named groups of related properties:

CREATE TYPE address (
    street text,
    city text,
    zip int
);

CREATE TABLE user_profiles (
    email text PRIMARY KEY,
    address address
);

From the driver’s perspective, UDT values can be retrieved like any other type. By default, they are returned as namedtuple instances:

row = session.execute("SELECT * FROM user_profiles")[0]
address = row.address

street = address.street  # by field name
zip = address[2]  # by index

You can also map a specific class to a particular UDT. Query results will return instances of that class instead of namedtuples, and instances of that class may be used for inserting data:

# create a class with 'street', 'city', and 'zip' attributes
class Address(object):

    def __init__(self, street, city, zip):
        self.street = street
        self.city = city
        self.zip = zip

cluster = Cluster()
session = cluster.connect("mykeyspace")

# register the Address class for "address" UDTs in keyspace "mykeyspace"
cluster.register_user_type("mykeyspace", "address", Address)

# insert an Address instance
address = Address("123 Main St.", "Austin", 78723)
session.execute(
    "INSERT INTO user_profiles (email, address) VALUES (%s, %s))",
    ("joe@example.com", address))

# queries return Address instances
row = session.execute("SELECT * FROM user_profiles LIMIT 1")[0]
address = row.address  # an Address instance
street = address.street

Although you aren’t required to register a class for UDTs when working with prepared statements, you are required to register a class when inserting UDTs through non-prepared statements.

Tuples and Customizable CQL Literal Encoding

Also new in Cassandra 2.1 is the tuple type.

CREATE TABLE points_of_interest (
    id int PRIMARY KEY,
    name text,
    coordinates tuple<float,float>
);

As expected, you’ll get a tuple back in queries:

row = session.execute("SELECT * FROM points_of_interest LIMIT 1")[0]
latitude, longitidue = row.coordinates

When working with prepared statements, you can insert tuples directly:

insert_statement = session.prepare("INSERT INTO points_of_interest (id, name, coordinates) VALUES (?, ?, ?)")
coordinates = (1.123, 3.345)
session.execute(insert_statement, (0, "City Center", coordinates))

For backwards-compatibility reasons, tuples are encoded as list collection literals (e.g. [1.123, 3.345]) by default when working with non-prepared statements. To change this, you will need to take advantage of a new feature in version 2.1 of the driver: customizable CQL literal encoding.

Customizable CQL Literal Encoding

When using non-prepared statements, the driver must convert native python types into CQL literal strings. To accomplish this, each Session holds a map of python types to encoder functions. You can now customize this mapping with new types or change the way an existing type is encoded. For example, to encode tuple objects as CQL tuple literals, we can do the following:

session.encoder.mapping[tuple] = session.encoder.cql_encode_tuple

This assignment means that when a tuple is passed as a query parameter, the Encoder.cql_encode_tuple() method will be used to convert it to a CQL literal, like (1.123, 3.345).

When updating the mapping, you can use any type or class that you want as a key. Generally, you’ll want to use one of the Encoder methods (such as cql_encode_tuple()) as values in this map.

Fewer Connections and Client-Side Timestamps

When using protocol version 3, the driver will only open a single connection to each host. This reduces resource consumption and improves throughput.

Additionally, when using the v3 protocol, the driver will supply protocol-level client-side timestamps by default. This timestamp can be overridden by explicitly specifying a TIMESTAMP in the CQL query itself.

Upgrading to 2.1

Version 2.1 of the python driver is available on pypi and is tagged on GitHub.

This version is backwards compatible with 2.0 versions of the driver, and supports Cassandra 1.2 through 2.1. For more details on upgrading, see the upgrade guide. If you discover any issues, please report them on the JIRA bug tracker.

Thanks, and enjoy!



Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>