Tyler Hobbs

<p>One of the several&nbsp;<a href="https://www.datastax.com/dev/blog/bootstrapping-performance-improvements-for-leveled-compaction">improvements in Cassandra 2.2</a>&nbsp;is the extension of CQL to make it easier to work with JSON documents. The&nbsp;<tt>SELECT</tt>&nbsp;and&nbsp;<tt>INSERT</tt>&nbsp;statements now include a JSON-focused variant, and two new native functions have been added to convert to and from JSON.</p>

<h2>JSON != Schemaless</h2>

<p>When designing this feature, we wanted to ensure that users would continue to work with data in a type-safe,&nbsp;<a href="http://rustyrazorblade.com/2014/07/the-myth-of-schema-less/">schema-enforced way</a>. For that reason, working through JSON does not allow you to bypass Cassandra's schema. All data is still validated against the same types, and the schema must be manually defined up-front with a normal&nbsp;<tt>CREATE TABLE</tt>&nbsp;statement.</p>

<h2>INSERT JSON</h2>

<p>The&nbsp;<tt>INSERT</tt>&nbsp;statement now accepts a JSON variant. Suppose we have a table defined like this:</p>

<table border="0" cellpadding="0" cellspacing="0">
	<tbody>
		<tr>
			<td>
			<p><code>CREATE</code> <code>TABLE</code> <code>users (</code></p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;</code><code>id text </code><code>PRIMARY</code> <code>KEY</code><code>,</code></p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;</code><code>age </code><code>int</code><code>,</code></p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;</code><code>state text</code></p>

			<p><code>);</code></p>
			</td>
		</tr>
	</tbody>
</table>

<p>Normally we would insert a row like this:</p>

<table border="0" cellpadding="0" cellspacing="0">
	<tbody>
		<tr>
			<td>
			<p><code>INSERT</code> <code>INTO</code> <code>users (id, age, state) </code><code>VALUES</code> <code>(</code><code>'user123'</code><code>, 42, </code><code>'TX'</code><code>);</code></p>
			</td>
		</tr>
	</tbody>
</table>

<p>The JSON version looks like this:</p>

<table border="0" cellpadding="0" cellspacing="0">
	<tbody>
		<tr>
			<td>
			<p><code>INSERT</code> <code>INTO</code> <code>users JSON </code><code>'{"id": "user123", "age": 42, "state": "TX"}'</code><code>;</code></p>
			</td>
		</tr>
	</tbody>
</table>

<p>The JSON-encoded map is simply a CQL string literal that is a JSON encoding of a map where keys are column names and values are column values. This means that drivers don't need to do anything special to support&nbsp;<tt>INSERT JSON</tt>. For example, with the&nbsp;<a href="https://github.com/datastax/python-driver">python driver</a>, you could prepare and execute the statement like so:</p>

<table border="0" cellpadding="0" cellspacing="0">
	<tbody>
		<tr>
			<td>
			<p><code>insert_statment </code><code>=</code> <code>session.prepare(</code><code>'INSERT INTO users JSON ?'</code><code>)</code></p>

			<p><code>json_values </code><code>=</code> <code>'{"id": "user123", "age": 42, "state": "TX"}'</code></p>

			<p><code>session.execute(insert_statement, [json_values])</code></p>
			</td>
		</tr>
	</tbody>
</table>

<p>This should work nicely with existing JSON libraries, making it easy to load documents:</p>

<table border="0" cellpadding="0" cellspacing="0">
	<tbody>
		<tr>
			<td>
			<p><code>import</code> <code>json</code></p>

			<p>&nbsp;</p>

			<p><code>prepared </code><code>=</code> <code>session.prepare(</code><code>'INSERT INTO users JSON ?'</code><code>)</code></p>

			<p>&nbsp;</p>

			<p><code>while</code> <code>True</code><code>:</code></p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;</code><code>user </code><code>=</code> <code>{</code><code>'id'</code><code>: get_username(user_input),</code></p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code>'age'</code><code>: get_age(user_input),</code></p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</code><code>'state'</code><code>: get_state(user_input) </code><code>or</code> <code>'TX'</code><code>}</code></p>

			<p>&nbsp;</p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;</code><code>session.execute(prepared, [json.dumps(user)])</code></p>
			</td>
		</tr>
	</tbody>
</table>

<h3>Type Interpretation</h3>

<p>When Cassandra types have a sensible native JSON equivalent, such as ints, floats, booleans, and lists, those native types are accepted. For Cassandra types that don't have a clear JSON equivalent, such as UUIDs, a string representation matching the normal CQL literal format should be used.</p>

<p>For example, in CQL you can represent a UUID with a literal like this:</p>

<table border="0" cellpadding="0" cellspacing="0">
	<tbody>
		<tr>
			<td>
			<p><code>INSERT</code> <code>INTO</code> <code>uuid_map (id, theuuid) </code><code>VALUES</code> <code>(10, 994FF312-111E-11E5-9FDE-E0B9A54A6D93);</code></p>
			</td>
		</tr>
	</tbody>
</table>

<p>When using&nbsp;<tt>INSERT JSON</tt>, you should use a string with the same format to represent the UUID:</p>

<table border="0" cellpadding="0" cellspacing="0">
	<tbody>
		<tr>
			<td>
			<p><code>INSERT</code> <code>INTO</code> <code>uuid_map JSON </code><code>'{"id": 10, "theuuid": "994FF312-111E-11E5-9FDE-E0B9A54A6D93"}'</code><code>;</code></p>
			</td>
		</tr>
	</tbody>
</table>

<p>Lists, sets, and tuples can all be represented by JSON lists:</p>

<table border="0" cellpadding="0" cellspacing="0">
	<tbody>
		<tr>
			<td>
			<p><code>CREATE</code> <code>TABLE</code> <code>example (</code></p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;</code><code>id </code><code>int</code> <code>PRIMARY</code> <code>KEY</code><code>,</code></p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;</code><code>tupleval tuple&lt;</code><code>int</code><code>, text&gt;,</code></p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;</code><code>numbers </code><code>set</code><code>&lt;</code><code>int</code><code>&gt;,</code></p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;</code><code>words list&lt;text&gt;</code></p>

			<p><code>)</code></p>

			<p>&nbsp;</p>

			<p><code>INSERT</code> <code>INTO</code> <code>example JSON </code><code>'{"id": 0, "tupleval": [1, "abc"], "numbers": [1, 2, 3], "letters": ["a", "b", "c"]}'</code><code>;</code></p>
			</td>
		</tr>
	</tbody>
</table>

<p>User-defined types are represented by JSON maps:</p>

<table border="0" cellpadding="0" cellspacing="0">
	<tbody>
		<tr>
			<td>
			<p><code>CREATE</code> <code>TYPE address (number </code><code>int</code><code>, street text);</code></p>

			<p>&nbsp;</p>

			<p><code>CREATE</code> <code>TABLE</code> <code>users (</code></p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;</code><code>id </code><code>int</code> <code>PRIMARY</code> <code>KEY</code><code>,</code></p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;</code><code>street_address frozen&lt;address&gt;</code></p>

			<p><code>)</code></p>

			<p>&nbsp;</p>

			<p><code>INSERT</code> <code>INTO</code> <code>users JSON </code><code>'{"id": 0, "street_address": {"number": 123, "street": "Main St."}}'</code><code>;</code></p>
			</td>
		</tr>
	</tbody>
</table>

<p>Please refer to the CQL documentation for full details on&nbsp;<a href="http://cassandra.apache.org/doc/cql3/CQL-2.2.html#insertJson">the accepted JSON formats for each Cassandra type</a>.</p>

<h3>Omitted Columns</h3>

<p>Columns which are omitted from the JSON value map are treated as a&nbsp;<tt>null</tt>&nbsp;insert (which results in an existing value being deleted, if one is present).</p>

<h3>Non-text Map Keys</h3>

<p>The JSON specification does not allow for non-text map keys. However, Cassandra's map type does support non-text keys. In order to support non-text keys, Cassandra will accept JSON-encoded string representations of any type as a map key <sup>[1]</sup> .</p>

<p>For example, suppose we have a table like this:</p>

<table border="0" cellpadding="0" cellspacing="0">
	<tbody>
		<tr>
			<td>
			<p><code>CREATE</code> <code>TABLE</code> <code>comments (</code></p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;</code><code>id </code><code>int</code> <code>PRIMARY</code> <code>KEY</code><code>,</code></p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;</code><code>int_map map&lt;</code><code>int</code><code>, text&gt;</code></p>

			<p><code>)</code></p>
			</td>
		</tr>
	</tbody>
</table>

<p>The map keys are ints, so we need to JSON encode them:</p>

<table border="0" cellpadding="0" cellspacing="0">
	<tbody>
		<tr>
			<td>
			<p><code>INSERT</code> <code>INTO</code> <code>comments JSON </code><code>'{"id": 10, "int_map": {"1": "foo", "2": "bar"}}'</code><code>;</code></p>
			</td>
		</tr>
	</tbody>
</table>

<p>We do the same thing for more complex key types, such as&nbsp;<tt>set&lt;text&gt;</tt>:</p>

<table border="0" cellpadding="0" cellspacing="0">
	<tbody>
		<tr>
			<td>
			<p><code>CREATE</code> <code>TABLE</code> <code>tags (</code></p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;</code><code>id </code><code>int</code> <code>PRIMARY</code> <code>KEY</code><code>,</code></p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;</code><code>tags map&lt;frozen&lt;</code><code>set</code><code>&lt;text&gt;&gt;, text&gt;</code></p>

			<p><code>)</code></p>
			</td>
		</tr>
	</tbody>
</table>

<table border="0" cellpadding="0" cellspacing="0">
	<tbody>
		<tr>
			<td>
			<p><code>INSERT</code> <code>INTO</code> <code>tags JSON </code><code>'{"id": 10, "tags": {"[\"tag1\", \"tag2\"]": "details"}}'</code><code>;</code></p>
			</td>
		</tr>
	</tbody>
</table>

<p>Note that the map key is a JSON encoding of the list, not a normal CQL string literal, so double-quotes are used to surround the text items (and need to be escaped).</p>

<h3>Case-sensitive Column Names</h3>

<p>The&nbsp;<tt>INSERT JSON</tt>&nbsp;value map uses column names for the top-level keys. As with normal CQL, these column names are case-insensitive. So, for example, if you have a table like this:</p>

<table border="0" cellpadding="0" cellspacing="0">
	<tbody>
		<tr>
			<td>
			<p><code>CREATE</code> <code>TABLE</code> <code>users (</code></p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;</code><code>id text </code><code>PRIMARY</code> <code>KEY</code><code>,</code></p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;</code><code>age </code><code>int</code><code>,</code></p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;</code><code>state text</code></p>

			<p><code>);</code></p>
			</td>
		</tr>
	</tbody>
</table>

<p>The following&nbsp;<tt>INSERT</tt>&nbsp;would work just fine:</p>

<table border="0" cellpadding="0" cellspacing="0">
	<tbody>
		<tr>
			<td>
			<p><code>INSERT</code> <code>INTO</code> <code>users JSON </code><code>'{"ID": "user123", "Age": 42, "StAtE": "TX"}'</code><code>;</code></p>
			</td>
		</tr>
	</tbody>
</table>

<p>On the other hand, if your table is declared with case-sensitive column names, you will need to use slightly special column names in your JSON value map. Suppose our table is instead defined like this:</p>

<table border="0" cellpadding="0" cellspacing="0">
	<tbody>
		<tr>
			<td>
			<p><code>CREATE</code> <code>TABLE</code> <code>users (</code></p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;</code><code>id text </code><code>PRIMARY</code> <code>KEY</code><code>,</code></p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;</code><code>"Age"</code> <code>int</code><code>,</code></p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;</code><code>"State"</code> <code>text</code></p>

			<p><code>);</code></p>
			</td>
		</tr>
	</tbody>
</table>

<p>The&nbsp;<tt>"Age"</tt>&nbsp;and&nbsp;<tt>"Stage"</tt>&nbsp;columns are case-sensitive. In the JSON value map, you must match the capitalization and add an extra set of double-quotes to the column names:</p>

<table border="0" cellpadding="0" cellspacing="0">
	<tbody>
		<tr>
			<td>
			<p><code>INSERT</code> <code>INTO</code> <code>users JSON </code><code>'{"id": "user123", "\"Age\"": 42, "\"State\"": "TX"}'</code><code>;</code></p>
			</td>
		</tr>
	</tbody>
</table>

<h2>SELECT JSON</h2>

<p>The&nbsp;<tt>SELECT</tt>&nbsp;statement has also be extended to support retrieval of rows in a JSON-encoded map format. The results for&nbsp;<tt>SELECT JSON</tt>&nbsp;will only include a single column named&nbsp;<tt>[json]</tt>. This column will contain the same JSON-encoded map representation of a row that is used for&nbsp;<tt>INSERT JSON</tt>. For example, if we have a table like the following:</p>

<table border="0" cellpadding="0" cellspacing="0">
	<tbody>
		<tr>
			<td>
			<p><code>CREATE</code> <code>TABLE</code> <code>users (</code></p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;</code><code>id text </code><code>PRIMARY</code> <code>KEY</code><code>,</code></p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;</code><code>age </code><code>int</code><code>,</code></p>

			<p><code>&nbsp;&nbsp;&nbsp;&nbsp;</code><code>state text</code></p>

			<p><code>);</code></p>
			</td>
		</tr>
	</tbody>
</table>

<p>And we execute the following query:</p>

<table border="0" cellpadding="0" cellspacing="0">
	<tbody>
		<tr>
			<td>
			<p><code>SELECT</code> <code>JSON * </code><code>FROM</code> <code>users;</code></p>
			</td>
		</tr>
	</tbody>
</table>

<p>The results will look like this in cqlsh:</p>

<table border="0" cellpadding="0" cellspacing="0">
	<tbody>
		<tr>
			<td>
			<p><code>[json]</code></p>

			<p><code>-------------------------------------------</code></p>

			<p><code>{"id": "user123", "age": 42, "state": "TX"}</code></p>
			</td>
		</tr>
	</tbody>
</table>

<p>It's also fine to use any normal selection clause. The map keys will match what the result column names would be for an equivalent non-JSON SELECT statement. For example:</p>

<table border="0" cellpadding="0" cellspacing="0">
	<tbody>
		<tr>
			<td>
			<p><code>SELECT</code> <code>JSON id, writetime(age), ttl(state) </code><code>as</code> <code>ttl </code><code>FROM</code> <code>users;</code></p>
			</td>
		</tr>
	</tbody>
</table>

<p>Will return:</p>

<table border="0" cellpadding="0" cellspacing="0">
	<tbody>
		<tr>
			<td>
			<p><code>[json]</code></p>

			<p><code>------------------------------------------------------------------</code></p>

			<p><code>{"id": "user123", "writetime(age)": 1434135381782986, "ttl": null}</code></p>
			</td>
		</tr>
	</tbody>
</table>

<p>The results of SELECT JSON are designed to be usable in an INSERT JSON statement without any modifications, so all of the same rules about non-text map keys and case-sensitive column names apply.</p>

<h2><tt>fromJson()</tt>&nbsp;and&nbsp;<tt>toJson()</tt></h2>

<p><tt>INSERT JSON</tt>&nbsp;and&nbsp;<tt>SELECT JSON</tt>&nbsp;are designed to work with entire rows. When you only need to use JSON for a single column, the new&nbsp;<tt>toJson()</tt>&nbsp;and&nbsp;<tt>fromJson()</tt>&nbsp;functions can be used. These behave the same as&nbsp;<tt>INSERT JSON</tt>&nbsp;and&nbsp;<tt>SELECT JSON</tt>, but are limited to a single value or column.</p>

<h3>fromJson()</h3>

<p>The&nbsp;<tt>fromJson()</tt>&nbsp;function converts a single JSON-encoded string to a normal Cassandra value. For example, this can be used when performing an update:</p>

<table border="0" cellpadding="0" cellspacing="0">
	<tbody>
		<tr>
			<td>
			<p><code>UPDATE</code> <code>users </code><code>SET</code> <code>age = fromJson(</code><code>'42'</code><code>) </code><code>WHERE</code> <code>id = fromJson(</code><code>'"user123"'</code><code>);</code></p>
			</td>
		</tr>
	</tbody>
</table>

<p>The only place where&nbsp;<tt>fromJson()</tt>&nbsp;cannot be used is the selection clause of SELECT statements. (This is because Cassandra can't know in advance what type the result will be.)</p>

<h3>toJson()</h3>

<p>The&nbsp;<tt>toJson()</tt>&nbsp;function is the inverse of&nbsp;<tt>fromJson()</tt>. It can be used to convert any column to a JSON representation. For example:</p>

<table border="0" cellpadding="0" cellspacing="0">
	<tbody>
		<tr>
			<td>
			<p><code>SELECT</code> <code>id, toJson(tags) </code><code>as</code> <code>tags </code><code>FROM</code> <code>tags;</code></p>
			</td>
		</tr>
	</tbody>
</table>

<p>will return:</p>

<table border="0" cellpadding="0" cellspacing="0">
	<tbody>
		<tr>
			<td>
			<p><code>id | tags</code></p>

			<p><code>---+------------------------------------</code></p>

			<p><code>10 | {"[\"tag1\", \"tag2\"]": "details"}</code></p>
			</td>
		</tr>
	</tbody>
</table>

<p>The&nbsp;<tt>toJson()</tt>&nbsp;function can only be used in the selection clause of SELECT statements.</p>

<h2>Summary</h2>

<p>Cassandra 2.2 makes it easier to work with JSON documents without sacrificing the benefits of schema enforcement.&nbsp;<a href="http://cassandra.apache.org/download/">Try it out</a>&nbsp;and let us know what you think!</p>

<h4 id="footnote">Footnotes</h4>

<p>[1]: For the sake of consistency, it will accept string representations of types anywhere, not just in map keys. However, for clarity and performance reasons I don't suggest using this unless you need to.</p>


What’s New in Cassandra 2.2: JSON Support

Tyler Hobbs

Discover more

Share

Share

JSON != Schemaless

INSERT JSON

Type Interpretation

Omitted Columns

Non-text Map Keys

Case-sensitive Column Names

SELECT JSON

fromJson() and toJson()

fromJson()

toJson()

Summary

More Technology

Knowledge Graphs for RAG without a GraphDB

How Winweb Built its AI Assistant with DataStax Astra DB and LangChain

Vercel + Astra DB: Get Data into Your GenAI Apps Fast

Simplifying Agent Development with Astra DB Connector for Vertex AI Search

One-stop Data API for Production GenAI