Victor Coustenoble

This article is a simple tutorial explaining how to connect&nbsp;<a href="http://www.tableau.com/" target="_blank">Tableau Software</a>&nbsp;to&nbsp;<a href="http://cassandra.apache.org/" target="_blank">Apache Cassandra™</a>&nbsp;via&nbsp;<a href="https://spark.apache.org/" target="_blank">Apache Spark™</a>.&nbsp;

Note:&nbsp;This blog post was written targeting DSE 4.8 which included Apache Spark™ 1.4.1. Please refer to the&nbsp;<a href="http://docs.datastax.com/en/" title="DataStax Documentation">DataStax documentation</a>&nbsp;for your specific version of DSE if different.

<img alt="logos" data-entity-type="file" data-entity-uuid="a5b22176-a929-483d-b88f-b00d0f29fc1c" src="https://www.datastax.com/sites/default/files/inline-images/logos-250x144.png" /> 
This tutorial explains how to create a simple Tableau Software dashboard based on Cassandra data. The tutorial uses the Spark ODBC driver to integrate Cassandra and Apache Spark. Data and step-by-step instructions for&nbsp;installation and setup of the demo are provided.

1/ Apache Cassandra and DataStax Enterprise

First you need to install a Cassandra cluster and an Apache Spark™ cluster connected with the&nbsp;<a href="https://github.com/datastax/spark-cassandra-connector" target="_blank">DataStax Spark Cassandra connector</a>. A very simple way to do that is to use&nbsp;<a href="https://www.datastax.com/products/datastax-enterprise" target="_blank">DataStax Enterprise (DSE)</a>, it’s free for development or test and it contains Apache Cassandra and Apache Spark already linked.

You can download DataStax Enterprise from&nbsp;<a href="https://downloads.datastax.com/" target="_blank">https://academy.datastax.com/downloads</a>&nbsp;and find installation instructions here&nbsp;<a href="http://docs.datastax.com/en/getting_started/doc/getting_started/installDSE.html" target="_blank">http://docs.datastax.com/en/getting_started/doc/getting_started/installDSE.html</a>.

After the installation is complete, start your DSE Cassandra cluster (it can be a single node) with Apache Spark™ enabled&nbsp;with the command line&nbsp;"dse cassandra -k".

2/ Spark Thrift JDBC/ODBC Server

The Spark SQL Thrift server is a JDBC/ODBC server allowing&nbsp;<a href="https://spark.apache.org/docs/1.4.1/sql-programming-guide.html#running-the-thrift-jdbcodbc-server" target="_blank">JDBC and ODBC interfaces</a>&nbsp;for client connections like Tableau to Spark (and then to Cassandra).&nbsp;See here for more details&nbsp;<a href="http://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/spark/sparkSqlThriftServer.html" target="_blank">http://docs.datastax.com/en/datastax_enterprise/4.8/datastax_enterprise/spark/sparkSqlThriftServer.html</a>.

Start the Spark Thrift JDBC/ODBC server with the command line "dse start-spark-sql-thriftserver".

You should see a new SparkSQL application running here&nbsp;<a href="http://127.0.0.1:4040/" target="_blank">http://127.0.0.1:4040/</a>&nbsp;from the Spark UI manager&nbsp;<a href="http://127.0.0.1:7080/" target="_blank">http://127.0.0.1:7080/</a>.

The IP address is the address of your Spark Master node. You may need to replace 127.0.0.1 with your instance IP address if you are not running Spark cluster or DSE locally. With DSE, you can run the command "dsetool sparkmaster" to find your Spark Master node IP.

<img alt="spark" data-entity-type="file" data-entity-uuid="4a89b435-24d3-4a53-bc24-51b6718eb433" src="https://www.datastax.com/sites/default/files/inline-images/sparkui-250x216.png" />

Note that to connect Tableau Software to Apache Cassandra we would have been able to connect directly&nbsp;<a href="https://www.datastax.com/dev/blog/datastax-odbc-cql-connector-apache-cassandra-datastax-enterprise" target="_blank">via the DataStax ODBC driver</a>. But in this case all computations, joins, aggregates are done on the client side, so it's not efficient and risky for large dataset. On the contrary, with Spark jobs everything is done on the server side and on a distributed manner.

3/ Demo Data

Create the 3 demo tables, you can find all data and the script to create CQL schemas and to load tables here :&nbsp;<a href="https://drive.google.com/drive/u/1/folders/0BwpBQmtj50DFaU5jWTJtM1pleUU" target="_blank">https://drive.google.com/drive/u/1/folders/0BwpBQmtj50DFaU5jWTJtM1pleUU</a>.

When you have downloaded everything, run the script “ScriptCQL.sh” to create schemas and load data (cqlsh must be in your path or download everything into the cqlsh directory). A keyspace named&nbsp;ks_music&nbsp;with 3 tables&nbsp;albums,&nbsp;performers,&nbsp;countries&nbsp;is created. 
<img alt="schema" data-entity-type="file" data-entity-uuid="9a94713c-0748-4d32-9a54-685eda514d03" src="https://www.datastax.com/sites/default/files/inline-images/schema1-250x265.png" />&nbsp;<img alt="devcenter" data-entity-type="file" data-entity-uuid="67523f0a-fb10-42dd-8d4a-2426b67efc28" src="https://www.datastax.com/sites/default/files/inline-images/devcenter-250x179.png" />

4/ ODBC Driver

Download and install the Databricks ODBC driver for Spark from&nbsp;<a href="https://databricks.com/spark/odbc-driver-download" target="_blank">https://databricks.com/spark/odbc-driver-download</a>&nbsp;or from&nbsp;<a href="https://academy.datastax.com/download-drivers" target="_blank">https://academy.datastax.com/download-drivers</a>.

No specific parameter is need, the default installation is ok. The Mac version can be found only on the Databricks Web site.

5/ Tableau Software

Open Tableau and connect to the Apache Spark server with following settings from the Connect panel:

<img alt="1" data-entity-type="file" data-entity-uuid="6d320c7a-c99c-423d-accf-48d737d18926" src="https://www.datastax.com/sites/default/files/inline-images/1-250x193_0.png" />

The server IP is the ip address of sparksql thriftserver which may also change depending of your installation. You may also change authentication settings depending of your configuration.

6/ Cassandra Connection

Then you should be able to see all Apache Cassandra keyspaces (named Schema in Tableau interface) and tables (click enter in Schema and Table inputs to see all available Cassandra keyspaces and tables).

Drag and drop&nbsp;albums&nbsp;and&nbsp;performers&nbsp;tables from the&nbsp;ks_music&nbsp;keyspace.

Change the inner join clause with right columns from the 2 tables,&nbsp;Performer&nbsp;from albums table and&nbsp;Name&nbsp;from performers table (click on the blue part of the link between the 2 tables to be able to edit this inner join).

<img alt="2" data-entity-type="file" data-entity-uuid="bf8b9ccf-6da8-44fe-b74e-6e2551e28dec" src="https://www.datastax.com/sites/default/files/inline-images/2-250x107.png" /> 
Keep a “Live” connection ! Don’t use “Extract” because otherwise all your data will be loaded into Tableau.

<img alt="3" data-entity-type="file" data-entity-uuid="08126fd1-e631-4498-8f89-34a3315bf9de" src="https://www.datastax.com/sites/default/files/inline-images/3-250x162.png" />

“Update Now” to see a sample of data returned.

7/ Tableau Dashboard

Go to the Tableau worksheet “Sheet 1” and start a simple dashboard.

Convert&nbsp;Year&nbsp;column (from albums table) to&nbsp;Discrete&nbsp;type (click at the right of the Year column to do that from a menu).

<img alt="3.5" data-entity-type="file" data-entity-uuid="5a89a856-ffc0-45e4-9670-59d47845d4a4" src="https://www.datastax.com/sites/default/files/inline-images/3.5-250x205.png" /> 
Add Year (from albums table) as Rows, Gender (from performers table) as Columns and Number of Records as the measure.

<img alt="4" data-entity-type="file" data-entity-uuid="97bdad8b-0878-4495-a773-0df8670eca8e" src="https://www.datastax.com/sites/default/files/inline-images/4-250x148_0.png" /> 
And with the “Show Me” option, convert your table into a stacked bars chart.

<img alt="5" data-entity-type="file" data-entity-uuid="f92a925e-eb2f-441e-bffd-0146a3ac020c" src="https://www.datastax.com/sites/default/files/inline-images/5-250x161.png" /> 
Done, you have created your first tableau dashboard on live Cassandra data !

8/ SparkSQL and SQL Queries

Finally you can check SQL queries generated on the fly and pass to SparkSQL from the Spark UI&nbsp;<a href="http://127.0.0.1:4040/sql/" target="_blank">http://127.0.0.1:4040/sql/</a>&nbsp;(SQL tab of the SparkSQL UI).

This shows SparkSQL processes and all SQL queries generated by Tableau Software and executed on top of Apache Cassandra data through the Spark Cassandra connector.

<img alt="6" data-entity-type="file" data-entity-uuid="37257cdb-82a4-4788-a17a-f5db2f3b9a29" src="https://www.datastax.com/sites/default/files/inline-images/6-250x151_0.png" />

Additional links

<ul>
	<li>Related article on Tableau and SparkSQL&nbsp;<a href="http://www.tableau.com/fr-fr/about/blog/2014/10/tableau-spark-sql-big-data-just-got-even-more-supercharged-33799" target="_blank">http://www.tableau.com/fr-fr/about/blog/2014/10/tableau-spark-sql-big-data-just-got-even-more-supercharged-33799</a></li>
	<li>Apache Spark Drivers for ODBC and JDBC with SQL Connector&nbsp;<a href="http://www.simba.com/connectors/apache-spark-driver" target="_blank">http://www.simba.com/connectors/apache-spark-driver</a></li>
	<li>DataStax Spark Cassandra Connector&nbsp;<a href="https://github.com/datastax/spark-cassandra-connector" target="_blank">https://github.com/datastax/spark-cassandra-connector</a></li>
</ul>

Tableau + Spark + Cassandra

Victor Coustenoble

Discover more

Share

Share

More Technology

Knowledge Graphs for RAG without a GraphDB

How Winweb Built its AI Assistant with DataStax Astra DB and LangChain

Vercel + Astra DB: Get Data into Your GenAI Apps Fast

Simplifying Agent Development with Astra DB Connector for Vertex AI Search

One-stop Data API for Production GenAI