Technology•August 28, 2018

Work with DSE Graph in Your Web Applications Part (1/2) : CRUD Operations

Cedrick LunvenSoftware Engineer

Graph Databases are really effective when it comes to working with highly connected data and getting value based on relationships, as we detailed in this previous blogpost. This article focuses on integrating graph databases with web applications to implement CRUD operations, pattern detection, and visualization in the user interface. Part 1 is dedicated to environment setup and CRUD operations, and Part 2 will dig into the user interface. Let's get our hands dirty.

Getting DataStax Enterprise Running

Start DSE Docker image

There are multiple ways to install DataStax Enterprise, for example, from tarball, installer, or the OpsCenter Lifecycle Manager. There are also the very convenient Docker images available on Docker Hub which allow us to do some quick tests without having to install anything; this is the approach we chose here. Let's run the image datastax/dse-server(defaulting to the latest version) and provide the option "-g" to enable the Graph workload we will work with today. You can notice several additional options, -s enables the search workload, and the environment variable DS_LICENSE is required to accept licence terms.

docker run -e "DS_LICENSE=accept" -it -d -p 9042:9042 --name dse datastax/dse-server -s -g

Start DataStax Studio Docker image

In this blog post we will illustrate samples with screenshots from DataStax Studio. This user interface will help us creating the schema and importing our first data without coding or using the command line, which is again pretty convenient. As for DataStax Enterprise, you can download and install the tool or you can simply run the datastax/dse-studio Docker image. Note that we link this Studio container with the DSE container by providing a name with --name for DSE and using the --link option on the Studio container.

docker run -e "DS_LICENSE=accept" -it -d -p 9091:9091 --link dse:dse datastax/dse-studio

Please note that if you don't want to lose your data or notebooks, I recommend defining external volumes on each container. You can use a docker-compose.yaml file as shared by Jeff Carpenter in his blogpost on Medium.com that I reproduce here :

version: '2' services: # DataStax Enterprise dse: image: datastax/dse-server:6.0.2 command: [ -s -g ] ports: - "9042:9042" # cassandra environment: DS_LICENSE: accept volumes: - "./data:/var/lib/cassandra" # Allow DSE to lock memory with mlock cap_add: - IPC_LOCK ulimits: memlock: -1

# One instance of DataStax Studio studio: image: datastax/dse-studio:6.0.0 ports: # The Web UI exposed to our host - "9091:9091" depends_on: - dse environment: DS_LICENSE: accept volumes: - "./notebooks:/var/lib/datastax-studio"

To start the related containers please do :

docker-compose up -d

Once the containers are started, you should be able to access Studio at the URL http://localhost:9091. You can setup a connection to point to dse which is the dse container hostname. If you run into trouble, check the official documentation to help you to create step by step a connection and your first notebook.

Create the Graph

As for most content you will find on DataStax Academy, we will leverage on Killrvideo reference application and its recommendation engine. Users can upload, rate and tag videos. We use the graph schema detailed on the following picture :

To create this schema first create a notebook, then create a Gremlim code block and execute the following:

// Create Graph system.graph("killrvideo_video_recommendations") .replication("{'class' : 'SimpleStrategy', 'replication_factor': '1' }") .ifNotExists().create();

// Create property keys schema.propertyKey("tag").Text().ifNotExists().create(); schema.propertyKey("tagged_date").Timestamp().ifNotExists().create(); schema.propertyKey("userId").Uuid().ifNotExists().create(); schema.propertyKey("email").Text().ifNotExists().create(); schema.propertyKey("added_date").Timestamp().ifNotExists().create(); schema.propertyKey("videoId").Uuid().ifNotExists().create(); schema.propertyKey("name").Text().ifNotExists().create(); schema.propertyKey("description").Text().ifNotExists().create(); schema.propertyKey("preview_image_location").Text().ifNotExists().create(); schema.propertyKey("rating").Int().ifNotExists().create();

// Create vertex labels schema.vertexLabel("user").partitionKey('userId').properties("userId", "email", "added_date").ifNotExists().create(); schema.vertexLabel("video").partitionKey('videoId').properties("videoId", "name", "description", "added_date", "preview_image_location").ifNotExists().create(); schema.vertexLabel("tag").partitionKey('name').properties("name", "tagged_date").ifNotExists().create();

// Create edge labels schema.edgeLabel("rated").multiple().properties("rating").connection("user","video").ifNotExists().create(); schema.edgeLabel("uploaded").single().properties("added_date").connection("user","video").ifNotExists().create(); schema.edgeLabel("taggedWith").single().connection("video","tag").ifNotExists().create();

// Help development and allow scans schema.config().option('graph.allow_scan').set('true') schema.config().option('graph.schema_mode').set('Development')

Import Data

Create a new gremlin code block in the notebook, and execute the following script to create sample data to populate the Graph. Here we create 2 users. The first user1 will upload a video Video1 with tags tag1Video and tag2Video.

// Create 2 Users user1 = graph.addVertex(T.label, 'user', 'userId', 'd0de3100-fc20-4079-bef5-2fb3b8ff51f3', 'email', 'user1@gmail.com', 'added_date', java.time.Instant.now()); user2 = graph.addVertex(T.label, 'user', 'userId', 'b1ff6d5f-80da-4ada-a852-165ce07e90d5', 'email', 'user2@gmail.com', 'added_date', java.time.Instant.now());

// user1 upload video1 with 2 tags tag1Video and tag2Video def insertTimeVideo1 = java.time.Instant.now(); video1 = graph.addVertex(T.label, 'video', 'videoId', '6c30089c-25d2-434d-b685-f1b6073d8e16', 'name', 'Video1', 'added_date', insertTimeVideo1); tag1video1 = graph.addVertex(T.label, 'tag', 'name', 'tag1Video', 'tagged_date', insertTimeVideo1); tag2video1 = graph.addVertex(T.label, 'tag', 'name', 'tag2Video', 'tagged_date', insertTimeVideo1); video1.addEdge('taggedWith', tag1video1); video1.addEdge('taggedWith', tag2video1); user1.addEdge('uploaded', video1);

user2.addEdge('rated', video1, 'rating', 4); 'Success'

We use a static UUID here to ease readability, but you can also generate a random UUID with def myUserId = UUID.randomUUID(); and substitute whenever needed. Now add a new Gremlin code block and enter the following :

g.V().has('video', 'videoId', '6c30089c-25d2-434d-b685-f1b6073d8e16').bothE();

Now spot the icon in the result. Click this icon to display results as a graph and you should see something like this:

Congratulations, you are now set up. You can play a bit more with the Gremlin language and see how the graph evolves as you make changes. Try the autocompletion mechanism in Studio to get some additional ideas on extending your query.

DataStax Studio is a very convenient environment to learn and browse graph data, but what about implementing a real web application? Enough with the shenanigans, let's get serious.

CRUD Operations in web applications

Configuring your application

First, you need to add dependencies to DataStax drivers in the pom.xml file of your Java application:

<dependency> <groupId>com.datastax.dse</groupId> <artefactId>dse-java-driver-graph</artefactId> <version>1.6.8</version> </dependency>

You can now create a JUnit Test class which checks that you are able to connect to Dse Graph as expected :

public class StandAloneGraphTest { DseSession dseSession; @Before public void createSession() { Builder clusterConfig = new Builder(); clusterConfig.withPort(9042); clusterConfig.addContactPoint("localhost"); GraphOptions graphOption = new GraphOptions(); graphOption.setReadTimeoutMillis(100000); graphOption.setGraphName("killrvideo_video_recommendations"); clusterConfig.withGraphOptions(graphOption); dseSession = clusterConfig.build().connect(); }

@Test public void listAvailableGraphs() { dseSession.executeGraph(new SimpleGraphStatement("system.graphs()") .setSystemQuery()) .all().stream().map(GraphNode::asString) .forEach(System.out::println); } @After public void closeSession() { // Even is cassandra session are stateless, driver let socket opened that need to be closed dseSession.getCluster().close(); } }

You can immediately see here how to use the Java driver. Convert a Gremlin query into a GraphStatement and execute using executeGraph(). From that point, let's create methods to populate the graph as we did before. Initialize the DseSession object as we just did in the previous test.

public UUID createUser(String email) { UUID userUuid = UUID.randomUUID(); dseSession.executeGraph( DseGraph.statementFromTraversal( DseGraph.traversal(dseSession).addV("user") .property("userId", userUuid.toString()) .property("email", email) .property("added_date", new Date()) ) ) ); return userUuid; }

public UUID userUploadVideo(UUID userId, String videoName, String... tagNames) { UUID videoUuid = UUID.randomUUID();

// Batch operations TraversalBatch batch = DseGraph.batch(); // Create Vertex Video batch.add(addV("video").property("videoId", videoUid) .property("name", videoName) .property("added_date", new Date())); // Create Edge 'uploaded' from User to Video batch.add(addE("uploaded") .from(DseGraph.traversal(dseSession).V().has("user", "email", userEmail)) .to(DseGraph.traversal(dseSession).V().has("video", "videoId", videoUuid))); // Create Vertices Tag and Edges from Video to Tags for (String videoTag : tagNames) { batch.add(addV("tag").property("name", videoTag).property("tagged_date", new Date())); batch.add(addE("taggedWith") .from(DseGraph.traversal(dseSession).V().has("video", "videoId", videoUuid)) .to(DseGraph.traversal(dseSession).V().has("tag", "name", videoTag))); }

// Execute statements dseSession.executeGraph(batch.asGraphStatement()); }

You might notice several points here :

First we are not writing the full query as a String but using Dse Graph Fluent API to help and use auto completion
We have several statements which create a video vertex, tag the vertex and related edges, and we put these statements in a single batch.
It is also possible to group everything as a single Gremlin traversal, here's what that looks like :

GraphTraversal<Vertex, Vertex> traversal = DseGraph.traversal(dseSession) .V().has("video", "videoId", videoUid).fold() .coalesce(__.unfold(), __.addV("video").property("videoId", videoUid) .property("name", videoName) .property("added_date", new Date())) .sideEffect(__.as("^video").coalesce(__.in("uploaded").hasLabel("user").has("email", userEmail), __.V().has("user", "email", userEmail).addE("uploaded").to("^video").inV())); for (String videoTag : videoTags) { traversal.sideEffect(__.as("^video").coalesce( __.out("taggedWith").hasLabel("tag").has("name", videoTag), __.coalesce(__.V().has("tag", "name", videoTag), __.addV("tag").property("name", videoTag)).addE("taggedWith") .from("^video").inV())); }

Execute the query g.V() in Studio again and see your graph updated.

All creation operations are upserts.

We will now use the driver to implement several additional operations to work with the User plain old java object :

public class User { private UUID uuid; private String email; private Date addedDate; // Constructors... // Getters ans Setters ... }

We want to find a user by id (if exist) :

public Optional < User > findUserById(UUID uuid) { GraphResultSet gras = dseSession.executeGraph( DseGraph.statementFromTraversal( DseGraph.traversal(dseSession).V().hasLabel("user").has("userId", uuid.toString()))); if (!gras.isExhausted()) { GraphNode record = gras.one(); com.datastax.driver.dse.graph.Vertex userVertex = record.asVertex(); String userEmail = userVertex.getProperty("email").getValue().asString(); Date userDate = Date.from(userVertex.getProperty("added_date").getValue().as(Instant.class)); return Optional.ofNullable(new User(uuid, userEmail, userDate)); } return Optional.empty(); }

Delete a user by id (if exist). It is important to notice that the edges connecting this Vertex to others will be dropped. More information is available in the documentation.

public boolean deleteUserById(UUID uuid) { if (findUserById(uuid).isPresent()) { dseSession.executeGraph( DseGraph.statementFromTraversal( DseGraph.traversal(dseSession).V().hasLabel("user").has("userId", uuid.toString()).drop())); return true; } return false; }

List video names updated by a single user :

public Set < String > findlistOfVideoUploadedByUser(UUID uuid) { GraphResultSet res = dseSession.executeGraph( DseGraph.statementFromTraversal( DseGraph.traversal(dseSession) .V().hasLabel("user").has("userId", uuid.toString()) .out("uploaded").values("name"))); if (!res.isExhausted()) { // ALL fetch everything, be sure you don't need pagination here return res.all().stream().map(GraphNode::asString).collect(Collectors.toSet()); } return new HashSet<>(); }

Please note that, as for Cassandra, selecting all records is a bad practice as it requires a table full scan, which is not what we want with big graphs and real time queries.

Conclusion and Takeaways

Starting to work with DSE Graph is super easy using Docker. DataStax provides non only the database runtime but also DataStax Studio, a powerful notebook-based user interface to work with data in Gremlin and CQL.

Working with Graph in Java applications is easy: simply import the driver and start executing Gremlin statements exactly the same way as Cassandra queries. The driver provides a fluent API to help build complex queries with autocompletion simplicity.

That's it for Part 1. In Part 2 we will leverage on the work done here and improve to visualize the very same graph in your own web application user interfaces. You can download the source code presented here from github.

Discover more

GremlinReference ApplicationDataStax EnterpriseJavaDSE GraphStudioDocker

JUMP TO SECTION

More Technology

View All

Introducing the DataStax AI Terraform Module

Technology • July 24, 2024

One-stop Data API for Production GenAI

Astra DB gives JavaScript developers a complete data API and out-of-the-box integrations that make it easier to build production RAG apps with high relevancy and low latency.

Learn More

Get Started for Free

Work with DSE Graph in Your Web Applications Part (1/2) : CRUD Operations

Cedrick LunvenSoftware Engineer

Getting DataStax Enterprise Running

Start DSE Docker image

Start DataStax Studio Docker image

Create the Graph

Import Data

CRUD Operations in web applications

Configuring your application

Conclusion and Takeaways

Discover more

Share

Share

Getting DataStax Enterprise Running

Start DSE Docker image

Start DataStax Studio Docker image

Create the Graph

Import Data

CRUD Operations in web applications

Configuring your application

Conclusion and Takeaways

More Technology

Introducing the DataStax AI Terraform Module

DataStax AI PaaS Is Now Enhanced with State-of-the-Art Retrieval Embedding with NVIDIA NeMo Retriever Integration

The Hitchhiker's Guide to Vector Embeddings

Highly Accurate Retrieval for your RAG Application with ColBERT and Astra DB

One-stop Data API for Production GenAI