Toggle Menu

Discover the Relationships Between Your Data

Optimized for storing billions of items and their relationships, DSE Graph incorporates all the enterprise-class capabilities of DataStax Enterprise, including continuous availability, linear scalability, advanced security, analytics and full-text search, visual management, visual monitoring and a tools suite for developers.

In short, DSE Graph enables you to identify and analyze hidden relationships between connected data and build powerful modern applications.


Graph Powered Insights

Fully integrated real-time search and analytics that allow you to derive powerful insights from graph data.


Graph Your Way

True multi-model platform—real-time Gremlin query language or CQL and offline Apache Spark allow you to create graph applications easily.


Graph Available Always

Proven distributed graph platform provides continuous availability and unmatched scalability spanning multiple data centers.

Delloit Logo

"MissionGraph™ is a powerful and innovative open-architecture platform for making sense of messy data landscapes and finding unknown connections among an enterprise’s entities. Key to our architecture are DataStax Enterprise and DataStax Enterprise Graph which enable us to fuse two of the most powerful technologies for handling the disparate forms of information we analyze. With them, we deliver contextual, distributed, always-on data management and cutting-edge analysis at scale for our clients’ most critical business questions.”

Adam Judelson

Vice President of Product, Marketing and Sales, Deloitte

Graph Developers

DataStax has many offerings for the graph developer community.

We have programs, tools, and initiatives you can explore including our online education platform DataStax Academy, our work with the Apache Software Foundation and the Apache TinkerPop™ project, graph developer tools like DataStax Studio, and DataStax Developer community initiatives you can subscribe to and check out: bootcamps, meetups, podcasts, and more.

The Gremlin Graph Traversal Machine and Language

TinkerPop is an open-source graph computing framework for graph databases and graph analytic systems. It is centered around the Gremlin graph query language, which provides users the ability to express complex graph traversals over their property graphs for both real-time (OLTP) and analytic (OLAP) workloads. Gremlin has support for a variety of programming languages, including Java, Groovy, .NET, JavaScript, and Python, giving users the comfort of working with graphs in their native programming language.

The Gremlin Graph Traversal Machine and Language

The Practitioner’s Guide to Graph Data

Authors Denise Koessler Gosnell and Matthias Broecheler show data engineers, data scientists, and data analysts how to solve complex problems with graph databases. You’ll learn to:

  • Build an example application architecture with relational and graph technologies
  • Use graph technology to build a Customer 360 application, the most popular graph data pattern today
  • Dive into hierarchical data and troubleshoot a new paradigm that comes from working with graph data
  • Find paths in graph data and learn why your trust in different paths motivates and informs your preferences
  • Use collaborative filtering to design a Netflix-inspired recommendation system
The Practitioner's Guide to Graph Data Book Cover


Technical Guide
DataStax Accelerate Guide for Graph Data Designers and Developers

In our increasingly data-driven world, organizations have more data to manage than ever before. More and more companies are moving to graph databases in order to make sense of the many-to-many relationships of their data. Graph technology enables you to make better and more efficient decisions in real time from the connectedness of your data to create and deliver more intelligent, richer experiences through your modern applications. As the world’s premier Apache Cassandra™ conference, DataStax Accelerate is chock-full of engaging talks and sessions graph database enthusiasts like yourself won’t want to miss.

Get the Technical Guide
Three Common Mistakes When Building Graph Applications

At DataStax, we help customers build some of the largest production applications on graph databases in the world. From these experiences, we’ve collected a set of three common pitfalls where teams frequently misstep when getting started with graph technology. These themes happen to parallel one of my favorite video games of all time: SimCity 2000.* SimCity 2000 is a game created in the early 1990s that requires a significant amount of trial and error to be successful. As a new player of SimCity 2000, you are naive to the effects of your decisions, and you only learn the consequences through multiple iterations of game play. This blog will draw parallels. Following are the three most common mistakes my graph team sees with building graph applications, plus advice on how to avoid them so that you can save time, skip iterations, and get a head start on building amazing applications using graph data. 1.   Not understanding branching factor To effectively build graph applications, you need to understand what branching factor is and how it affects query runtime. The introduction of graph data into your application brings a new paradigm of data modeling known as “relationship-first design” (as opposed to “entity-first design”). The transition to relationship-first design principles introduces a new set of rules to consider when thinking about your application’s performance. If, like me, you are a fan of SimCity 2000, then this should look familiar: When you start a new game in SimCity 2000, you are introduced to the different game modes and tools available. Just like graph data modeling, the first step to being successful is to examine the options and determine what are the biggest and most important dials you can tweak. Even though there are many tricks to graph data modeling, your graph’s branching factor is the most commonly overlooked dial of your graph’s schema. A graph’s branching factor is the expected or average number of edges that are traversed when you walk from one vertex to another. Unsure what I mean by this? Consider the animation below. In this animation, the goal is to walk the graph until you find your destination. Above, we start at the left-most vertex (the one in green) until we get to the destination vertex on the far right (the one in red). For every vertex you walk (traverse) through from left to the right, you have to explore two more edges to get to the next level. Moving from one vertex to two edges creates two new paths to explore. This causes the number of traversers to double between each level. Each vertex’s branching factor causes the split from one to two traversers during your query. A graph’s branching factor creates exponential growth in the number of traversers required to walk from one vertex to another. The growth in the number of traversers directly correlates to the computational overhead required to process a graph traversal: Frequently, when we troubleshoot slow-running queries, the root cause is a graph data model that creates a higher-than-anticipated branching factor. A high branching factor is one of the main contributing factors to a poor performing Gremlin query. 2. Not planning for or monitoring the growth of supernodes Relationship-first data modeling can create a sleeping time bomb in your graph data—namely, supernodes. A supernode is a node that contains a disproportionately high number of incident edges. Just like in SimCity 2000, high volumes of progress without proper planning will eventually introduce a catastrophe. For the advanced SimCity 2000 player, these catastrophes show up in your game as disasters and monsters. These are unplanned events that decimate your metropolis and bring your city’s advancement to a grinding halt. These time bombs appear as you are traversing your graph, and a bad data model brings your traversal to a grinding halt. These are your supernodes. A supernode is any vertex in your graph that has approximately 100,000 or more edges. You will need to track, mitigate, and eliminate the potential for supernodes within your applications. To find the most likely supernodes in your graph database, you should use your analytics engine to look for the top 10 vertices with the most connections in your graph. You can do this with DataStax via: The results of your query above will let you know if you currently have a supernode in your graph. You should monitor the results of that job periodically to detect and mitigate any potential supernode problems as your graph grows over the lifetime of your application. Monitoring your graph’s health is a great way to reactively handle supernodes in your graph, but the best way to handle them is to be proactive and model your data in a way that mitigates any chance of the supernodes forming. 3. Not fitting the technology to the problem Just like learning the tools and rules for building a successful city in SimCity 2000, you will inevitably make some mistakes. Consider this example below: Naively, I was trying to connect the city in the upper right to the open land in the lower right and lower left. If you brute-force your infrastructure at this point in the game, you don’t end up building what you intended. Instead of connected land, I built broken roads and disconnected bridges. From this I learned, a bit too late, that the ground leveling tool is the place to start before hammering roads over my map. When playing SimCity 2000, it is really easy to spot your errors—they look like broken roads or bridges to nowhere. When starting to integrate graph technology into your stack, it is also inevitable that you will make mistakes. But unlike in SimCity 2000, there are no broken roads or bridges to easily locate these mistakes. I’ve found that people often try to use graph to solve non-graph problems, and typically when my team dives in to troubleshoot a customer’s graph cluster, these are the three most common red herrings that we encounter: Indexes on every property on every vertex Comparing properties to match vertices Putting a graph database behind a BI dashboard Having search indexes on every property of a vertex is an indication that the end use of the application is more about searching data than leveraging the relationships within the data. This type of usage pattern lends itself to using a dedicated search technology, such as DSE Search, which is better optimized to handle these types of questions. The first step in saving your data in a graph is figuring out what determines a unique person, place, or thing within your data. Commonly, teams do not determine this before they load their raw data. Then, in their Gremlin query, they are left with trying to decide which vertices represent the same unique item. This is detrimental to your queries due to the additional computational overhead required and resulting branching factor explosion that it creates. Instead of determining this uniqueness each time you run a query, we recommend using DSE Analytics to match and merge your data before you load it into DSE Graph. Lastly, we know that insights and metrics drive the business decisions that you make on a daily basis. If the end goal of your application is to create a BI dashboard for tracking these metrics, there are a myriad of tools and technologies out there that are well-suited for business intelligence or business analytics. It isn’t very often that a graph database is the right tool for serving up global insights into your company’s analytics. What I mean is: the data from a graph might feed into that insight, and a graph is rarely the best way to serve out the business dashboard. If you have one, please let me know. I am still looking for a good example in this area. Where to go for more graph database insights The seasoned graph experts at DataStax help our customers create graph models that mitigate this risk. If you are looking for more on supernodes, we recommend this great talk by Jonathan Lacefield, our Sr. Director of Product Management. DataStax continues to lead the charge with the most innovative enterprises in building production applications backed by distributed technologies. To accelerate innovation, DataStax formed their Graph Practice, a team of experts focused on growing, advocating, and enabling our customers and the graph industry. As the practice grows, keep watching for our posts, videos, and content on the lessons we learn from the frontlines. We want to hear from and collaborate with you on your graph problem and interesting uses of graph technology. Reach out to us through this blog, check out our contributions to DataStax Academy, or come find us when we are at an event near you. White Paper: Why Graph? READ NOW

Get the Blog