Cassandra Summit is back! Join us in San Jose, CA on March 13-14. Learn more.
StepZen CEO Anant Jhingran and DataStax Chief Product Officer Ed Anuff were colleagues at Apigee and Google. Here, they share their thoughts on how GraphQL and graph databases are connected.
Facebook’s social graph
Perhaps the best-known example of graph data is Facebook, which had everyone from teenagers to octogenarians spending a significant part of their lives dealing with the data representations of graph relationships—all without any previous exposure to graph theory. Since its humble beginning in 2004, the social network’s social graph has brought graph data into the mainstream and made a whole generation of data scientists very hot commodities.
Part of the reason that graph databases work so well for Facebook is the fact that they bridge the gap between the way people and computers view the world. Computers rely on rows and columns of data, while people navigate and reason about life through relationships.
Here’s an example: Alice has as children Bob, Cindy, and Dorothy; she’s married to Ethan; and she lives in San Jose, Calif. Graph databases facilitate the efficient storage and retrieval of this kind of data, along with its relationships.
GraphQL: Move fast, break less
There are a host of languages that allow for an expressive way of traversing the graph (fundamentally, extracting a subgraph from the larger graph): SPARQL, SQL-SPARQL, and Cypher, to name a few.
Then there’s GraphQL, which is an answer to the question: In the age of “move fast and break things,” how do you help an app developer move faster while breaking fewer things? Initially, GraphQL was developed as a query language and runtime aimed at minimizing network hops to clients, but the fact that it collects related information and returns it in one query provides a welcome help to those building apps. With GraphQL, related information can sometimes be the relationships (as it is with a graph database), but it can also be connected implicitly.
So, in a retail example, a GraphQL query can return a customer’s last order placed and the closest return location to her, without there being any explicit “closest return store” relationship between a customer and stores where she could return a purchase. The latter would be impossible to maintain—and besides, Google Maps can give that information, so why would one want to store that kind of data in a database?
So how do we describe the connection between GraphQL and Graph databases?
GraphQL is a natural query language for Graph databases where relationships are explicitly created (albeit lacking some of the semantic reasoning that relationship-based data requires). When relationships are implicit and not in a central place, GraphQL is an awesome language because …
In a Graph database, the graph is in the data. To go from Alice to her children, the Graph DB needs to follow all links that are of type “has child.” Three of these links result in Bob, Cindy, and Dorothy—Alice’s three children.
In GraphQL, the graph is in the metadata. Every data element does not have an explicit link; only the metadata does. The system is told: “to go from a person to their children, issue a query of the form: ‘select name from children where parent = $’.” Or, in the retail example, for the closest returns store, issue the query in Google maps using the person’s home address and pick the closest one. The structure is in the schema/metadata.
So should you invest in GraphQL? Definitely. It’s the right way to access related information, and access it efficiently. It comes with some caveats, as with any technology--e.g., you need to be more careful about access control, and you shouldn’t replicate your business logic in this layer.
Should you invest in a Graph database? Quite likely. If the structure of links and connections cannot be expressed just in metadata, then a GraphDB is the only way to go. It also comes with some caveats--if it’s a copy of data that sits somewhere else, then it needs to be cared for and fed, otherwise the data can become quite out of sync.
For more on these topics, check out Anant and Ed’s recent webcast replay. Learn more about StepZen, a leading provider of GraphQL, and DataStax, which offers a distributed graph database built on Cassandra and optimized for enterprise applications.
This content originally appeared on The New Stack.