Gremlin’s Time Machine
date: September 25, 2016
In the TinkerPop, time and space coexist within the graph. The graph's vertices, edges, and properties are the atomic elements of its structure. Without these elements, there are no "things" and without "things" there is no space as there is no way to measure distance. The graph defines space because a vertex may be closer or farther from another vertex depending on the number of edges that must be traversed to reach it. The longer the path separating two vertices, the greater the distance between them and thus, the larger the space. If the graph creates space, does it also create time?
The ticking of time within the TinkerPop is experienced by humans as a continuous progression of structural modifications to the graph. For instance, at some point in time, a person may be a friend to another person. However, after a heart wrenching fallout, that friendship (edge) may wither and die. What was, is no longer and what wasn't, now is. To humans, vertices (people, places, things) and their various relationships to one another (edges) come and go from their conscious experience and that procession is demarcated by "time." Humans witness the graph in which they live as a dynamic structure, forever mutating itself according to the logic of their collectively conceived physics and their subjectively constructed psyches. However, their perception of a world changing through time is simply an epiphenomena of their graph traversal. Human consciousness traverses a time (and space) constrained subgraph within the larger, universal graph of the TinkerPop. Outside the boundaries of the human's subgraph exists a time-full, changeless world, where time and space are structural features/motifs of the universal graph. It is simply the human's self-imposed constraints that creates, for them, the very real illusion of time.
The Nature of Time in the TinkerPop
Let us begin our journey by creating a
TraversalSource to an empty
TinkerGraph. Note that all of the presented code snippets build on each other. The more precocious reader may want to follow along by cutting and pasting each snippet into the Gremlin Console.
Time and Edges
There are two people named Marko and Stephen. Marko was born in 1979 and Stephen in 1975. In 2010, they met each other in the TinkerPop and quickly developed a mutual respect for each others contributions to the science and engineering of the TinkerPop. This friendship continues to this day.
- A Marko vertex is created with a
nameproperty has a
startTimeproperty as well.
- A Stephen vertex is created with similar properties to Marko.
- Marko projects a
knows(friendship) edge to Stephen with a
startTimeproperty on it.
|NOTE: All vertex properties that are not time-based will have time-based meta-properties. For instance, the
There is another person in our world named Bob. Yes, Bob. Not Balthazar, not Ragnar nor Melchizedek. Just Bob. Bob was born in 1983. Marko befriended Bob in 2012 when he saw his work on Wolf Spider (a TinkerPop visualization tool). This work ultimately led Bob on a path to developing DSE Studio.
After a few years of collaboration, Marko came to realize that Bob was rotted through to his core -- a truly evil person, shameless in his dissolutely deviant degenerate depravity; an impure man unworthy of breathing the life giving air afforded to herculeanly high-minded honorable hombres. In 2015, against Bob's pleads for a "second chance" at moral integrity, Marko announced the verdict of his deliberation to all the town's people: De-friend him!
- A Bob vertex is created with a
nameproperty also has a
- Marko has an outgoing
knowsedge to Bob with both a
endTimeproperty on it.
When Marko reflects on the friendships that he has made (in "social" space) over his life (in time), his mind traverses the
knows-edges incident to his vertex-self.
When Marko wants to think only about his current friends, he constrains his thoughts to only those
knows-edges in his current point in time. If the year is 2016, then the edges must have a
startTime less than or equal to 2016 and an
endTime greater than 2016 (or no
endTime at all -- i.e. the end has not been determined). In the 2016-based example below, Marko only experiences his friendship with Stephen, not Bob as Marko and Bob de-friended in 2015. In this way, this sense of friendship is bound to the "the moment." When such constraints are applied to conscious thought, the notion of a dynamic graph changing with time emerges. While "time" exists as a static structure endogenous to the graph, it also exists as a dynamic structure exogenous to the human.
|The human's constrained traversal(s) cuts him off from the universe as a whole and yields the very real illusions of space, time, self, other, etc. In an unconstrained traversal, there is no space as every vertex is adjacent to every other vertex. Likewise, there is no time as time properties do not impose a "street sign"-like influence, dictating what is and is not possible to experience.|
There must be an easier way to express this traversal. Alas, there is! Welcome
SubgraphStrategy. Before a traversal is executed against a graph, any number of traversal strategies can be applied to that traversal. Typically, traversal strategies optimize the traversal by rewriting it to a more efficient, semantically equivalent form. However, other types of traversal strategies exist such as
SubgraphStrategy which is a
DecorationStrategy. Decoration strategies, unlike optimization strategies, augment the traversal with application-specific logic which yields a traversal that need not be semantically equivalent to the original form. When traversing a time-encoded graph,
SubgraphStrategy allows the user to write their Gremlin traversals without having to explicitly add time constraints throughout. Instead,
SubgraphStrategy will insert all such constraints at compile time. For more information on Gremlin's compiler, please see The Benefits of the Gremlin Graph Traversal Machine.
- Create a
SubgraphStrategywhere all traversed edges must satisfy the specified constraint.
- Add the strategy to the
TraversalSourceand submit a time-agnostic traversal.
To make things a bit easier to use, let's create three helper functions.
timeWindow: A bi-function that yields a
SubgraphStrategythat filters out edges within a provided start time and end time.
timePoint: A function that yields a
SubgraphStrategywith a time window at a specified year.
currentPoint: A supplier that yields a
SubgraphStrategyfor the current point in time.
With these helpers in place, it's easy to create a traversal that is bound to a time in the graph.
- Marko was friends with both Stephen and Bob from 2012 to 2014.
- Marko was only friends with Stephen in 2015.
- Marko is currently only friends with Stephen (as of 2016).
While the graph's structure hasn't changed, the results of the "marko's friends"-traversal has. The traversal's results are determined by both the "space" (social) and "time" structures therein. It is worth emphasizing again that the graph is static. It is the time-relative traversal that yields the dynamic, evolving nature of the world experienced by humans. This is why time is a very real illusion.
|Consciousness is the Traverser Guided by a Traversal through the Graph|
The human is a traversal machine. Contrary to modern cognitive theory, the brain does not store episodic, life-history information (data). It doesn't need to, that information is already encoded in the graph. Instead the brain is composed of traversals (queries/algorithms) for accessing (experiencing) that information within the the universal graph of the TinkerPop. Classic human traversals include:
The human, like all "things," is encoded in the graph. The human's cortex is composed of yet more vertices and edges. Cortical subgraphs represent traversals as tree-like structures supporting function chaining (vertex adjacency) and function nesting (subgraph groupings). Consciousness (the traverser) exists at the union of the cortical subgraph being intersected with other subgraphs of the universal graph. These cortical subgraphs can be rewritten via psychotherapy (i.e. introspection), pharmacology (i.e entheogens), neurosurgery (i.e. prosthetics), etc. Such compile-time mutations to the traversal are
A Note on Traversal Sources and their Traversals
In Gremlin, a
TraversalSource defines a "connection" to a graph in a manner analogous to a database connection. An instance is typically named
g and is defined as
g = graph.traversal(). A traversal source can be configured using
withXXX() source steps. These
withXXX()-steps add or remove traversal strategies. Once a traversal source has been created and configured with strategies, it can be used repeatedly to spawn
Traversals -- i.e. "queries." Thus, a new
SubgraphStrategy does not need to be created each and every time a traversal is spawned. Instead, it may be created once for the life of
SubgraphStrategythat will inject "current point in time"-semantics into the traversal.
- As of 2016, Marko is currently only friends with Stephen.
- As of 2016, Stephen is currently only friends with Marko.
- As of 2016, Bob is friendless.
Time and Vertices
It is possible to also use
SubgraphStrategy to constrain a traversal to only those vertices within a particular
endTime-range. To generalize the previous helper methods for the remaining examples in this article, let's define them again, but this time, allow the functions to take a list of element classes on which to apply the time-based filter -- e.g.
Without Marko's friendship, Bob drifted as a lost soul flailing around in a sea of vertices looking for anything to form a connection with. Something to relate him to this world. Little did he know that he is and always will be bound to Marko. However, his consciousness felt empty in its shortsighted, time-constrained interpretation of the TinkerPop. His eigen settled into an ergotic, steady state human's like to call hell. And then, in 2015, at his darkest moment, with his sadness too much to bear, Bob passed away... Bob no longer made reference (edges) to anything (vertices) that would allow his contemporaries to be with him in physical reality. (Un)fortunately, due to traversals unconstrained by time, Bob will always be in their memories.
Marko attended Bob's funeral to pay his (dis)respects and was surprised to find Stephen there. Unbeknownst to Marko, Stephen had been friends with Bob since 2012 and he remained his friend till the bitter end. Unfortunately, Stephen's friendship was not enough to keep Bob grounded in the(ir) world. Even though Bob was less than the skin he was printed on, no one would find fault in his spiraled denouement and ultimate razor sharp finale as a tight needle point of nothing against the backdrop of an eternity bounding with greatness. Losing Marko's friendship can be so traumatic to one's psyche that nothing in this world will ever take away thœse blues... nothing compares.
- The traversal source is redefined to not include a time-based
endTime-property is added to Bob's vertex in order to represent his time of death.
- Stephen's friendship with Bob started in 2012 and never ended.
With the amendments above, it is possible to traverse the graph of friends where not only are friendship dates respected, but also only those friendships that are with people whom are still alive. When only considering edges, Stephen is friends with both Bob and Marko. However, when considering vertices as well, Stephen's current friends no longer include Bob as Bob is no longer "current."
- As of 2016, Stephen is friends with both Bob and Marko as only the timestamp on edges are respected.
- As of 2016, Stephen is only friends with Marko as both vertex and edge timestamps are respected and Bob died in 2015.
Time and Vertex Properties
The last element to consider in TinkerPop's property graph data model is the
VertexProperty. Vertex properties are unique to vertices, can have duplicate keys (multi-properties), and can themselves have properties (meta-properties). What is convenient about this data structure is that vertex properties, like vertices and edges, can be timestamped. Thus, let's assume that each person vertex has a
spirit property with a value of either
sad. Now, given that a person's spirit changes through (human) time, it will be necessary to add start and end time (meta)properties to each
spirit vertex property.
- Marko's spirit has changed over the years.
- Stephen's spirit has been consistently happy over the years.
- Bob's spirit plummeted in 2014 do to his nefariously wrought debauchery.
- In 2012, everyone was happy.
- In 2014, Bob became sad -- and this led to Marko cutting his ties in 2015.
- In 2015, Bob died and thus, he no longer exists.
- Currently, only Marko and Stephen exist and they are happy.
SubgraphStrategy is registered with a traversal source, all traversals spawned from that source will be constrained by its respective vertex, edge, and vertex property filters. The question becomes, how does
SubgraphStrategy actually work? To study the mechanics of
explain()-step should be used. The
explain()-step shows the evolution of the traversal from submission to evaluation as each registered traversal strategy applies its rewrite logic.
SubgraphStrategy inspects the traversal and whenever a vertex, edge, or vertex property is traversed, the specified filter is inserted appropriately. The table below shows the various rewrite patterns in
x.y.z is completely filter-based, then the
x.y.z constraint is not wrapped in a
filter()-step, but instead, is simply inlined as is.
|filter||original traversal||non-filter rewrite||pure filter rewrite|
explain()-ation above, note how
ProviderOptimizationStrategy used specifically with
GraphStep and any subsequent chain of
HasSteps into a new step called
TinkerGraphStep. The purpose of this is that if there exists a graph index for
TinkerGraphStep will not blindly iterate all the vertices in the graph and then filter those that don't meet the
startTime-constraint, but instead, will do an
O(log(|V|)) index lookup for those vertices within the specified time window. This demonstrates the importance of traversal strategy execution order and how strategies can play off one another during the compilation process. The next section will drive this point home as it will show how locally sorted/indexed edges and vertex properties can be exploited by traversals over time-encoded graphs.
The two examples above filter edges and vertex properties, respectively. Note that when there is an edge filter, the step
outE('knows').x.y.z.inV() as the edge connecting both vertices needs its time frame analyzed. Likewise,
Optimizing for Time in DSE Graph
DSE Graph is an Apache TinkerPop-enabled, distributed graph database led by Matthias Bröcheler and sold by DataStax. One of the differentiating aspects of DSE Graph is its indexing system. A DSE Graph user can not only define global indices over all vertices in the graph (e.g. "get the person vertex named 'marko'"), but they can also define local indices over all edges (and vertex properties) incident to a vertex (e.g. "get the
knows-edges incident to Marko within a particular window of time"). These local indices are called vertex-centric indices and when used with time-based
SubgraphStrategy constraints, it is possible to efficiently retrieve edges (and vertex properties) within a particular window of time without having to iterate and filter all incident edges in memory (higher up in the Gremlin traversal machine). Said another way,
has()-chains in Gremlin are compiled to DSE Graph specific push down predicates.
- Create a new graph in the DSE Graph cluster.
- Configure the Gremlin Console to communicate with that graph's traversal source.
In order to take advantage of DSE Graph's unique indexing system, a graph schema must be defined.
namevertex property is defined as having a text value and
spiritvertex property is a defined as a multi-property with
personvertex is defined with specified vertex properties.
knowsedge is defined as connecting people and having
- A global graph-centric vertex index is created to index all people vertices by their name.
- A local vertex-centric property index is created to index
spiritproperties by their
- A local vertex-centric edge index is created to index
knowsedges by their
With the schema defined, the graph data is then loaded into DSE Graph.
- Create the Marko vertex with all its vertex properties.
- Create the Stephen vertex with all its vertex properties.
- Create the Bob vertex with all its vertex properties.
- Create the Marko/Stephen friendship edge with all its edge properties.
- Create the Marko/Bob friendship edge with all its edge properties.
- Create the Stephen/Bob friendship edge with all its edge properties.
The traversal above simply returns all people vertices named 'marko'. DSE Graph's
GraphStepStrategy is able to fold the respective
HasStep predicates into
DsegGraphStep. Given that a "personByName"-index was created in the schema,
DsegGraphStep is able to do a global graph-centric index lookup instead of linearly scanning all vertices in the graph and filtering out those that do not have the name 'marko'. Note that global graph indices are a standard feature across all OLTP graph database systems.
LocalEdgeQueryOptimizerStrategy is a DSE Graph specific
ProviderOptimizationStrategy that converts Apache TinkerPop
DsegVertexStep is a one-to-many step mapping a vertex to a set of edges. However, notice how
DsegVertexStep encapsulates the
HasStep-constraint previously inserted by
SubgraphStrategy. Given that we defined a vertex-centric "knowsByStartTime"-index in our schema, DSE Graph will leverage that index to only materialize (from disk and from memory) those incident
knows edges that satisfy that constraint. In this way, the
SubgraphStrategy filters are turned into push-down predicates by
DsegVertexStep prior to evaluation against the underlying DSE Graph database. When a graph's data is never deleted and only continues to grow, vertex-centric indices become indispensable as the number of edges incident to a vertex may grow into the millions. Linearly scanning all incident edges in memory is simply not a scalable solution to this problem. This is what makes vertex-centric indices one of the most important, uniquely defining features of DSE Graph.
|Learn more about vertex-centric indices from one of history's classic footnotes in the annals of graph-literature: A Solution to the Supernode Problem.|
DsegPropertiesStep is also able to "roll up" appropriate
HasStep filters and leverage them as push down predicates against DSE Graph's storage engine. However, instead of the index being with respect to a vertex's incident edges, the index is over incident vertex properties. More specifically, using the "spiritByStartTime"-index defined in the schema.
Note that at the most inopportune time, during the final review of this article, it was realized that a better way to represent an element that hasn't been "deleted" is to use an
Integer.MAX_VALUE. In this way, every element would always have both a
startTime and an
endTime property. The benefit of this is that both graph-centric and vertex-centric indices would be able to them leverage
endTime in their push down predication constructions. In the example below, when there is no need to check if the
endTime property exists, then both constraints are pulled into the
Graph Mutations through Time
SubgraphStrategy is useful in read-only situations. In a time-based application, typically the graph will be modified and thus, write-based traversals will need to account for
endTime-semantics. There are three types of modifications to consider.
- Element creation: When a new element is added, a
startTime-property needs to be added to it.
- Element deletion: When an element is removed, an
endTime-property needs to be added to it.
- Edge/Vertex Property property update: When an edge or vertex property's properties are altered, an
endTime-property needs to be added and a clone needs to be made with a new
When a vertex, edge, or vertex property is created (or "deleted"), simply appending a
endTime) property is sufficient to represent the mutation. However, when an edge or vertex property's properties are updated, then a "clone" is required. Assume that a
strength property of 0.8 is added to the
knows-edge between Marko and Stephen. To represent this, the
knows-edge has an
endTime property added to it and then a new
knows-edge is created with all the same properties (minus the
endTime) as the original edge with an added
strength property and a
startTime being the current time. In this way, a property mutation on an edge (or vertex property) requires a "deletion," creation, and a clone.
In the examples presented in this article, traversals that modified the graph maintained the aforementioned timestamp logic. Such bookkeeping logic should really be encapsulated in a
SubgraphStrategy does not support introspection into the Apache TinkerPop mutation steps. Perhaps it will in the future. Until then, the application developer can create a strategy to perform the aforementioned logic on writes and then, when that strategy (along with
SubgraphStrategy) is registered with the application's traversal source, then both read- and write-based traversals against the application's underlying time-based graph will be sound -- allowing users to write time-agnostic traversals.
Gremlin's Time Machine (
Ode to the Death of Bob) has run out of time! The time-based graph model presented makes it possible to create graphs that never lose data and only grow as the world being modeled changes over (human) time. In order to make the spaghetti ball of undying vertices and edges manageable, timestamps can be applied as specified. From there,
SubgraphStrategy can be registered with a traversal source so that users can write time-agnostic traversals that are ultimately compiled into a time-prescient form prior to evaluation. Graph database providers, like DSE Graph, will be able to use the injected time-filters as push-down predicates to both graph-centric and vertex-centric indices. The future is yours.
With that -- did we learn about the fundamental nature of time? Yes, time and space exist in the graph (like pink elephants, square circles, and a loving angelic Bob), but it is only the fact that your conscious traversal accounts for both space and time that you experience the very real illusion that the TinkerPop is some mind-bogglingly massive, ever-changing universe. Silly wabbit trix r 4 kidz.
Numerous DSE Graph customers (and graph users in general) have asked have to look at the state of their graph at any particular point of time. This article was inspired by such customer use cases. Discussions about this topic took place at Cassandra Summit 2016 with Jonathan Lacefield and Vincent Poncet. Stephen Mallette was the original author of
SubgraphStrategy and helped with technical preparation for the DSE Graph code examples. Finally, Bob Briody was a good sport -- may he rest in peace.