Gremlin’s Time Machine

By Marko A. Rodriguez -  September 25, 2016 | 0 Comments

Salvador Gremli In the TinkerPop, time and space coexist within the graph. The graph’s vertices, edges, and properties are the atomic elements of its structure. Without these elements, there are no “things” and without “things” there is no space as there is no way to measure distance. The graph defines space because a vertex may be closer or farther from another vertex depending on the number of edges that must be traversed to reach it. The longer the path separating two vertices, the greater the distance between them and thus, the larger the space. If the graph creates space, does it also create time?

The ticking of time within the TinkerPop is experienced by humans as a continuous progression of structural modifications to the graph. For instance, at some point in time, a person may be a friend to another person. However, after a heart wrenching fallout, that friendship (edge) may wither and die. What was, is no longer and what wasn’t, now is. To humans, vertices (people, places, things) and their various relationships to one another (edges) come and go from their conscious experience and that procession is demarcated by “time.” Humans witness the graph in which they live as a dynamic structure, forever mutating itself according to the logic of their collectively conceived physics and their subjectively constructed psyches. However, their perception of a world changing through time is simply an epiphenomena of their graph traversal. Human consciousness traverses a time (and space) constrained subgraph within the larger, universal graph of the TinkerPop. Outside the boundaries of the human’s subgraph exists a time-full, changeless world, where time and space are structural features/motifs of the universal graph. It is simply the human’s self-imposed constraints that creates, for them, the very real illusion of time.

The Nature of Time in the TinkerPop

WARNING: This article presents some functionality that is currently not in an official release of Apache TinkerPop and thus, DSE Graph.

 
Let us begin our journey by creating a TraversalSource to an empty TinkerGraph. Note that all of the presented code snippets build on each other. The more precocious reader may want to follow along by cutting and pasting each snippet into the Gremlin Console.

$ bin/gremlin.sh

         \,,,/
         (o o)
-----oOOo-(3)-oOOo-----
gremlin> g = TinkerGraph.open().traversal()
==>graphtraversalsource[tinkergraph[vertices:0 edges:0], standard]

Time and Edges

There are two people named Marko and Stephen. Marko was born in 1979 and Stephen in 1975. In 2010, they met each other in the TinkerPop and quickly developed a mutual respect for each others contributions to the science and engineering of the TinkerPop. This friendship continues to this day.

Marko and Stephen

gremlin> g.addV('person').                              // <1>
......1>   property('startTime',1979).
......2>   property('name','marko','startTime',1979)
==>v[0]
gremlin> g.addV('person').                              // <2>
......1>   property('startTime',1975).
......2>   property('name','stephen','startTime',1975)
==>v[3]
gremlin> g.V().has('name','marko').as('a').             // <3>
......1>   V().has('name','stephen').as('b').
......2>     addE('knows').from('a').to('b').
......3>       property('startTime',2010)
==>e[6][0-knows->3]
  1. A Marko vertex is created with a startTime and name property. The name property has a startTime property as well.
  2. A Stephen vertex is created with similar properties to Marko.
  3. Marko projects a knows (friendship) edge to Stephen with a startTime property on it.
NOTE: All vertex properties that are not time-based will have time-based meta-properties. For instance, the name property on the two vertices above have a startTime denoting when their name was given to them. The importance of this will become clearer as the article progresses.

 

DSE Studio Screenshot There is another person in our world named Bob. Yes, Bob. Not Balthazar, not Ragnar nor Melchizedek. Just Bob. Bob was born in 1983. Marko befriended Bob in 2012 when he saw his work on Wolf Spider (a TinkerPop visualization tool). This work ultimately led Bob on a path to developing DSE Studio. After a few years of collaboration, Marko came to realize that Bob was rotted through to his core — a truly evil person, shameless in his dissolutely deviant degenerate depravity; an impure man unworthy of breathing the life giving air afforded to herculeanly high-minded honorable hombres. In 2015, against Bob’s pleads for a “second chance” at moral integrity, Marko announced the verdict of his deliberation to all the town’s people: De-friend him!

Marko, Stephen, and Bob

gremlin> g.addV('person').                              // <1>
......1>   property('startTime',1983).
......2>   property('name','bob','startTime',1983)
==>v[7]
gremlin> g.V().has('name','marko').as('a').             // <2>
......1>   V().has('name','bob').as('b').
......2>     addE('knows').from('a').to('b').
......3>       property('startTime',2012).
......4>       property('endTime',2015)
==>e[10][0-knows->7]
  1. A Bob vertex is created with a name and startTime property. His name property also has a startTime property.
  2. Marko has an outgoing knows edge to Bob with both a startTime and endTime property on it.

When Marko reflects on the friendships that he has made (in “social” space) over his life (in time), his mind traverses the knows-edges incident to his vertex-self.

gremlin> g.V().has('name','marko').out('knows').values('name')
==>stephen
==>bob

Window of Time When Marko wants to think only about his current friends, he constrains his thoughts to only those knows-edges in his current point in time. If the year is 2016, then the edges must have a startTime less than or equal to 2016 and an endTime greater than 2016 (or no endTime at all — i.e. the end has not been determined). In the 2016-based example below, Marko only experiences his friendship with Stephen, not Bob as Marko and Bob de-friended in 2015. In this way, this sense of friendship is bound to the “the moment.” When such constraints are applied to conscious thought, the notion of a dynamic graph changing with time emerges. While “time” exists as a static structure endogenous to the graph, it also exists as a dynamic structure exogenous to the human.

The human’s constrained traversal(s) cuts him off from the universe as a whole and yields the very real illusions of space, time, self, other, etc. In an unconstrained traversal, there is no space as every vertex is adjacent to every other vertex. Likewise, there is no time as time properties do not impose a “street sign”-like influence, dictating what is and is not possible to experience.

 

gremlin> g.V().has('name','marko').outE('knows').
......1>     has('startTime',lte(2016)).
......2>       and(hasNot('endTime').
......3>             or().
......4>           has('endTime',gt(2016))).
......5>   inV().values('name')
==>stephen

There must be an easier way to express this traversal. Alas, there is! Welcome SubgraphStrategy. Before a traversal is executed against a graph, any number of traversal strategies can be applied to that traversal. Typically, traversal strategies optimize the traversal by rewriting it to a more efficient, semantically equivalent form. However, other types of traversal strategies exist such as SubgraphStrategy which is a DecorationStrategy. Decoration strategies, unlike optimization strategies, augment the traversal with application-specific logic which yields a traversal that need not be semantically equivalent to the original form. When traversing a time-encoded graph, SubgraphStrategy allows the user to write their Gremlin traversals without having to explicitly add time constraints throughout. Instead, SubgraphStrategy will insert all such constraints at compile time. For more information on Gremlin’s compiler, please see The Benefits of the Gremlin Graph Traversal Machine.

gremlin> strategy = SubgraphStrategy.build().                                            // <1>
......1>              edges(has('startTime',lte(2016)).
......2>                    or(hasNot('endTime'),
......3>                       has('endTime',gt(2016)))).create()
==>SubgraphStrategy
gremlin> g.withStrategies(strategy).V().has('name','marko').out('knows').values('name')  // <2>
==>stephen
  1. Create a SubgraphStrategy where all traversed edges must satisfy the specified constraint.
  2. Add the strategy to the TraversalSource and submit a time-agnostic traversal.

To make things a bit easier to use, let’s create three helper functions.

timeWindow = { startTime, endTime ->                                    // <1>
  assert startTime <= endTime
  SubgraphStrategy.build().
    edges(has('startTime',lte(startTime)).
          or(hasNot('endTime'),has('endTime',gt(endTime)))).create()  
}

timePoint = { timePoint -> timeWindow(timePoint,timePoint) }            // <2>

currentPoint = { timePoint(Calendar.getInstance().get(Calendar.YEAR)) } // <3>
  1. timeWindow: A bi-function that yields a SubgraphStrategy that filters out edges within a provided start time and end time.
  2. timePoint: A function that yields a SubgraphStrategy with a time window at a specified year.
  3. currentPoint: A supplier that yields a SubgraphStrategy for the current point in time.

With these helpers in place, it’s easy to create a traversal that is bound to a time in the graph.

gremlin> g.withStrategies(timeWindow(2012,2014)).V().has('name','marko').out('knows').values('name') // <1>
==>stephen
==>bob
gremlin> g.withStrategies(timePoint(2015)).V().has('name','marko').out('knows').values('name')       // <2>
==>stephen
gremlin> g.withStrategies(currentPoint()).V().has('name','marko').out('knows').values('name')        // <3>
==>stephen
  1. Marko was friends with both Stephen and Bob from 2012 to 2014.
  2. Marko was only friends with Stephen in 2015.
  3. Marko is currently only friends with Stephen (as of 2016).

While the graph’s structure hasn’t changed, the results of the “marko’s friends”-traversal has. The traversal’s results are determined by both the “space” (social) and “time” structures therein. It is worth emphasizing again that the graph is static. It is the time-relative traversal that yields the dynamic, evolving nature of the world experienced by humans. This is why time is a very real illusion.

Consciousness is the Traverser Guided by a Traversal through the Graph
 
 
The human is a traversal machine. Contrary to modern cognitive theory, the brain does not store episodic, life-history information (data). It doesn’t need to, that information is already encoded in the graph. Instead the brain is composed of traversals (queries/algorithms) for accessing (experiencing) that information within the the universal graph of the TinkerPop. Classic human traversals include:

  • Physically moving in space: The motor cortex is a traversal that is able to connect to a “location”-vertex (addE()) that is time/space-adjacent to the current “location”-vertex being referenced.
  • Looking out at the world: The visual cortex is constrained to traverse “physical”-vertices with nearby “time/space”-properties that have incident, time-equivalent “photon”-vertices.
  • Remembering the past: The associative cortex is constrained by the individual’s ego-centric subgraph that, unlike the motor cortex, is not restricted by any time/space motifs.
  • Abstract reasoning: The cerebral cortex is an Aristotelian-constrained logic traversal that systematically moves through a conceptual graph in order to infer and deduce (i.e. be exposed to) more areas of the universal graph.
  • Religious experiences: The temporal lobe’s traversal has no time/space/ego-constraints and as such, is able to experience everything as being adjacent to everything (i.e. a fully connected graph).

The TinkerPop The human, like all “things,” is encoded in the graph. The human’s cortex is composed of yet more vertices and edges. Cortical subgraphs represent traversals as tree-like structures supporting function chaining (vertex adjacency) and function nesting (subgraph groupings). Consciousness (the traverser) exists at the union of the cortical subgraph being intersected with other subgraphs of the universal graph. These cortical subgraphs can be rewritten via psychotherapy (i.e. introspection), pharmacology (i.e entheogens), neurosurgery (i.e. prosthetics), etc. Such compile-time mutations to the traversal are DecorationStrategies which themselves are simply traversals (traversals that traverse traversals). Like monkeys on a keyboard, there are an infinite number of traversals that consciousness could leverage, but due to the “human’s” conditioning (education, culture, science), the human believes the repertoire to be small and within that tiny library only a sliver are deemed “okay” to experience. The liberated consciousness leaves the repetition of the constrained human condition in search of new traversal flows with logics unfathomed where left is right, up near south, and heaven just a tad bit north of the Arctic’s ice — in search of the unreachable. A place where death only happens because you believe people die. Remember this when you read of Bob’s tragedy. Bob’s fate isn’t his end. He simply no longer references the old school, played out, “have we done this enough yet?” cortical traversals of the human condition(ing). The space/time-“uni”verse that science holds so dear is a mere blip within that which is, has always been, and will always-never be The TinkerPop.

 

A Note on Traversal Sources and their Traversals

In Gremlin, a TraversalSource defines a “connection” to a graph in a manner analogous to a database connection. An instance is typically named g and is defined as g = graph.traversal(). A traversal source can be configured using withXXX() source steps. These withXXX()-steps add or remove traversal strategies. Once a traversal source has been created and configured with strategies, it can be used repeatedly to spawn Traversals — i.e. “queries.” Thus, a new SubgraphStrategy does not need to be created each and every time a traversal is spawned. Instead, it may be created once for the life of g.

gremlin> g = g.withStrategies(currentPoint())                     // <1>
==>graphtraversalsource[tinkergraph[vertices:3 edges:2], standard]
gremlin> g.V().has('name','marko').both('knows').values('name')   // <2>
==>stephen
gremlin> g.V().has('name','stephen').both('knows').values('name') // <3>
==>marko
gremlin> g.V().has('name','bob').both('knows').values('name')     // <4>
gremlin>
  1. A SubgraphStrategy that will inject “current point in time”-semantics into the traversal.
  2. As of 2016, Marko is currently only friends with Stephen.
  3. As of 2016, Stephen is currently only friends with Marko.
  4. As of 2016, Bob is friendless.

Time and Vertices

It is possible to also use SubgraphStrategy to constrain a traversal to only those vertices within a particular startTime/endTime-range. To generalize the previous helper methods for the remaining examples in this article, let’s define them again, but this time, allow the functions to take a list of element classes on which to apply the time-based filter — e.g. Vertex, Edge, and/or VertexProperty.

timeWindow = { startTime, endTime, elementClasses ->
  assert startTime <= endTime
  timeTraversal = has('startTime',lte(startTime)).
                  or(hasNot('endTime'),has('endTime',gt(endTime)))
  builder = SubgraphStrategy.build()
  if(Vertex.class in elementClasses)
    builder = builder.vertices(timeTraversal.clone())
  if(Edge.class in elementClasses)
    builder = builder.edges(timeTraversal.clone())
  if(VertexProperty.class in elementClasses)
    builder = builder.vertexProperties(timeTraversal.clone())
  return builder.create()
}

timePoint = { timePoint, elementClasses -> timeWindow(timePoint, timePoint, elementClasses) }

currentPoint = { elementClasses -> timePoint(Calendar.getInstance().get(Calendar.YEAR), elementClasses) }

Bob Drifting AwayWithout Marko’s friendship, Bob drifted as a lost soul flailing around in a sea of vertices looking for anything to form a connection with. Something to relate him to this world. Little did he know that he is and always will be bound to Marko. However, his consciousness felt empty in its shortsighted, time-constrained interpretation of the TinkerPop. His eigen settled into an ergotic, steady state human’s like to call hell. And then, in 2015, at his darkest moment, with his sadness too much to bear, Bob passed away… Bob no longer made reference (edges) to anything (vertices) that would allow his contemporaries to be with him in physical reality. (Un)fortunately, due to traversals unconstrained by time, Bob will always be in their memories.

Marko attended Bob’s funeral to pay his (dis)respects and was surprised to find Stephen there. Unbeknownst to Marko, Stephen had been friends with Bob since 2012 and he remained his friend till the bitter end. Unfortunately, Stephen’s friendship was not enough to keep Bob grounded in the(ir) world. Even though Bob was less than the skin he was printed on, no one would find fault in his spiraled denouement and ultimate razor sharp finale as a tight needle point of nothing against the backdrop of an eternity bounding with greatness. Losing Marko’s friendship can be so traumatic to one’s psyche that nothing in this world will ever take away thœse blues… nothing compares.

Marko, Stephen, and Dead Bob

gremlin> g = g.withoutStrategies(SubgraphStrategy)                          // <1>
==>graphtraversalsource[tinkergraph[vertices:3 edges:2], standard]
gremlin> g.V().has('name','bob').property('endTime',2015)                   // <2>
==>v[7]
gremlin> g.V().has('name','stephen').as('a').                               // <3>
......1>   V().has('name','bob').as('b').
......2>     addE('knows').from('a').to('b').property('startTime',2012)
==>e[12][3-knows->7]
  1. The traversal source is redefined to not include a time-based SubgraphStrategy.
  2. An endTime-property is added to Bob’s vertex in order to represent his time of death.
  3. Stephen’s friendship with Bob started in 2012 and never ended.

With the amendments above, it is possible to traverse the graph of friends where not only are friendship dates respected, but also only those friendships that are with people whom are still alive. When only considering edges, Stephen is friends with both Bob and Marko. However, when considering vertices as well, Stephen’s current friends no longer include Bob as Bob is no longer “current.”

gremlin> g.withStrategies(currentPoint([Edge])).V().has('name','stephen').both('knows').values('name')        // <1>
==>bob
==>marko
gremlin> g.withStrategies(currentPoint([Vertex,Edge])).V().has('name','stephen').both('knows').values('name') // <2>
==>marko
  1. As of 2016, Stephen is friends with both Bob and Marko as only the timestamp on edges are respected.
  2. As of 2016, Stephen is only friends with Marko as both vertex and edge timestamps are respected and Bob died in 2015.

Time and Vertex Properties

The last element to consider in TinkerPop’s property graph data model is the VertexProperty. Vertex properties are unique to vertices, can have duplicate keys (multi-properties), and can themselves have properties (meta-properties). What is convenient about this data structure is that vertex properties, like vertices and edges, can be timestamped. Thus, let’s assume that each person vertex has a spirit property with a value of either happy or sad. Now, given that a person’s spirit changes through (human) time, it will be necessary to add start and end time (meta)properties to each spirit vertex property.

Marko, Stephen, and Dead Bob's Spirit

gremlin> g.V().has('name','marko').
......1>   property(list,'spirit','happy','startTime',1979,'endTime',2013).  // <1>
......2>   property(list,'spirit','sad','startTime',2013,'endTime',2014).
......3>   property(list,'spirit','happy','startTime',2014)
==>v[0]
gremlin>
gremlin> g.V().has('name','stephen').                                        // <2>
......1>   property(list,'spirit','happy','startTime',1975)
==>v[3]
gremlin>
gremlin> g.V().has('name','bob').                                            // <3>
......1>   property(list,'spirit','happy','startTime',1983,'endTime',2014).
......2>   property(list,'spirit','sad','startTime',2014)
==>v[7]                 
  1. Marko’s spirit has changed over the years.
  2. Stephen’s spirit has been consistently happy over the years.
  3. Bob’s spirit plummeted in 2014 do to his nefariously wrought debauchery.
gremlin> g.withStrategies(timePoint(2012,[Vertex,Edge,VertexProperty])).V().valueMap('name','spirit') // <1>
==>[name:[marko],spirit:[happy]]
==>[name:[stephen],spirit:[happy]]
==>[name:[bob],spirit:[happy]]
gremlin> g.withStrategies(timePoint(2014,[Vertex,Edge,VertexProperty])).V().valueMap('name','spirit') // <2>
==>[name:[marko],spirit:[happy]]
==>[name:[stephen],spirit:[happy]]
==>[name:[bob],spirit:[sad]]
gremlin> g.withStrategies(timePoint(2015,[Vertex,Edge,VertexProperty])).V().valueMap('name','spirit') // <3>
==>[name:[marko],spirit:[happy]]
==>[name:[stephen],spirit:[happy]]
gremlin> g.withStrategies(currentPoint([Vertex,Edge,VertexProperty])).V().valueMap('name','spirit')   // <4>
==>[name:[marko],spirit:[happy]]
==>[name:[stephen],spirit:[happy]]
  1. In 2012, everyone was happy.
  2. In 2014, Bob became sad — and this led to Marko cutting his ties in 2015.
  3. In 2015, Bob died and thus, he no longer exists.
  4. Currently, only Marko and Stephen exist and they are happy.

Understanding SubgraphStrategy

Once a SubgraphStrategy is registered with a traversal source, all traversals spawned from that source will be constrained by its respective vertex, edge, and vertex property filters. The question becomes, how does SubgraphStrategy actually work? To study the mechanics of SubgraphStrategy the explain()-step should be used. The explain()-step shows the evolution of the traversal from submission to evaluation as each registered traversal strategy applies its rewrite logic.

gremlin> g.withStrategies(timeWindow(2012,2014,[Vertex])).V().explain()
==>Traversal Explanation
====================================================================================================================================================================================
Original Traversal                 [GraphStep(vertex,[])]

SubgraphStrategy             [D]   [GraphStep(vertex,[]), HasStep([startTime.lte(2012)]), OrStep([[NotStep(![PropertiesStep([endTime],value)])], [HasStep([endTime.gt(2014)])]])]
...
TinkerGraphStepStrategy      [P]   [TinkerGraphStep(vertex,[startTime.lte(2012)]), OrStep([[NotStep(![PropertiesStep([endTime],property)])], [HasStep([endTime.gt(2014)])]])]
...
Final Traversal                    [TinkerGraphStep(vertex,[startTime.lte(2012)]), OrStep([[NotStep(![PropertiesStep([endTime],property)])], [HasStep([endTime.gt(2014)])]])]
// BEFORE SubgraphStrategy
[GraphStep(vertex,[])]

// AFTER SubgraphStrategy
[GraphStep(vertex,[]), 
  HasStep([startTime.lte(2012)]), 
  OrStep(
    [[NotStep(![PropertiesStep([endTime],value)])], 
    [HasStep([endTime.gt(2014)])]])]

SubgraphStrategy inspects the traversal and whenever a vertex, edge, or vertex property is traversed, the specified filter is inserted appropriately. The table below shows the various rewrite patterns in SubgraphStrategy. If x.y.z is completely filter-based, then the x.y.z constraint is not wrapped in a filter()-step, but instead, is simply inlined as is.

filter original traversal non-filter rewrite pure filter rewrite
vertex V() V().filter(x.y.z) V().x.y.z
vertex out('knows') out('knows').filter(x.y.z) out('knows').x.y.z
edge outE('knows') outE('knows').filter(x.y.z) outE().x.y.z
edge out('knows') outE('knows').filter(x.y.z).inV() outE().x.y.z.inV()
vertex property properties('spirit') properties('spirit').filter(x.y.z) properties('spirit').x.y.z
vertex property values('spirit') properties('spirit').filter(x.y.z).value() properties('spirit').x.y.z.value()

 
In the explain()-ation above, note how TinkerGraphStepStrategy (a ProviderOptimizationStrategy used specifically with TinkerGraph) wraps GraphStep and any subsequent chain of HasSteps into a new step called TinkerGraphStep. The purpose of this is that if there exists a graph index for startTime, then TinkerGraphStep will not blindly iterate all the vertices in the graph and then filter those that don’t meet the startTime-constraint, but instead, will do an O(log(|V|)) index lookup for those vertices within the specified time window. This demonstrates the importance of traversal strategy execution order and how strategies can play off one another during the compilation process. The next section will drive this point home as it will show how locally sorted/indexed edges and vertex properties can be exploited by traversals over time-encoded graphs.

gremlin> g.withStrategies(timeWindow(2012,2014,[Edge])).V().out('knows').explain()
==>Traversal Explanation
============================================================================================================================================================================================================================================
Original Traversal                 [GraphStep(vertex,[]), VertexStep(OUT,[knows],vertex)]

SubgraphStrategy             [D]   [GraphStep(vertex,[]), VertexStep(OUT,[knows],edge), HasStep([startTime.lte(2012)]), OrStep([[NotStep(![PropertiesStep([endTime],value)])], [HasStep([endTime.gt(2014)])]]), EdgeVertexStep(IN)]
...
gremlin> g.withStrategies(timeWindow(2012,2014,[VertexProperty])).V().values('spirit').explain()
==>Traversal Explanation
================================================================================================================================================================================================================================================
Original Traversal                 [GraphStep(vertex,[]), PropertiesStep([spirit],value)]

SubgraphStrategy             [D]   [GraphStep(vertex,[]), PropertiesStep([spirit],property), HasStep([startTime.lte(2012)]), OrStep([[NotStep(![PropertiesStep([endTime],value)])], [HasStep([endTime.gt(2014)])]]), PropertyValueStep]
...

The two examples above filter edges and vertex properties, respectively. Note that when there is an edge filter, the step out('knows') becomes outE('knows').x.y.z.inV() as the edge connecting both vertices needs its time frame analyzed. Likewise, values('spirit') becomes properties('spirit').x.y.z.value().

Optimizing for Time in DSE Graph

DSE GraphDSE Graph is an Apache TinkerPop-enabled, distributed graph database led by Matthias Bröcheler and sold by DataStax. One of the differentiating aspects of DSE Graph is its indexing system. A DSE Graph user can not only define global indices over all vertices in the graph (e.g. “get the person vertex named ‘marko'”), but they can also define local indices over all edges (and vertex properties) incident to a vertex (e.g. “get the knows-edges incident to Marko within a particular window of time”). These local indices are called vertex-centric indices and when used with time-based SubgraphStrategy constraints, it is possible to efficiently retrieve edges (and vertex properties) within a particular window of time without having to iterate and filter all incident edges in memory (higher up in the Gremlin traversal machine). Said another way, has()-chains in Gremlin are compiled to DSE Graph specific push down predicates.

gremlin> system.graph("gremlins-time-machine").create()   // <1>
gremlin> :remote config alias g gremlins-time-machine.g   // <2>
==>g=gremlins-time-machine.g
  1. Create a new graph in the DSE Graph cluster.
  2. Configure the Gremlin Console to communicate with that graph’s traversal source.

In order to take advantage of DSE Graph’s unique indexing system, a graph schema must be defined.

// property definitions 
schema.propertyKey('startTime').Int().create()
schema.propertyKey('endTime').Int().create()
schema.propertyKey('name').Text().properties('startTime','endTime').create()                        // <1>
schema.propertyKey('spirit').Text().multiple().properties('startTime','endTime').create()           // <2>

// vertex and edge definitions
schema.vertexLabel('person').properties('startTime','endTime','name','spirit').create()             // <3>
schema.edgeLabel('knows').properties('startTime','endTime').connection('person','person').create()  // <4>

// graph- and vertex-centric index definitions
schema.vertexLabel('person').index('personByName').materialized().by('name').add()                  // <5>
schema.vertexLabel('person').index('spiritByStartTime').property('spirit').by('startTime').add()    // <6>
schema.vertexLabel('person').index('knowsByStartTime').outE('knows').by('startTime').add()          // <7>
  1. The name vertex property is defined as having a text value and startTime/endTime meta-properties.
  2. The spirit vertex property is a defined as a multi-property with startTime/endTime meta-properties.
  3. The person vertex is defined with specified vertex properties.
  4. The knows edge is defined as connecting people and having startTime/endTime properties.
  5. A global graph-centric vertex index is created to index all people vertices by their name.
  6. A local vertex-centric property index is created to index spirit properties by their startTimes.
  7. A local vertex-centric edge index is created to index knows edges by their startTimes.

With the schema defined, the graph data is then loaded into DSE Graph.

g.addV('person').                                                   // <1>
  property('startTime',1979).  
  property('name','marko','startTime',1979).
  property(list,'spirit','happy','startTime',1979,'endTime',2013).
  property(list,'spirit','sad','startTime',2013,'endTime',2014).
  property(list,'spirit','happy','startTime',2014)                              
g.addV('person').                                                    // <2>
  property('startTime',1975).  
  property('name','stephen','startTime',1975).
  property(list,'spirit','happy','startTime',1975)          
g.addV('person').                                                    // <3>
  property('startTime',1983).
  property('endTime',2015).
  property('name','bob','startTime',1983).
  property(list,'spirit','happy','startTime',1983,'endTime',2014).
  property(list,'spirit','sad','startTime',2014)  

g.V().has('person','name','marko').as('a').                          // <4>
  V().has('person','name','stephen').as('b').
    addE('knows').from('a').to('b').
      property('startTime',2010)
g.V().has('person','name','marko').as('a').                          // <5>
  V().has('person','name','bob').as('b').
    addE('knows').from('a').to('b').
      property('startTime',2012).
      property('endTime',2015)
g.V().has('person','name','stephen').as('a').                        // <6>
  V().has('person','name','bob').as('b').
    addE('knows').from('a').to('b').
      property('startTime',2012)
  1. Create the Marko vertex with all its vertex properties.
  2. Create the Stephen vertex with all its vertex properties.
  3. Create the Bob vertex with all its vertex properties.
  4. Create the Marko/Stephen friendship edge with all its edge properties.
  5. Create the Marko/Bob friendship edge with all its edge properties.
  6. Create the Stephen/Bob friendship edge with all its edge properties.
gremlin> g.withStrategies(currentPoint([Vertex])).V().has('person','name','marko').explain()
==>Traversal Explanation
======================================================================================================================================================================================================================================================
Original Traversal                          [GraphStep(vertex,[]), HasStep([~label.eq(person)]), HasStep([name.eq(marko)])]

SubgraphStrategy                      [D]   [GraphStep(vertex,[]), HasStep([startTime.lte(2016)]), OrStep([[NotStep(![PropertiesStep([endTime],value)])], [HasStep([endTime.gt(2016)])]]), HasStep([~label.eq(person)]), HasStep([name.eq(marko)])]
...
GraphStepStrategy                     [P]   [DsegGraphStep(vertex,[~label.=(person), name.=(marko), startTime.<=(2016)]), OrStep([[NotStep(![DsegPropertiesStep([endTime],property)])], [HasStep([endTime.>(2016)])]])]
...

Graph Centric Index The traversal above simply returns all people vertices named ‘marko’. DSE Graph’s GraphStepStrategy is able to fold the respective HasStep predicates into DsegGraphStep. Given that a “personByName”-index was created in the schema, DsegGraphStep is able to do a global graph-centric index lookup instead of linearly scanning all vertices in the graph and filtering out those that do not have the name ‘marko’. Note that global graph indices are a standard feature across all OLTP graph database systems.

gremlin> g.withStrategies(currentPoint([Edge])).V().outE('knows').explain()
==>Traversal Explanation
===========================================================================================================================================================================================================================
Original Traversal                          [GraphStep(vertex,[]), VertexStep(OUT,[knows],edge)]

SubgraphStrategy                      [D]   [GraphStep(vertex,[]), VertexStep(OUT,[knows],edge), HasStep([startTime.lte(2016)]), OrStep([[NotStep(![PropertiesStep([endTime],value)])], [HasStep([endTime.gt(2016)])]])]
...
LocalEdgeQueryOptimizerStrategy       [P]   [GraphStep(vertex,[]), DsegVertexStep(OUT,[knows],[startTime.<=(2016)],edge), OrStep([[NotStep(![DsegPropertiesStep([endTime],property)])], [HasStep([endTime.>(2016)])]])]
...

Vertex Centric Edge Index LocalEdgeQueryOptimizerStrategy is a DSE Graph specific ProviderOptimizationStrategy that converts Apache TinkerPop VertexSteps to DsegVertexSteps. Like VertexStep, a DsegVertexStep is a one-to-many step mapping a vertex to a set of edges. However, notice how DsegVertexStep encapsulates the startTime HasStep-constraint previously inserted by SubgraphStrategy. Given that we defined a vertex-centric “knowsByStartTime”-index in our schema, DSE Graph will leverage that index to only materialize (from disk and from memory) those incident knows edges that satisfy that constraint. In this way, the SubgraphStrategy filters are turned into push-down predicates by DsegVertexStep prior to evaluation against the underlying DSE Graph database. When a graph’s data is never deleted and only continues to grow, vertex-centric indices become indispensable as the number of edges incident to a vertex may grow into the millions. Linearly scanning all incident edges in memory is simply not a scalable solution to this problem. This is what makes vertex-centric indices one of the most important, uniquely defining features of DSE Graph.

Learn more about vertex-centric indices from one of history’s classic footnotes in the annals of graph-literature: A Solution to the Supernode Problem.
gremlin> g.withStrategies(currentPoint([VertexProperty])).V().properties('spirit').explain()
==>Traversal Explanation
================================================================================================================================================================================================================================
Original Traversal                          [GraphStep(vertex,[]), PropertiesStep([spirit],property)]

SubgraphStrategy                      [D]   [GraphStep(vertex,[]), PropertiesStep([spirit],property), HasStep([startTime.lte(2016)]), OrStep([[NotStep(![PropertiesStep([endTime],value)])], [HasStep([endTime.gt(2016)])]])]
...
LocalEdgeQueryOptimizerStrategy       [P]   [GraphStep(vertex,[]), DsegPropertiesStep([spirit],[startTime.<=(2016)],property), OrStep([[NotStep(![DsegPropertiesStep([endTime],property)])], [HasStep([endTime.>(2016)])]])]
...

Vertex Centric Property Index Similar to DsegVertexStep, DsegPropertiesStep is also able to “roll up” appropriate HasStep filters and leverage them as push down predicates against DSE Graph’s storage engine. However, instead of the index being with respect to a vertex’s incident edges, the index is over incident vertex properties. More specifically, using the “spiritByStartTime”-index defined in the schema.

Note that at the most inopportune time, during the final review of this article, it was realized that a better way to represent an element that hasn’t been “deleted” is to use an endTime of Integer.MAX_VALUE. In this way, every element would always have both a startTime and an endTime property. The benefit of this is that both graph-centric and vertex-centric indices would be able to them leverage endTime in their push down predication constructions. In the example below, when there is no need to check if the endTime property exists, then both constraints are pulled into the DsegPropertiesStep accordingly.

gremlin> g.V().properties('spirit').has('startTime',lte(2016)).has('endTime',gt(2016)).explain()
==>Traversal Explanation
==================================================================================================================================================================
Original Traversal                          [GraphStep(vertex,[]), PropertiesStep([spirit],property), HasStep([startTime.lte(2016)]), HasStep([endTime.gt(2016)])]

...
LocalEdgeQueryOptimizerStrategy       [P]   [GraphStep(vertex,[]), DsegPropertiesStep([spirit],[startTime.<=(2016), endTime.>(2016)],property)]
...

Graph Mutations through Time

SubgraphStrategy is useful in read-only situations. In a time-based application, typically the graph will be modified and thus, write-based traversals will need to account for startTime/endTime-semantics. There are three types of modifications to consider.

  1. Element creation: When a new element is added, a startTime-property needs to be added to it.
  2. Element deletion: When an element is removed, an endTime-property needs to be added to it.
  3. Edge/Vertex Property property update: When an edge or vertex property’s properties are altered, an endTime-property needs to be added and a clone needs to be made with a new startTime property.

Marko-Stephen Mutation When a vertex, edge, or vertex property is created (or “deleted”), simply appending a startTime (or endTime) property is sufficient to represent the mutation. However, when an edge or vertex property’s properties are updated, then a “clone” is required. Assume that a strength property of 0.8 is added to the knows-edge between Marko and Stephen. To represent this, the knows-edge has an endTime property added to it and then a new knows-edge is created with all the same properties (minus the endTime) as the original edge with an added strength property and a startTime being the current time. In this way, a property mutation on an edge (or vertex property) requires a “deletion,” creation, and a clone.

In the examples presented in this article, traversals that modified the graph maintained the aforementioned timestamp logic. Such bookkeeping logic should really be encapsulated in a DecorationStrategy. Unfortunately, SubgraphStrategy does not support introspection into the Apache TinkerPop mutation steps. Perhaps it will in the future. Until then, the application developer can create a strategy to perform the aforementioned logic on writes and then, when that strategy (along with SubgraphStrategy) is registered with the application’s traversal source, then both read- and write-based traversals against the application’s underlying time-based graph will be sound — allowing users to write time-agnostic traversals.

Conclusion

Melting ClockGremlin’s Time Machine (Ode to the Death of Bob) has run out of time! The time-based graph model presented makes it possible to create graphs that never lose data and only grow as the world being modeled changes over (human) time. In order to make the spaghetti ball of undying vertices and edges manageable, timestamps can be applied as specified. From there, SubgraphStrategy can be registered with a traversal source so that users can write time-agnostic traversals that are ultimately compiled into a time-prescient form prior to evaluation. Graph database providers, like DSE Graph, will be able to use the injected time-filters as push-down predicates to both graph-centric and vertex-centric indices. The future is yours.

With that — did we learn about the fundamental nature of time? Yes, time and space exist in the graph (like pink elephants, square circles, and a loving angelic Bob), but it is only the fact that your conscious traversal accounts for both space and time that you experience the very real illusion that the TinkerPop is some mind-bogglingly massive, ever-changing universe. Silly wabbit trix r 4 kidz.

Acknowledgements

Numerous DSE Graph customers (and graph users in general) have asked have to look at the state of their graph at any particular point of time. This article was inspired by such customer use cases. Discussions about this topic took place at Cassandra Summit 2016 with Jonathan Lacefield and Vincent Poncet. Stephen Mallette was the original author of SubgraphStrategy and helped with technical preparation for the DSE Graph code examples. Finally, Bob Briody was a good sport — may he rest in peace.



Comments

Your email address will not be published. Required fields are marked *




Subscribe for newsletter:

© 2017 DataStax, All rights reserved. Tel. +1 (408) 933-3120 sales@datastax.com Offices

DataStax is a registered trademark of DataStax, Inc. and its subsidiaries in the United States and/or other countries.
Apache Cassandra, Apache, Tomcat, Lucene, Solr, Hadoop, Spark, TinkerPop, and Cassandra are trademarks of the Apache Software Foundation or its subsidiaries in Canada, the United States and/or other countries.