Property Graph Algorithms
The term property graph has come to denote an attributed, multi-relational graph. That is, a graph where the edges are labeled and both vertices and edges can have any number of key/value properties associated with them. An example of a property graph with two vertices and one edge is diagrammed below.
Property graphs are more complex than the standard single-relational graphs of common knowledge. The reason for this is that there are different=types of vertices (e.g. people, companies, software) and different types of edges (e.g. knows, works_for, imports). The complexities added by this data structure (and multi-relational graphs in general, e.g. RDF graphs) effect how graph algorithms are defined and evaluated.
Standard graph theory textbooks typically present common algorithms such as various centralities, geodesics, assortative mixings, etc. These algorithms usually come pre-packaged with single-relational graph toolkits and frameworks (e.g.NetworkX, iGraph). It is common for people to desire such graph algorithms when they begin to work with property graph software. I have been asked many times:
My answer to this question is always:
"What do you mean by centrality in a property graph?"
When a heterogeneous set of vertices can be related by a heterogeneous set of edges, there are numerous ways in which to calculate centrality (or any other standard graph algorithm for that matter).
- Ignore edge labels and use standard single-relational graph centrality algorithms.
- Isolate a particular "slice" of the graph (e.g. the knows subgraph) and use standard single-relational graph centrality algorithms.
- Make use of abstract adjacencies to compute centrality with higher-order semantics.
The purpose of this blog post is to stress point #3 and the power of property graph algorithms. In Gremlin, you can calculate numerous eigenvector centralities for the same property graph instance. At this point, you might ask: "How can a graph have more than one primary eigenvector?" The answer lies in seeing all the graphs that exist within the graph---i.e. seeing all the higher-order, derived, implicit, virtual, abstract adjacencies. Each line below exemplifies point #1, #2, and #3 in the list above, respectively. The code examples use the power method to calculate the vertex centrality rankings which are stored in the map.
// point #1 above g.V().repeat(out().groupCount(m)).times(10) // point #2 above g.V().repeat(out("knows").groupCount(m)).times(10) // point #3 above g.V().repeat(???.groupCount(m)).times(10)
The ??? on line 3 refers to the fact that ??? can be any arbitrary computation. For example, ??? can be:
// point #1 below out('works_for').in('works_for') // point #2 below out('works_for').has('name','ACME').in('works_for') // point #3 below where(out('develops').out('imports').has('name','Blueprints')). out('works_for').in('works_for'). where(out('develops').out('imports').has('name','Blueprints'))
The above expressions have the following meaning:
- Coworker centrality.
- ACME Corporation coworker centrality.
- Coworkers who import Blueprints into their software centrality.
There are numerous graphs within the graph. As such, "what do you mean by centrality?"
These ideas are explored in more detail in the following article and slideshow.
Rodriguez M.A., Shinavier, J., “Exposing Multi-Relational Networks to Single-Relational Network Analysis Algorithms,” Journal of Informetrics, 4(1), pp. 29-41, Elsevier, doi:10.1016/j.joi.2009.06.004, 2009.
DataStax has many ways for you to advance in your career and knowledge.
You can take free classes, get certified, or read one of our many white papers.
register for classes
DBA's Guide to NoSQL