Scalable Graph Computing: Der Gekrümmte Graph
Scalable Graph Computing is a multi-part treatment on that for which the series is named.
A graph describes which “things” are adjacent to which other “things.” A logical graph forms a topology separate from the topology of the physical, material world. When computing, the logical structure must be placed in 3D, geometric space. A scalable graph data structure demonstrates a non-random morphism between logical and physical adjacency. This post discusses scalable graph data structures from the following perspective: It is important to co-locate data in space according to the metric of co-retrieval in time. Respecting space/time is the difference between sequential and random memory access. It is the difference between scalable and unscalable (graph) computing.
X tends to be retrieved with
Y should reside side-by-side. If groceries and then the laundromat are on the todo list then to save time, it is best that they are near one another. With respect to the memory hierarchy,
Y should be in the same cloud region, same data center, the same server rack, the same machine, on the same page of disk, in the same block of memory, together in L* cache, and — to the limit of identity — in the CPU’s registers. It is ludicrously slower to move data across the Internet to a CPU than to move that same data from the CPU’s L1 cache. The difference is ~50 milliseconds versus ~1 nanosecond, where 50 milliseconds is 50,000,000 nanoseconds. In human terms, suppose 1 second is a nanosecond, then 50 million seconds is 1.58 years. If
X is on the CPU and
Y is in Connecticut, then a relative eternity must unfold while
Y travels its way to
X‘s side for comparison/computation (a wait state). Fortunately, real-world computer systems move data in “chunks” and scalable systems use this phenomena to their advantage.
All the bits in a word are necessary to define the word:
?001 could be
0001 (1) or
1001 (9). When one bit of a word is accessed, all
bits must be accessed else, the meaning of the word is unclear —
?at could be
hat. Local machine buses (bit co-retrieval systems) move a word’s bits together. Bus-inspired co-retrieval exists everywhere. When data is fetched from disk, pages are accessed (i.e. sequential blocks of data). If the page of data with
X on it is requested from the OS by the executing program, then
Y should have also been on that page, else another trip to disk must be made (seek time is ~5ms). When a human is interested in a webpage, the entire document is downloaded as it is assumed that the sentence after the first will most likely be required next. Similar assumptions are made for streaming audio and video. Data moves towards the CPU from one memory structure to the next in proportion to the size of that memory structure. With any luck (aka a cache hit), the next piece of data required from the CPU will be nearby — not in memory, the disk, nor in Windham, Connecticut.
Localized Computing and Random Access Structures
f(x,y) brings two pieces of data together to yield more data. In laymen physics, atoms are data and physical laws are the functions, where the “CPU” is everywhere. An interesting aspect of man-driven computing is that logical adjacency (a graph) does not imply spatial adjacency. Software-based computing, as we know it, is about stitching together “random” areas of memory (space) so that two logically adjacent data can be pulled together for comparison at a spatially localized CPU. Random access computing is both a curse and a blessing. The curse is that physics will not do our job for us — the created graph, while embedded in space, requires a machine that is not physics to evolve it (i.e. a CPU — however see molecular computing). The blessing is that new “things” (new definitions of adjacency) emerge out of 3D reality. In the emergent referential structure, a “thing” can be a “thing” without its parts 3D-near each other. Unfortunately, this level of abstraction comes at the cost of time. For speed, the best the clever engineer can do is try their damnedest to make 3D reality and their data structure isomorphic in the literal sense — the same structure. However, as we believe it,
[P]rogramming is basically planning and detailing the enormous traffic of words through the Von Neumann bottleneck, and much of that traffic concerns not significant data itself, but where to find it.
— John Backus (1977)
Three-dimensional space, the things within it, and the physics that evolves it are not expressive enough for the likes of mind — for man has eaten from the tree of knowledge. No longer within the grace of a 1-to-1 correspondence, neurons go about constructing new maps of space/time that unite disparate areas via the construction of concepts in the hopes of realization (i.e. “full comprehension of the situation”). When the brain runs out of storage and compute resources, it goes about its way linking concepts in a computer using data structures (graphs). When a single computer is no longer sufficient, clusters of computers are leveraged. As larger graphs and more complex problems are met, graph algorithms must become more advanced (
O(n1/c)) and/or the graph data structures must be represented more efficiently according to the constraints of modern hardware (see the Von Neumann bottleneck which is also a constraint of mind). In the end, both CPUs and brains are not efficient architectures as the data and the process are not one in the same and the data must ultimately be moved to a single point in space (e.g. a CPU’s ALU, which implements the function
f(x,y), and the world must go through the senses to the brain).
Co-Locating the Structures of a Graph
The structures of a property graph are key/value properties, vertices, edges, incidences, adjacencies, cliques, clusters, super clusters, and graph filaments. Unfortunately, the entire world can not exist on the head of a pin (the CPU’s registers). While a graph is ultimately bits/bytes/words, the aforementioned larger structures hint at the probability of co-retrieval and allow the clever engineer to use that information to place the structures in space.
When a vertex is accessed, its long identifier and key/values properties are typically accessed as well. If a vertex’s incident liked-edges are accessed according to their stars-property, then it is important to have those edges sequentially ordered by 5 star, then 4 star, then 3, and so on with an offset index to where each group can be found (see vertex-centric indices). One of the major benefits of the graph data structure is that adjacency is made explicit and is not up to arbitrary joins. When a vertex is accessed, its adjacent neighbors are known and those adjacents are more likely to be accessed next than a random distant neighbor. Thus, the vertices in a clique should reside side-by-side. Given the size of a clique (potentially 10s of vertices), they should be on the same page of disk. Loose cliques form clusters — e.g., students at a university are more likely to interact with each other than with students in different universities. Partitioning the vertices across physical machine servers according to their university affiliation may respect the principle of co-retrieval/co-location (see Educating the Planet with Pearson). Clusters form super-clusters — universities in the same region of the world are more likely to interact. However, at this scale, other dimensions of the graph (i.e. paths between vertices) could be asserting their effects on co-retrieval. The larger the scale, the harder it is to predict which bit will follow next. Below is a summary of the components of a graph and their co-location in the memory hierarchy according the perspective of a single vertex (where “perspective” is a process at the CPU).
structure location description id register A long value uniquely identifying a vertex/edge property L1 cache A string key and a primitive value incidents L2/L3 cache A vertex's incoming and outgoing edges adjacents Block of RAM The vertices incoming and outgoing to a vertex clique Page on Disk A group of collectively adjacent vertices cluster Server A group of vertices with high intra-connectivity and low inter-connectivity super cluster Server Rack A group of clusters with high intra-connectivity and low inter-connectivity filament Data Center A massive lattice-like structure connecting super clusters
The larger the structure, the slower it evolves as large structures are epiphenomena of their more atomic pieces. While the atoms of computing (bits/words) are the structures processed by the CPU, these mutations at small take time to effect changes at large — nanoseconds, seconds, days, years. The evolution of local neighborhoods in the graph percolate structural changes throughout the system. Thus, optimizing the layout of a universal graph across all data centers in the world is not only NP hard (given the possible number of permutations), but it is also a constantly evolving problem. The principle of divide-and-conquer is used today — optimize for key/values, incident edges, clusters and with any “luck” (a cache hit), the next piece of data is near a CPU.
The concept of computing is very general and can be implemented in a multitude of ways. In the everyday real-world, the Von Neumann machine architecture is the standard. These machines are composed of a CPU, memory, and a disk. They are networked together using high-speed interconnects within the same server rack. Server racks are stored in data centers. Data centers exist in regions across the globe. As graph structures become larger, so does the computing resources required to represent and process them. However, any one micro-structure of the universal graph is ultimately operated on by a particular CPU in the cluster. The collective computation of all CPUs yields the evolution of man’s (re)definition of reality contorted in space.
Dr. Matthias Bröcheler is an advocate of algorithm and data structure designs that are sympathetic to hardware — tailored to the memory hierarchy. Discussions on such matters fostered the writing of this post.