Graph databases haven’t made the news much because, I think, they don’t fit in convenient categories. They certainly aren’t the relational databases we’re all familiar with, nor are they the arbitrary keys and values provided by many NoSQL stores. But in a highly connected world–where it’s not what you know but whom you know–it makes intuitive sense to arrange our knowledge as nodes and edges.
Ted Nelson, inventor of the hyperlink, recognized the power of viewing life in graphs. After the implosion of his historic Xanadu project, he embarked on a graph database tool called ZigZag. The most modern instantiations of graphs–the Neo4j store and the Alchemy.js tool for interactively visualizing graphs–were well represented this year at O’Reilly’s Open Source convention.
Both projects are growing impressively and finding users in domains they didn’t expect. According to Huston Hedinger, Founder and CEO of Portland startup
GraphAlchemist, “Graph visualization can provide insight into information ranging from how communication flows across a social network to where vulnerabilities exist in a supply chain network.”
Some unusual domains where results pay off
Epidemiologists use graphs to trace or predict the flow of infections through populations. Similar techniques can be used by network administrators to determine which computer systems are most likely to be compromised by an intruder, a use case documented in the O’Reilly book Network Security Through Data Analysis. In Byron Ruth’s presentation on ETL in health setting, Neo4j was used to store provenance data (sources and transformations applied to data).
Graphs are natural representations for the relationships encoded in web sites through RDF, as well as the more formal and intimidating Semantic Web. The RDF uses are echoed in a proposed Open Graph protocol to provide more meaningful links between web pages than the familiar <a> tags provide.
Social networks use graphs. That’s how we get recommendations to add people whom they think we know. (When are the social networks going to realize that it is not in my best interest, or ultimately theirs, for me to make every person I know a “friend”?) And LinkedIn can tell me exactly how many degree separate me from another member–obviously the application of a graph.
Improving user experience
Why don’t the social networks go all the way and let us directly view our own graphs? It would be neat to find out how many people I know in common with my children or with the person I worked with ten years ago.
Perhaps the incumbent social networking services don’t believe we could understand the graphs or use them without frustration. According to Neo4j’s Product Designer, Andreas Kollegger, they would have a good point. He says that graph databases have not learned the means to expose the deep information they hold–that they won’t catch on until their advanced features are as easy to use as spreadsheets.
Alchemy.js is a good start. It’s a graph drawing engine built in d3 by Portland startup GraphAlchemist. You can ask it to color nodes and edges by attribute, and to cluster related nodes.
Graph architecture is simple–just two nodes connected by an edge, over and over and over again–but the fascination of graphs comes from the emergent properties such as connectedness, how easy it would be to sever parts of the database by removing nodes, etc. All of these things are calculated on the nodes and edges, but they are deeply associated with the graph, being characteristics or parts of the whole.
For balance, on Tuesday evening I attended the PostgreSQL birds-of-a-feather session. Leaders presented glowing visions of new features in the upcoming version (9.4) along with more cautious assessments of complex security enhancements. One new feature will eliminate the need to outsource work to a programming language, while another will take the place of NoSQL competition. By the time the presenters were done, I was convinced PostgreSQL 9.4 is so powerful that if you run it you won’t even need a computer.
Like graph databases, advanced relational databases like PostgreSQL create higher-level metadata that can be retrieved directly from the database, and therefore represent hidden characteristics of the data.
Kollegger says, “An RDBMS and a graph database are equally expressive, allowing you to capture the same kinds of information, but allow you to ask different kinds of questions. Taking OSCON as a data set, you’d ask PostgreSQL questions like ‘How many people attended’ or ‘What was the most popular session’, then turn to a graph database to ask questions like ‘Did people like me go to OSCON?’ and ‘What was the most popular sequence of sessions?'”
What would be great, in my view, would be a way for a graph database to provide key statistics such as connectedness (for instance, whether everybody tends to know everybody else in the database) and for Alchemy.js to display such characteristics.
Databases are advancing on all fronts. The mathematical rigor underlying relational databases continues to keep them appealing when you have to investigate many fixed relationships and not leave anybody out. The quite different complexity of graph databases will reveal a power of analysis that is yet to be fully explored.