- Mining Billion Node Graphs: Patterns and Scalable Algorithms (PDF) — slides from a CMU academic’s talk at C-BIG 2012.
- There Is No Now — One of the most important results in the theory of distributed systems is an impossibility result, showing one of the limits of the ability to build systems that work in a world where things can fail. This is generally referred to as the FLP result, named for its authors, Fischer, Lynch, and Paterson. Their work, which won the 2001 Dijkstra Prize for the most influential paper in distributed computing, showed conclusively that some computational problems that are achievable in a “synchronous” model in which hosts have identical or shared clocks are impossible under a weaker, asynchronous system model.
- Deep Learning Hardware Guide — One of the worst things you can do when building a deep learning system is to waste money on hardware that is unnecessary. Here I will guide you step by step through the hardware you will need for a cheap high performance system.
- Awesome Computer Vision — curated list of computer vision resources.
"graph databases" entries
Find emergent properties and solutions to new computing problems with graphs
Graph databases haven’t made the news much because, I think, they don’t fit in convenient categories. They certainly aren’t the relational databases we’re all familiar with, nor are they the arbitrary keys and values provided by many NoSQL stores. But in a highly connected world–where it’s not what you know but whom you know–it makes intuitive sense to arrange our knowledge as nodes and edges.
Ted Nelson, inventor of the hyperlink, recognized the power of viewing life in graphs. After the implosion of his historic Xanadu project, he embarked on a graph database tool called ZigZag. The most modern instantiations of graphs–the Neo4j store and the Alchemy.js tool for interactively visualizing graphs–were well represented this year at O’Reilly’s Open Source convention.
Business users are becoming more comfortable with graph analytics.
The rise of sensors and connected devices will lead to applications that draw from network/graph data management and analytics. As the number of devices surpasses the number of people — Cisco estimates 50 billion connected devices by 2020 — one can imagine applications that depend on data stored in graphs with many more nodes and edges than the ones currently maintained by social media companies.
This means that researchers and companies will need to produce real-time tools and techniques that scale to much larger graphs (measured in terms of nodes & edges). I previously listed tools for tapping into graph data, and I continue to track improvements in accessibility, scalability, and performance. For example, at the just-concluded Spark Summit, it was apparent that GraphX remains a high-priority project within the Spark1 ecosystem.
Data stores are rolling out easy-to-use analysis tools
Originated by the NSA, Apache Accumulo is a BigTable inspired data store known for being highly scalable and for its interesting security model. Federal agencies and Defense contractors have deployed Accumulo on clusters of a thousand or more servers. It also uses “cell-level” security to control access to values stored in individual cells1.
What Accumulo was lacking were easy-to-use, standard analytic engines that allow users to interact with data. The release of Sqrrl Enterprise this past week fills that gap. Sqrrl Enterprise provides an initial set of analytic engines for the Accumulo ecosystem2. It includes support for interactive SQL, fulltext search, and queries over graph data. Each of these engines takes into account security labels placed on data: since every data object ingested into Sqrrl has a security label, (query & analytic) results incorporate those access levels. Analysts interact with data as they normally would. For example Sqrrl’s indexing technology accounts for security labels, and search queries are written in standard Lucene syntax. Reminiscent of the Phoenix project for HBase3, SQL queries4 in Sqrrl are converted into optimized Accumulo iterators.