- 50 Years of Data Science (PDF) — Because all of science itself will soon become data that can be mined, the imminent revolution in Data Science is not about mere “scaling up,” but instead the emergence of scientific studies of data analysis science-wide.
- badwolf — a temporal graph store from Google.
- Why Biomedical Superstars are Signing on with Google (Nature) — “To go all the way from foundational first principles to execution of vision was the initial draw, and that’s what has continued to keep me here.” Research to retail, at Google scale.
- VR Basics — intro to terminology and hardware in the next gen of hardware, in case you’re late to the goldrush^w exciting field.
"graph databases" entries
Identifying the key requirements of a web application cloud architecture.
Download a free copy of “Azure for Developers,” an O’Reilly report by experienced .NET developer John Adams that breaks down Microsoft’s Azure platform in plain language, so that you can quickly get up to speed.
One of the most natural uses of the cloud is for web applications. You may already be using virtual machines on your own systems to make deploying your applications easier, either to new hardware or to additional servers. Microsoft Azure uses virtualization too, but it also brings useful benefits that virtualization cannot deliver alone. By hosting your application in the cloud, you can leverage automatic scaling, load balancing, system health monitoring, and logging. You also benefit from the fact that managed cloud platforms help narrow the attack surface of your system by automatically patching the operating system and runtimes and by keeping systems sandboxed. Let’s look at some examples of how to build some common web applications inside of Microsoft Azure.
Imagine that you work for a retailer who generates a significant amount of revenue through online sales. Imagine also that this retailer has been around for long enough that it already has an established web architecture that runs in a private data center. This retailer has decided that it wants to move to a hosted platform so that it no longer has any data center responsibilities and it can focus on its core business. How do you replatform this web application into Microsoft Azure? Let’s first identify some requirements for this system:
- It has high utilization and needs to serve a large number of concurrent users without timing out, even during peak hours such as Black Friday sales.
- It needs to accommodate a wide variety of products in its database that do not necessarily all follow the same schema.
- It needs a fast and intelligent search bar so that customers can find products easily.
- It needs to be able to recommend products to customers as they shop to help generate additional revenue.
However these requirements are being met today in the private data center, I can suggest some guidelines on how to reproduce this system in Microsoft Azure so you can boost performance instead of just replicating it. I will take each of these requirements in order and explain how to leverage certain Azure components so that these requirements are properly met.
Find emergent properties and solutions to new computing problems with graphs
Graph databases haven’t made the news much because, I think, they don’t fit in convenient categories. They certainly aren’t the relational databases we’re all familiar with, nor are they the arbitrary keys and values provided by many NoSQL stores. But in a highly connected world–where it’s not what you know but whom you know–it makes intuitive sense to arrange our knowledge as nodes and edges.
Ted Nelson, inventor of the hyperlink, recognized the power of viewing life in graphs. After the implosion of his historic Xanadu project, he embarked on a graph database tool called ZigZag. The most modern instantiations of graphs–the Neo4j store and the Alchemy.js tool for interactively visualizing graphs–were well represented this year at O’Reilly’s Open Source convention.
Business users are becoming more comfortable with graph analytics.
The rise of sensors and connected devices will lead to applications that draw from network/graph data management and analytics. As the number of devices surpasses the number of people — Cisco estimates 50 billion connected devices by 2020 — one can imagine applications that depend on data stored in graphs with many more nodes and edges than the ones currently maintained by social media companies.
This means that researchers and companies will need to produce real-time tools and techniques that scale to much larger graphs (measured in terms of nodes & edges). I previously listed tools for tapping into graph data, and I continue to track improvements in accessibility, scalability, and performance. For example, at the just-concluded Spark Summit, it was apparent that GraphX remains a high-priority project within the Spark1 ecosystem.
Data stores are rolling out easy-to-use analysis tools
Originated by the NSA, Apache Accumulo is a BigTable inspired data store known for being highly scalable and for its interesting security model. Federal agencies and Defense contractors have deployed Accumulo on clusters of a thousand or more servers. It also uses “cell-level” security to control access to values stored in individual cells1.
What Accumulo was lacking were easy-to-use, standard analytic engines that allow users to interact with data. The release of Sqrrl Enterprise this past week fills that gap. Sqrrl Enterprise provides an initial set of analytic engines for the Accumulo ecosystem2. It includes support for interactive SQL, fulltext search, and queries over graph data. Each of these engines takes into account security labels placed on data: since every data object ingested into Sqrrl has a security label, (query & analytic) results incorporate those access levels. Analysts interact with data as they normally would. For example Sqrrl’s indexing technology accounts for security labels, and search queries are written in standard Lucene syntax. Reminiscent of the Phoenix project for HBase3, SQL queries4 in Sqrrl are converted into optimized Accumulo iterators.