- Hackers Gain ‘Full Control’ of Critical SCADA Systems (IT News) — The vulnerabilities were discovered by Russian researchers who over the last year probed popular and high-end ICS and supervisory control and data acquisition (SCADA) systems used to control everything from home solar panel installations to critical national infrastructure. More on the Botnet of Things.
- mcl — Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for graphs (also known as networks) based on simulation of (stochastic) flow in graphs.
- Facebook to Launch Flipboard-like Reader (Recode) — what I’d actually like to see is Facebook join the open web by producing and consuming RSS/Atom/anything feeds, but that’s a long shot. I fear it’ll either limit you to whatever circle-jerk-of-prosperity paywall-penetrating content-for-advertising-eyeballs trades the Facebook execs have made, or else it’ll be a leech on the scrotum of the open web by consuming RSS without producing it. I’m all out of respect for empire-builders who think you’re a fool if you value the open web. AOL might have died, but its vision of content kings running the network is alive and well in the hands of Facebook and Google. I’ll gladly post about the actual product launch if it is neither partnership eyeball-abuse nor parasitism.
- Map Projections Illustrated with a Face (Flowing Data) — really neat, wish I’d had these when I was getting my head around map projections.
ENTRIES TAGGED "visualizations"
Cloudera ventures into real-time queries with Impala, data centers are the new landfill, and Jesper Andersen looks at the relationship between art and data.
Here are a few stories from the data space that caught my attention this week.
Cloudera’s Impala takes Hadoop queries into real-time
Cloudera ventured into real-time Hadoop querying this week, opening up its Impala software platform. As Derrick Harris reports at GigaOm, Impala — an SQL query engine — doesn’t rely on MapReduce, making it faster than tools such as Hive. Cloudera estimates its queries run 10 times faster than Hive, and Charles Zedlewski, Cloudera’s cloud VP of products, told Harris that “small queries can run in less than a second.”
Harris notes that Zedlewski pointed out that Impala wasn’t designed to replace business intelligence (BI) tools, and that “Cloudera isn’t interested in selling BI or other analytic applications.” Rather, Impala serves as the execution engine, still relying on software from Cloudera partners — Zedlewski told Harris, “We’re sticking to our knitting as a platform vendor.”
Joab Jackson at PC World reports that “[e]ventually, Impala will be the basis of a Cloudera commercial offering, called the Cloudera Enterprise RTQ (Real-Time Query), though the company has not specified a release date.”
Impala has plenty of competition on this playing field, which Harris also covers, and he notes the significance of all the recent Hadoop innovation:
“I can’t underscore enough how critical all of this innovation is for Hadoop, which in order to add substance to its unparalleled hype needed to become far more useful to far more users. But the sudden shift from Hadoop as a batch-processing engine built on MapReduce into an ad hoc SQL querying engine might leave industry analysts and even Hadoop users scratching their heads.”
You can read more from Harris’ piece here and Jackson’s piece here. Wired also has an interesting piece on Impala, covering the Google F1 database upon which it is based and the Googler Cloudera hired away to help build it.
(Cloudera CEO Mike Olson discussed Impala, Hadoop and the importance of real-time at this week’s Strata Conference + Hadoop World.)
Bitsy Bentley on the work behind a good visualization and why she hopes users will take data interactions for granted.
Because of the size, complexity and density of big data, it’s not always easy to find the important insights hiding in all that information. That’s where data visualization comes into play. A great visualization creates meaning where none existed.
Bitsy Bentley (@bitsybot) is the director of data visualization at GfK Custom Research, where she works with information designers to craft meaningful data experiences for a variety of business audiences. In the following interview, she discusses the space between a “wow” response and an “aha” moment, how her team addresses privacy concerns, and why practice is vital for both visualization creators and viewers.
Bentley will explore related visualization topics during her presentation at Strata Conference + Hadoop World in New York City later this month.
Why are data visualizations an effective way to understand the underlying data?
Bitsy Bentley: There is so much beauty and richness in big datasets, and now that we have enough processing power to harness that richness, it’s little wonder that interest in data visualization is exploding. To quote John Tukey: “The greatest value of a picture is when it forces us to notice what we never expected to see.” My clients find that, whether they’re more concerned with numbers or more concerned with stories, an appropriate visual is integral to their understanding of the data.
Visualization unlocks the serendipity of data analysis. It provides a language that is less intimidating than an overwhelming array of digits. Something as simple as a set of histograms breaking down the distribution of a data store makes it easy to find irregularities and outliers in the data. Read more…
Hacking a Texas city, RIP Michael S. Hart, and the bar is raised for open gov visualizations.
This week on O'Reilly: Christopher Groskopf explained how he's going to hack a Texas city, Nat Torkington said goodbye to Project Gutenberg founder Michael S. Hart, and the value of government data visualizations reached a new standard thanks to LookatCook.com.
Data artist Jer Thorp on working at the New York Times and the aesthetics of data.
Jer Thorp, data artist in residence at the New York Times, sits at the crossroads of data, art and science. Here he discusses his work at the Times and, more broadly, how aesthetics shape our understanding of data.
Visualization Papers, Immersive Learning, Readability, and Quora's Technology
- Seven Foundational Visualization Papers — seven classics in the field that are cited and useful again and again.
- Git Immersion — a “walking tour” of Git inspired by the premise that to know a thing is to do it. Cf Learn Python the Hard Way or even NASA’s Planet Makeover. We’ll see more and more tutorials that require participation because you don’t get muscle memory by reading. (NASA link via BoingBoing
- Readability — strips out ads and sends money to the publishers you like. I’d never thought of a business model as something that’s imposed from the outside quite like this, but there you go.
- Quora’s Technology Examined (Phil Whelan) — In this blog post I will delve into the snippets of information available on Quora and look at Quora from a technical perspective. What technical decisions have they made? What does their architecture look like? What languages and frameworks do they use? How do they make that search bar respond so quickly? Lots of Python. (via Joshua Schachter on Delicious)
New Browser, Google APIs, NFC Checkin, and XSS Prevention
- Mozilla Home Dash — love this experiment in rethinking the browser from Mozilla. They call it a “browse-based browser” as opposed to “search-based browser” (hello, Chrome). Made me realize that, with Chrome, Google’s achieved a 0-click interface to search–you search without meaning to as you type in URLs, you see advertising results without ever having visited a web site.
- Periodic Table of Google APIs — cute graphic, part of a large push from Google to hire more outreach engineers to do evangelism, etc. The first visible signs of Google’s hiring binge.
- NFC in the Real World (Dan Hill) — smooth airline checkin with fobs mailed to frequent fliers.
- XSS Prevention Cheat Sheet (OWASP) — HTML entity encoding doesn’t work if you’re putting untrusted data inside a script tag anywhere, or an event handler attribute like onmouseover, or inside CSS, or in a URL. So even if you use an HTML entity encoding method everywhere, you are still most likely vulnerable to XSS. You MUST use the escape syntax for the part of the HTML document you’re putting untrusted data into. That’s what the rules below are all about. (via Hacker News)
LinkedIn's Pete Skomoroch on the key capabilities of data scientists.
In this brief video interview, LinkedIn senior research scientist Pete Skomoroch reveals the three core skills of data scientists.