"visualizations" entries

Four short links: 24 December 2015

Python Viz, Linux Scavenger Hunt, Sandbox Environment, and Car Code

by Nat Torkington | @gnat | +Nat Torkington | December 24, 2015

Folium — makes it easy to visualize data that’s been manipulated in Python on an interactive Leaflet map. It enables both the binding of data to a map for choropleth visualizations as well as passing Vincent/Vega visualizations as markers on the map.
scavenger-hunt — A scavenger hunt to learn Linux commands.
SEE — F-Secure’s open source Sandboxed Execution Environment (SEE) is a framework for building test automation in secured Environments.
The Problem with Self-Driving Cars: Who Controls the Code? (Cory Doctorow) — Here’s a different way of thinking about this problem: if you wanted to design a car that intentionally murdered its driver under certain circumstances, how would you make sure that the driver never altered its programming so that they could be assured that their property would never intentionally murder them?

Graphs in the world: Modeling systems as networks

See, extract, and create value with networks.

by Russell Jurney | @rjurney | June 30, 2015

Get notified when our free report, “Mapping Big Data: A Data Driven Market Report” is available for download.

Networks of all kinds drive the modern world. You can build a network from nearly any kind of data set, which is probably why network structures characterize some aspects of most phenomenon. And yet, many people can’t see the networks underlying different systems. In this post, we’re going to survey a series of networks that model different systems in order to understand different ways networks help us understand the world around us.

We’ll explore how to see, extract, and create value with networks. We’ll look at four examples where I used networks to model different phenomenon, starting with startup ecosystems and ending in network-driven marketing.

Networks and markets

Commerce is one person or company selling to another, which is inherently a network phenomenon. Analyzing networks in markets can help us understand how market economies operate.

Strength of weak ties

Mark Granovetter famously researched job hunting and discovered the Strength of Weak Ties. Read more…

Data APIs, design, and visual storytelling

One example of how using a data API can lead to better visualizations.

by Sébastien Pierre | @ssebastien | +Sébastien Pierre | March 23, 2015

Over the past five years, international agencies such as the World Bank, OECD, and UNESCO have created portals to make their data available for everyone to explore. Many non-profits are also visualizing masses of data in the hope that it will give policymakers, funders, and the general public a better understanding of the issues they are trying to solve.

Data visualization plays a key role in telling the stories behind the data. For most audiences, data sets are hard to use and interpret — the average user will need a technical guide just to navigate through the complicated hierarchies of categories let alone interpret the information. But data visualizations trigger interest and insight because they are immediate, clear, and tangible.

At FFunction, we visualize a lot of data. Most of the time our clients send us Excel spreadsheets or CSV files, so we were happily surprised when we started to work with UNESCO Institute for Statistics on two fascinating education-related projects — Out-of-School Children and Left Behind — and realized that they had been working on a data API. As we began to work through the data ourselves, we uncovered several reasons why using an API helps immeasurably with data visualization. Read more…

Signals from Strata + Hadoop World New York 2014

From unique data applications to factories of the future, here are key insights from Strata + Hadoop World New York 2014.

by Mac Slocum | @macslocum | +Mac Slocum | October 16, 2014

Experts from across the data world came together in New York City for Strata + Hadoop World New York 2014. Below we’ve assembled notable keynotes, interviews, and insights from the event.

Unusual data applications and the correct way to say “Hadoop”

Hadoop creator and Cloudera chief architect Doug Cutting discusses surprising data applications — from dating sites to premature babies — and he reveals the proper (but in no way required) pronunciation of “Hadoop.”

Read more…

Four short links: 5 September 2014

Pragmatic Ventures?, Pictures Vanishing, Vertical Progress, and Visualising Distributed Consensus

by Nat Torkington | @gnat | +Nat Torkington | September 5, 2014

Intellectual Ventures Making Things (Bloomberg) — Having earned billions in payouts from powerful technology companies, IV is setting out to build things on its own. Rather than keeping its IP under lock and key, the company is looking to see if its ideas can be turned into products and the basis for new companies. Crazy idea. Madness. Building things never works.
Twitpic Shutting Down — I guess we know what Jason Scott will be doing for the next three weeks.
Thiel’s Contrarian Strategy (Fortune) — the distinction Thiel draws between transformative, “vertical” change—going from zero to one—and incremental, “horizontal” change—going from one to n. “If you take one typewriter and build 100, you have made horizontal progress,” he explains in the book’s first chapter. “If you have a typewriter and build a word processor, you have made vertical progress.”
Raft: Understandable Distributed Consensus — making sense of something that’s useful but not intuitive. Awesome.

Four short links: 15 January 2014

SCADA Security, Graph Clustering, Facebook Flipbook, and Projections Illustrated

by Nat Torkington | @gnat | +Nat Torkington | January 15, 2014

Hackers Gain ‘Full Control’ of Critical SCADA Systems (IT News) — The vulnerabilities were discovered by Russian researchers who over the last year probed popular and high-end ICS and supervisory control and data acquisition (SCADA) systems used to control everything from home solar panel installations to critical national infrastructure. More on the Botnet of Things.
mcl — Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for graphs (also known as networks) based on simulation of (stochastic) flow in graphs.
Facebook to Launch Flipboard-like Reader (Recode) — what I’d actually like to see is Facebook join the open web by producing and consuming RSS/Atom/anything feeds, but that’s a long shot. I fear it’ll either limit you to whatever circle-jerk-of-prosperity paywall-penetrating content-for-advertising-eyeballs trades the Facebook execs have made, or else it’ll be a leech on the scrotum of the open web by consuming RSS without producing it. I’m all out of respect for empire-builders who think you’re a fool if you value the open web. AOL might have died, but its vision of content kings running the network is alive and well in the hands of Facebook and Google. I’ll gladly post about the actual product launch if it is neither partnership eyeball-abuse nor parasitism.
Map Projections Illustrated with a Face (Flowing Data) — really neat, wish I’d had these when I was getting my head around map projections.

Four short links: 27 December 2013

Dinosaur Tries to Suckle, Dashboard Design, Massive Visualizations, Massive Machine Learning

by Nat Torkington | @gnat | +Nat Torkington | December 27, 2013

Intel XDK — If you can write code in HTML5, CSS3 and JavaScript*, you can use the Intel® XDK to build an HTML5 web app or a hybrid app for all of the major app stores. It’s a .exe. What more do I need to say? FFS.
Behind the Scenes of a Dashboard Design — the design decisions that go into displaying complex info.
Superconductor — a web framework for creating data visualizations that scale to real-time interactions with up to 1,000,000 data points. It compiles to WebCL, WebGL, and web workers. (via Ben Lorica)
BIDMach: Large-scale Learning with Zero Memory Allocation (PDF) — GPU-accelerated machine learning. In this paper we describe a caching approach that allows code with complex matrix (graph) expressions at massive scale, i.e. multi-terabyte data, with zero memory allocation after the initial setup. (via Siah)

Four short links: 10 January 2013

Engineering Virality, App Store Numbers, App Store Data, and FPGA OS

by Nat Torkington | @gnat | +Nat Torkington | January 10, 2013

How To Make That One Thing Go Viral (Slideshare) — excellent points about headline writing (takes 25 to find the one that works), shareability (your audience has to click and share, then it’s whether THEIR audience clicks on it), and A/B testing (they talk about what they learned doing it ruthlessly).
A More Complete Picture of the iTunes Economy — $12B/yr gross revenue through it, costs about $3.5B/yr to operate, revenue has grown at a ~35% compounded rate over last four years, non-app media 2/3 sales but growing slower than app sales. Lots of graphs!
Visualizing the iOS App Store — interactive exploration of app store sales data.
BORPH — an Operating System designed for FPGA-based reconfigurable computers. It is an extended version of the Linux kernel that handles FPGAs as if they were CPUs. BORPH introduces the concept of a ‘hardware process’, which is a hardware design that runs on an FPGA but behaves just like a normal user program. The BORPH kernel provides standard system services, such as file system access to hardware processes, allowing them to communicate with the rest of the system easily and systematically. The name is an acronym for “Berkeley Operating system for ReProgrammable Hardware”.

Strata Week: Real-time Hadoop

Cloudera ventures into real-time queries with Impala, data centers are the new landfill, and Jesper Andersen looks at the relationship between art and data.

by Jenn Webb | @JennWebb | +Jenn Webb | October 26, 2012

Here are a few stories from the data space that caught my attention this week.

Cloudera’s Impala takes Hadoop queries into real-time

Cloudera ventured into real-time Hadoop querying this week, opening up its Impala software platform. As Derrick Harris reports at GigaOm, Impala — an SQL query engine — doesn’t rely on MapReduce, making it faster than tools such as Hive. Cloudera estimates its queries run 10 times faster than Hive, and Charles Zedlewski, Cloudera’s cloud VP of products, told Harris that “small queries can run in less than a second.”

Harris notes that Zedlewski pointed out that Impala wasn’t designed to replace business intelligence (BI) tools, and that “Cloudera isn’t interested in selling BI or other analytic applications.” Rather, Impala serves as the execution engine, still relying on software from Cloudera partners — Zedlewski told Harris, “We’re sticking to our knitting as a platform vendor.”

Joab Jackson at PC World reports that “[e]ventually, Impala will be the basis of a Cloudera commercial offering, called the Cloudera Enterprise RTQ (Real-Time Query), though the company has not specified a release date.”

Impala has plenty of competition on this playing field, which Harris also covers, and he notes the significance of all the recent Hadoop innovation:

“I can’t underscore enough how critical all of this innovation is for Hadoop, which in order to add substance to its unparalleled hype needed to become far more useful to far more users. But the sudden shift from Hadoop as a batch-processing engine built on MapReduce into an ad hoc SQL querying engine might leave industry analysts and even Hadoop users scratching their heads.”

You can read more from Harris’ piece here and Jackson’s piece here. Wired also has an interesting piece on Impala, covering the Google F1 database upon which it is based and the Googler Cloudera hired away to help build it.

(Cloudera CEO Mike Olson discussed Impala, Hadoop and the importance of real-time at this week’s Strata Conference + Hadoop World.)

Read more…

A search for balance between the “wow” and “aha” in visualizations

Bitsy Bentley on the work behind a good visualization and why she hopes users will take data interactions for granted.

by Ron Miller | @ron_miller | +Ron Miller | October 3, 2012

Because of the size, complexity and density of big data, it’s not always easy to find the important insights hiding in all that information. That’s where data visualization comes into play. A great visualization creates meaning where none existed.

Bitsy Bentley (@bitsybot) is the director of data visualization at GfK Custom Research, where she works with information designers to craft meaningful data experiences for a variety of business audiences. In the following interview, she discusses the space between a “wow” response and an “aha” moment, how her team addresses privacy concerns, and why practice is vital for both visualization creators and viewers.

Bentley will explore related visualization topics during her presentation at Strata Conference + Hadoop World in New York City later this month.

Why are data visualizations an effective way to understand the underlying data?

Bitsy Bentley: There is so much beauty and richness in big datasets, and now that we have enough processing power to harness that richness, it’s little wonder that interest in data visualization is exploding. To quote John Tukey: “The greatest value of a picture is when it forces us to notice what we never expected to see.” My clients find that, whether they’re more concerned with numbers or more concerned with stories, an appropriate visual is integral to their understanding of the data.

Visualization unlocks the serendipity of data analysis. It provides a language that is less intimidating than an overwhelming array of digits. Something as simple as a set of histograms breaking down the distribution of a data store makes it easy to find irregularities and outliers in the data. Read more…