Strata Week: A realistic look at big data obstacles

Here are a few stories from the data space that caught my attention this week.

Big obstacles for big data

For the latest issue of Foreign Policy, Uri Friedman put together a summarized history of big data to show “[h]ow we arrived at a term to describe the potential and peril of today’s data deluge.” A couple months ago, MIT’s Alex “Sandy” Pentland took a look at some of that big data potential for Harvard Business Review; this week, he looked at some of the perilous aspects. Pentland writes that to be realistic about big data, it’s important to look not only at its promise, but also its obstacles. He identifies the problem of finding meaningful correlations as one of big data’s biggest obstacles:

“When your volume of data is massive, virtually any problem you tackle will generate a wealth of ‘statistically significant’ answers. Correlations abound with Big Data, but inevitably most of these are not useful connections. For instance, your Big Data set may tell you that on Mondays, people who drive to work rather than take public transportation are more likely to get the flu. Sounds interesting, and traditional research methods show that it’s factually true. Jackpot!

“But why is it true? Is it causal? Is it just an accident? You don’t know. This means, strangely, that the scientific method as we normally use it no longer works, because there are so many possible relationships to consider that many are bound to be ‘statistically significant’. As a consequence, the standard laboratory-based question-and-answering process — the method that we have used to build systems for centuries — begins to fall apart.”

Pentland says that big data is going to push us out of our comfort zone, requiring us to conduct experiments in the real world — outside our familiar laboratories — and change the way we test the causality of connections. He also addresses issues of understanding those correlations enough to put them to use, knowing who owns the data and learning to forge new types of collaborations to use it, and how putting individuals in charge of their own data helps address big data privacy concerns. This piece, together with Pentland’s earlier big data potential post, are this week’s recommended reads.

Informing intelligence with big data

NPR’s counterterrorism correspondent Dina Temple-Raston reported this week on how intelligence officials are seeking to make use of big data to help predict volatile situations around the world, such as the Arab Spring, and how a Swedish-American start-up called Recorded Future is working to make that happen.

Temple-Raston writes that the company’s algorithms parse huge amounts of data from around the world — from such sources as social media platforms, government economic projections, and newspaper reports — to develop a searchable visual timeline. Recorded Future co-founder Christopher Ahlberg explained to Temple-Raston that the idea is to “extract what we call signals of activity that relate to people and places and associate them with events and time” and notes “[t]ime is often a forgotten dimension in analysis, and we think it is key.”

The algorithmic prediction technology doesn’t come without its skeptics, Temple-Raston reports. She notes a recent prediction error Recorded Future made when its algorithm connected the Ansar al-Sharia militant group in Libya with a Yemen al-Qaida group with the same name — other than having the same name, the two groups were not connected. This is where the human-touch aspect of big data analysis comes in. Ahlberg pointed out that Recorded Future users are experts and would catch mistakes like that. “We’re just trying to visualize data in a way that makes smart people smarter,” he told Temple-Raston. You can read the full piece here.

Google and Facebook are watching you … and cashing in

Want to know how much you’re worth to Facebook or Google? A new privacy add-on for Chrome and Firefox called Privacyfix is taking a stab at calculating that value as well as pointing out questionable privacy settings on the platforms users may want to review. Joe Mullin at Ars Technica reports that the add-on “measures your last 60 days of activity on Google, extrapolates that to a year, and uses a value-per-search estimate. … [it] also tells you how many of the websites you visit feed data back to Facebook and Google.”

The add-on combs through a user’s Facebook and Google privacy settings, highlights questionable areas, and then links directly to the setting the user may want to change. There’s also a tab that identifies websites that are tracking you right now. The add-on’s creator, Privacy Choice founder Jim Brock, told Mullin, “I haven’t had anyone go through this process who was not surprised by something [they saw].” That was true for me — Facebook tracks 84% of the sites I visit, and I was surprised to learn how much of my Facebook profile was being shared with friends’ apps. The add-on is only available on Chrome and Firefox right now, but the company FAQ says a version for Safari is in the works. Mullin reports that a mobile app is in development, along with similar privacy programs for Twitter and LinkedIn.

Selling our private data is nothing new, of course. Chris Taylor and Ron Webb took a look this week at what we sell and what we get in exchange. They write that “[s]ometimes the loss of privacy is the benefit,” but “[b]efore deciding to sell our personal information, we need to be clear about what people want to know about us, why they want to know it, how and when they intend to collect that information, and what’s being offered to us in return.” You can read their analysis here.

Tip us off

News tips and suggestions are always welcome, so please send them along.

Related:

Strata Week: A realistic look at big data obstacles

Obstacles for big data, big data intelligence, and a privacy plugin puts Google and Facebook settings in the spotlight.

Big obstacles for big data

Informing intelligence with big data

Google and Facebook are watching you … and cashing in

Tip us off