"data privacy" entries

Strata Week: Movers and shakers on the data journalism front

Reuters' Connected China, accessing Pew's datasets, Simon Rogers' move to Twitter, data privacy solutions, and Intel's shift away from chips.

Reuters launches Connected China, Pew instructs on downloading its data, and Twitter gets a data editor

Yue Qiu and Wenxiong Zhang took a look this week at a data journalism effort by Reuters, the Connected China visualization application. Qiu and Zhang report that “[o]ver the course of about 18 months, a dozen bilingual reporters based in Hong Kong dug into government websites, government reports, policy papers, Mainland major publications, English news reporting, academic texts, and think-tank reports to build up the database.”

Read more…

Strata Week: Court case sheds light on FBI stingray surveillance

Intrusiveness of FBI stingrays, IRS vs Fourth Amendment, Liquid Robotics' AWS of open seas, and Republicans want big data.

FBI and IRS push privacy envelope

Details about how the FBI uses stingray or IMSI-catcher technology — and how much more intrusive it is than previously known — have come to light in a tax fraud case against accused identity thief Daniel David Rigmaiden. Kim Zetter reports at Wired that the FBI, in coordination with Verizon Wireless, was able to track Rigmaiden’s location by reprogramming his air card to connect to the FBI’s fake cell tower, or stingray, when calls came to a landline controlled by the FBI. “The FBI calls, which contacted the air card silently in the background, operated as pings to force the air card into revealing its location,” Zetter explains.

The U.S. government claims it doesn’t need a warrant to use stingrays “because they don’t collect the content of phone calls and text messages and operate like pen-registers and trap-and-traces, collecting the equivalent of header information,” Zetter says, but in this particular case they got a probable-cause warrant because the stingray located and accessed the air card remotely through Rigmaiden’s apartment.

The issue at stake in this case is whether or not the court was fully informed as to the intrusiveness of the technology when it granted the warrant. Read more…

Strata Week: Can big data save human language?

Big data and language preservation, growing data privacy concerns, and a comparison of big data to crude oil.

Preserving human language with big data

Inspired by Deb Roy’s 2011 TEDTalk, “The Birth of a Word,” Nataly Kelly at the Huffington Post’s TEDWeekends took a look at the potential effect big data could have on language — specifically, on preserving endangered and dying languages.

Read more…

Strata Week: Raising the world’s data privacy IQ

Celebrating Data Privacy Day, how data fits into Bill Gates' education plan, and why "long data" deserves our attention.

Data Privacy Day and the fight against “digital feudalism”

Data Privacy Day was celebrated this week. Led by the National Cyber Security Alliance, the day is meant to increase awareness of personal data protection and “to empower people to protect their privacy and control their digital footprint and escalate the protection of privacy and data as everyone’s priority,” according to the website.

Many companies used the day as an opportunity to issue transparency reports, re-informing users and customers about how their data is used and and how it’s protected. Google added a new section to its transparency report, a Q&A on how the company handles personal user data requests from government agencies and courts.

Read more…

Strata Week: The complexities of forecasting the big data market

IDC forecast underestimates big data growth, EU report sounds an alarm over FISA Amendments Act, and big data's growing role in daily life.

Here are a few stories from the data space that caught my attention this week.

Big data needs a bigger forecast

The International Data Corporation (IDC) released a forecast this week, projecting “the worldwide big data technology and services market will grow at a 31.7% compound annual growth rate (CAGR) — about seven times the rate of the overall information and communication technology (ICT) market — with revenues reaching $23.8 billion in 2016.”

According to the press release, findings from IDC’s research also forecasted specific segment growth, including 21.1% CAGR for services and 53.4% for storage. GigaOm’s Derrick Harris says IDC’s research “only tells part of the story” and that the market will actually be much bigger. For instance, Harris notes that the report doesn’t include analytics software, a critical component of the big data market that the IDC predicts will hit $51 billion by 2016. And what of the outliers? Harris writes:

” .. .where does one include the rash of Software-as-a-Service applications targeting fields from marketing to publishing? They’re all about big data at their core, but the companies selling them certainly don’t fit into the mold of ‘big data’ vendors.”

Harris highlights potential problems the IDC might have in maintaining their report segments — servers, storage, networking, software and services — with more and more cloud providers hosting big data applications and startups offering cloud-based big data services; calculating these revenues will be no easy feat, he writes. You can read Harris’ piece in full at GigaOm.

Read more…

Big, open and more networked than ever: 10 trends from 2012

Social media, open source in government, open mapping and other trends that mattered this year.

In 2012, technology-accelerated change around the world was accelerated by the wave of social media, data and mobile devices. In this year in review, I look back at some of the stories that mattered here at Radar and look ahead to what’s in store for 2013.

Below, you’ll find 10 trends that held my interest in 2012. This is by no means a comprehensive account of “everything that mattered in the past year” — try The Economist’s account of the world in 2012 or The Atlantic’s 2012 in review or Popular Science’s “year in ideas” if you’re hungry for that perspective — but I hope you’ll find something new to think about as 2013 draws near. Read more…

Strata Week: Big data gets warehouse services

AWS Redshift and BitYota launch, big data's problems could shift to real time, and NYPD may be crossing a line with cellphone records.

Here are a few stories from the data space that caught my attention this week.

Amazon, BitYota launch data warehousing services

Amazon announced the beta launch of its Amazon Web Services data warehouse service Amazon Redshift this week. Paul Sawers at The Next Web reports that Amazon hopes to democratize data warehousing services, offering affordable options to make such services viable for small businesses while enticing large companies with cheaper alternatives. Depending on the service plan, customers can launch Redshift clusters scaling to more than a petabyte for less than $1,000 per terabyte per year.

So far, the service has drawn in some big players — Sawers notes that the initial private beta has more than 20 customers, including NASA/JPL, Netflix, and Flipboard.

Brian Proffitt at ReadWrite took an in-depth look at the service, noting its potential speed capabilities and the importance of its architecture. Proffitt writes that Redshift’s massively parallel processing (MPP) architecture “means that unlike Hadoop, where data just sits cheaply waiting to be batch processed, data stored in Redshift can be worked on fast — fast enough for even transactional work.”

Proffitt also notes that Redshift isn’t without its red flags, pointing out that a public cloud service not only raises issues of data security, but of the cost of data access — the bandwidth costs of transferring data back and forth. He also raises concerns that this service may play into Amazon’s typical business model of luring customers into its ecosystem bits at a time. Proffitt writes:

“If you have been keeping your data and applications local, shifting to Redshift could also mean shifting your applications to some other part of the AWS ecosystem as well, just to keep the latency times and bandwidth costs reasonable. In some ways, Redshift may be the AWS equivalent of putting the milk in the back of the grocery store.”

In related news, startup BitYota also launched a data warehousing service this week. Larry Dignan reports at ZDNet that BitYota is built on a cloud infrastructure and uses SQL technology, and that service plans will start at $1,500 per month for 500GB of data. As to competition with AWS Redshift, BitYota co-founder and CEO Dev Patel told Dignan that it’s a non-issue: “[Redshift is] not a competitor to us. Amazon is taking the traditional data warehouse and making it available. We focus on a SaaS approach where the hardware layer is abstracted away,” he said.

Read more…

Strata Week: Investors embrace Hadoop BI startups

Platfora, Continuuity secure funding; the Internet of Things gets connected; and personal big data needs a national awareness campaign.

Here are a few stories from the data space that caught my attention this week.

Two Hadoop BI startups secure funding

Hadoop LogoThere were a couple notable pieces of investment news this week. Platfora, a startup looking to democratize Hadoop as a business intelligence (BI) tool for everyday business users, announced this week that it has raised $20 million in series B funding, bringing its total funding to $25.7 million, according to a report by Derrick Harris at GigaOm.

Harris notes that investors seem to get the technology — CEO Ben Werther told Harris that in this funding round, discussions moved to signed term sheets in just three weeks. Harris writes that the smooth investment experience “probably has something to do with the consensus the company has seen among venture capitalists, who project Hadoop will take about 20 percent of a $30 billion legacy BI market and are looking for the startups with the vision to win that business.”

Platfora faces plenty of well-funded legacy BI competitors, but Werther told Christina Farr at Venture Beat that Platfora’s edge is speed: “People can visualize and ask questions about data within hours. There is no six-month cycle time to make Hadoop amazing.”

In other investment news, Continuuity announced it has secured $10 million in series A funding to further develop AppFabric, its cloud-based platform-as-a-service tool designed to host Hadoop-based BI applications. Alex Wilhelm reports at The Next Web that Continuuity is looking to make AppFabric “the de facto location where developers can move their big data tools from idea to product, without worrying about building their own backend, or fretting about element integration.”

Read more…

Strata Week: A realistic look at big data obstacles

Obstacles for big data, big data intelligence, and a privacy plugin puts Google and Facebook settings in the spotlight.

Here are a few stories from the data space that caught my attention this week.

Big obstacles for big data

For the latest issue of Foreign Policy, Uri Friedman put together a summarized history of big data to show “[h]ow we arrived at a term to describe the potential and peril of today’s data deluge.” A couple months ago, MIT’s Alex “Sandy” Pentland took a look at some of that big data potential for Harvard Business Review; this week, he looked at some of the perilous aspects. Pentland writes that to be realistic about big data, it’s important to look not only at its promise, but also its obstacles. He identifies the problem of finding meaningful correlations as one of big data’s biggest obstacles:

“When your volume of data is massive, virtually any problem you tackle will generate a wealth of ‘statistically significant’ answers. Correlations abound with Big Data, but inevitably most of these are not useful connections. For instance, your Big Data set may tell you that on Mondays, people who drive to work rather than take public transportation are more likely to get the flu. Sounds interesting, and traditional research methods show that it’s factually true. Jackpot!

“But why is it true? Is it causal? Is it just an accident? You don’t know. This means, strangely, that the scientific method as we normally use it no longer works, because there are so many possible relationships to consider that many are bound to be ‘statistically significant’. As a consequence, the standard laboratory-based question-and-answering process — the method that we have used to build systems for centuries — begins to fall apart.”

Pentland says that big data is going to push us out of our comfort zone, requiring us to conduct experiments in the real world — outside our familiar laboratories — and change the way we test the causality of connections. He also addresses issues of understanding those correlations enough to put them to use, knowing who owns the data and learning to forge new types of collaborations to use it, and how putting individuals in charge of their own data helps address big data privacy concerns. This piece, together with Pentland’s earlier big data potential post, are this week’s recommended reads.

Read more…

The dark side of data

In a world of big, open data, "privacy by design" will become even more important.

Map of France in Google Earth by Steven La Roux

A few weeks ago, Tom Slee published “Seeing Like a Geek,” a thoughtful article on the dark side of open data. He starts with the story of a Dalit community in India, whose land was transferred to a group of higher cast Mudaliars through bureaucratic manipulation under the guise of standardizing and digitizing property records. While this sounds like a good idea, it gave a wealthier, more powerful group a chance to erase older, traditional records that hadn’t been properly codified. One effect of passing laws requiring standardized, digital data is to marginalize all data that can’t be standardized or digitized, and to marginalize the people who don’t control the process of standardization.

That’s a serious problem. It’s sad to see oppression and property theft riding in under the guise of transparency and openness. But the issue isn’t open data, but how data is used.

Read more…