"strata week" entries
Data brokers, workplace sensor studies, unreported drug side effects revealed in search data, and the dark side of big data.
The lowdown on data brokers, and the use of sensor data in the workplace
ProPublica’s Lois Beckett takes a look this week at data brokers. She says that though Congress is making moves to make such companies give consumers more control over their data and what happens to it, many people not only don’t know these data brokers exist, but they also don’t know the extent of the data gathered and how it’s used.
Hortonworks' Data Platform for Windows, Intel's Hadoop distribution, invasive smartphone surveillance, and data-driven "House of Cards."
Windows gets Hadoop, Intel launches Hadoop distribution
Hortonworks released a beta version of its Hortonworks Data Platform for Windows this week. In the press release, the company highlights the mission is to “expand the reach of Apache Hadoop across the enterprise” and notes that the “100% open source Hortonworks Data Platform is the industry’s first and only Apache Hadoop distribution for both Windows and Linux.”
Barb Darrow notes at GigaOm that there’s likely no better way to bring big data to the masses than via Microsoft Excel. Darrow reports that Hortonworks’ VP of corporate strategy Shawn Connolly told her that “[t]he combination should make it easier to integrate data from SQL Server and Hadoop and to funnel all that into Excel for charting and pivoting and all the tasks Excel is good at,” stressing that the same Apache Hadoop distribution will run on both Windows and Linux. Connolly also noted to Darrow that “an analogous Hortonworks Data Platform for Windows Azure is still in the works.”
Is data collection entering discriminatory territory? Also, big data's role in crime fighting and its debut in the NBA.
Data mining opens new doors for discrimination, marginalization
“For most of the Internet’s short history, the primary goal of this data collection was classic product marketing: for example, advertisers might want to show me Nikes and my wife Manolo Blahniks. But increasingly, data collection is leapfrogging well beyond strict advertising and enabling insurance, medical and other companies to benefit from analyzing your personal, highly detailed ‘Big Data’ record without your knowledge. Based on this analysis, these companies then make decisions about you — including whether you are even worth marketing to at all.”
The consequences of such detailed data mining run deep. Fertik notes that advances in online data mining are enabling companies to “skirt the spirit of the law” and make discriminatory choices in who receives credit or loan offers, for example, by simply not displaying online offers to less credit-attractive users. “If you live on the wrong side of the digital tracks,” he says, “you won’t even see a credit offer from leading lending institutions, and you won’t realize that loans are available to help you with your current personal or professional priorities.”
Big data and language preservation, growing data privacy concerns, and a comparison of big data to crude oil.
Preserving human language with big data
Inspired by Deb Roy’s 2011 TEDTalk, “The Birth of a Word,” Nataly Kelly at the Huffington Post’s TEDWeekends took a look at the potential effect big data could have on language — specifically, on preserving endangered and dying languages.
Controversy surrounds EU's data reforms, an investigation into "data-ism," and geeks fighting for our civil liberties.
EU’s data protection reforms could “instigate a trade war”
Ars Technica’s Cyrus Farivar took a look this week at the European Commission’s proposed reform to existing data protection laws. Farivar highlights some of the major changes the proposed reform would bring:
“The data protection reforms as proposed by the Commission would consolidate existing data protection rules, would require data breach notification within 24 hours, and would include a ‘right to be forgotten,’ allowing citizens to ‘delete their data if there are no legitimate grounds for retaining it.'”
The reform would facilitate data portability as well, Farivar notes, making it easier to transfer personal data from LinkedIn to Facebook, for instance, and could impose fines from 1% to 4% of global revenues for companies held in violation of the EU rules.
The proposed reform has ignited quite a controversy. Farivar looks at a draft response (PDF) to the proposed reform legislation published in January by Jan Philip Albrecht, a Green Party member of the European Parliament, that has “ruffled some feathers,” further expanding data protection rights beyond what the EU Commission proposed. Farivar also looks at the role members from the Pirate Party are playing in the debate, and the response from U.S. officials.
Farivar reports that U.S. Foreign Service economic officer John Rodgers noted in a speech in Berlin (Google Translate) “that a vast right to delete such personal information was not technically feasible and would pose a huge problem for all globally minded companies” and he “warned that the data protection reform as currently conceived could ‘instigate a trade war.'”
“[L]obbying pressure from American government representatives and their corporate allies is intensifying at an unprecedented level,” Farivar reports. Joe McNamee, executive director of European Digital Rights, told Farivar that “[n]othing, not even ACTA, caused the U.S. to lobby on this scale in Brussels.” You can read Farivar’s full report at Ars Technica.
Celebrating Data Privacy Day, how data fits into Bill Gates' education plan, and why "long data" deserves our attention.
Data Privacy Day and the fight against “digital feudalism”
Data Privacy Day was celebrated this week. Led by the National Cyber Security Alliance, the day is meant to increase awareness of personal data protection and “to empower people to protect their privacy and control their digital footprint and escalate the protection of privacy and data as everyone’s priority,” according to the website.
Many companies used the day as an opportunity to issue transparency reports, re-informing users and customers about how their data is used and and how it’s protected. Google added a new section to its transparency report, a Q&A on how the company handles personal user data requests from government agencies and courts.
The battle to open source OFA code; a student hacker uncovers security flaw, gets expelled; and ethics and taxes for user data collection.
A cloudy future for Obama’s election code
A battle is brewing between politicians and the dream team of programmers that helped Obama win the nerdiest election ever. Ben Popper reports at The Verge that the programmers who worked on the Obama for America (OFA) 2012 campaign want to open source the code behind the campaign’s website, its donation collection and email systems, and its mobile app. Yet “[t]hree months after the election, the data and software is still tightly controlled by the president and his campaign staff, with the fate of the code still largely undecided,” Popper writes.
OFA’s director of front-engineering Daniel Ryan told Popper that he believes the Democratic National Committee (DNC) will “mothball” the tech and argues that it should be open because it was built on top of open source code and, therefore, should go back to the public. Popper also notes that if the DNC keeps the code on ice until the 2016 election, it will be useless. “But if our work was open and people were forking it and improving it all the time,” Ryan told Popper, “then it keeps up with changes as we go.” Ryan also points out that not opening up the code not only would stifle development for the next election, but would also hinder opportunities for other progressive organizations to build on the code in the next four years.
Popper reports that a DNC official responded to a request for comment, stating that “OFA is still working out the future of their tech and data infrastructure so any speculation at this time is premature and uninformed.” You can read Popper’s in-depth report at The Verge.
Inaugural 2013 app has plans for your data, the "unprecedented" security issues of the Internet of Things, and optical switches speed up data centers.
Here are a few stories from the data space that caught my attention this week.
Inaugural 2013 app takes as much as it gives
The Presidential Inaugural Committee (PIC) launched the first official inaugural smartphone app, Inaugural 2013 (for iOS and for Android), Monday. Daniel Strauss reports in a post at The Hill that inauguration attendees can use the app to locate and RSVP to events, watch events via livestream, and navigate the event with an interactive map.
What isn’t front and center in the pomp and circumstance of the shiny new app are the terms of service and the privacy statement. Steve Friess at Politico points out that in the fine print, users are giving the PIC permission to share their data — phone numbers, email, home addresses, and GPS location data, for instance — “with candidates, organizations, groups or causes that [the PIC] believe have similar political viewpoints, principles or objectives.”
Gregory Ferenstein reports at TechCrunch that “privacy advocates find it troubling that the fine-print on the PIC’s website says it can use activity data ‘without limitation in advertising, fundraising and other communications in support of PIC and the principles of the Democratic party, without any right of compensation or attribution.'”
IDC forecast underestimates big data growth, EU report sounds an alarm over FISA Amendments Act, and big data's growing role in daily life.
Here are a few stories from the data space that caught my attention this week.
Big data needs a bigger forecast
The International Data Corporation (IDC) released a forecast this week, projecting “the worldwide big data technology and services market will grow at a 31.7% compound annual growth rate (CAGR) — about seven times the rate of the overall information and communication technology (ICT) market — with revenues reaching $23.8 billion in 2016.”
According to the press release, findings from IDC’s research also forecasted specific segment growth, including 21.1% CAGR for services and 53.4% for storage. GigaOm’s Derrick Harris says IDC’s research “only tells part of the story” and that the market will actually be much bigger. For instance, Harris notes that the report doesn’t include analytics software, a critical component of the big data market that the IDC predicts will hit $51 billion by 2016. And what of the outliers? Harris writes:
” .. .where does one include the rash of Software-as-a-Service applications targeting fields from marketing to publishing? They’re all about big data at their core, but the companies selling them certainly don’t fit into the mold of ‘big data’ vendors.”
Harris highlights potential problems the IDC might have in maintaining their report segments — servers, storage, networking, software and services — with more and more cloud providers hosting big data applications and startups offering cloud-based big data services; calculating these revenues will be no easy feat, he writes. You can read Harris’ piece in full at GigaOm.
Automation opens new avenues for humans, big data could help curb gun violence, and results from the Digital Universe Study.
Happy new year! Here are a few stories from the data space that caught my attention recently.
It’s OK if all our jobs are belong to them
Kevin Kelly took a look at the effects of automation on society over at Wired and argues that we should welcome our forthcoming robot overlords with open arms. Kelly says that giving current work tasks, future yet-to-be-imagined tasks, and jobs we can’t do at all to machines opens up possibilities for humans to do things previously unimaginable and “will let us focus on becoming more human than we were” — much like the industrial revolution “led a greater percentage of the population to decide that humans were meant to be ballerinas, full-time musicians, mathematicians, athletes, fashion designers, yoga masters, fan-fiction authors, and folks with one-of-a kind titles on their business cards.” Kelly argues that our future employment incomes will depend on our ability to work with robots and that most of what we do won’t be possible without them.