"strata week" entries
Recommended resources from a former analyst
I was pretty cranky before I spoke with Q Ethan McCallum on the phone today.
I was cranky from absorbing the NSA news dominating many data conversations. There is a lot of yammering going on. Some good. Some super bad. My crankiness dissolved a bit after speaking with Q and other Chicago-based people who are working on positive impact data science projects. You’ll be seeing more from Q and these other data science people within the Strata blog very soon. Utilizing data for positive change makes me happy.
My crankiness also dissolved when I decided to not provide summary points on a few articles covering the latest NSA leaks for the Strata Week element. Instead, I decided to pretend that I was an analyst again and think about the resources that I would have wanted to visit in order to form my own insights and analysis.
Recommended Resources for Analysts
- The Guardian. Interested in reviewing the leaked documents and forming your own insights? The Guardian’s “Read the Documents” section will be very useful.
- U.S. House of Representatives Permanent Select Committee on Intelligence. There is always more than one side to a story. The latest committee updates are available as well as videos of recent hearings.
- Office of the Director of National Intelligence. Specifically the Federal Agency Data Mining Report and finding out very quickly how the U.S. government defines “data mining”. We should all be aware of this.
- Accumulo. Are you technically-oriented and want to understand more about the database that grew up within the NSA? Then you should look at the Wired coverage on Accumolo for background and then take a look around at the open source project.
- ProPublica. I visit this investigative journalism site often and as a full disclosure, I have also donated personal money to ProPublica.
- Techmeme. While there are a lot of aggregators available, this is my go-to aggregator.
U.S. opens data, Wong tapped for U.S. chief privacy officer, FBI might read your email sans warrant, and big data spells trouble for anonymity.
U.S. government data to be machine-readable, Nicole Wong may fill new White House chief privacy officer role
The U.S. government took major steps this week to open up government data to the public. U.S. President Obama signed an executive order requiring government data to be made available in machine-readable formats, and the Office of Management and Budget and the Office of Science and Technology Policy released a Open Data Policy memo (PDF) to address the order’s implementation.
The press release announcing the actions notes the benefit the U.S. economy historically has experienced with the release of government data — GPS data, for instance, sparked a flurry of innovation that ultimately contributed “tens of billions of dollars in annual value to the American economy,” according to the release. President Obama noted in a statement that he hopes a similar result will come from this open data order: “Starting today, we’re making even more government data available online, which will help launch even more new startups. And we’re making it easier for people to find the data and use it, so that entrepreneurs can build products and services we haven’t even imagined yet.”
Jon Bruner's industrial Internet report; IBM, Belkin, and the Internet of Things; cars as software platforms; and coding is the job of the future.
Soon, everything will be an Internet platform
Ben Schiller at Fast Company took a look this week at a recent report by Jon Bruner on the industrial Internet. “According to Jon Bruner [the industrial Internet] is ‘machines becoming nodes on pervasive networks that use open protocols,'” writes Schiller. “And, to many others, it is as a big a deal as the Internet itself: essentially completing a job that’s only half-finished with web sites, email, Twitter, and so on.”
Shiller pulls some highlights from Bruner’s report, especially noting how the industrial Internet will effect various industries, such as energy, health care, and transport. Read more…
Big data aids HR, DataKind heads to the U.K., and German regulators fine Google a "paltry" 145,000 euros.
Big data replaces gut instinct in HR management
In a post at the New York Times, Steve Lohr took a look this week at a new data discipline: work-force science. The field pairs big data with human resources to help remove subjectivity and gut instinct from the hiring process and HR management. Lohr notes that in the past, studies conducted to understand worker behavior included a few hundred test subjects at most. Today, they can include thousands of subjects and far more data points. Lohr writes:
“Today, every e-mail, instant message, phone call, line of written code and mouse-click leaves a digital signal. These patterns can now be inexpensively collected and mined for insights into how people work and communicate, potentially opening doors to more efficiency and innovation within companies. Digital technology also makes it possible to conduct and aggregate personality-based assessments, often using online quizzes or games, in far greater detail and numbers than ever before.”
Lohr looks at several companies applying data-driven decision making to HR management. Read more…
Reuters' Connected China, accessing Pew's datasets, Simon Rogers' move to Twitter, data privacy solutions, and Intel's shift away from chips.
Reuters launches Connected China, Pew instructs on downloading its data, and Twitter gets a data editor
Yue Qiu and Wenxiong Zhang took a look this week at a data journalism effort by Reuters, the Connected China visualization application. Qiu and Zhang report that “[o]ver the course of about 18 months, a dozen bilingual reporters based in Hong Kong dug into government websites, government reports, policy papers, Mainland major publications, English news reporting, academic texts, and think-tank reports to build up the database.”
Intrusiveness of FBI stingrays, IRS vs Fourth Amendment, Liquid Robotics' AWS of open seas, and Republicans want big data.
FBI and IRS push privacy envelope
Details about how the FBI uses stingray or IMSI-catcher technology — and how much more intrusive it is than previously known — have come to light in a tax fraud case against accused identity thief Daniel David Rigmaiden. Kim Zetter reports at Wired that the FBI, in coordination with Verizon Wireless, was able to track Rigmaiden’s location by reprogramming his air card to connect to the FBI’s fake cell tower, or stingray, when calls came to a landline controlled by the FBI. “The FBI calls, which contacted the air card silently in the background, operated as pings to force the air card into revealing its location,” Zetter explains.
The U.S. government claims it doesn’t need a warrant to use stingrays “because they don’t collect the content of phone calls and text messages and operate like pen-registers and trap-and-traces, collecting the equivalent of header information,” Zetter says, but in this particular case they got a probable-cause warrant because the stingray located and accessed the air card remotely through Rigmaiden’s apartment.
The issue at stake in this case is whether or not the court was fully informed as to the intrusiveness of the technology when it granted the warrant. Read more…
Strata Week: We give up more data than we realize, but CA residents soon may have access to all of it
Alessandro Acquisti's data research, the CA Right to Know Act of 2013, big data signal issues, and big data battles fraud and theft.
A look at personal data research and new government legislation
In a post at the New York Times this week, Somini Sengupta took an in-depth look at the work of Alessandro Acquisti, a behavioral economist at Carnegie Mellon University in Pittsburgh. Acquisti studies the choices we make when deciding what and how much data we’re willing to share and the things that cause us to often give up more data than we realize. Sengupta reports:
“Our browsing habits, search terms, e-mail communication — even our offering of our ZIP codes at the supermarket checkout — reveal bits of information that can be assembled by data companies, usually for the purpose of knowing what sorts of products we’re most likely to buy. The online advertising industry insists that the data is scrambled to make it impossible to identify individuals.
“Mr. Acquisti offers a sobering counterpoint. In 2011, he took snapshots with a webcam of nearly 100 students on campus. Within minutes, he had identified about one-third of them using facial recognition software. In addition, for about a fourth of the subjects whom he could identify, he found out enough about them on Facebook to guess at least a portion of their Social Security numbers.”
Anonymized phone data isn't as anonymous as we thought, a CFPB API, and NYC's "geek squad of civic-minded number-crunchers."
Mobile phone mobility traces ID users with only four data points
A study published this week by Scientific Reports, Unique in the Crowd: The privacy bounds of human mobility, shows that the location data in mobile phones is posing an anonymity risk. Jason Palmer reported at the BBC that researchers at MIT and the Catholic University of Louvain reviewed data from 15 months’ worth of phone records for 1.5 million people and were able to identify “mobility traces,” or “evident paths of each mobile phone,” using only four locations and times to positively identify a particular user. Yves-Alexandre de Montjoye, the study’s lead author, told Palmer that “[t]he way we move and the behaviour is so unique that four points are enough to identify 95% of people.”
Debating big data' potential social impact, CISPA is back, and a look at how cities are benefitting from big data.
Big data’s big social impact
In partnership with the Harvard Business Review, the Skoll World Forum on Social Entrepreneurship has been running a series of posts addressing and debating big data’s potential for large-scale social impact. A couple posts from the series published this week stood out.
Robert Kirkpatrick, director of the Global Pulse initiative, argued that donating data needs to be the next evolution of philanthropic giving. He pointed out that many of the public programs we all take for granted in arts, health and education would not exist without support from the private sector, and noted that potential societal benefits from big data initiatives are no less important. Read more…
It's not a big data bubble — it's a big data revolution; connected cars are here; and executives get in on big data.
The magnitude of big data’s role eclipses the hype
In a post at NPR, Adam Frank argued that the potential and extent of big data’s role and influence in our world is akin to the role the steam engine played in technological and scientific advances in the 19th century.
Frank highlighted a piece at Frankfurter Allgemeine Zeitung in which one detractor warned against becoming “bewitched” by data or expecting it to “replace our traditional methods of discovering the truth,” and argued that human intuition will still be required to achieve understanding. Frank wrote that while the writer’s point is taken, it doesn’t diminish the magnitude of big data’s potential:
“I believe there is something real and powerful happening in the Big Data revolution. It’s more than just a fad. It’s the next link in the long chain connecting culture and technology to human history. … Through new fields like data science and network theory, Big Data will not only change the world we move through as individuals, it will change the world we imagine through science.”