"data mining" entries
Computing Twitter Influence, Part 2
In the previous post of this series, we aspired to compute the influence of a Twitter account and explored some relevant variables to arriving at a base metric. This post continues the conversation by presenting some sample code for making “reliable” requests to Twitter’s API to facilitate the data collection process.
Given a Twitter screen name, it’s (theoretically) quite simple to get all of the account profiles that follow the screen name. Perhaps the most economical route is to use the GET /followers/ids API to request all of the follower IDs in batches of 5,000 per response, followed by the GET /users/lookup API to retrieve full account profiles for up to Y of those IDs in batches of 100 per response. Thus, if an account has X followers, you’d need to anticipate making ceiling(X/5000) API calls to GET /followers/ids and ceiling(X/100) API calls toGET /users/lookup. Although most Twitter accounts may not have enough followers that the total number of requests to each API resource presents rate-limiting problems, you can rest assured that the most popular accounts will trigger rate-limiting enforcements that manifest as an HTTP error in RESTful APIs.
Technology has changed the way we understand targeting and contextual relevance. How will marketing adapt?
Over the past five years, marketing has transformed from a primarily creative process into an increasingly data-driven discipline with strong technological underpinnings.
The central purpose of marketing hasn’t changed: brands still aim to tell a story, to emotionally connect with a prospective customer, with the goal of selling a product or service. But while the need to tell an interesting, authentic story has remained constant, customers and channels have fundamentally changed. Old Marketing took a spray-and-pray approach aimed at a broad, passive audience: agencies created demographic or psychographic profiles for theoretical consumers and broadcast ads on mass-consumption channels, such as television, print, and radio. “Targeting” was primarily about identifying high concentrations of a given consumer type in a geographic area.
The era of demographics is over. Advances in data mining have enabled marketers to develop highly specific profiles of customers at the individual level, using data drawn from actual personal behavior and consumption patterns. Now when a brand tells a story, it has the ability to tailor the narrative in such a way that each potential customer finds it relevant, personally. Users have become accustomed to this kind of sophisticated targeting; broad-spectrum advertising on the Internet is now essentially spam. At the same time, there is still a fine line between “well-targeted” and “creepy.” Read more…
Notes and links from the data journalism beat
It seems that new data journalism tools are being released every day. The latest data journalism tools include: CivOmega, a modular prototype for government data that allows developers to plug in their own APIs and Fact Tank, a new data journalism platform from the Pew Research Center. Also, for journalists in the US concerned about protecting their own personal data, government investigators now face more hurdles when seeking a reporter’s records. And for a little data journalism levity, check out the latest project from Noah Veltman, a data journalism fellow at the BBC. Veltman used the GovTrack Bulk data API, SQL and Python to conduct a self-described “overly in-depth analysis” of Congressional Acronym Abuse from 1973 to the present.
Your links for the week:
- The alpha of CivOmega: A hack-day tool to parse civic data and tell you more about Beyoncé’s travels (Neiman Lab)
The idea of “a Siri or Wolfram Alpha for government data” — something that can connect natural language queries with multfaceted datasets — had been kicking around in the mind of MIT Media Lab and Knight-Mozilla veteran Dan Schultz ever since a Knight Foundation-sponsored election-year brainstorming session in 2011.
- Introducing Fact Tank: An Interview with Pew Research Center President Alan Murray (Data Driven Journalism)
Obviously, we collect vast amounts of data, about demographics, about a variety of issues – we are basically a data shop. In the past, most of the dissemination of our data has been done through existing media. But we also felt it was important for us to get our own data relating to news events out to the public more quickly and more directly. Additionally, we also felt it was important for us to play a role in aggregating data sets which we can then present ourselves.”
Response to NSA data mining and the troubling lack of technical details, Facebook's Open Compute data center, and local police are growing their own DNA databases.
It’s a question of power, not privacy — and what is the NSA really doing?In the wake of the leaked NSA data-collection programs, the Pew Research Center conducted a national survey to measure American’s response. The survey found that 56% of respondents think NSA’s telephone record tracking program is an acceptable method to investigate terrorism, and 62% said the government’s investigations into possible terrorist threats are more important than personal privacy.
Rebecca J. Rosen at The Atlantic took a look at legal scholar Daniel J. Solove’s argument that we should care about the government’s collection of our data, but not for the reasons one might think — the collection itself, he argues, isn’t as troubling as the fact that they’re holding the data in perpetuity and that we don’t have access to it. Rosen quotes Solove:
“The NSA program involves a massive database of information that individuals cannot access. … This kind of information processing, which forbids people’s knowledge or involvement, resembles in some ways a kind of due process problem. It is a structural problem involving the way people are treated by government institutions. Moreover, it creates a power imbalance between individuals and the government. … This issue is not about whether the information gathered is something people want to hide, but rather about the power and the structure of government.”
Preview of The Laws of Data Mining Session at Strata Santa Clara 2013
Many years ago I was taught about the three laws of thermodynamics. When that didn’t stick, I was taught a quick way to remember originally identified by C.P. Snow:
- 1st Law: you can’t win
- 2nd Law: you can’t draw
- 3rd Law: you can’t get out of the game
These laws (well the real ones) were firmly established by the mid 19th century. Yet, it wasn’t until the 1930s that the value of the 0th law was identified.
They may possibly, just possibly, not be as important as the laws of thermodynamics, but at Strata they will be supported by an equally important 0th Law.
Inaugural 2013 app has plans for your data, the "unprecedented" security issues of the Internet of Things, and optical switches speed up data centers.
Here are a few stories from the data space that caught my attention this week.
Inaugural 2013 app takes as much as it gives
The Presidential Inaugural Committee (PIC) launched the first official inaugural smartphone app, Inaugural 2013 (for iOS and for Android), Monday. Daniel Strauss reports in a post at The Hill that inauguration attendees can use the app to locate and RSVP to events, watch events via livestream, and navigate the event with an interactive map.
What isn’t front and center in the pomp and circumstance of the shiny new app are the terms of service and the privacy statement. Steve Friess at Politico points out that in the fine print, users are giving the PIC permission to share their data — phone numbers, email, home addresses, and GPS location data, for instance — “with candidates, organizations, groups or causes that [the PIC] believe have similar political viewpoints, principles or objectives.”
Gregory Ferenstein reports at TechCrunch that “privacy advocates find it troubling that the fine-print on the PIC’s website says it can use activity data ‘without limitation in advertising, fundraising and other communications in support of PIC and the principles of the Democratic party, without any right of compensation or attribution.'”
Big data in 2013, and beyond; the Sunlight Foundation's new data mining app; and the growth of our planet's central nervous system.
Here are a few stories from the data space that caught my attention this week.
Big data will continue to be a big deal
“Big data” became something of a buzz phrase in 2012, with its role in the US Presidential election, and businesses large and small starting to realize the benefits and challenges of mountains upon zettabytes of data — so much so that NPR’s linguist contributor Geoff Nunberg thinks it should have been the phrase of the year.
Nunberg says that though “it didn’t get the wide public exposure given to items like ‘frankenstorm,’ ‘fiscal cliff‘ and YOLO,” and might not have been “as familiar to many people as ‘Etch A Sketch’ and ’47 percent'” were during the election, big data has become a phenomenon affecting our lives: “It’s responsible for a lot of our anxieties about intrusions on our privacy, whether from the government’s anti-terrorist data sweeps or the ads that track us as we wander around the Web.” He also notes that big data has transformed statistics into “a sexy major” and predicts the term will long outlast “Gangnam Style.” (You can read Nunberg’s full case for big data at NPR.)