Need a break from the holiday madness? You’re not alone. Check out these items of interest from the land of data and see why even the big consumers face tough choices.
Does this place accept returns?
SEDE is an open source, web-based tool for querying the monthly data dump of Creative Commons data from its four main Q&A sites (Stack Overflow, Server Fault, Super User, and Meta) as well as other sites in the Stack Exchange family. The primary reason given (within a polite write-up by Jeff Atwood and SEDE lead Sam Saffron), was the desire to have fine-tuned control over the platform.
When you are using a [Platform-as-a-Service] you are giving up a lot of control to the service provider. The service provider chooses which applications you can run and imposes a series of restrictions. … It was disorienting moving to a platform where we had no idea what kind of hardware was running our app. Giving up control of basic tools and processes we use to tune our environment was extremely painful.
While the support that comes with Platform-as-a-Service was acknowledged, it seems that the ability to better automate, adjust, and perpetuate processes and systems with more fine-grained control won out as a bigger convenience.
Where did you get that lovely platform?
Of course, one company’s headache is another’s dream. Netflix, a company known for playing with big data and crowdsourcing solutions “before it was cool,” posted on Tuesday the four reasons they’ve chosen to use Amazon Web Services (AWS) as their platform and have moved onto it over the last year.
Laudably, the company states that it viewed its tremendous recent growth (in terms of both members and streaming devices) as a license to question everything in the necessary process of re-architecting. Instead of building out their own data centers, etc., they decided to answer that set of questions by paying someone else to worry about it.
Also to their credit, Netflix has enough self-awareness to know what they are and aren’t good at. Building top-notch recommendation systems and providing entertainment? You betcha. Predicting customer growth and device engagement? Not so much.
How many subscribers would you guess used our Wii application the week it launched? How many would you guess will use it next month? We have to ask ourselves these questions for each device we launch because our software systems need to scale to the size of the business, every time.
Self-awareness is in fact the primary lesson in both Netflix’s and Stack Exchange’s platform decisions. If you feel your attention is better spent elsewhere, write a check. If you’ve got the time and expertise to hone your hardware, roll your own.
[Of course, Netflix doesn’t go for the pre-packaged solutions every time. They also posted recently about why they love open source software, and listed among the projects they make use of and contribute back to: Hadoop, Hive, HBase, Honu, Ant, Tomcat, Hudson, Ivy, Cassandra, etc.]
With what shall we shop?
The New York Times this week released a cool group of interactive maps based on data collected in the Census Bureau’s American Community Survey (ACS) from 2005 to 2009. Data is compared against the 2000 census to uncover rates of change.
[While similar to the census, the ACS is conducted every year instead of every 10 years. The ACS includes only a sampling of addresses instead of a comprehensive inventory. It covers much of the same ground on population (age, race, disability status, family relationships), but it also asks for information that is used to help make funding distribution decisions about community services and institutions.]
The Times maps explore education levels; rent, mortgage rates, and home values; household income; and racial distribution. Viewers can select among 22 maps in these four categories, and then pan and zoom to view national, state, or local trends down to the level of individual census tracts.
Above is the national view of the map that looks at change in median household income. The ACS website itself provides some maps displaying the survey numbers from the 2000 census and the 2005-2009 survey, as well as a listing of data tables.
The Times map shows the uneven way in which these numbers have gone up or down in various parts of the country, with some surprising results that are worth exploring. Note that the blue regions are places where income has dropped, and the yellow regions are places where it has increased. (No wonder a lot of us are getting creative with holiday shopping.)
If this kind of research floats your boat, check out Social Explorer, the mapping tool used to create the New York Times maps.
Even markets like to buy things
While Stewart Brand may be right in thinking information wants to be free, there’s also enormous value to be added by aggregating, structuring, and packaging data, as well as in matching up buyers with sellers. That’s the main service Data Marketplace aims to provide, particularly in the field of financial data.
At Infochimps, information is offered a la carte, and many of the site’s datasets are offered for free. These include sets as diverse as “Word List – 100,000+ official crossword words (Excel readable)“, “Measuring Worth: Interest Rates – US & UK 1790-2000“, and “Retrosheet: Game Logs (play-by-play) for Major League Baseball Games.” Data Marketplace is a bit different, in that it allows users to enter requests for data (with a deadline and budget, if desired) and then matches up would-be buyers with data providers.
Infochimps has said that Data Marketplace, which is less than a year old, will continue to operate as a standalone site, although its founders Steve DeWald and Matt Hodan will depart for new projects.
Not yet signed up for Strata? Register now and save 30% with the code STR11RAD.