Here are a few of the data stories that caught my attention this week.
Megaupload’s seizure and questions about controlling user data
When the file-storage and sharing site Megaupload had its domain name seized, assets frozen and website shut down in mid-January, the U.S. Justice Department contended that the owners were operating a site dedicated to copyright infringement. But that posed a huge problem for those who were using Megaupload for the legitimate and legal storage of their files. As the EFF noted, these users weren’t given any notice of the seizure, nor were they given an opportunity to retrieve their data.
Moreover, it seemed this week that those users would have all their data deleted, as Megaupload would no longer be able to pay its server fees.
While it appears that users have won a two-week reprieve before any deletion actually occurs, the incident does raise a number of questions about users’ data rights and control in the cloud. Specifically: What happens to user data when a file hosting / cloud provider goes under? And how much time and notice should users have to reclaim their data?
This is what you see when you visit Megaupload.com.
Bloomberg opens its market data distribution technology
The financial news and information company Bloomberg opened its market data distribution interface this week. The BLPAPI is available under a free-use license at open.bloomberg.com. According to the press release, some 100,000 people already use the BLPAPI, but with this week’s announcement, the interface will be more broadly available.
The company introduced its Bloomberg Open Symbology back in 2009, a move to provide an alternative to some of the proprietary systems for identifying securities (particularly those services offered by Bloomberg’s competitor Thomson Reuters). This week’s opening of the BLPAPI is a similar gesture, one that the company says is part of its “Open Market Data Initiative, an ongoing effort to embrace and promote open solutions for the financial services industry.”
The BLPAPI works with a range of programming languages, including Java, C, C++, .NET, COM and Perl. But while the interface itself is free to use, the content is not.
Pentaho moves Kettle to the Apache 2.0 license
By moving to the Apache license, Pentaho says it will be more in line with the licensing of Hadoop, Hbase, and a number of NoSQL projects.
Kettle downloads and documentation are available at the Pentaho Big Data Community Home.
Oscar screeners and movie piracy data
Andy Baio took a look at some of the data surrounding piracy and the Oscar screening process. There has long been concern that the review copies of movies distributed to members of the Academy of Motion Arts and Sciences were making their way online. Baio observed that while a record number of films have been nominated for Oscars this year (37), just eight of the “screeners” have been leaked online, “a record low that continues the downward trend from last year.”
However, while the number of screeners available online has diminished, almost all of the nominated films (34) had already been leaked online. “If the goal of blocking leaks is to keep the films off the Internet, then the MPAA [Motion Picture Association of America] still has a long way to go,” Baio wrote.
Baio has a number of additional observations about these leaks (and he also made the full data dump available for others to examine). But as the MPAA and others are making arguments (and helping pen related legislation) to crack down on Internet privacy, a good look at piracy trends seems particularly important.
Got data news?
Feel free to email me.