Strata Week: Hortonworks brings Hadoop to Windows

Hortonworks' Data Platform for Windows, Intel's Hadoop distribution, invasive smartphone surveillance, and data-driven "House of Cards."

Windows gets Hadoop, Intel launches Hadoop distribution

Hadoop-logoHortonworks released a beta version of its Hortonworks Data Platform for Windows this week. In the press release, the company highlights the mission is to “expand the reach of Apache Hadoop across the enterprise” and notes that the “100% open source Hortonworks Data Platform is the industry’s first and only Apache Hadoop distribution for both Windows and Linux.”

Barb Darrow notes at GigaOm that there’s likely no better way to bring big data to the masses than via Microsoft Excel. Darrow reports that Hortonworks’ VP of corporate strategy Shawn Connolly told her that “[t]he combination should make it easier to integrate data from SQL Server and Hadoop and to funnel all that into Excel for charting and pivoting and all the tasks Excel is good at,” stressing that the same Apache Hadoop distribution will run on both Windows and Linux. Connolly also noted to Darrow that “an analogous Hortonworks Data Platform for Windows Azure is still in the works.”

Brian Proffitt at ReadWrite says the move is significant and not unexpected, but wondered about the cost. Running Hadoop clusters on Linux servers is “all but frictionless,” Proffitt says, noting that there are no licensing costs on Linux and configuration is easy. “But when the underlying operating system is Windows Server,” he writes, “licensing — i.e., explicitly not free — would seem likely to create a lot more friction when someone tries to build a Hadoop cluster.” Proffitt then poses the question that wouldn’t running a Hadoop system on Windows Server, then, be too expensive?

Hortonworks VP of Marketing David McJannet didn’t think so, Proffitt reports — “McJannet’s concern was that too many Windows-based shops out there were shying away from Hadoop because they didn’t want to deal with adding Linux clusters and the related hassle of managing them.”

The beta release is available for download now at Hortonworks.

In other Hadoop news, Intel entered the arena, launching its own open source distribution of Apache Hadoop. In the company blog announcement, Pauline Nist outlined several reasons the semiconductor company wanted in on the Hadoop ecosystem, boiling it all down to its transformation potential: “Intel wants to see the Hadoop framework easily and widely adopted, as it believes the broad use of analytics, delivered at lower price points, can transform business and society, by turning big data into better insights.”

Rachel King reports at ZDNet that “Intel is framing its deployment of the open source software framework as a ground-up approach by baking Hadoop directly into the silicon level,” noting that during the invite-only announcement event, company executives said that they’ve “optimized [Intel's] Xeon chips, in particular, for networking and I/O use cases to ‘enable new levels’ of data analytics.”

Court document shows invasiveness of smartphone searches

The ACLU’s Chris Soghoian and Naomi Gilens report this week at the ACLU blog that they’ve discovered a document that lists the types of data federal agents are allowed to pull from seized smartphones. The document, which was submitted in court in relation to a drug investigation according to Soghoian’s and Gilens’ report, has been made available for download (PDF) at the ACLU. Soghoian and Gilens say it “starkly demonstrates just how invasive cell phone searches are — and why law enforcement should be required to obtain a warrant before conducting them.”

Soghoian and Gilens report that the data the agents pulled from the iPhone seized from the bedroom of a suspect in the drug investigation included a “huge array of personal data,” including stored voicemails and text messages; eight different passwords; and 659 geolocation points, including 227 cell towers and 403 Wi-Fi networks with which the cell phone had previously connected. Soghoian and Gilens note that in this particular case, agents did obtain a warrant to conduct a detailed search of the phone, but that “courts are divided about whether a warrant is necessary in these circumstances, and no statute requires one.”

Data-driven TV

In a post at the New York Times, David Carr took a look at Netflix’s employment of big data in the purchase and development of its new original series “House of Cards” — a show, Carr reports, that Netflix claims is “the most streamed piece of content in the United States and 40 other countries.” Carr notes that data use in the film and television industries is nothing new, “but as a technology company that distributes and now produces content, Netflix has mind-boggling access to consumer sentiment in real time.”

Pointing to a post by Derrick Harris at GigaOm, Carr reports that “Netflix looks at 30 million ‘plays’ a day, including when you pause, rewind and fast forward, four million ratings by Netflix subscribers, three million searches as well as the time of day when shows are watched and on what devices.” And this is in addition to all the movie and TV show metadata Netflix has at its fingertips. In addition to informing Netflix’s decisions on the creation of “House of Cards,” data informed marketing as well. Carr writes:

“And there was not one trailer for ‘House of Cards,’ there were many. Fans of Mr. [Kevin] Spacey saw trailers featuring him, women watching ‘Thelma and Louise’ saw trailers featuring the show’s female characters and serious film buffs saw trailers that reflected Mr. Fincher’s touch.”

Carr also looks at the wide-ranging responses to such data-driven approaches to TV content. You can read his full piece at the New York Times.

Tip us off

News tips and suggestions are always welcome, so please send them along.

Related:

tags: , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.