Strata Week: A home for negative and null results

Figshare wants research data, Accel makes a huge data investment, LinkedIn shares its DataFu.

Here are a few of the data stories that caught my attention this week:

Figshare sees the upside of negative results

FigshareScience data-sharing site Figshare relaunched its website this week, adding several new features. Figshare lets researchers publish all of their data online, including negative and null results.

Using the site, researchers can now upload and publish all file formats, including videos and datasets that are often deemed “supplemental materials” or excluded from current publishing models. This is part of a larger “open science” effort. According to Figshare:

“… by opening up the peer review process, researchers can easily publish null results, avoiding the file drawer effect and helping to make scientific research more efficient. Figshare uses creative commons licensing to allow frictionless sharing of research data whilst allowing users to maintain their ownership.”

As the startup argues: “Unless we as scientists publish all of our data, we will never achieve access to the sum of all scientific knowledge.”

Strata 2012 — The 2012 Strata Conference, being held Feb. 28-March 1 in Santa Clara, Calif., will offer three full days of hands-on data training and information-rich sessions. Strata brings together the people, tools, and technologies you need to make data work.

Save 20% on registration with the code RADAR20

Accel’s $100 million data fund makes its first ($52.5 million) investment

Late last year, the investment firm Accel Partners announced a new $100 Million Big Data Fund, with a promise to invest in big data startups. This year, the first investment from that fund was revealed, with a whopping $52.5 million going to Code 42.

Founded in 2001, Code 42 is the creator of the backup software CrashPlan, and the company describes itself as building “high-performance hardware and easy-to-use software solutions that protect the world’s data.”

Describing the investment, GigaOm’s Stacey Higginbotham writes:

“With the growth in mobile devices and the data stored on corporate and consumer networks that is moving not only from device to server, but device to device, [CEO Matthew] Dornquast realized Code 42′s software could become more than just a backup and sharing service, but a way for corporations to understand what data and how data was moving between employees and the devices they use.”

Higginbotham also cites Accel Partners’ Ping Li, who notes that further investments from its Big Data Fund are unlikely to be so sizable.

LinkedIn open sources DataFu

LinkedInLinkedIn has been a heavy user of Apache Pig for performing analysis with Hadoop on projects such as its People You May Know tool, among other things. For more advanced tasks like these, Pig supports User Defined Functions (UDFs), which allow the integration of custom code into scripts.

This week, LinkedIn announced the release of DataFu, the consolidation of its UDFs into a single, general-purpose library. DataFu enables users to “run PageRank on a large number of independent graphs, perform set operations such as intersect and union, compute the haversine distance between two points on the globe,” and more.

LinkedIn is making DataFu available on GitHub under the Apache 2.0 license.

Got data news?

Feel free to email me.

Related:

tags: , , , , ,

Get the O’Reilly Data Newsletter

Stay informed. Receive weekly insight from industry insiders.