Making the web work for science

Kaitlin Thaney on the value of open science and why broken methods are blocking big advances.

The field of science is firmly on our radar as a vertical with huge interest in and opportunities for the things that are foundational to O’Reilly’s world view: openness, platforms and APIs, creating more value than you capture, the web as foundational platform, the power of big data both as key to analytics and as business model chokepoint, sensors and the Internet of Things, open publishing … and it’s just beginning to be accelerated by startups.

It’s an exciting time in science, so for an update I spent some time with Kaitlin Thaney (@kaythaney), who helps organise Science Foo Camp, co-chairs Strata Conference in London, and who serves as the manager of external partnerships for Digital Science, a technology company serving scientific research, where she is responsible for the group’s public-facing activities.

Nat Torkington: Thanks for joining me today. Let’s start with Science Foo Camp (“Sci Foo” for short). At O’Reilly, we get insight into how the trends we see in our early adopter technology market are feeding into (and being informed by) the work of practising scientists. What’s the Digital Science view of Science Foo?

Kaitlin Thaney: Sci Foo, to me, gets back to that curiosity and serendipity that often drives brilliant interactions and ideas. It’s a supercharged weekend with little to no structure, where you’re not scolded for leaving a session midway (or not attending one at all), that stretches your mind and leaves you inspired. It’s a mix of utterly fascinating people across disciplines and sectors all somehow linked by a common interest or work in science. I’d say it’s a means for us to connect to our community, but that’d be, albeit in some sense true, vastly underselling the power of it, I’m afraid. For attendees, of course, it’s energizing and open to possibilities — so very different from the usual scientific conference.

Nat Torkington: For some outsiders, obviously, it’s a bit of a strange thing for these Internet-age companies to be engaging with the white labcoats of science. Science is obviously a centuries-old (millenia-old if you take a wider view) endeavour. Why must science practice today be different from the scientific practice that gave us our understanding of atomic theory, black holes, and antibiotics?

Kaitlin Thaney: Those practices aren’t scalable anymore, and rather than helping the next generation of researchers build their careers, they’re hindering us from finding the next cure for a disease or making the next big discovery. In drug discovery, we’ve plucked the low-hanging fruit, and now need to look at larger scale datasets to identify interesting pathways for targeting.

The way we produced and disseminated knowledge has moved beyond writing on a sheet of paper and passing it to a colleague. Technology — and namely, the web — has fundamentally transformed how we interrogate and interact with content, how we discover information, our work environments, our agility. It’s time we bring our scientific research methods into the 21st century.

Nat Torkington: Nicely said! How have things changed since 2006 when you and John Wilbanks, as part of the Science Commons team at Creative Commons, first started talking about this?

Kaitlin Thaney: In some ways, we’re still fighting the same theoretical battles, just in different arenas and increasingly now with the public’s attention, not just that of the zealots. That’s an important shift in and of itself, and shows that our work in the early years not only advocating publicly for this change but also backing that up with the legal and technical tools for use in research (not just in copyright), is working, slowly but surely.

The messaging and theory our team at Creative Commons crafted, from principles of open science, to open data doctrine as well as broader theory on the commons, was meant for the community, and to see that taken forward still shows that there’s still work to be done, and an appetite for that change. Our initial aim — to make the web work better for science — is being carried forward by a number of organisations, projects and advocates, ranging from the open science world to software shops like ours. It’s validation of a sort to see that work continue, and for that message to still resonate.

It’s becoming a mainstream conversation, with an ever-increasing focus on making the fruits of research available to be learned from, built upon, consumed by the broader public as a means of engagement. That conversation has shifted from making the actual content available (a hard sell, even for say, green Open Access, in 2006) to making sure that all of the necessary components — the underlying data, the code, the software — are also somehow listed or included to reduce the burden on the next researcher who comes along, wanting to pick up where the research left off, or remix that into a new experiment.

Nat Torkington: Is it still a conversation? Why is actual change in practice so hard to make?

Kaitlin Thaney: In scientific research, we’re dealing with special circumstances, trying to innovate upon hundreds of years of entrenched norms and practices, broken incentive structures and information discovery problems dramatically slowing the system, keeping us from making the advances needed to better society. This stuff is tough, and there’s not a quick one-size-fits-all solution (trust me, I’ve looked).

I spoke to a number of these issues in my talk at OSCON this past July, on how for all of the incredible discoveries we see hit the news and the pages of Nature, for how many times we hear about “Science 2.0” and even “3.0” (sorry, Tim), we haven’t even hit “0.5” … at least in the worlds I work in. Issues such as these are oft-overlooked, baseline assumptions we take for granted that haven’t actually been addressed or solved.

Think of it as a calcified pipe whose blockages cause only a trickle to come out on the other end. That’s most research. The big advances in science? They have a high-pressure firehose on one end. More pressure, same broken pipe. Those breaks in the system are keeping us from doing more efficient work and truly advancing discovery.

Nat Torkington: Are there success stories you can point to, though, of gains from change in practices and behaviours?

Kaitlin Thaney: We’re beginning to see data treated more as a first-class citizen when it comes to reporting, publication, and availability. Take for example the changes undertaken recently by the National Science Foundation (NSF), one of the leading funding bodies in the sciences. They put in place a requirement that for all proposals submitted on or after January 2011, there must be a data management plan. That’s now been elaborated upon with an additional requirement for grantees to list data, software, code, patents or other “products” as research outputs as part of that funding. And from what I can tell, they intend to enforce that requirement, which has surely caught the attention of researchers applying for NSF money.

There are a host of other mandates on the institutional and funder level that have helped push the access conversation from one that many feigned ignorance of, to one that while still uncomfortable for some, is at least unavoidable. This is progress, and is helping, in a top-down fashion, provide an incentive for researchers.

Offerings like figshare (whom we at Digital Science are delighted to support) are now increasingly being embedded in new ways, helping to better link data to publication (see their integration with Faculty of 1000’s new research journal), as well as stitch data and other research objects (ie., posters, datasets, video, figures) into schemes familiar to researchers like citation. Not only is the interface stupidly simple and fast to use, it’s free, the defaults are set to open (CC-BY if copyright applies, CC0 for the rest), and everything uploaded gets a Digital Object Identifier allowing for citation. That, to me, we need to see more of.

What I think will be interesting to watch is the continuing evolution in thinking when it comes to mapping our digital behaviours in other settings to research, taking lessons from our habits as digital consumers (which, in many cases, have become second nature) and seeing if there’s an opportunity to apply to science. For example, the way that when we go to purchase an item, chances are many of us will look to online reviews, be it on Amazon or TripAdvisor, to decide whether that’s a smart investment. Now, apply that to say, sourcing biological materials like antibodies for experiments. Efficacy is a massive problem (and an expensive one at that), where you can buy the same Huntingtin antibody from five top suppliers and not realise until you’re midway through your experiment that it’s of insufficient quality. It’s like shopping blindly, hoping for the best. A team we work with in Toronto, 1DegreeBio, is working to change those odds, providing useful reviews on products in an open fashion matched with a catalogue of the top commercial and university offerings, allowing researchers to make educated decisions that won’t waste time and money.

Of course, moving more toward efficient, digital research opens up opportunities to remix, link up, tinker, and learn in new ways, from stumbling on that next scientific discovery due to new collaborative technologies and analysis tools, to having a better understanding as to what’s going on under the hood and preserving that knowledge for the next crop of students to come along. I still think there’s work to be done in really making the web work for science, but we’re heading in the right direction — and luckily, there are plenty of us working towards that goal.

Nat Torkington: How has this mapped to tool development?

Kaitlin Thaney: In terms of the technology, we have seen a burst of growth in the scientific startup space, as well as innovation in big pharma, biotech, and non-profits to name a few. Researchers are overwhelmed with choices for means to manage, share and analyse their research in new ways, with an ever-growing stack of tools and web apps that can enhance their experience and further bridge the gap between the way we consume, sift and create as digital consumers to how we do so as researchers.

But at the end of the day, the researcher has to be incentivised to learn how to navigate a new interface or engage with a system. They have to trust that their investment of time and energy into adopting a new technology is going to pay off. It has to show immediate value.

At Digital Science, we craft software solutions for researchers, many of us coming from that world ourselves. I can tell you, trust and incentives are not easy problems to crack, but underestimating their importance could lead to the downfall of your project or company. Many of us come from the research world, which helps in terms of grasping the severity of the problem and the right approach to get scientists on board. Beyond our core team, the founders we work with each come with a similar story in some regards of encountering a problem in their day-to-day work, getting frustrated, and crafting a solution to get around it.

To go back to the figshare example cited previously, Mark Hahnel (the founder of figshare) crafted that while he was finishing his PhD in stem cell biology. He was frustrated that he’d collected so much other data that wasn’t going to be used in publication, but still was of use to the community. That’s how figshare was born. Labguru, a lightweight lab management tool, was developed by a researcher who was sick of how much waste (time, money, and materials) there was in his lab, where the norm was a lab bench organised with colored stickers, post-it notes and stacks of invoices. You’ll see a number of these projects — and they’re the ones I find most promising — where the team involved comes with that domain expertise paired with a bit of frustration. With some added support and integration where it makes sense, they really start to make a difference.

Nat Torkington: From my limited experience, most scientists aren’t as comfortable with software and the possibilities of the Internet as they would have to be in order to make the change in practice. Is this right?

Kaitlin Thaney: Yes. Looking beyond tools offerings, skills training to match the technology is still leagues behind where it should be. This could be in part, I’m told, due to bandwidth or lack of resources to update curriculum at universities (whilst arguably a legitimate concern to some extent, I’m not sure this excuses it); also, in part, due to increased specialisation on a disciplinary level.

As I was told by one researcher “you either are good at the wet lab stuff, or the computational analysis,” dependent on whether you choose to pursue a traditional life sciences degree or the computational flavoring (ie., computational neuroscience / biology / etc.). Some are self-taught to understand packages such as R or to program in Python. Others are left at a loss, and basic concepts necessary for understanding information let alone discipline-specific research such as basic statistical literacy, data management, visualisation, and analysis fall by the wayside.

Resetting the defaults when students first learn how to, say, save their data alongside an experiment could make a tremendous difference in the long-term, in terms of understanding how to store and mark up their data as they go along is something like figshare, for example, how to tie that into publication, understanding how that fits into data management plans. A broader understanding and introduction to the vast array of tools that exist earlier in a student’s research career could also dramatically alter behavioral chain in how they teach their students, and the next generation after that, be it in better managing their lab, making their data available, or sourcing materials.

The skills gap is only increasing, and there are groups out there working to address this (i.e, Software Carpentry, ROpenSci, Young Rewired State, Code Club, etc.) that we can, and should, learn from, keeping this in mind as we continue developing top-notch tools for researchers and working to increase awareness. Having a better understanding as to how a result was reached or what data from an experiment may truly represent is empowering, and we need to make sure those methods of inquiry and skills aren’t lost.

Nat Torkington: So what’s going to happen next in this transformation of science practice? What should we be keeping our eye on and watching for?

Kaitlin Thaney: Content will continue to move online, not just in a digitised fashion, but meaningfully linked to the necessary components (where feasible) to reproduce that experiment. We’re making headway with getting the content online. Next is adding value to the content and making it useful for the community.

Work environments are also moving more to the cloud — private or public — providing a more distributed approach to sharing, analysing and filtering information.

I also foresee a more concerted effort to reset our defaults to open, from how we teach undergraduates, to the tools we arm them with to conduct their research and even publication. Providing entry points that are set to foster collaboration, sharing of information and better information management will help us establish these practices as the norm, rather than a conflicting way of doing research. And we’re starting to see the beginnings of that.

Nat Torkington: Thanks so much for taking the time to talk with me. Best of luck with your science revolution!

Kaitlin Thaney: Thanks, and I’ll see you at Kiwi Foo Camp one of these years!

tags: , ,