From data-driven government to our age of intelligence, here are key insights from Strata + Hadoop World in San Jose, CA, 2015.
Experts from across the big data world came together for Strata + Hadoop World in San Jose, CA, 2015. We’ve gathered insights from the event below.
U.S. chief data scientist
With a special recorded introduction from President Barack Obama, DJ Patil talks about his new role as the U.S. government’s first ever chief data scientist, the nature of the U.S.’s emerging data-driven government, and defines his mission in leading the data-driven initiative:
“Responsibly unleash the power of data for the benefit of the American public and maximize the nation’s return on its investment in data.”
Our things are getting wired together, and you're not secure if you can't control the destiny of your private information.
Editor’s note: The Electronic Frontier Foundation’s Cory Doctorow will be speaking at the Solid Conference in San Francisco June 23-25, 2015. Registration is now open — for more information on the program, visit the Solid website.
The digital world has been colonized by a dangerous idea: that we can and should solve problems by preventing computer owners from deciding how their computers should behave. I’m not talking about a computer that’s designed to say, “Are you sure?” when you do something unexpected — not even one that asks, “Are you really, really sure?” when you click “OK.” I’m talking about a computer designed to say, “I CAN’T LET YOU DO THAT DAVE” when you tell it to give you root, to let you modify the OS or the filesystem.
Case in point: the cell-phone “kill switch” laws in California and Minneapolis, which require manufacturers to design phones so that carriers or manufacturers can push an over-the-air update that bricks the phone without any user intervention, designed to deter cell-phone thieves. Early data suggests that the law is effective in preventing this kind of crime, but at a high and largely needless (and ill-considered) price.
To understand this price, we need to talk about what “security” is, from the perspective of a mobile device user: it’s a whole basket of risks, including the physical threat of violence from muggers; the financial cost of replacing a lost device; the opportunity cost of setting up a new device; and the threats to your privacy, finances, employment, and physical safety from having your data compromised. Read more…
The real challenge going forward: we can't trust anything.
A few weeks ago, I wrote about postmodern computing, and characterized it as the computing in a world of distrust.
This morning, I read Steve Bellovin’s blog post, What Must We Trust? — Bellovin explains that “modern” (my word) security is founded on the idea of a “Trusted Computing Base” (TCB), defined (in part) in the United States’ Defense Department’s Orange Book. There were parts of a system that you had to trust, and you had to guard their integrity vigilantly: the kernel, certainly, but also specific configuration files, executables, and so on.
The TCB has always been problematic, particularly since (at least initially) it did not consider the problem of network connections. But networking aside, Bellovin argues that recent events have blown the idea of a “trusted” system to bits. We’ve seen attacks against (Bellovin’s list) batteries, webcams, USB, and more. If Andromedans (Bellovin doesn’t want to say NSA) have managed to infiltrate our disk drives, what can trust mean? And it would be naive to think that this stops with devices that have disk drives. Our devices, from Fitbits to data centers, have been pwnd even before they’re built. Read more…
How to decide which framework is best for your particular use case.
Editor’s note: Mark Grover will be part of the team teaching the tutorial Architectural Considerations for Hadoop Applications at Strata + Hadoop World in San Jose. Visit the Strata + Hadoop World website for more information on the program.
Hadoop has become the de-facto platform for storing and processing large amounts of data and has found widespread applications. In the Hadoop ecosystem, you can store your data in one of the storage managers (for example, HDFS, HBase, Solr, etc.) and then use a processing framework to process the stored data. Hadoop first shipped with only one processing framework: MapReduce. Today, there are many other open source tools in the Hadoop ecosystem that can be used to process data in Hadoop; a few common tools include the following Apache projects: Hive, Pig, Spark, Cascading, Crunch, Tez, and Drill, along with Impala and Presto. Some of these frameworks are built on top of each other. For example, you can write queries in Hive that can run on MapReduce or Tez. Another example currently under development is the ability to run Hive queries on Spark.
Amidst all of these options, two key questions arise for Hadoop users:
- Which processing frameworks are most commonly used?
- How do I choose which framework(s) to use for my specific use case?
This post will you help answer both of these questions, giving you enough context to make an educated decision regarding the best processing framework for your specific use case. Read more…
The O'Reilly Radar Podcast: Balaji Srinivasan on the bigger picture of bitcoin, liquid markets, and the future of regulation.
The promise of bitcoin and blockchain extends well beyond its potential disruption as a currency. In this Radar Podcast episode, Balaji Srinivasan, a general partner at Andreessen Horowitz, explains how bitcoin is an enabling technology and why it’s like the Internet, in that “bitcoin will do for value transfer what the Internet did for communication — make it programmable.” I met up with Srinivasan at our recent O’Reilly Radar Summit: Bitcoin & the Blockchain, where he was speaking — you can see his talk, and all the others from the event, in the complete video compilation now available.
The bigger picture of bitcoin
More than just a digital currency, bitcoin can serve as an instigator for new markets. Srinivasan explained the potential for everything to become a liquid market:
“Bitcoin is a platform for programmable money, programmable interchange, or anything of value. That’s very general. People have probably heard at this point about how you can use a blockchain to trade — in theory — stocks, or houses, or other kinds of things, but programmable value transfer is even bigger than just trading things which we know already exist.
“One analogy I would give is in 1988, it was not possible to find information on anything instantly. Today, most of the time it is. From your iPhone or your Android phone, you can google pretty much anything. In the same way, I think what bitcoin is going to mean, is markets in everything. That is, everything will have a price on it — everything will be a liquid market. You’ll be able to buy and sell almost anything. Where today the fixed costs of setting up such a market is too high for anything other than things that are fairly valuable, tomorrow it’ll be possible for even images or things you would not even think of normally buying and selling.”