Salary insights from more than 800 data professionals reveal a correlation to skills and tools.
In the results of this year’s O’Reilly Media Data Science Salary Survey, we found a median total salary of $98k ($144k for US respondents only). The 816 data professionals in the survey included engineers, analysts, entrepreneurs, and managers (although almost everyone had some technical component in their role).
Why the high salaries? While the demand for data applications has increased rapidly, the number of people who set up the systems and perform advanced analytics has increased much more slowly. Newer tools such as Hadoop and Spark should have even fewer expert users, and correspondingly we found that users of these tools have particularly high salaries. Read more…
Introducing Bitcoin & the Blockchain: An O’Reilly Radar Summit
When the creators of bitcoin solved the “double spend” problem in a decentralized manner, they introduced techniques that have implications far beyond digital currency. Our newly announced one-day event — Bitcoin & the Blockchain: An O’Reilly Radar Summit — is in line with our tradition of highlighting applications of developments in computer science. Financial services have long relied on centralized solutions, so in many ways, products from this sector have become canonical examples of the developments we plan to cover over the next few months. But many problems that require an intermediary are being reexamined with techniques developed for bitcoin.
How do you get multiple parties in a transaction to trust each other without an intermediary? In the case of a digital currency like bitcoin, decentralization means reaching consensus over an insecure network. As Mastering Bitcoin author Andreas Antonopoulos noted in an earlier post, several innovations lie at the heart of what makes bitcoin disruptive:
“Bitcoin is a combination of several innovations, arranged in a novel way: a peer-to-peer network, a proof-of-work algorithm, a distributed timestamped accounting ledger, and an elliptic-curve cryptography and key infrastructure. Each of these parts is novel on its own, but the combination and specific arrangement was revolutionary for its time and is beginning to show up in more innovations outside bitcoin itself.”
Examples of multi-layer, three-tier data-processing architecture.
Like CPU caches, which tend to be arranged in multiple levels, modern organizations direct their data into different data stores under the principle that a small amount is needed for real-time decisions and the rest for long-range business decisions. This article looks at options for data storage, focusing on one that’s particularly appropriate for the “fast data” scenario described in a recent O’Reilly report.
Many organizations deal with data on at least three levels:
- They need data at their fingertips, rather like a reference book you leave on your desk. Organizations use such data for things like determining which ad to display on a web page, what kind of deal to offer a visitor to their website, or what email message to suppress as spam. They store such data in memory, often in key/value stores that allow fast lookups. Flash is a second layer (slower than memory, but much cheaper), as I described in a recent article. John Piekos, vice president of engineering at VoltDB, which makes an in-memory database, says that this type of data storage is used in situations where delays of just 20 or 30 milliseconds mean lost business.
- For business intelligence, theses organizations use a traditional relational database or a more modern “big data” tool such as Hadoop or Spark. Although the use of a relational database for background processing is generally called online analytic processing (OLAP), it is nowhere near as online as the previous data used over a period of just milliseconds for real-time decisions.
- Some data is archived with no immediate use in mind. It can be compressed and perhaps even stored on magnetic tape.
For the new fast data tier, where performance is critical, techniques such as materialized views further improve responsiveness. According to Piekos, materialized views bypass a certain amount of database processing to cut milliseconds off of queries. Read more…
In this O'Reilly Radar Podcast: Dr. Gilad Rosner talks about data privacy, and Alasdair Allan chats about the broken IoT.
In this podcast episode, I catch up with Dr. Gilad Rosner, a visiting researcher at the Horizon Digital Economy Research Institute in England. Rosner focuses on privacy, digital identity, and public policy, and is launching an Internet of Things Privacy Forum. We talk about personal data privacy in the age of the Internet of Things (IoT), privacy as a social characteristic, an emerging design ethos for technologists, and whether or not we actually own our personal data. Rosner characterizes personal data privacy as a social construct and addresses the notion that privacy is dead:
“Firstly, it’s important to recognize the idea that privacy is not a regime to control information. Privacy is a much larger concept than that. Regimes to control information are ways that we as a society preserve privacy, but privacy itself emerges from social needs and from individual human needs. The idea that privacy is dead comes from the vulnerability that people are feeling because they can see that it’s very difficult to maintain walls between their informational spheres, but that doesn’t mean that there aren’t countercurrents to that, and it doesn’t mean that there aren’t ways, as we go forward, to improve privacy preservation in the electronic spaces that we continue to move into.”
As we move more and more into these electronic spaces and the Internet of Things becomes democratized, our notions of privacy are shifting on a cultural level beyond anything we’ve experienced as a society before. Read more…
From the Internet of Things to data-driven fashion, here are key insights from Strata + Hadoop World in Barcelona 2014.
Experts from across the big data world came together for Strata + Hadoop World in Barcelona 2014. We’ve gathered insights from the event below.
#IoTH: The Internet of Things and Humans
“If we could start over with these capabilities we have now, how would we do it differently?” Tim O’Reilly continues to explore data and the Internet of Things through the lens of human empowerment and the ability to “use technology to give people superpowers.”