FEATURED STORY

Announcing Cassandra certification

A new partnership between O’Reilly and DataStax offers certification and training in Cassandra.

apache-cassandra-certified-300x300I am pleased to announce a joint program between O’Reilly and DataStax to certify Cassandra developers. This program complements our developer certification for Apache Spark and — just as in the case of Databricks and Spark — we are excited to be working with the leading commercial company behind Cassandra. DataStax has done a tremendous job growing and nurturing the Cassandra community, user base, and technology.

Once the certification program is ready, developers can take the exam online, in designated test centers, and at select training courses. O’Reilly will also be developing books, training days, and videos targeted at developers and companies interested in the Cassandra distributed storage system.

Cassandra is a popular component used for building big data and real-time analytic platforms. Its ability to comfortably scale to clusters with thousands of nodes makes it a popular option for solutions that need to ingest and make sense of large amounts of time series and event data. As noted in an earlier post, real-time event data are at the heart of one of the trends we’re closely following: the convergence of cheap sensors, fast networks, and distributed computation. Read more…

Comment
Four short links: 27 May 2015

Four short links: 27 May 2015

Domo Arigato Mr Google, Distributed Graph Processing, Experiencing Ethics, and Deep Learning Robots

  1. Roboto — Google’s signature font is open sourced (Apache 2.0), including the toolchain to build it.
  2. Pregel: A System for Large Scale Graph Processing — a walk through a key 2010 paper from Google, on the distributed graph system that is the inspiration for Apache Giraph and which sits under PageRank.
  3. How to Turn a Liberal Hipster into a Global Capitalist (The Guardian) — In Zoe Svendsen’s play “World Factory at the Young Vic,” the audience becomes the cast. Sixteen teams sit around factory desks playing out a carefully constructed game that requires you to run a clothing factory in China. How to deal with a troublemaker? How to dupe the buyers from ethical retail brands? What to do about the ever-present problem of clients that do not pay? […] And because the theatre captures data on every choice by every team, for every performance, I know we were not alone. The aggregated flowchart reveals that every audience, on every night, veers toward money and away from ethics. I’m a firm believer that games can give you visceral experience, not merely intellectual knowledge, of an activity. Interesting to see it applied so effectively to business.
  4. End to End Training of Deep Visuomotor Policies (PDF) — paper on using deep learning to teach robots how to manipulate objects, by example.
Comment

The state of augmented reality

A look at AR today and how we need to design it for tomorrow.

Attend O’Reilly’s Solid Conference, June 23–25, in San Francisco. Solid is our conference exploring how the collision of software and hardware is fueling the creation of a software-enhanced, networked physical world. Helen Papagiannis will speak at Solid on June 24.

Google_Sky_Map_screenshot

Screenshot of the Google Sky Map app.

Unlike virtual reality (VR), augmented reality (AR) provides a gateway to a new dimension without the need to leave our physical world behind. We still see the real world around us in AR, whereas in VR, the real world is completely blocked out and replaced by a new world that immerses the user in a computer generated environment.

AR today

The most common definition of AR to date is a digital overlay on top of the real world, consisting of computer graphics, text, video, and audio, which is interactive in real time. This is experienced through a smartphone, tablet, computer, or AR eyewear equipped with software and a camera. Examples of AR today include the translation of signs or menus into the language of your choice, pointing at and identifying stars and planets in the night sky, and delving deeper into a museum exhibit with an interactive AR guide. AR presents the opportunity to better understand and experience our world in unprecedented ways.

AR is rapidly gaining momentum (and extreme amounts of funding) with great advances and opportunities in science, design, and business. It is not often that a whole new communications medium is introduced to the world. AR will have a profound effect on the way we live, work, and play. Now is the time to imagine, design, and build our virtual future. Read more…

Comment

How Shazam predicts pop hits

The O'Reilly Radar Podcast: Cait O'Riordan on Shazam's predictive analytics, and Francine Bennett on using data for evil.

Subscribe to the O’Reilly Radar Podcast to track the technologies and people that will shape our world in the years to come.

record_player_from_1920s_Marcin_Wichary_FlickrIn this week’s Radar Podcast, I chat with Cait O’Riordan, VP of product, music and platforms at Shazam. She talks about the current state of predictive analytics and how Shazam is able to predict the success of a song, often in the first few hours after its release. We also talk about the Internet of Things and how products like the Apple Watch affect Shazam’s product life cycles as well as the behaviors of their users.

Predicting the next pop hit

Shazam has more than 100 million monthly active users, and its users Shazam more than 20 million times per day. This, of course, generates a ton of data that Shazam uses in myriad ways, not the least of which is to predict the success of a song. O’Riordan explained how they approach their user data and how they’re able to accurately predict pop hits (and misses):

What’s interesting from a data perspective is when someone takes their phone out of their pocket, unlocks it, finds the Shazam app, and hits the big blue button, they’re not just saying, “I want to know the name of this song.” They’re saying, “I like this song sufficiently to do that.” There’s an amount of effort there that implies some level of liking. That’s really interesting, because you combine that really interesting intention on the part of the user plus the massive data set, you can cut that in lots and lots of different ways. We use it for lots of different things.

At the most basic level, we’re looking at what songs are going to be popular. We can predict, with a relative amount of accuracy, what will hit the Top 100 Billboard Chart 33 days out, roughly. We can look at that in lots of different territories as well. We can also look and see, in the first few hours of a track, whether a big track is going to go on to be successful. We can look at which particular part of the track is encouraging people to Shazam and what makes a popular hit. We know that, for example, for a big pop hit, you’ve got about 10 seconds to convince somebody to find the Shazam app and press that button. There are lots of different ways that we can look at that data, going right into the details of a particular song, zooming out worldwide, or looking in different territories just due to that big worldwide and very engaged audience.

Read more…

Comment

What today’s fitness technology means for tomorrow’s office

How the IoT could help organizations create a better employee experience.

Contributing Author: Claire Niech

The_office_John_Flickr

Attend O’Reilly’s Solid Conference, June 23–25, in San Francisco. Solid is our conference exploring how the collision of software and hardware is fueling the creation of a software-enhanced, networked physical world.

At 5:37 a.m., Nina’s alarm softly begins to buzz and glow. It has calculated her recovery time based on her previous day’s workout and monitored her sleep tracker to identify the best point in her REM cycle to wake her up. After rising, she grabs a healthy breakfast and her PrepPad or Drop connected kitchen scale records the fat, protein, calories, and carbohydrates in her meal.

For athletes like Nina, this kind of technology-enabled tracking is now standard. When Nina hits the gym for her daily routine, she warms up on a treadmill equipped with sensors to help gauge when she is striking at her optimal force. As she practices technique and form, a ‘smart’ surface records the location and duration of each move. Her training regimen is personalized based on this data; ‘instead of working off a generalized idea of what an athlete needs to be successful, [data analysis] has identified the specific abilities that a player requires to excel in a given sport.’ (From Faster, Higher, Stronger, by Mark McClusky)

Professional athletes today increasingly rely on Internet-connected devices and sensors to boost performance. Yet, the potential of such devices — commonly called the “Internet of Things” — extends beyond sports and fitness; as “weekend warriors” begin to bring these technologies mainstream, it is not hard to imagine that similar devices may soon also help us better understand other complex personal pursuits, such as creativity and productivity at work. As Laszlo Bock, who runs Google’s People Operations, notes: “We all have our opinions and case studies, but there is precious little scientific certainty around how to build great work environments, cultivate high-performing teams, maximize productivity, or enhance happiness.”

Today, many organizations tackle these questions with an industrial-organizational approach, diagnosing the issues most relevant to their workforce using tools such as annual surveys and benchmarking. But today’s approach seldom offers insight on “what works” — ways to track, teach, or reinforce new behaviors, or to see if specific initiatives are achieving the desired effect. Solutions to complex challenges like productivity or satisfaction often vary by organization, and demand better, real-time measurement and testing to enable experimentation.

By weaving together our physical and digital environments, sensors could help organizations analyze how factors like mood, focus, social engagement, or movement contribute to the employee experience — and even help replicate or enhance this experience. Consider how this new technology could impact how companies do work, assess outcomes, and enable employees to thrive. Read more…

Comment

Data science makes an impact on Wall Street

The O'Reilly Data Show Podcast: Gary Kazantsev on how big data and data science are making a difference in finance.

Charging_Bull_Sam_valadi_FlickrHaving started my career in industry, working on problems in finance, I’ve always appreciated how challenging it is to build consistently profitable systems in this extremely competitive domain. When I served as quant at a hedge fund in the late 1990s and early 2000s, I worked primarily with price data (time-series). I quickly found that it was difficult to find and sustain profitable trading strategies that leveraged data sources that everyone else in the industry examined exhaustively. In the early-to-mid 2000s the hedge fund industry began incorporating many more data sources, and today you’re likely to find many finance industry professionals at big data and data science events like Strata + Hadoop World.

During the latest episode of the O’Reilly Data Show Podcast, I had a great conversation with one of the leading data scientists in finance: Gary Kazantsev runs the R&D Machine Learning group at Bloomberg LP. As a former quant, I wanted to know the types of problems Kazantsev and his group work on, and the tools and techniques they’ve found useful. We also talked about data science, data engineering, and recruiting data professionals for Wall Street. Read more…

Comment