Four short links: 2 September 2015

Hard Problems in Distributed Systems, Engineering Bootcamp, Scripted TV, and C Guidelines

  1. There Are Only Two Hard Problems in Distributed Systems — the best tweet ever. (via Tim Bray)
  2. Building LinkedIn’s New Engineering Bootcamp — transmitting cultural and practical knowledge in a structured format.
  3. Soul-Searching in TV Land Over the Challenges of a New Golden Age (NY Times) — The number of scripted shows produced by networks, cable networks and online services ballooned to 371 last year, according to statistics compiled by FX. Mr. Landgraf believes that figure will pass 400 this year, which would nearly double the 211 shows made in 2009. […] predicted that the number of shows would slowly return to about 325 over the next few years, in large part because scripted television is expensive.
  4. C Programming Substance GuidelinesThis document is mainly about avoiding problems specific to the C programming language.

Lead with merit and be rewarded with success

Michael Lopp on the concept of merit badges for leaders.

Girl Scout Sash with Badges," by Steve Snodgrass on Flickr. Used under a Creative Commons license

Attend Cultivate, September 28 to 29 in New York, NY. Cultivate is our conference looking at the challenges facing modern management and aiming to train a new generation of business leaders who understand the relationship between corporate culture and corporate prosperity.

At our Cultivate conference in July, Michael Lopp had a fantastic session on “Leadership: By the Numbers,” which was a bit like leadership fundamentals that you may have forgotten or never knew. However, he also had a few slides in his presentation that were meant to be a brief tangent from his primary talk, but they grabbed my attention immediately. Lopp always has fantastic stories and pearls of wisdom from the hard-won experiences in his career like “Busy is a bug, not a feature.” But before he launched into his “Vegetable Talk” on how being a good manager is really basic—like vegetables—he explored the idea of having merit badges for leaders. Read more…


Showcasing the real-time processing revival

Tools and learning resources for building intelligent, real-time products.

Earth orbiting sun illustration

Register for Strata + Hadoop World NYC, which will take place September 29 to Oct 1, 2015.

A few months ago, I noted the resurgence in interest in large-scale stream-processing tools and real-time applications. Interest remains strong, and if anything, I’ve noticed growth in the number of companies wanting to understand how they can leverage the growing number of tools and learning resources to build intelligent, real-time products.

This is something we’ve observed using many metrics, including product sales, the number of submissions to our conferences, and the traffic to Radar and newsletter articles.

As we looked at putting together the program for Strata + Hadoop World NYC, we were excited to see a large number of compelling proposals on these topics. To that end, I’m pleased to highlight a strong collection of sessions on real-time processing and applications coming up at the event. Read more…

Four short links: 1 September 2015

Four short links: 1 September 2015

People Detection, Ratings Patterns, Inspection Bias, and Cloud Filesystem

  1. End-to-End People Detection in Crowded Scenes — research paper and code. When parsing the title, bind “end-to-end” to “scenes” not “people”.
  2. Statistical Patterns in Movie Ratings (PLOSone) — We find that the distribution of votes presents scale-free behavior over several orders of magnitude, with an exponent very close to 3/2, with exponential cutoff. It is remarkable that this pattern emerges independently of movie attributes such as average rating, age and genre, with the exception of a few genres and of high-budget films.
  3. The Inspection Bias is EverywhereIn 1991, Scott Feld presented the “friendship paradox”: the observation that most people have fewer friends than their friends have. He studied real-life friends, but the same effect appears in online networks: if you choose a random Facebook user, and then choose one of their friends at random, the chance is about 80% that the friend has more friends. The friendship paradox is a form of the inspection paradox. When you choose a random user, every user is equally likely. But when you choose one of their friends, you are more likely to choose someone with a lot of friends. Specifically, someone with x friends is overrepresented by a factor of x.
  4. s3qla file system that stores all its data online using storage services like Google Storage, Amazon S3, or OpenStack. S3QL effectively provides a hard disk of dynamic, infinite capacity that can be accessed from any computer with internet access running Linux, FreeBSD or OS-X. (GPLv3)

Designing at the intersection of disciplines

The O'Reilly Radar Podcast: Simon King on creating holistic, integrated experiences and the importance of discipline overlap.


Subscribe to the O’Reilly Radar Podcast to track the technologies and people that will shape our world in the years to come.

In this week’s Radar Podcast, I chat with Simon King, director of the Carnegie Mellon University Design Center. Harkening back to growing up on a family farm in Michigan, King talks about technology’s growing role in agriculture and the role design is playing in agriculture innovation. He also talks about his new book Understanding Industrial Design and the synergies between industrial design and interaction design. King will be speaking about industrial design at our newly launched O’Reilly Design Conference: Design the Future on January 19 to 22, 2016, in San Francisco.

Here are a few highlights from our conversation:

There’s been different eras of agriculture, and this latest one of precision agriculture or data-driven agriculture has the possibility of really changing the way people farm. I see that to some degree with people like my father and the new tools that he’s embracing slowly — things like autonomous driving tractors and some of the different data services. It’s an opportunity, I think, for new people to come into the field, and it’s going to be important.

Like most industries that are leading with technology, design trails. People are embracing the technology because it’s whole new capabilities that they never had before. Being able to do soil samples and analysis and then create nitrogen prescription maps so that you are not like wasting any chemicals — it’s such a great advancement that people are willing to fight through the fact that it’s poorly designed. We see that in medical; we see that in automotive. Any industry that reaches a certain curve where the technology has become mature, then all of a sudden the experience of using it begins to matter a lot more. I think that’s where design is going to start intersecting with agriculture really strongly and actually make it more accessible to farmers who are generally not that technically savvy.

Industrial design is such an older design discipline. Just purely from the design history standpoint, it’s something that everybody should be studying and be aware of how that discipline has evolved. It’s the underpinning of a lot of the different disciplines that design has kind of fragmented into.

Read more…


Bridging the divide: Business users and machine learning experts

The O'Reilly Data Show Podcast: Alice Zheng on feature representations, model evaluation, and machine learning models.

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science.

606px-IBM_Electronic_Data_Processing_Machine_-_GPN-2000-001881As tools for advanced analytics become more accessible, data scientist’s roles will evolve. Most media stories emphasize a need for expertise in algorithms and quantitative techniques (machine learning, statistics, probability), and yet the reality is that expertise in advanced algorithms is just one aspect of industrial data science.

During the latest episode of the O’Reilly Data Show podcast, I sat down with Alice Zheng, one of Strata + Hadoop World’s most popular speakers. She has a gift for explaining complex topics to a broad audience, through presentations and in writing. We talked about her background, techniques for evaluating machine learning models, how much math data scientists need to know, and the art of interacting with business users.

Making machine learning accessible

People who work at getting analytics adopted and deployed learn early on the importance of working with domain/business experts. As excited as I am about the growing number of tools that open up analytics to business users, the interplay between data experts (data scientists, data engineers) and domain experts remains important. In fact, human-in-the-loop systems are being used in many critical data pipelines. Zheng recounts her experience working with business analysts:

It’s not enough to tell someone, “This is done by boosted decision trees, and that’s the best classification algorithm, so just trust me, it works.” As a builder of these applications, you need to understand what the algorithm is doing in order to make it better. As a user who ultimately consumes the results, it can be really frustrating to not understand how they were produced. When we worked with analysts in Windows or in Bing, we were analyzing computer system logs. That’s very difficult for a human being to understand. We definitely had to work with the experts who understood the semantics of the logs in order to make progress. They had to understand what the machine learning algorithms were doing in order to provide useful feedback. Read more…