Four short links: 2 April 2015

250 Whys, Amazon Dash, Streaming Data, and Lightning Networks

  1. What I Learned from 250 WhysLet’s Plan for a Future Where We’re All As Stupid as We Are Today.
  2. Thoughts on Amazon Dash (Matt Webb) — In a way, we’re really seeing the future of marketing here. We’ve separated awareness (advertising) and distribution (stores) for so long, but it’s no longer the way. When you get a Buy Now button in a tweet you’re seeing ads and distribution merging, and the Button is the physical instantiation of this same trend. […] in the future every product will carry a buy button
  3. A Collection of Links for Streaming Algorithms and Data Structures — is this not the most self-evident title ever?
  4. Lightning Networks (Rusty Russell) — I finally took a second swing at understanding the Lightning Network paper. The promise of this work is exceptional: instant reliable transactions across the bitcoin network. But the implementation is complex and the draft paper reads like a grab bag of ideas; but it truly rewards close reading! It doesn’t involve novel crypto, nor fancy bitcoin scripting tricks. There are several techniques which are used in the paper, so I plan to concentrate on one per post and wrap up at the end. Already posted part II.
Four short links: 1 April 2015

Four short links: 1 April 2015

Tuning Fanout, Moore's Law, 3D Everything, and Social Graph Analysis

  1. Facebook’s Mystery MachineThe goal of this paper is very similar to that of Google Dapper[…]. Both work [to] try to figure out bottlenecks in performance in high fanout large-scale Internet services. Both work us[ing] similar methods, however this work (the mystery machine) tries to accomplish the task relying on less instrumentation than Google Dapper. The novelty of the mystery machine work is that it tries to infer the component call graph implicitly via mining the logs, where as Google Dapper instrumented each call in a meticulous manner and explicitly obtained the entire call graph.
  2. The Multiple Lives of Moore’s LawA shrinking transistor not only allowed more components to be crammed onto an integrated circuit but also made those transistors faster and less power hungry. This single factor has been responsible for much of the staying power of Moore’s Law, and it’s lasted through two very different incarnations. In the early days, a phase I call Moore’s Law 1.0, progress came by “scaling up”—adding more components to a chip. At first, the goal was simply to gobble up the discrete components of existing applications and put them in one reliable and inexpensive package. As a result, chips got bigger and more complex. The microprocessor, which emerged in the early 1970s, exemplifies this phase. But over the last few decades, progress in the semiconductor industry became dominated by Moore’s Law 2.0. This era is all about “scaling down,” driving down the size and cost of transistors even if the number of transistors per chip does not go up.
  3. BoXZY Rapid-Change FabLab: Mill, Laser Engraver, 3D Printer (Kickstarter) — project that promises you the ability to swap out heads to get different behaviour from the “move something in 3 dimensions” infrastructure in the box.
  4. SociaLite (Github) — a distributed query language for graph analysis and data mining. (via Ben Lorica)
Comment: 1

Telling your data’s story

How storytelling can enhance the effectiveness of your visualizations.

Editor’s note: this post is part of our investigation into Big Data Design and Social Science. Michael Freeman covers the use of storytelling frameworks in visualizations in his new tutorial video “Using Storytelling to Effectively Communicate Data.”

Visualizing complex relationships in big data often requires involved graphical displays that can be intimidating to users. As the volume and complexity of data collection and storage scale exponentially, creating clear, communicative, and approachable visual representations of that data is an increasing challenge. As a data visualization specialist, I frightened one of my first sets of collaborators when I suggested using this display:

Data Visualization

What I had failed to communicate was that we would use a story structure to introduce audiences to the complex layout (you can see how I did it here).

This image captures three emerging limitations in big data visualization:

  1. Unclear visual encodings: People don’t know what each visual symbol represents
  2. Too much data: The volume of information displayed is overwhelming
  3. Too many variables: Simultaneous encodings of color, position, size, etc. precludes fully understanding each dimension

Read more…


The shape of software architecture

Five things we learned from the O’Reilly Software Architecture Conference 2015.

nautilusLast week, I had the opportunity to see the first Software Architecture Conference spring to life after a winter of preparation. Software architects, with or without the official title, swarmed the halls learning from speakers and attendees alike. I count myself among the people who were learning. Many notions about this profession and skill set have become clearer to me and I’m already planning to keep the content coming. I’m also in the early stages of developing out the next Software Architecture Conference (spring 2016).

Within this piece you’ll find my takeaways and lessons learned from the event. I expect these initial impressions to both shape our upcoming exploration of software architecture and be shaped by continued shifts within software architecture.
Read more…

Comments: 5

How the DevOps revolution informs software architecture

The O'Reilly Radar Podcast: Neal Ford on the changing role of software architects and the rise of microservices.


In this episode of the Radar Podcast, O’Reilly’s Mac Slocum sits down with Neal Ford, a software architect and meme wrangler at ThoughtWorks, to talk about the changing role of software architects. They met up at our recent Software Architecture Conference in Boston — if you missed the event, you can sign up to be notified when the Complete Video Compilation of all sessions and talks is available.

Slocum started the conversation with the basics: what, exactly, does a software architect do. Ford noted that there’s not a straightforward answer, but that the role really is a “pastiche” of development, soft skills and negotiation, and solving business domain problems. He acknowledged that the role historically has been negatively perceived as a non-coding, post-useful, ivory tower deep thinker, but noted that has been changing over the past five to 10 years as the role has evolved into real-world problem solving, as opposed to operating in abstractions:

“One of the problems in software, I think, is that you build everything on towers of abstractions, and so it’s very easy to get to the point where all you’re doing is playing with abstractions, and you don’t reify that back to the real world, and I think that’s the danger of this kind of ivory-tower architect. When you start looking at things like continuous delivery and continuous deployment, you have to take those operational concerns into account, and I think that is making the role of architect a lot more relevant now, because they are becoming much more involved in the entire software development ecosystem, not just the front edge of it.”

Read more…

Comment: 1

Redefining power distribution using big data

The O'Reilly Data Show Podcast: Erich Nachbar on testing and deploying open source, distributed computing components.

power_distribution_alex.ch_FlickrWhen I first hear of a new open source project that might help me solve a problem, the first thing I do is ask around to see if any of my friends have tested it. Sometimes, however, the early descriptions sound so promising that I just jump right in and try it myself — and in a few cases, I transition immediately (this was certainly the case for Spark).

I recently had a conversation with Erich Nachbar, founder and CTO of Virtual Power Systems, and one of the earliest adopters of Spark. In the early days of Spark, Nachbar was CTO of Quantifind, a startup often cited by the creators of Spark as one of the first “production deployments.” On the latest episode of the O’Reilly Data Show Podcast, we talk about the ease with which Nachbar integrates new open source components into existing infrastructure, his contributions to Mesos, and his new “software-defined power distribution” startup.

Ecosystem of open source big data technologies

When evaluating a new software component, nothing beats testing it against workloads that mimic your own. Nachbar has had the luxury of working in organizations where introducing new components isn’t subject to multiple levels of decision-making. But, as he notes, everything starts with testing things for yourself:

“I have sort of my mini test suite…If it’s a data store, I would just essentially hook it up to something that’s readily available, some feed like a Twitter fire hose, and then just let it be bombarded with data, and by now, it’s my simple benchmark to know what is acceptable and what isn’t for the machine…I think if more people, instead of reading papers and paying people to tell them how good or bad things are, would actually set aside a day and try it, I think they would learn a lot more about the system than just reading about it and theorizing about the system. Read more…