"distributed systems" entries

Four short links: 30 June 2015

Four short links: 30 June 2015

Ductile Systems, Accessibility Testing, Load Testing, and CRAP Data

  1. Brittle SystemsMore than two decades ago at Sun, I was convinced that making systems ductile (the opposite of brittle) was the hardest and most important problem in system engineering.
  2. tota11y — accessibility testing toolkit from Khan.
  3. Locustan open source load testing tool.
  4. Impala: a Modern, Open-source SQL Engine for Hadoop (PDF) — CRAP, aka Create, Read, and APpend, as coined by an ex-colleague at VMware, Charles Fan (note the absence of update and delete capabilities). (via A Paper a Day)
Four short links: 9 June 2015

Four short links: 9 June 2015

Parallelising Without Coordination, AR/VR IxD, Medical Insecurity, and Online Privacy Lies

  1. The Declarative Imperative (Morning Paper) — on Dataflow. …a large class of recursive programs – all of basic Datalog – can be parallelized without any need for coordination. As a side note, this insight appears to have eluded the MapReduce community, where join is necessarily a blocking operator.
  2. Consensual Reality (Alistair Croll) — Among other things we discussed what Inbar calls his three rules for augmented reality design: 1. The content you see has to emerge from the real world and relate to it. 2. Should not distract you from the real world; must add to it. 3. Don’t use it when you don’t need it. If a film is better on the TV watch the TV.
  3. X-Rays Behaving BadlyAccording to the report, medical devices – in particular so-called picture archive and communications systems (PACS) radiologic imaging systems – are all but invisible to security monitoring systems and provide a ready platform for malware infections to lurk on hospital networks, and for malicious actors to launch attacks on other, high value IT assets. Among the revelations contained in the report: A malware infection at a TrapX customer site spread from a unmonitored PACS system to a key nurse’s workstation. The result: confidential hospital data was secreted off the network to a server hosted in Guiyang, China. Communications went out encrypted using port 443 (SSL) and were not detected by existing cyber defense software, so TrapX said it is unsure how many records may have been stolen.
  4. The Online Privacy Lie is Unraveling (TechCrunch) — The report authors’ argue it’s this sense of resignation that is resulting in data tradeoffs taking place — rather than consumers performing careful cost-benefit analysis to weigh up the pros and cons of giving up their data (as marketers try to claim). They also found that where consumers were most informed about marketing practices they were also more likely to be resigned to not being able to do anything to prevent their data being harvested. Something that didn’t make me regret clicking on a TechCrunch link.
Four short links: 27 May 2015

Four short links: 27 May 2015

Domo Arigato Mr Google, Distributed Graph Processing, Experiencing Ethics, and Deep Learning Robots

  1. Roboto — Google’s signature font is open sourced (Apache 2.0), including the toolchain to build it.
  2. Pregel: A System for Large Scale Graph Processing — a walk through a key 2010 paper from Google, on the distributed graph system that is the inspiration for Apache Giraph and which sits under PageRank.
  3. How to Turn a Liberal Hipster into a Global Capitalist (The Guardian) — In Zoe Svendsen’s play “World Factory at the Young Vic,” the audience becomes the cast. Sixteen teams sit around factory desks playing out a carefully constructed game that requires you to run a clothing factory in China. How to deal with a troublemaker? How to dupe the buyers from ethical retail brands? What to do about the ever-present problem of clients that do not pay? […] And because the theatre captures data on every choice by every team, for every performance, I know we were not alone. The aggregated flowchart reveals that every audience, on every night, veers toward money and away from ethics. I’m a firm believer that games can give you visceral experience, not merely intellectual knowledge, of an activity. Interesting to see it applied so effectively to business.
  4. End to End Training of Deep Visuomotor Policies (PDF) — paper on using deep learning to teach robots how to manipulate objects, by example.
Four short links: 15 May 2015

Four short links: 15 May 2015

Army Cloud, Google Curriculum, Immutable Infrastructure, and Task Queues

  1. Army Cloud Computing Strategy (PDF) — aka: “what we hope to do without having done, to use what we’re doing to them.”
  2. Guide to Technical Development (Google) — This guide is a suggested path for university students to develop their technical skills academically and non-academically through self-paced, hands-on learning.
  3. Immutable Infrastructure is the Future (Michael DeHaan) — The future of configuration management systems is in deploying cloud infrastructure that will later run immutable systems via an API level.
  4. machineryan asynchronous task queue/job queue based on distributed message passing.
Four short links: 14 May 2015

Four short links: 14 May 2015

Human-Machine Cooperation, Concurrent Systems Books, AI Future, and Gesture UI

  1. Ghosts in the Machines (Courtney Nash) — People are neither masters of machines, nor subservient to their machine-learning outcomes — we cannot, and should not, separate the two. We are actors, together, in a very complex system. David Woods calls this “joint cognitive systems.”
  2. TLA+ (Leslie Lamport) — two tutorials: “Principles of Concurrent Computing” and “Specification of Concurrent Systems.” Ironically, I see people grizzling that the book on distributed systems hasn’t been linearised. I wonder if you can partition it into the two tutorials and still have full availability…
  3. Deep Learning vs Probabilistic vs LogicAs of 2015, I pity the fool who prefers Modus Ponens over Gradient Descent.
  4. Touché (Disney Research) — measur[es] capacitive response of object and human at multiple frequencies, a technique that we called Swept Frequency Capacitive Sensing. The signal travels through different paths depending on its frequency, capturing the posture of human hand and body as well as other properties of the context. The resulted data is classified using machine learning algorithms to identify gestures that are then used to trigger desired responses of the user interface.
Four short links: 6 May 2015

Four short links: 6 May 2015

Self-Driving Cars, Cloud BigTable, Define "Uptime," and Continuous Delivery Architectures

  1. Andrew Ng (Wired) — I think self-driving cars are a little further out than most people think. There’s a debate about which one of two universes we’re in. In the first universe it’s an incremental path to self-driving cars, meaning you have cruise control, adaptive cruise control, then self-driving cars only on the highways, and you keep adding stuff until 20 years from now you have a self-driving car. In universe two you have one organization, maybe Carnegie Mellon or Google, that invents a self-driving car and bam! You have self-driving cars. It wasn’t available Tuesday but it’s on sale on Wednesday. I’m in universe one. I think there’s a lot of confusion about how easy it is to do self-driving cars. There’s a big difference between being able to drive a thousand miles, versus being able to drive anywhere. And it turns out that machine-learning technology is good at pushing performance from 90 to 99 percent accuracy. But it’s challenging to get to four nines (99.99 percent). I’ll give you this: we’re firmly on our way to being safer than a drunk driver.
  2. Google Cloud BigTable — Google’s BigTable, with Apache HBase API, single-digit millisecond latency, and “fully managed”. G are hell-bent on catching up with Amazon and Microsoft at this cloud serving thing.
  3. Call Me Maybe: AerospikeWe’re setting a timeout of 500ms here, and operations still time out every time a partition between nodes occurs. In these tests we aren’t interfering with client-server traffic at all. Aerospike may claim “100% uptime”, but this is only meaningful with respect to particular latency bounds. Given Aerospike claims millisecond-scale latencies, you may want to reconsider whether you consider this “uptime”.
  4. 31 Continuous Delivery Architectures (Slideshare) — from a vendor, so one name crops up repeatedly (other than “Jenkins”), but it’s still good devops voyeurism/envy.
Four short links: 5 May 2015

Four short links: 5 May 2015

Agile Hardware, Time Series Data, Data Loss, and Automating Security

  1. How We Do Agile Hardware Development at MeldIn every sprint we built both hardware and software. This doesn’t mean we had a fully fabricated new board rev once a week. […] We couldn’t build a complete new board every week, and early on we didn’t even know for sure what parts we wanted in our final BOM (Bill of Materials) so we used eval boards. These stories of how companies iterated fast will eventually build a set of best practices for hardware startups, similar to those in software.
  2. Recording Time Series — if data arrives with variable latency, timestamps are really probabilistic ranges. How do you store your data for searches and calculations that reflect reality, and are not erroneous because you’re ignoring a simplification you made to store the data more conveniently?
  3. Call Me Maybe, ElasticSearch 1.5.0To be precise, Elasticsearch’s transaction log does not put your data safety first. It puts it anywhere from zero to five seconds later. In this test we kill random Elasticsearch processes with kill -9 and restart them. In a datastore like Zookeeper, Postgres, BerkeleyDB, SQLite, or MySQL, this is safe: transactions are written to the transaction log and fsynced before acknowledgement. In Mongo, the fsync flags ensure this property as well. In Elasticsearch, write acknowledgement takes place before the transaction is flushed to disk, which means you can lose up to five seconds of writes by default. In this particular run, ES lost about 10% of acknowledged writes.
  4. FIDO — Netflix’s open source system for automatically analyzing security events and responding to security incidents.
Four short links: 27 April 2015

Four short links: 27 April 2015

Living Figures, Design vs Architecture, Faceted Browsing, and Byzantine Comedy

  1. ‘Living Figures’ Make Their Debut (Nature) — In July last year, neurobiologist Björn Brembs published a paper about how fruit flies walk. Nine months on, his paper looks different: another group has fed its data into the article, altering one of the figures. The update — to figure 4 — marks the debut of what the paper’s London-based publisher, Faculty of 1000 (F1000), is calling a living figure, a concept that it hopes will catch on in other articles. Brembs, at the University of Regensburg in Germany, says that three other groups have so far agreed to add their data, using software he wrote that automatically redraws the figure as new data come in.
  2. Strategies Against Architecture (Seb Chan and Aaron Straup Cope) — the story of the design of the Cooper Hewitt’s clever “pen,” which visitors to the design museum use to collect the info from their favourite exhibits. (Visit the Cooper Hewitt when you’re next in NYC; it’s magnificent.)
  3. Two Way Streetan independent explorer for The British Museum collection, letting you browse by year acquired, year created, type of object, etc. I note there are more things from a place called “Brak” than there are from USA. Facets are awesome. (via Courtney Johnston)
  4. The Saddest Moment (PDF) — “How can you make a reliable computer service?” the presenter will ask in an innocent voice before continuing, “It may be difficult if you can’t trust anything and the entire concept of happiness is a lie designed by unseen overlords of endless deceptive power.” The presenter never explicitly says that last part, but everybody understands what’s happening. Making distributed systems reliable is inherently impossible; we cling to Byzantine fault tolerance like Charlton Heston clings to his guns, hoping that a series of complex software protocols will somehow protect us from the oncoming storm of furious apes who have somehow learned how to wear pants and maliciously tamper with our network packets. Hilarious. (via Tracy Chou)
Four short links: 24 April 2015

Four short links: 24 April 2015

Jeff Jonas, Siri and Mesos, YouTube's Bandwidth Bill, and AWS Numbers

  1. Decoding Jeff Jonas (National Geographic) — “He thinks in three—no, four dimensions,” Nathan says. “He has a data warehouse in his head.” And that’s where the work takes place—in his head. Not on paper. Not on a computer. He resorts to paper only to work the details out. When asked about his thought process, Jonas reaches for words, then says: “It’s like a Rubik’s Cube. It all clicks into place. “The solution,” he says, is “simply there to find.” Jeff’s a genius and has his own language for explaining what he does. This quote goes a long way to explaining it.
  2. How Apple Uses Mesos for Siri — great to see not only some details of the tooling that Apple built, but also their acknowledgement of the open source foundations and ongoing engagement with those open source communities. There have been times in the past when Apple felt like a parasite on the commons rather than a participant.
  3. Cheaper Bandwidth or Bust: How Google Saved YouTube (ArsTechnica) — Remember YouTube’s $2 million-a-month bandwidth bill before the Google acquisition? While it wasn’t an overnight transition, apply Google’s data center expertise, and this cost drops to about $666,000 a month.
  4. AWS Business NumbersAmazon Web Services generated $5.2 billion over the past four quarters, and almost $700 million in operating income. During the first quarter of 2015, AWS sales reached $1.6 billion, up 49% year-over-year, and roughly 7% of Amazon’s overall sales.
Four short links: 22 April 2015

Four short links: 22 April 2015

Perfect Security, Distributing Secrets, Stale Reads, and Digital Conversions

  1. Perfect Security (99% Invisible) — Since we lost perfect security in the 1850s, it has has remained elusive. Despite tremendous leaps forward in security technology, we have never been able to get perfect security back. History of physical security, relevant to digital security today.
  2. keywhiz a system for managing and distributing secrets. It can fit well with a service oriented architecture (SOA).
  3. Call Me Maybe: MongoDB Stale Reads — a master class in understanding modern distributed systems. Kyle’s blog is consistently some of the best technical writing around today.
  4. Users Convert to Digital Subscribers at a Rate of 1% (Julie Starr) — and other highlights of Jeff Jarvis’s new book, Geeks Bearing Gifts.