"distributed systems" entries

Four short links: 4 February 2016

Four short links: 4 February 2016

Shmoocon Video, Smart Watchstrap, Generalizing Learning, and Dataflow vs Spark

  1. Shmoocon 2016 Videos (Internet Archive) — videos of the talks from an astonishingly good security conference.
  2. TipTalk — Samsung watchstrap that is the smart device … put your finger in your ear to hear the call. You had me at put my finger in my ear. (via WaPo)
  3. Ecorithms — Leslie Valiant at Harvard broadened the concept of an algorithm into an “ecorithm,” which is a learning algorithm that “runs” on any system capable of interacting with its physical environment. Algorithms apply to computational systems, but ecorithms can apply to biological organisms or entire species. The concept draws a computational equivalence between the way that individuals learn and the way that entire ecosystems evolve. In both cases, ecorithms describe adaptive behavior in a mechanistic way.
  4. Dataflow/Beam vs Spark (Google Cloud) — To highlight the distinguishing features of the Dataflow model, we’ll be comparing code side-by-side with Spark code snippets. Spark has had a huge and positive impact on the industry thanks to doing a number of things much better than other systems had done before. But Dataflow holds distinct advantages in programming model flexibility, power, and expressiveness, particularly in the out-of-order processing and real-time session management arenas.
Comment
Four short links: 20 January 2016

Four short links: 20 January 2016

Rules-Based Distributed Code, Open Source Face Recognition, Simulation w/Emoji, and Berkeley's AI Materials

  1. Experience with Rules-Based Programming for Distributed Concurrent Fault-Tolerant Code (A Paper a Day) — To demonstrate applicability outside of the RAMCloud system, the team also re-wrote the Hadoop Map-Reduce job scheduler (which uses a traditional event-based state machine approach) using rules. The original code has three state machines containing 34 states with 163 different transitions, about 2,250 lines of code in total. The rules-based re-implementation required 19 rules in 3 tasks with a total of 117 lines of code and comments. Rules-based systems are powerful and underused.
  2. OpenFace — open source face recognition software using deep neural networks.
  3. Simulating the World in Emoji — fun simulation environment in the browser.
  4. Berkeley’s Intro-to-AI MaterialsWe designed these projects with three goals in mind. The projects allow students to visualize the results of the techniques they implement. They also contain code examples and clear directions, but do not force students to wade through undue amounts of scaffolding. Finally, Pac-Man provides a challenging problem environment that demands creative solutions; real-world AI problems are challenging, and Pac-Man is, too.
Comment
Four short links: 4 January 2016

Four short links: 4 January 2016

How to Hire, Real World Distributed Systems, 3D-Printed Ceramics, and Approximate Spreadsheets

  1. How to Hire (Henry Ward) — this isn’t holy writ for everyone, but the clear way in which he lays out how he thinks about hiring should be a model to all managers, even those who disagree with his specific recommendations.
  2. From the Ground Up: Reasoning About Distributed Systems in the Real World (Tyler Treat) — When we try to provide semantics like guaranteed, exactly-once, and ordered message delivery, we usually end up with something that’s over-engineered, difficult to deploy and operate, fragile, and slow. What is the upside to all of this? Something that makes your life easier as a developer when things go perfectly well, but the reality is things don’t go perfectly well most of the time. Instead, you end up getting paged at 1 a.m. trying to figure out why RabbitMQ told your monitoring everything is awesome while proceeding to take a dump in your front yard. An approachable argument for shifting some consistency checks to application layer so the infrastructure can be simpler.
  3. 3D Printed Ceramics to 1700°C (Ars Technica) — The key step used in the new work is to replace the standard polymers used to create ceramics with a chemical that polymerizes when exposed to UV light. (These can have a variety of chemistries; the authors list thiol, vinyl, acrylate, methacrylate, and epoxy groups.) This means they’re able to be polymerized using a fairly standard 3D printer setup. In fact, the paper lists the model number of the version the authors bought from a different company.
  4. Guesstimatespreadsheet for things that aren’t certain.
Comment
Four short links: 10 December 2015

Four short links: 10 December 2015

Reactive Programming Theory, Attacking HTTP/2, Distributed Systems Explainer, and Auto Futures

  1. Distributed Reactive Programming (A Paper a Day) — this week’s focus on reactive programming has been eye-opening for me. I find the implementation details less interesting than the simple notion that we can define different consistency models for reactive programs and reason about them.
  2. Attacking HTTP/2 ImplementationsOur talk focused on threats, attack vectors, and vulnerabilities found during the course of our research. Two Firefox, two Apache Traffic Server (ATS), and four Node-http2 vulnerabilities will be discussed alongside the release of the first public HTTP/2 fuzzer. We showed how these bugs were found, their root cause, why they occur, and how to trigger them.
  3. What We Talk About When We Talk About Distributed Systems — a great intro/explainer to the different concepts in distributed systems.
  4. The Autonomous Winter is ComingThe future of any given manufacturer will be determined by how successfully they manage their brands in a market split between Mobility customers and Driving customers.
Comment
Four short links: 7 December 2015

Four short links: 7 December 2015

Telepresent Axeman, Toxic Workers, Analysis Code, and Cryptocurrency Attacks

  1. Axe-Wielding Robot w/Telepresence (YouTube) — graphic robot-on-wall action at 2m30s. (via IEEE)
  2. Toxic Workers (PDF) — In comparing the two costs, even if a firm could replace an average worker with one who performs in the top 1%, it would still be better off by replacing a toxic worker with an average worker by more than two-to-one. Harvard Business School research. (via Fortune)
  3. Replacing Sawzall (Google) — At Google, most Sawzall analysis has been replaced by Go […] we’ve developed a set of Go libraries that we call Lingo (for Logs in Go). Lingo includes a table aggregation library that brings the powerful features of Sawzall aggregation tables to Go, using reflection to support user-defined types for table keys and values. It also provides default behavior for setting up and running a MapReduce that reads data from the logs proxy. The result is that Lingo analysis code is often as concise and simple as (and sometimes simpler than) the Sawzall equivalent.
  4. Attacks in the World of Cryptocurrency — a review of some of the discussed weakness, attacks, or oddities in cryptocurrency (esp. bitcoin).
Comment
Four short links: 16 November 2015

Four short links: 16 November 2015

Hospital Hacking, Security Data Science, Javascript Face-Substitution, and Multi-Agent Systems Textbook

  1. Hospital Hacking (Bloomberg) — interesting for both lax regulation (“The FDA seems to literally be waiting for someone to be killed before they can say, ‘OK, yeah, this is something we need to worry about,’ ” Rios says.) and the extent of the problem (Last fall, analysts with TrapX Security, a firm based in San Mateo, Calif., began installing software in more than 60 hospitals to trace medical device hacks. […] After six months, TrapX concluded that all of the hospitals contained medical devices that had been infected by malware.). It may take a Vice President’s defibrillator being hacked for things to change. Or would anybody notice?
  2. Cybersecurity and Data Science — pointers to papers in different aspects of using machine learning and statistics to identify misuse and anomalies.
  3. Real-time Face Substitution in Javascript — this is awesome. Moore’s Law is amazing.
  4. Multi-Agent Systems — undergraduate textbook covering distributed systems, game theory, auctions, and more. Electronic version as well as printed book.
Comment
Four short links: 2 November 2015

Four short links: 2 November 2015

Anti-Caching, Tyranny of Ratings, Distributed Deep Learning, and Sorting Rated Things

  1. Anti-Caching (PDF) — paper outlining a clever reframing of the database strategy of keeping frequently accessed things in-memory, namely pushing to disk the things that won’t be accessed … aka, “anti-caching.”
  2. The Rating Game (Verge) — Until companies release ratings data, we can’t know for certain whether this is true, but a study of Airbnb users found that black hosts get less money for similar listings than white hosts, and another study found that white taxi drivers get higher tips than black ones. There’s no reason such biases wouldn’t carry over to ratings.
  3. Singa — Apache distributed deep learning platform turns 1.0.
  4. Scoring Items That Were Voted On or Rated — a Bayesian system to turn a set of ratings or up/down votes into a single score, such that you can sort a list from “best” to “worst.”
Comment

Swarm v. Fleet v. Kubernetes v. Mesos

Comparing different orchestration tools.

Buy Using Docker Early Release.

Buy Using Docker Early Release.

Most software systems evolve over time. New features are added and old ones pruned. Fluctuating user demand means an efficient system must be able to quickly scale resources up and down. Demands for near zero-downtime require automatic fail-over to pre-provisioned back-up systems, normally in a separate data centre or region.

On top of this, organizations often have multiple such systems to run, or need to run occasional tasks such as data-mining that are separate from the main system, but require significant resources or talk to the existing system.

When using multiple resources, it is important to make sure they are efficiently used — not sitting idle — but can still cope with spikes in demand. Balancing cost-effectiveness against the ability to quickly scale is difficult task that can be approached in a variety of ways.

All of this means that the running of a non-trivial system is full of administrative tasks and challenges, the complexity of which should not be underestimated. It quickly becomes impossible to look after machines on an individual level; rather than patching and updating machines one-by-one they must be treated identically. When a machine develops a problem it should be destroyed and replaced, rather than nursed back to health.

Various software tools and solutions exist to help with these challenges. Let’s focus on orchestration tools, which help make all the pieces work together, working with the cluster to start containers on appropriate hosts and connect them together. Along the way, we’ll consider scaling and automatic failover, which are important features.

Read more…

Comments: 18
Four short links: 8 October 2015

Four short links: 8 October 2015

Mystery Machine, Emotional Effect, Meeting Hacks, and Energy Consumption

  1. The Mystery Machine (A Paper a Day) — rundown of Facebook’s Mystery Machine, which can measure end-to-end performance from the initiation of a page load in a Web browser, all the way through the server-side infrastructure, and back out to the point where the page has finished rendering. Doing this requires a causal model of the relationships between components (happens-before). How do you get that? And especially, how do you get that if you can’t assume a uniform environment for instrumentation?
  2. Network Effect — hypnotic and emotional. (via Flowing Data)
  3. Cultivating Great Distributed Teams (Liza Daly) — updates and refinements on her awesome meeting hack/system.
  4. Smartphone Energy Consumption (Pete Warden) — I love new ways of looking at familiar things. Looking at code and features through the lens of power consumption is another such lens. (I remember Craig from Craigslist talking at OSCON about using power as the denominator in your data center, changing how I saw the Web). The article is full of surprising numbers and fascinating factoids. Active cell radio might use 800 mW. Bluetooth might use 100 mW. Accelerometer is 21 mW. Gyroscope is 130 mW. Microphone is 101 mW. GPS is 176 mW. Using the camera in ‘viewfinder’ mode, focusing and looking at a picture preview, might use 1,000 mW. Actually recording video might take another 200 to 1,000 mW on top of that.
Comment