"distributed systems" entries

Four short links: 23 March 2016

Graph Query, API Economy, Mutual Interest, and The Multithreading Organization

by Nat Torkington | @gnat | +Nat Torkington | March 23, 2016

Dragon: A Distributed Graph Query Engine — Facebook describes its internal graph query engine. [T]he layout of these indices on storage is optimized based on a deeper understanding of query patterns (e.g., many queries are about friends), as opposed to accepting random sharding, which is common in these systems. Wisely, the system is tailored to the use cases they have and the patterns they see in access.
Almost Everyone Is Doing the API Economy Wrong (Techcrunch) — Redux: your API should help you make money when the API customer makes money, and you should set clear expectations for what’s acceptable and what’s not. But every developer should be forced to write 100 times: “if you build on a platform you don’t own, you’re building on a potential and probable future competitor.”
Traditional Economics Failed, Here’s a Blueprint — runs through the shifts happening in our thinking about the world and ourselves (simple to complex, independent to interdependent, rational calculator to irrational approximators, etc) and concludes: True self-interest is mutual interest. The best way to improve your likelihood of surviving and thriving is to make sure those around you survive and thrive. See above API note.
Blitzscaling (HBR) — as you move from village to city, functions are beginning to be differentiated; you’re really multithreading. I could write a thesis on the CAP theorem for business. And I have definitely worked for companies that have a “share nothing” approach to solving their threading issues.

Four short links: 22 March 2016

HCI Pioneers, Security Architecture, Trial by Cyborg, and Distributed Ledgers

by Nat Torkington | @gnat | +Nat Torkington | March 22, 2016

HCI Pioneers — Ben Schneiderman’s photo collection, acknowledging pioneers in the field. (via CCC Blog)
A Burglar’s Guide to the City (BLDGBLOG) — For the past several years, I’ve been writing a book about the relationship between burglary and architecture. Burglary, as it happens, requires architecture: it is a spatial crime. Without buildings, burglary, in its current legal form, could not exist. Committing it requires an inside and an outside; it’s impossible without boundaries, thresholds, windows, and walls. In fact, one needn’t steal anything at all to be a burglar. In a sense, as a crime, it is part of the built environment; the design of any structure always implies a way to break into it. Connection to computer security left as exercise to the reader.
Trial by Machine (Roth) — The current landscape of mechanized proof, liability, and punishment suffers from predictable but underscrutinized automation pathologies: hidden subjectivities and errors in “black box” processes; distorted decision-making through oversimplified — and often dramatically inaccurate — proxies for blameworthiness; the compromise of values protected by human safety valves, such as dignity, equity, and mercy; and even too little mechanization where machines might be a powerful debiasing tool but where little political incentive exists for its development or deployment. […] The article ultimately proposes a systems approach – “trial by cyborg” – that safeguards against automation pathologies while interrogating conspicuous absences in mechanization through “equitable surveillance” and other means. (via Marginal Revolution)
Distributed Ledger Technology: Blackett Review (gov.uk) — Distributed ledgers can provide new ways of assuring ownership and provenance for goods and intellectual property. For example, Everledger provides a distributed ledger that assures the identity of diamonds, from being mined and cut to being sold and insured. In a market with a relatively high level of paper forgery, it makes attribution more efficient, and has the potential to reduce fraud and prevent “blood diamonds” from entering the market. Report includes recommendations for policy makers. (via Dan Hill)

Metadata services can lead to performance and organizational improvements

The O’Reilly Data Show podcast: Joe Hellerstein on data wrangling, distributed systems, and metadata services.

by Ben Lorica | @bigdata | +Ben Lorica | February 11, 2016

Subscribe to the O’Reilly Data Show Podcast to explore the opportunities and techniques driving big data and data science: Stitcher, TuneIn, iTunes, SoundCloud, RSS.

In this episode of the O’Reilly Data Show, I spoke with one of the most popular speakers at Strata+Hadoop World: Joe Hellerstein, professor of Computer Science at UC Berkeley and co-founder/CSO of Trifacta. We talked about his past and current academic research (which spans HCI, databases, and systems), data wrangling, large-scale distributed systems, and his recent work on metadata services.

Data wrangling and preparation

The most interactive tasks that people do with data are essentially data wrangling. You’re changing the form of the data, you’re changing the content of the data, and at the same time you’re trying to evaluate the quality of the data and see if you’re making it the way you want it. … It’s really actually the most immersive interaction that people do with data and it’s very interesting.

Read more…

Four short links: 4 February 2016

Shmoocon Video, Smart Watchstrap, Generalizing Learning, and Dataflow vs Spark

by Nat Torkington | @gnat | +Nat Torkington | February 4, 2016

Shmoocon 2016 Videos (Internet Archive) — videos of the talks from an astonishingly good security conference.
TipTalk — Samsung watchstrap that is the smart device … put your finger in your ear to hear the call. You had me at put my finger in my ear. (via WaPo)
Ecorithms — Leslie Valiant at Harvard broadened the concept of an algorithm into an “ecorithm,” which is a learning algorithm that “runs” on any system capable of interacting with its physical environment. Algorithms apply to computational systems, but ecorithms can apply to biological organisms or entire species. The concept draws a computational equivalence between the way that individuals learn and the way that entire ecosystems evolve. In both cases, ecorithms describe adaptive behavior in a mechanistic way.
Dataflow/Beam vs Spark (Google Cloud) — To highlight the distinguishing features of the Dataflow model, we’ll be comparing code side-by-side with Spark code snippets. Spark has had a huge and positive impact on the industry thanks to doing a number of things much better than other systems had done before. But Dataflow holds distinct advantages in programming model flexibility, power, and expressiveness, particularly in the out-of-order processing and real-time session management arenas.

Four short links: 20 January 2016

Rules-Based Distributed Code, Open Source Face Recognition, Simulation w/Emoji, and Berkeley's AI Materials

by Nat Torkington | @gnat | +Nat Torkington | January 20, 2016

Experience with Rules-Based Programming for Distributed Concurrent Fault-Tolerant Code (A Paper a Day) — To demonstrate applicability outside of the RAMCloud system, the team also re-wrote the Hadoop Map-Reduce job scheduler (which uses a traditional event-based state machine approach) using rules. The original code has three state machines containing 34 states with 163 different transitions, about 2,250 lines of code in total. The rules-based re-implementation required 19 rules in 3 tasks with a total of 117 lines of code and comments. Rules-based systems are powerful and underused.
OpenFace — open source face recognition software using deep neural networks.
Simulating the World in Emoji — fun simulation environment in the browser.
Berkeley’s Intro-to-AI Materials — We designed these projects with three goals in mind. The projects allow students to visualize the results of the techniques they implement. They also contain code examples and clear directions, but do not force students to wade through undue amounts of scaffolding. Finally, Pac-Man provides a challenging problem environment that demands creative solutions; real-world AI problems are challenging, and Pac-Man is, too.

Four short links: 4 January 2016

How to Hire, Real World Distributed Systems, 3D-Printed Ceramics, and Approximate Spreadsheets

by Nat Torkington | @gnat | +Nat Torkington | January 4, 2016

How to Hire (Henry Ward) — this isn’t holy writ for everyone, but the clear way in which he lays out how he thinks about hiring should be a model to all managers, even those who disagree with his specific recommendations.
From the Ground Up: Reasoning About Distributed Systems in the Real World (Tyler Treat) — When we try to provide semantics like guaranteed, exactly-once, and ordered message delivery, we usually end up with something that’s over-engineered, difficult to deploy and operate, fragile, and slow. What is the upside to all of this? Something that makes your life easier as a developer when things go perfectly well, but the reality is things don’t go perfectly well most of the time. Instead, you end up getting paged at 1 a.m. trying to figure out why RabbitMQ told your monitoring everything is awesome while proceeding to take a dump in your front yard. An approachable argument for shifting some consistency checks to application layer so the infrastructure can be simpler.
3D Printed Ceramics to 1700°C (Ars Technica) — The key step used in the new work is to replace the standard polymers used to create ceramics with a chemical that polymerizes when exposed to UV light. (These can have a variety of chemistries; the authors list thiol, vinyl, acrylate, methacrylate, and epoxy groups.) This means they’re able to be polymerized using a fairly standard 3D printer setup. In fact, the paper lists the model number of the version the authors bought from a different company.
Guesstimate — spreadsheet for things that aren’t certain.

Four short links: 10 December 2015

Reactive Programming Theory, Attacking HTTP/2, Distributed Systems Explainer, and Auto Futures

by Nat Torkington | @gnat | +Nat Torkington | December 10, 2015

Distributed Reactive Programming (A Paper a Day) — this week’s focus on reactive programming has been eye-opening for me. I find the implementation details less interesting than the simple notion that we can define different consistency models for reactive programs and reason about them.
Attacking HTTP/2 Implementations — Our talk focused on threats, attack vectors, and vulnerabilities found during the course of our research. Two Firefox, two Apache Traffic Server (ATS), and four Node-http2 vulnerabilities will be discussed alongside the release of the first public HTTP/2 fuzzer. We showed how these bugs were found, their root cause, why they occur, and how to trigger them.
What We Talk About When We Talk About Distributed Systems — a great intro/explainer to the different concepts in distributed systems.
The Autonomous Winter is Coming — The future of any given manufacturer will be determined by how successfully they manage their brands in a market split between Mobility customers and Driving customers.

Four short links: 7 December 2015

Telepresent Axeman, Toxic Workers, Analysis Code, and Cryptocurrency Attacks

by Nat Torkington | @gnat | +Nat Torkington | December 7, 2015

Axe-Wielding Robot w/Telepresence (YouTube) — graphic robot-on-wall action at 2m30s. (via IEEE)
Toxic Workers (PDF) — In comparing the two costs, even if a firm could replace an average worker with one who performs in the top 1%, it would still be better off by replacing a toxic worker with an average worker by more than two-to-one. Harvard Business School research. (via Fortune)
Replacing Sawzall (Google) — At Google, most Sawzall analysis has been replaced by Go […] we’ve developed a set of Go libraries that we call Lingo (for Logs in Go). Lingo includes a table aggregation library that brings the powerful features of Sawzall aggregation tables to Go, using reflection to support user-defined types for table keys and values. It also provides default behavior for setting up and running a MapReduce that reads data from the logs proxy. The result is that Lingo analysis code is often as concise and simple as (and sometimes simpler than) the Sawzall equivalent.
Attacks in the World of Cryptocurrency — a review of some of the discussed weakness, attacks, or oddities in cryptocurrency (esp. bitcoin).

Four short links: 16 November 2015

Hospital Hacking, Security Data Science, Javascript Face-Substitution, and Multi-Agent Systems Textbook

by Nat Torkington | @gnat | +Nat Torkington | November 16, 2015

Hospital Hacking (Bloomberg) — interesting for both lax regulation (“The FDA seems to literally be waiting for someone to be killed before they can say, ‘OK, yeah, this is something we need to worry about,’ ” Rios says.) and the extent of the problem (Last fall, analysts with TrapX Security, a firm based in San Mateo, Calif., began installing software in more than 60 hospitals to trace medical device hacks. […] After six months, TrapX concluded that all of the hospitals contained medical devices that had been infected by malware.). It may take a Vice President’s defibrillator being hacked for things to change. Or would anybody notice?
Cybersecurity and Data Science — pointers to papers in different aspects of using machine learning and statistics to identify misuse and anomalies.
Real-time Face Substitution in Javascript — this is awesome. Moore’s Law is amazing.
Multi-Agent Systems — undergraduate textbook covering distributed systems, game theory, auctions, and more. Electronic version as well as printed book.