"distributed databases" entries

Four short links: 2 March 2016

Four short links: 2 March 2016

Sensing Cognitive Load, Boring is Good, Replicating SQLite, and Intro to Autonomous Robots

  1. An Adaptive Learning Interface that Adjusts Task Difficulty based on Brain State (PDF) — using blood flow to measure cognitive load, this tool releases new lessons to you when you’re ready for them. The system measures blood flow using functional near-infrared spectroscopy (fNIRS). Increased activation in an area of the brain results in increased levels of oxyhemoglobin. These changes can be measured by emitting frequencies of near-infrared light around 3 cm deep into the brain tissue and measuring the light attenuation caused by levels of oxyhemoglobin. I think we all want a widget on our computer that says “your brain is full, go offline to recover,” if only to validate naptime.
  2. Deploying SoftwareYour deploys should be as boring, straightforward, and stress-free as possible. cf Maciej Ceglowski’s “if you find it interesting, it doesn’t belong in production.”
  3. Replicating SQLite Using Raftrqlite is written in Go and uses Raft to achieve consensus across all the instances of the SQLite databases. rqlite ensures that every change made to the database is made to a quorum of SQLite files, or none at all.
  4. An Introduction to Autonomous RobotsAn open textbook focusing on computational principles of autonomous robots. CC-NC-ND and for sale via Amazon.
Four short links: 5 January 2016

Four short links: 5 January 2016

Inference with Privacy, RethinkDB Reliability, T-Mobile Choking Video, and Real-Time Streams

  1. Privacy-Preserving Inference of Social Relationships from Location Data (PDF) — utilizes an untrusted server and computes the building blocks to support various social relationship studies, without disclosing location information to the server and other untrusted parties. (via CCC Blog)
  2. Jepson takes on Rethink — the glowingest review I’ve seen from Aphyr. As far as I can ascertain, RethinkDB’s safety claims are accurate.
  3. T-Mobile’s BingeOn `Optimization’ Is Just Throttling (EFF) — T-Mobile has claimed that this practice isn’t really “throttling,” but we disagree. It’s clearly not “optimization,” since T-Mobile doesn’t alter the actual content of the video streams in any way.
  4. qminer — BSD-licensed data analytics platform for processing large-scale, real-time streams containing structured and unstructured data.
Four short links: 19 November 2015

Four short links: 19 November 2015

Javascript Charting, Time-Series Database, Postgresql Clustering, and Organisational Warfare

  1. plotly.js — open source Javascript charting library. See the announcement.
  2. Heroic — Spotify’s time-series database, built on Cassandra and Elasticsearch. See the announcement.
  3. Yoke — high-availability Postgresql cluster with automated cluster recovery and auto-failover.
  4. Ten Graphs on Organisational Warfare — Simon Wardley in a nutshell :-)
Four short links: 12 June 2015

Four short links: 12 June 2015

OLAP Datastores, Timely Dataflow, Paul Ford is God, and Static Analysis

  1. pinota realtime distributed OLAP datastore, which is used at LinkedIn to deliver scalable real time analytics with low latency. It can ingest data from offline data sources (such as Hadoop and flat files) as well as online sources (such as Kafka). Pinot is designed to scale horizontally.
  2. Naiad: A Timely Dataflow System — in Timely Dataflow, the first two features are needed to execute iterative and incremental computations with low latency. The third feature makes it possible to produce consistent results, at both outputs and intermediate stages of computations, in the presence of streaming or iteration.
  3. What is Code (Paul Ford) — What the coders aren’t seeing, you have come to believe, is that the staid enterprise world that they fear isn’t the consequence of dead-eyed apathy but rather détente. Words and feels.
  4. Facebook Infer Opensourced — the static analyzer I linked to yesterday, released as open source today.
Four short links: 5 June 2015

Four short links: 5 June 2015

IoT and New Hardware Movement, OpenCV 3, FBI vs Crypto, and Transactional Datastore

  1. New Hardware and the Internet of Things (Jon Bruner) — The Internet of Things and the new hardware movement are not the same thing. The new hardware movement is driven by new tools for: Prototyping (inexpensive 3D printers, CNC machine tools, cheap and powerful microcontrollers, high-level programming languages on embedded systems); Fundraising and business development (Highway1, Lab IX); Manufacturing (PCH, Seeed); Marketing (Etsy, Quirky). The IoT is driven by: Ubiquitous connectivity; Cheap hardware (i.e., the new hardware movement); Inexpensive data processing and machine learning.
  2. OpenCV 3.0 Released — I hadn’t realised how much hardware acceleration comes out of the box with OpenCV.
  3. FBI: Companies Should Help us Prevent Encryption (WaPo) — as Mike Loukides says, we are in a Post-Modern age where we don’t trust our computers and they don’t trust us. It’s jarring to hear the organisation that (over-zealously!) investigates computer crime arguing that citizens should not be able to secure their communications. It’s like police arguing against locks.
  4. cockroacha scalable, geo-replicated, transactional datastore. The Wired piece about it drops the factoid that the creators of GIMP worked on Google’s massive BigTable-successor, Colossus. From Photoshop-alike to massive file systems. Love it.
Four short links: 12 May 2015

Four short links: 12 May 2015

Data Center Numbers, Utility Computing, NSA Art, and RIP CAP

  1. We Used to Build Steel Mills Near Cheap Sources of Power, but Now That’s Where We Build DatacentersHennessy & Patterson estimate that of the $90M cost of an example datacenter (just the facilities – not the servers), 82% is associated with power and cooling. The servers in the datacenter are estimated to only cost $70M. It’s not fair to compare those numbers directly since servers need to get replaced more often than datacenters; once you take into account the cost over the entire lifetime of the datacenter, the amortized cost of power and cooling comes out to be 33% of the total cost, when servers have a three-year lifetime and infrastructure has a 10-15 year lifetime. Going back to the Barroso and Holzle book, processors are responsible for about a third of the compute-related power draw in a datacenter (including networking), which means that just powering processors and their associated cooling and power distribution is about 11% of the total cost of operating a datacenter. By comparison, the cost of all networking equipment is 8%, and the cost of the employees that run the datacenter is 2%.
  2. Microsoft Invests in 3 Undersea Cable Projects — utility computing is an odd concept, given how quickly hardware cycles refresh. In the past, you could ask whether investors wanted to be in a high-growth, high-risk technology business or a stable blue-chip utility.
  3. Secret Power — Simon Denny’s NSA-logo-and-Snowden-inspired art makes me wish I could get to Venice. See also The Guardian piece on him.
  4. Please Stop Calling Databases CP or AP (Martin Kleppman) — The fact that we haven’t been able to classify even one datastore as unambiguously “AP” or “CP” should be telling us something: those are simply not the right labels to describe systems. I believe that we should stop putting datastores into the “AP” or “CP” buckets. So readable!
Four short links: 6 May 2015

Four short links: 6 May 2015

Self-Driving Cars, Cloud BigTable, Define "Uptime," and Continuous Delivery Architectures

  1. Andrew Ng (Wired) — I think self-driving cars are a little further out than most people think. There’s a debate about which one of two universes we’re in. In the first universe it’s an incremental path to self-driving cars, meaning you have cruise control, adaptive cruise control, then self-driving cars only on the highways, and you keep adding stuff until 20 years from now you have a self-driving car. In universe two you have one organization, maybe Carnegie Mellon or Google, that invents a self-driving car and bam! You have self-driving cars. It wasn’t available Tuesday but it’s on sale on Wednesday. I’m in universe one. I think there’s a lot of confusion about how easy it is to do self-driving cars. There’s a big difference between being able to drive a thousand miles, versus being able to drive anywhere. And it turns out that machine-learning technology is good at pushing performance from 90 to 99 percent accuracy. But it’s challenging to get to four nines (99.99 percent). I’ll give you this: we’re firmly on our way to being safer than a drunk driver.
  2. Google Cloud BigTable — Google’s BigTable, with Apache HBase API, single-digit millisecond latency, and “fully managed”. G are hell-bent on catching up with Amazon and Microsoft at this cloud serving thing.
  3. Call Me Maybe: AerospikeWe’re setting a timeout of 500ms here, and operations still time out every time a partition between nodes occurs. In these tests we aren’t interfering with client-server traffic at all. Aerospike may claim “100% uptime”, but this is only meaningful with respect to particular latency bounds. Given Aerospike claims millisecond-scale latencies, you may want to reconsider whether you consider this “uptime”.
  4. 31 Continuous Delivery Architectures (Slideshare) — from a vendor, so one name crops up repeatedly (other than “Jenkins”), but it’s still good devops voyeurism/envy.
Four short links: 11 February 2015

Four short links: 11 February 2015

Crowdsourcing Working, etcd DKVS, Psychology Progress, and Inferring Logfile Rules

  1. Crowdsourcing Isn’t Broken — great rundown of ways to keep crowdsourcing on track. As with open sourcing something, just throwing open the doors and hoping for the best has a low probability of success.
  2. etcd Hits 2.0 — first major stable release of an open source, distributed, consistent key-value store for shared configuration, service discovery, and scheduler coordination.
  3. You Can’t Play 20 Questions With Nature and Win (PDF) — There is, I submit, a view of the scientific endeavor that is implicit (and sometimes explicit) in the picture I have presented above. Science advances by playing 20 questions with nature. The proper tactic is to frame a general question, hopefully binary, that can be attacked experimentally. Having settled that bits-worth, one can proceed to the next. The policy appears optimal – one never risks much, there is feedback from nature at every step, and progress is inevitable. Unfortunately, the questions never seem to be really answered, the strategy does not seem to work. An old paper, but still resonant today. (via Mind Hacks)
  4. Sequence: Automated Analyzer for Reducing 100k Messages to 10s of Patterns — induces patterns from the examples in log files.
Four short links: 3 December 2014

Four short links: 3 December 2014

VIsual NoSQL, QA MindSet, Future Programming, and Interactive Cities

  1. Visual Guide to NoSQL Systems — not quite accurate in the “pick any two,” but still a useful frame for understanding the landscape.
  2. The QA Mindset (Michael Lopp) — Humans do strange shit to software that we could never predict in the controlled setting of our carefully constructed software development environments. This x1000.
  3. Future Programming 2014 Videos — a collection of talks on boundary-pushing ideas around IDEs, code control, distributed objects, GPUs, etc.
  4. Some of These Things are Not Like the Others (Tom Armitage) — writeup on sensor-rich interactive cityscapes designed for residents to thrive rather than for merchants to transact. Lovely.
Four short links: 17 November 2014

Four short links: 17 November 2014

Tut Tut ISPs, Distributing Old Datastores, Secure Containers, and Design Workflow

  1. ISPs Remove Their Customers’ Email Encryption (EFF) — ISPs have apparently realised that man-in-the-middle is their business model.
  2. Dynomite (Netflix) — a sharding and replication layer. Dynomite can make existing non-distributed datastores, such as Redis or Memcached, into a fully distributed & multi-datacenter replicating datastore.
  3. After Dockersmaller, easier to manage, more secure containers via unikernels and immutable infrastructure.
  4. Pixelapse — something between Dropbox and Github for the design workflow and artifacts.