"databases" entries

Four short links: 3 July 2015

Four short links: 3 July 2015

Storage Interference, Open Source SSL, Pub-Sub Reverse-Proxy, and Web Components Checklist

  1. The Storage Tipping Pointthe performance optimization technologies of the last decade – log structured file systems, coalesced writes, out-of-place updates and, soon, byte-addressable NVRAM – are conflicting with similar-but-different techniques used in SSDs and arrays. The software we use is written for dumb storage; we’re getting smart storage; but smart+smart = fragmentation, write amplification, and over-consumption.
  2. s2n — Amazon’s open source ssl implementation.
  3. pushpina reverse proxy server that makes it easy to implement WebSocket, HTTP streaming, and HTTP long-polling services. It communicates with backend web applications using regular, short-lived HTTP requests (GRIP protocol). This allows backend applications to be written in any language and use any webserver.
  4. The Gold Standard Checklist for Web ComponentsThis is a working draft of a checklist to define a “gold standard” for web components that aspire to be as predictable, flexible, reliable, and useful as the standard HTML elements.
Four short links: 24 June 2015

Four short links: 24 June 2015

Big Data Architecture, Leaving the UK, GPU-powered Queries, and Gongkai in the West

  1. 100 Big Data Architecture Papers (Anil Madan) — you’ll either find them fascinating essential reading … or a stellar cure for insomnia.
  2. Software Companies Leaving UK Because of Government’s Surveillance Plans (Ars Technica) — to Amsterdam, to NYC, and to TBD.
  3. MapD: Massive Throughput Database Queries with LLVM and GPUs (nvidia) — The most powerful GPU currently available is the NVIDIA Tesla K80 Accelerator, with up to 8.74 teraflops of compute performance and nearly 500 GB/sec of memory bandwidth. By supporting up to eight of these cards per server, we see orders-of-magnitude better performance on standard data analytics tasks, enabling a user to visually filter and aggregate billions of rows in tens of milliseconds, all without indexing.
  4. Why It’s Often Easier to Innovate in China than the US (Bunnie Huang) — We did some research into the legal frameworks and challenges around absorbing gongkai IP into the Western ecosystem, and we believe we’ve found a path to repatriate some of the IP from gongkai into proper open source.
Four short links: 17 April 2015

Four short links: 17 April 2015

Distributed SQLite, Communicating Scientists, Learning from Failure, and Cat Convergence

  1. Replicating SQLite using Raft Consensus — clever, he used a consensus algorithm to build a distributed (replicated) SQLite.
  2. When Open Access is the Norm, How do Scientists Communicate? (PLOS) — From interviews I’ve conducted with researchers and software developers who are modeling aspects of modern online collaboration, I’ve highlighted the most useful and reproducible practices. (via Jon Udell)
  3. Meet DJ Patil“It was this kind of moment when you realize: ‘Oh, my gosh, I am that stupid,’” he said.
  4. Interview with Bruce Sterling on the Convergence of Humans and MachinesIf you are a human being, and you are doing computation, you are trying to multiply 17 times five in your head. It feels like thinking. Machines can multiply, too. They must be thinking. They can do math and you can do math. But the math you are doing is not really what cognition is about. Cognition is about stuff like seeing, maneuvering, having wants, desires. Your cat has cognition. Cats cannot multiply 17 times five. They have got their own umwelt (environment). But they are mammalian, you are a mammalian. They are actually a class that includes you. You are much more like your house cat than you are ever going to be like Siri. You and Siri converging, you and your house cat can converge a lot more easily. You can take the imaginary technologies that many post-human enthusiasts have talked about, and you could afflict all of them on a cat. Every one of them would work on a cat. The cat is an ideal laboratory animal for all these transitions and convergences that we want to make for human beings. (via Vaughan Bell)
Four short links: 13 March 2015

Four short links: 13 March 2015

Sad Sysadminning, Data Workflow, Ambiguous "Database," and Creepy Barbie

  1. The Sad State of Sysadmin in the Age of Containers (Erich Schubert) — a Grumpy Old Man rant, but solid. And since nobody is still able to compile things from scratch, everybody just downloads precompiled binaries from random websites. Often without any authentication or signature.
  2. Pinball — Pinterest open-sourced their data workflow manager.
  3. Disambiguating Databases (ACM) — The scope of the term database is vast. Technically speaking, anything that stores data for later retrieval is a database. Even by that broad definition, there is functionality that is common to most databases. This article enumerates those features at a high level. The intent is to provide readers with a toolset with which they might evaluate databases on their relative merits.
  4. Hello Barbie — I just can’t imagine a business not wanting to mine and repurpose the streams of audio data coming into their servers. “You listen to Katy Perry a lot. So do I! You have a birthday coming up. Have you told your parents about the Katy Perry brand official action figurines from Mattel? Kids love ’em, and demo data and representative testing indicates you will, too!” Or just offer a subscription service where parents can listen in on what their kids say when they play in the other room with their friends. Or identify product mentions and cross-market offline. Or …
Four short links: 10 March 2015

Four short links: 10 March 2015

Robot Swarms, Media Hacking, Inside-Out Databases, and Quantified Medical Self

  1. Surgical Micro-Robot SwarmsA swarm of medical microrobots. Start with cm sized robots. These already exist in the form of pillbots and I reference the work of Paolo Dario’s lab in this direction. Then get 10 times smaller to mm sized robots. Here we’re at the limit of making robots with conventional mechatronics. The almost successful I-SWARM project prototyped remarkable robots measuring 4 x 4 x 3mm. But now shrink by another 3 orders of magnitude to microbots, measured in micrometers. This is how small robots would have to be in order to swim through and access (most of) the vascular system. Here we are far beyond conventional materials and electronics, but amazingly work is going on to control bacteria. In the example I give from the lab of Sylvain Martel, swarms of magnetotactic bacteria are steered by an external magnetic field and, interestingly, tracked in an MRI scanner.
  2. Media Hacking — interesting discussion of the techniques used to spread disinformation through social media, often using bots to surface/promote a message.
  3. Turning the Database Inside Out with Apache Samzareplication, secondary indexing, caching, and materialized views as a way of getting into distributed stream processing.
  4. Apple Research Kit — Apple positioning their mobile personal biodata tools with medical legitimacy, presumably as a way to distance themselves from the stereotypical quantified selfer. I’m reminded of the gym chain owner who told me, about the Nike+, “yeah, maybe 5% of my clients will want this. The rest go to the gym so they can eat and drink what they want.”
Four short links: 9 March 2015

Four short links: 9 March 2015

Shareable Audio, Designing Robot Relationships, Machine Learning for Programming, and Geospatial Databases

  1. Four Types of Audio That People Share (Nieman Lab) — Audio Explainers, Whoa! Sounds, Storytellers, and Snappy Reviews, the results of experiments with NPR stations.
  2. Designing the Human-Robot Relationship (O’Reilly) — We can use those same principles [Jakob Nielsen’s usability heuristics] and look for implications of robots serving our higher ordered needs, as we move from serving needs related to convenience or performance to actually supporting our decision making to emerging technologies, moving from being able to do anything or be magic in terms of the user interface to being more human in the user interface.
  3. Machine Learning for General Programming — Peter Norvig talk. What more do you need to know?
  4. Why Are Geospatial Databases So Hard To Build?Algorithms in computer science, with rare exception, leverage properties unique to one-dimensional scalar data models. In other words, data types you can abstractly represent as an integer. Even when scalar data types are multidimensional, they can often be mapped to one dimension. This works well, as the majority of [what] data people care about can be represented with scalar types. If your data model is inherently non-scalar, you enter an algorithm wasteland in the computer science literature.
Four short links: 3 March 2015

Four short links: 3 March 2015

Wearable Warning, Time Series Data, App Cards, and Secure Comms

  1. You Guys Realize the Apple Watch is Going to Flop, Right? — leaving aside the “guys” assumption of its readers, you can take this either as a list of the challenges Apple will inevitably overcome or bypass when they release their watch, or (as intended) a list of the many reasons that it’s too damn soon for watches to be useful. The Apple Watch is Jonathan Ive’s new Newton. It’s a potentially promising form that’s being built about 10 years before Apple has the technology or infrastructure to pull it off in a meaningful way. As a result, the novel interactions that could have made the Apple watch a must-have device aren’t in the company’s launch product, nor are they on the immediate horizon. And all Apple can sell the public on is a few tweets and emails on their wrists—an attempt at a fashion statement that needs to be charged once or more a day.
  2. InfluxDB, Now With Tags and More UnicornsThe combination of these new features [tagging, and the use of tags in queries] makes InfluxDB not just a time series database, but also a database for time series discovery. It’s our solution for making the problem of dealing with hundreds of thousands or millions of time series tractable.
  3. The End of Apps as We Know ThemIt may be very likely that the primary interface for interacting with apps will not be the app itself. The app is primarily a publishing tool. The number one way people use your app is through this notification layer, or aggregated card stream. Not by opening the app itself. To which one grumpy O’Reilly editor replied, “cards are the new walled garden.”
  4. Signal 2.0Signal uses your existing phone number and address book. There are no separate logins, usernames, passwords, or PINs to manage or lose. We cannot hear your conversations or see your messages, and no one else can either. Everything in Signal is always end-to-end encrypted, and painstakingly engineered in order to keep your communication safe.
Four short links: 11 February 2015

Four short links: 11 February 2015

Crowdsourcing Working, etcd DKVS, Psychology Progress, and Inferring Logfile Rules

  1. Crowdsourcing Isn’t Broken — great rundown of ways to keep crowdsourcing on track. As with open sourcing something, just throwing open the doors and hoping for the best has a low probability of success.
  2. etcd Hits 2.0 — first major stable release of an open source, distributed, consistent key-value store for shared configuration, service discovery, and scheduler coordination.
  3. You Can’t Play 20 Questions With Nature and Win (PDF) — There is, I submit, a view of the scientific endeavor that is implicit (and sometimes explicit) in the picture I have presented above. Science advances by playing 20 questions with nature. The proper tactic is to frame a general question, hopefully binary, that can be attacked experimentally. Having settled that bits-worth, one can proceed to the next. The policy appears optimal – one never risks much, there is feedback from nature at every step, and progress is inevitable. Unfortunately, the questions never seem to be really answered, the strategy does not seem to work. An old paper, but still resonant today. (via Mind Hacks)
  4. Sequence: Automated Analyzer for Reducing 100k Messages to 10s of Patterns — induces patterns from the examples in log files.
Four short links: 10 February 2015

Four short links: 10 February 2015

Speech Recognition, Predictive Analytic Queries, Video Chat, and Javascript UI Library

  1. The Uncanny Valley of Speech Recognition (Zach Holman) — I’m reminded of driving up US-280 in 2003 or so with @raelity, a Kiwi and a South African trying every permutation of American accent from Kentucky to Yosemite Sam in order to get TellMe to stop giving us the weather for zipcode 10000. It didn’t recognise the swearing either. (Caution: features similarly strong language.)
  2. TuPAQ: An Efficient Planner for Large-scale Predictive Analytic Queries (PDF) — an integrated PAQ [Predictive Analytic Queries] planning architecture that combines advanced model search techniques, bandit resource allocation via runtime algorithm introspection, and physical optimization via batching. The resulting system, TUPAQ, solves the PAQ planning problem with comparable accuracy to exhaustive strategies but an order of magnitude faster, and can scale to models trained on terabytes of data across hundreds of machines.
  3. p2pvc — point-to-point video chat. In an 80×25 terminal window.
  4. Sortable — nifty UI library.
Four short links: 14 January 2015

Four short links: 14 January 2015

IoT and Govt, Exactly Once, Random Database Subset, and UX Checking

  1. Internet of Things: Blackett Review — the British Government’s review of Internet of Things opportunities around government. Government and others can use expert commissioning to encourage participants in demonstrator programmes to develop standards that facilitate interoperable and secure systems. Government as a large purchaser of IoT systems is going to have a big impact if it buys wisely. (via Matt Webb)
  2. Exactly Once Semantics with Kafka — designing for failure means it’s easier to ensure that things get done than it is to ensure that things get done exactly once.
  3. rdbms-subsetter — open source tool to generate a random sample of rows from a relational database that preserves referential integrity – so long as constraints are defined, all parent rows will exist for child rows. (via 18F)
  4. UXcheck — a browser extension to help you do a quick UX check against Nielsen’s 10 principles.