DeepDive — DeepDive is targeted to help users extract relations between entities from data and make inferences about facts involving the entities. DeepDive can process structured, unstructured, clean, or noisy data and outputs the results into a database.
Running Kafka at Scale (LinkedIn Engineering) — This tiered infrastructure solves many problems, but it greatly complicates monitoring Kafka and assuring its health. While a single Kafka cluster, when running normally, will not lose messages, the introduction of additional tiers, along with additional components such as mirror makers, creates myriad points of failure where messages can disappear. In addition to monitoring the Kafka clusters and their health, we needed to create a means to assure that all messages produced are present in each of the tiers, and make it to the critical consumers of that data.
A/A Testing — In an A/A test, you run a test using the exact same options for both “variants” in your test. That’s right, there’s no difference between “A” and “B” in an A/A test. It sounds stupid, until you see the “results.” (via Nelson Minar)
NSA Declares War on General-Purpose Computing (BoingBoing) — NSA director Michael S Rogers says his agency wants “front doors” to all cryptography used in the USA, so that no one can have secrets it can’t spy on — but what he really means is that he wants to be in charge of which software can run on any general purpose computer.
Subjectivity-Exploitability Tradeoff — Voting-based DAOs, lacking an equivalent of shareholder regulation, are vulnerable to attacks where 51% of participants collude to take all of the DAO’s assets for themselves […] The example supplied here will define a new, third, hypothetical form of blockchain or DAO governance. Every day we’re closer to Stross’s Accelerando.
Sahale — open source cascading workflow visualizer to help you make sense of tasks decomposed into Hadoop jobs. (via Code as Craft)
Brain Time (David Eagleman) — the visual system is a distributed system with some flexible built-in consistency. So if the visual brain wants to get events correct timewise, it may have only one choice: wait for the slowest information to arrive. To accomplish this, it must wait about a tenth of a second. In the early days of television broadcasting, engineers worried about the problem of keeping audio and video signals synchronized. Then they accidentally discovered that they had around a hundred milliseconds of slop: As long as the signals arrived within this window, viewers’ brains would automatically resynchronize the signals; outside that tenth-of-a-second window, it suddenly looked like a badly dubbed movie.
CS Bumper Stickers (PDF) — Allocate four digits for the year part of a date: a new millenium is coming. —David Martin. From 1985.
ASCIIcam — real-time ASCII output from your videocamera. This is doing terrible things to my internal chronometer. Is it 2014 or 1984? Yes!
Wearable Power Assist Device Goes on Sale in Japan (WSJ, Paywall) — The Muscle Suit, which weighs 5.5 kilograms (12 pounds), can be worn knapsack-style and uses a mouthpiece as its control. Unlike other similar suits that rely on motors, it uses specially designed rubber tubes and compressed air as the source of its power. The Muscle Suit can help users pick up everyday loads with about a third of the usual effort. […] will sell for about ¥600,000 ($5,190), and is also available for rent at about ¥30,000 to ¥50,000 per month. Prof. Kobayashi said he expected the venture would ship 5,000 of them in 2015. (via Robot Economics)
Building a Complete Tweet Index (Twitter) — engineering behind the massive searchable Tweet collection: indexes roughly half a trillion documents and serves queries with an average latency of under 100ms.
Lost Lessons from an 8-bit BASIC — The little language that fueled the home computer revolution has been long buried beneath an avalanche of derision, or at least disregarded as a relic from primitive times. That’s too bad, because while the language itself has serious shortcomings, the overall 8-bit BASIC experience has high points that are worth remembering.
The Epic Struggle of the Internet of Things — a Bruce Sterling Kindle single, a powerfully-written challenge to the presumed-benevolent technology-pervaded universe that we label “the Internet of Things”. The Internet of Things is not about a talking refrigerator, because that is the old-fashioned consumer retail world of electrical white goods. It’s an archaic concept, like software bought in a plastic-wrapped box from a shelf. The genuine Internet of Things wants to invade that refrigerator, measure it, instrument it, monitor any interactions with it; it would cheerfully give away a fridge at cost.
mbeddr — a set of integrated and extensible languages for embedded software engineering, plus an IDE. It supports implementation, testing, verification and process aspects. It integrates with command-line build tools and integration servers, as well as file-based version control systems. Nice to see something beyond webdev getting tools love.
Replace wget With axel — download accelerator, aka a parallel wget for situations where the fetched file has multiple servers.
Dat — an open source project that provides a streaming interface between every file format and data storage backend. See the Wired piece on it.
Smithsonian Crowdsourcing Transcription (Smithsonian) — 49 volunteers transcribed 200 pages of correspondence between the Monuments Men in a week. Soon it’ll be mathematics test questions: “if 49 people transcribe 200 pages in 7 days, how many weeks will it take …”
MIT Guide to Family CompSci Sessions — This guide is for educators, community center staff, and volunteers interested in engaging their young people and their families to become designers and inventors in their community.
Machine Learning for Plant Properties — startup building database of plant genomics, properties, research, etc. for mining. The more familiar you are with your data and its meaning, the better your machine learning will be at suggesting fruitful lines of query … and the more valuable your startup will be.
Dissecting Message Queues — throughput, latency, and qualitative comparison of different message queues. MQs are to modern distributed architectures what function calls were to historic unibox architectures.
1915 Data Visualization Rules — a reminder that data visualization is not new, but research into effectiveness of alternative presentation styles is.