Eve, Version 0 (Chris Grainger) — Version 0 contains a database, compiler, query runtime, data editor, and query editor. Basically, it’s a database with an IDE. You can add data both manually or through importing a CSV and then you can create queries over that data using our visual query editor.
Eigenstyle — clever analysis and reconstruction of images through principal component analysis. And here are “prettiest ugly dresses,” those that I classified as dislikes, that the program predicted I would really like.
Turing Digital Archive — many of Turing’s letters, talks, photographs, and unpublished papers, as well as memoirs and obituaries written about him. It contains images of the original documents that are held in the Turing collection at King’s College, Cambridge. (Timely as Jason Scott works to save a manual archive: , , )
Google Patenting Machine Learning Developments (Reddit) — I am afraid that Google has just started an arms race, which could do significant damage to academic research in machine learning. Now it’s likely that other companies using machine learning will rush to patent every research idea that was developed in part by their employees. We have all been in a prisoner’s dilemma situation, and Google just defected. Now researchers will guard their ideas much more combatively, given that it’s now fair game to patent these ideas, and big money is at stake.
Machine Ethics (Nature) — machine learning ethics versus rule-driven ethics. Logic is the ideal choice for encoding machine ethics, argues Luís Moniz Pereira, a computer scientist at the Nova Laboratory for Computer Science and Informatics in Lisbon. “Logic is how we reason and come up with our ethical choices,” he says. I disagree with his premises.
The Storage Tipping Point — the performance optimization technologies of the last decade – log structured file systems, coalesced writes, out-of-place updates and, soon, byte-addressable NVRAM – are conflicting with similar-but-different techniques used in SSDs and arrays. The software we use is written for dumb storage; we’re getting smart storage; but smart+smart = fragmentation, write amplification, and over-consumption.
pushpin — a reverse proxy server that makes it easy to implement WebSocket, HTTP streaming, and HTTP long-polling services. It communicates with backend web applications using regular, short-lived HTTP requests (GRIP protocol). This allows backend applications to be written in any language and use any webserver.
The Gold Standard Checklist for Web Components — This is a working draft of a checklist to define a “gold standard” for web components that aspire to be as predictable, flexible, reliable, and useful as the standard HTML elements.
MapD: Massive Throughput Database Queries with LLVM and GPUs (nvidia) — The most powerful GPU currently available is the NVIDIA Tesla K80 Accelerator, with up to 8.74 teraflops of compute performance and nearly 500 GB/sec of memory bandwidth. By supporting up to eight of these cards per server, we see orders-of-magnitude better performance on standard data analytics tasks, enabling a user to visually filter and aggregate billions of rows in tens of milliseconds, all without indexing.
Why It’s Often Easier to Innovate in China than the US (Bunnie Huang) — We did some research into the legal frameworks and challenges around absorbing gongkai IP into the Western ecosystem, and we believe we’ve found a path to repatriate some of the IP from gongkai into proper open source.
Meet DJ Patil — “It was this kind of moment when you realize: ‘Oh, my gosh, I am that stupid,’” he said.
Interview with Bruce Sterling on the Convergence of Humans and Machines — If you are a human being, and you are doing computation, you are trying to multiply 17 times five in your head. It feels like thinking. Machines can multiply, too. They must be thinking. They can do math and you can do math. But the math you are doing is not really what cognition is about. Cognition is about stuff like seeing, maneuvering, having wants, desires. Your cat has cognition. Cats cannot multiply 17 times five. They have got their own umwelt (environment). But they are mammalian, you are a mammalian. They are actually a class that includes you. You are much more like your house cat than you are ever going to be like Siri. You and Siri converging, you and your house cat can converge a lot more easily. You can take the imaginary technologies that many post-human enthusiasts have talked about, and you could afflict all of them on a cat. Every one of them would work on a cat. The cat is an ideal laboratory animal for all these transitions and convergences that we want to make for human beings. (via Vaughan Bell)
The Sad State of Sysadmin in the Age of Containers (Erich Schubert) — a Grumpy Old Man rant, but solid. And since nobody is still able to compile things from scratch, everybody just downloads precompiled binaries from random websites. Often without any authentication or signature.
Pinball — Pinterest open-sourced their data workflow manager.
Disambiguating Databases (ACM) — The scope of the term database is vast. Technically speaking, anything that stores data for later retrieval is a database. Even by that broad definition, there is functionality that is common to most databases. This article enumerates those features at a high level. The intent is to provide readers with a toolset with which they might evaluate databases on their relative merits.
Hello Barbie — I just can’t imagine a business not wanting to mine and repurpose the streams of audio data coming into their servers. “You listen to Katy Perry a lot. So do I! You have a birthday coming up. Have you told your parents about the Katy Perry brand official action figurines from Mattel? Kids love ’em, and demo data and representative testing indicates you will, too!” Or just offer a subscription service where parents can listen in on what their kids say when they play in the other room with their friends. Or identify product mentions and cross-market offline. Or …
Surgical Micro-Robot Swarms — A swarm of medical microrobots. Start with cm sized robots. These already exist in the form of pillbots and I reference the work of Paolo Dario’s lab in this direction. Then get 10 times smaller to mm sized robots. Here we’re at the limit of making robots with conventional mechatronics. The almost successful I-SWARM project prototyped remarkable robots measuring 4 x 4 x 3mm. But now shrink by another 3 orders of magnitude to microbots, measured in micrometers. This is how small robots would have to be in order to swim through and access (most of) the vascular system. Here we are far beyond conventional materials and electronics, but amazingly work is going on to control bacteria. In the example I give from the lab of Sylvain Martel, swarms of magnetotactic bacteria are steered by an external magnetic field and, interestingly, tracked in an MRI scanner.
Media Hacking — interesting discussion of the techniques used to spread disinformation through social media, often using bots to surface/promote a message.
Apple Research Kit — Apple positioning their mobile personal biodata tools with medical legitimacy, presumably as a way to distance themselves from the stereotypical quantified selfer. I’m reminded of the gym chain owner who told me, about the Nike+, “yeah, maybe 5% of my clients will want this. The rest go to the gym so they can eat and drink what they want.”
Designing the Human-Robot Relationship (O’Reilly) — We can use those same principles [Jakob Nielsen’s usability heuristics] and look for implications of robots serving our higher ordered needs, as we move from serving needs related to convenience or performance to actually supporting our decision making to emerging technologies, moving from being able to do anything or be magic in terms of the user interface to being more human in the user interface.
Why Are Geospatial Databases So Hard To Build? — Algorithms in computer science, with rare exception, leverage properties unique to one-dimensional scalar data models. In other words, data types you can abstractly represent as an integer. Even when scalar data types are multidimensional, they can often be mapped to one dimension. This works well, as the majority of [what] data people care about can be represented with scalar types. If your data model is inherently non-scalar, you enter an algorithm wasteland in the computer science literature.
You Guys Realize the Apple Watch is Going to Flop, Right? — leaving aside the “guys” assumption of its readers, you can take this either as a list of the challenges Apple will inevitably overcome or bypass when they release their watch, or (as intended) a list of the many reasons that it’s too damn soon for watches to be useful. The Apple Watch is Jonathan Ive’s new Newton. It’s a potentially promising form that’s being built about 10 years before Apple has the technology or infrastructure to pull it off in a meaningful way. As a result, the novel interactions that could have made the Apple watch a must-have device aren’t in the company’s launch product, nor are they on the immediate horizon. And all Apple can sell the public on is a few tweets and emails on their wrists—an attempt at a fashion statement that needs to be charged once or more a day.
InfluxDB, Now With Tags and More Unicorns — The combination of these new features [tagging, and the use of tags in queries] makes InfluxDB not just a time series database, but also a database for time series discovery. It’s our solution for making the problem of dealing with hundreds of thousands or millions of time series tractable.
The End of Apps as We Know Them — It may be very likely that the primary interface for interacting with apps will not be the app itself. The app is primarily a publishing tool. The number one way people use your app is through this notification layer, or aggregated card stream. Not by opening the app itself. To which one grumpy O’Reilly editor replied, “cards are the new walled garden.”
Signal 2.0 — Signal uses your existing phone number and address book. There are no separate logins, usernames, passwords, or PINs to manage or lose. We cannot hear your conversations or see your messages, and no one else can either. Everything in Signal is always end-to-end encrypted, and painstakingly engineered in order to keep your communication safe.
Crowdsourcing Isn’t Broken — great rundown of ways to keep crowdsourcing on track. As with open sourcing something, just throwing open the doors and hoping for the best has a low probability of success.
etcd Hits 2.0 — first major stable release of an open source, distributed, consistent key-value store for shared configuration, service discovery, and scheduler coordination.
You Can’t Play 20 Questions With Nature and Win (PDF) — There is, I submit, a view of the scientific endeavor that is implicit (and sometimes explicit) in the picture I have presented above. Science advances by playing 20 questions with nature. The proper tactic is to frame a general question, hopefully binary, that can be attacked experimentally. Having settled that bits-worth, one can proceed to the next. The policy appears optimal – one never risks much, there is feedback from nature at every step, and progress is inevitable. Unfortunately, the questions never seem to be really answered, the strategy does not seem to work. An old paper, but still resonant today. (via Mind Hacks)