Nat has chaired the O'Reilly Open Source Convention and other O'Reilly conferences for over a decade. He ran the first web server in New Zealand, co-wrote the best-selling Perl Cookbook, and was one of the founding Radar bloggers. He lives in New Zealand and consults in the Asia-Pacific region.
Pattern — a web mining module for the Python programming language. It has tools for data mining (Google, Twitter and Wikipedia API, a web crawler, a HTML DOM parser), natural language processing (part-of-speech taggers, n-gram search, sentiment analysis, WordNet), machine learning (vector space model, clustering, SVM), network analysis and <canvas> visualization.
Microsoft HoloLens Goggles (Wired) — a media release about the next thing from the person behind Kinect. I’m still trying to figure out (as are investors, I’m sure) where in the hype curve this Googles/AR/etc. amalgam lives. Is it only a tech proof-of-concept? Is it a games device like Kinect? Is it good and cheap enough for industrial apps? Or is this the long-awaited climb out of irrelevance for Virtual Reality?
The Facebook (YouTube) — brilliant fake 1995 ad for The Facebook. Excuse me, I’m off to cleanse.
Natural Language in Social Robotics (Robohub) — Natural language interfaces are turning into a de-facto interface convention. Just like the GUI overlapped and largely replaced the command line, NLP is now being used by robots, the Internet of things, wearables, and especially conversational systems like Apple’s Siri, Google’s Now, Microsoft’s Cortana, Nuance’s Nina, Amazon’s Echo and others. These interfaces are designed to simplify, speed up, and improve task completion. Natural language interaction with robots, if anything, is an interface. It’s a form of UX that requires design.
Microservices and Testing (Martin Fowler) — testing across component boundaries, in the face of failing data stores and HTTP timeouts. The first discussion of testing in a web-scale world that I’ve seen from The Mainstream.
MIT Faculty Search — two open gigs at MIT, one around climate change and one “undefined.” Great job ad.
Scalability at What Cost? — evaluation of these systems, especially in the academic context, is lacking. Folks have gotten all wound-up about scalability, despite the fact that scalability is just a means to an end (performance, capacity). When we actually look at performance, the benefits the scalable systems bring start to look much more sketchy. We’d like that to change.
Reset (Rowan Simpson) — It was a bit chilling to go back over a whole years worth of tweets and discover how many of them were just junk. Visiting the water cooler is fine, but somebody who spends all day there has no right to talk of being full.
Google’s AI Brain — on the subject of Google’s AI ethics committee … Q: Will you eventually release the names? A: Potentially. That’s something also to be discussed. Q: Transparency is important in this too. A: Sure, sure. Such reassuring.
AVA is now Open Source (Laura Bell) — Assessment, Visualization and Analysis of human organisational information security risk. AVA maps the realities of your organisation, its structures and behaviors. This map of people and interconnected entities can then be tested using a unique suite of customisable, on-demand, and scheduled information security awareness tests.
Deep Learning for Torch (Facebook) — Facebook AI Research open sources faster deep learning modules for Torch, a scientific computing framework with wide support for machine learning algorithms.
Putting the Nuclear Option Front and Centre (Tom Armitage) — offering what feels like the nuclear option front and centre, reminding the user that it isn’t a nuclear option. I love this. “Undo” changes your experience profoundly.
3D-Printing Carbon Fibre (Makezine) — the machine doesn’t produce angular, stealth fighter-esque pieces with the telltale CF pattern seen on racing bikes and souped up Mustangs. Instead, it creates an FDM 3D print out of nylon filament (rather than ABS or PLA), and during the process it layers in a thin strip of carbon fiber, melted into place from carbon fiber fabric using a second extruder head. (It can also add in kevlar or fiberglass.)
The Devops Identity Crisis (Baron Schwartz) — I saw one framework-retailing bozo saying that devops was the art of ensuring there were no flaws in software. I didn’t know whether to cry or keep firing until the gun clicked.
Apache Giraph — an iterative graph processing system built for high scalability. For example, it is currently used at Facebook to analyze the social graph formed by users and their connections.
Apache Flink — a data processing system and an alternative to Hadoop’s MapReduce component. It comes with its own runtime, rather than building on top of MapReduce. As such, it can work completely independently of the Hadoop ecosystem. However, Flink can also access Hadoop’s distributed file system (HDFS) to read and write data, and Hadoop’s next-generation resource manager (YARN) to provision cluster resources. Since most Flink users are using Hadoop HDFS to store their data, we ship already the required libraries to access HDFS.
Internet of Things: Blackett Review — the British Government’s review of Internet of Things opportunities around government. Government and others can use expert commissioning to encourage participants in demonstrator programmes to develop standards that facilitate interoperable and secure systems. Government as a large purchaser of IoT systems is going to have a big impact if it buys wisely. (via Matt Webb)
rdbms-subsetter — open source tool to generate a random sample of rows from a relational database that preserves referential integrity – so long as constraints are defined, all parent rows will exist for child rows. (via 18F)
UXcheck — a browser extension to help you do a quick UX check against Nielsen’s 10 principles.
Building the Workplace We Want (Slack) — culture is the manifestation of what your company values. What you reward, who you hire, how work is done, how decisions are made — all of these things are representations of the things you value and the culture you’ve wittingly or unwittingly created. Nice (in the sense of small, elegant) explanation of what they value at Slack.
The Internet of Things Has Four Big Data Problems (Alistair Croll) — What the IoT needs is data. Big data and the IoT are two sides of the same coin. The IoT collects data from myriad sensors; that data is classified, organized, and used to make automated decisions; and the IoT, in turn, acts on it. It’s precisely this ever-accelerating feedback loop that makes the coin as a whole so compelling. Nowhere are the IoT’s data problems more obvious than with that darling of the connected tomorrow known as the wearable. Yet, few people seem to want to discuss these problems.
Keysweeper — a stealthy Arduino-based device, camouflaged as a functioning USB wall charger, that wirelessly and passively sniffs, decrypts, logs, and reports back (over GSM) all keystrokes from any Microsoft wireless keyboard in the vicinity. Designs and demo videos included.
The Toxoplasma of Rage — It’s in activists’ interests to destroy their own causes by focusing on the most controversial cases and principles, the ones that muddy the waters and make people oppose them out of spite. And it’s in the media’s interest to help them and egg them on.
Samza: LinkedIn’s Stream-Processing Engine — Samza’s goal is to provide a lightweight framework for continuous data processing. Unlike batch processing systems such as Hadoop, which typically has high-latency responses (sometimes hours), Samza continuously computes results as data arrives, which makes sub-second response times possible.