Nat has chaired the O'Reilly Open Source Convention and other O'Reilly conferences for over a decade. He ran the first web server in New Zealand, co-wrote the best-selling Perl Cookbook, and was one of the founding Radar bloggers. He lives in New Zealand and consults in the Asia-Pacific region.
UK Copyright Law Permits Researchers to Data Mine — changes mean Copyright holders can require researchers to pay to access their content but cannot then restrict text or data mining for non-commercial purposes thereafter, under the new rules. However, researchers that use the text or data they have mined for anything other than a non-commercial purpose will be said to have infringed copyright, unless the activity has the consent of rights holders. In addition, the sale of the text or data mined by researchers is prohibited. The derivative works will be very interesting: if university mines the journals, finds new possibility for a Thing, is verified experimentally, is that Thing the university’s to license commercially for profit?
Efficient Online Summary of Microblogging Streams (PDF) — research paper. The algorithm we propose uses a word graph, along with optimization techniques such as decaying windows and pruning. It outperforms the baseline in terms of summary quality, as well as time and memory efficiency.
Statistical Shortcomings in Standard Math Libraries — or “Why C Derivatives Are Not Popular With Statistical Scientists”. The following mathematical functions are necessary for implementing any rudimentary statistics application; and yet they are general enough to have many applications beyond statistics. I hereby propose adding them to the standard C math library and to the libraries which inherit from it. For purposes of future discussion, I will refer to these functions as the Elusive Eight.
fail2ban — open source tool that scans logfiles for signs of malice, and triggers actions (e.g., iptables updates).
Apple’s Secure Database for Users (Ian Waring) — excellent breakdown of how Apple have gone out of their way to make their cloud database product safe and robust. They may be slow to “the cloud” but they have decades of experience having users as customers instead of products.
Hidden Biases in Big Data — with every big data set, we need to ask which people are excluded. Which places are less visible? What happens if you live in the shadow of big data sets? (via Quinn Norton)
CoreObject — a version-controlled object database for Objective-C that supports powerful undo, semantic merging, and real-time collaborative editing.
Ethics and UX Design (Slideshare) –We are the thieves of time. This excellent talk challenges you (via Aristotle) to understand what a good life is, and whether you’re designing to bring it about. (via Keith Bolland)
Pepper Personal Robot — Japan’s lead in consumer-facing robotics is impressive. If this had been developed by an American company, it’d either have a Lua scripting interface or twin machine guns for autonomous death.
shrturl — spoof, edit, rewrite, and general evil up webpages, hidden behind an URL shortening service.
Lessons for Building Magical Devices (First Round Review) — The most interesting devices I’ve seen take elements of the physical world and expose them to software.[...] If you buy a Tesla Model S today, the behavior of the car six months from now could be radically different because software can reshape the capability of the hardware continuously, exceeding the speed of customer demand.
End-to-End PGP in Gmail — Google releases an open source Chrome extension to enable end-to-end OpenPGP on top of gmail. This is a good thing. As noted FSF developer Ben Franklin wrote: Those who would give up awkward key signing parties to purchase temporary convenience deserve neither.
Machine Learning Done Wrong — [M]ost practitioners pick the modeling algorithm they are most familiar with rather than pick the one which best suits the data. In this post, I would like to share some common mistakes (the don’t-s).
Bandits for Recommendations — A common problem for internet-based companies is: which piece of content should we display? Google has this problem (which ad to show), Facebook has this problem (which friend’s post to show), and RichRelevance has this problem (which product recommendation to show). Many of the promising solutions come from the study of the multi-armed bandit problem.
Droplets — the Droplet is almost spherical, can self-right after being poured out of a bucket, and has the hardware capabilities to organize into complex shapes with its neighbors due to accurate range and bearing. Droplets are available open-source and use cheap vibration motors and a 3D printed shell. (via Robohub)
Apple’s App Store Approval Guidelines — some of the plainest English I’ve seen, especially the Introduction. I can only aspire to that clarity. If your App looks like it was cobbled together in a few days, or you’re trying to get your first practice App into the store to impress your friends, please brace yourself for rejection. We have lots of serious developers who don’t want their quality Apps to be surrounded by amateur hour.
Cockroach — a distributed key/value datastore which supports ACID transactional semantics and versioned values as first-class features. The primary design goal is global consistency and survivability, hence the name. Cockroach aims to tolerate disk, machine, rack, and even datacenter failures with minimal latency disruption and no manual intervention. Cockroach nodes are symmetric; a design goal is one binary with minimal configuration and no required auxiliary services.
Linux Foundation Providing for Core Infrastructure Projects — press release, but interested in how they’re tackling sustainability—they’re taking on identifying worthies (glad I’m not the one who says “you’re not worthy” to a project) and being the non-profit conduit for the dosh. Interesting: implies they think the reason companies weren’t supporting necessary open source projects was some combination of being unsure who to support (projects you use, surely?) and how to get them money (ask?). (Sustainability of open source projects is a pet interest of mine)
Beyond the Stack (Mike Loukides) — tools and processes to support software developers who are as massively distributed as the code they build.
Mary Meeker’s Internet Trends 2014 (PDF) — the changes on slide 34 are interesting: usage moving away from G+/Facebook-style omniblather creepware and towards phonebook-based chat apps.
Introduction to Software Engineering Ethics (PDF) — amazing set of provocative questions and scenarios for software engineers about the decisions they made and consequences of their actions. From a course in ethics from SCU.
Open Government Data Online: Impenetrable (Guardian) — Too much knowledge gets trapped in multi-page pdf files that are slow to download (especially in low-bandwidth areas), costly to print, and unavailable for computer analysis until someone manually or automatically extracts the raw data.