Neglected Machine Learning Ideas — Perhaps my list is a “send me review articles and book suggestions” cry for help, but perhaps it is useful to others as an overview of neat things.
First Crowdfunded Book on Booker Shortlist — Booker excludes self-published works, but “The Wake” was through Unbound, a Threadless-style “if we hit this limit, the book is printed and you have bought a copy” site.
Watson Can Debate Its Opponents (io9) — Speaking in nearly perfect English, Watson/The Debater replied: “Scanned approximately 4 million Wikipedia articles, returning ten most relevant articles. Scanned all 3,000 sentences in top ten articles. Detected sentences which contain candidate claims. Identified borders of candidate claims. Assessed pro and con polarity of candidate claims. Constructed demo speech with top claim predictions. Ready to deliver.”
ipfs — a global, versioned, peer-to-peer file system. It combines good ideas from Git, BitTorrent, Kademlia, and SFS. You can think of it like a single BitTorrent swarm, exchanging Git objects, making up the web. IPFS provides an interface much simpler than HTTP, but has permanence built in.. (via Sourcegraph)
word2vec — This tool provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research. From Google Research paper Efficient Estimation of Word Representations in Vector Space.
What Every Frontend Developer Should Know about Page Rendering — Rendering has to be optimized from the very beginning, when the page layout is being defined, as styles and scripts play the crucial role in page rendering. Professionals have to know certain tricks to avoid performance problems. This arcticle does not study the inner browser mechanics in detail, but rather offers some common principles.
Cayley — an open-source graph inspired by the graph database behind Freebase and Google’s Knowledge Graph.
Alice in Warningland (PDF) — We performed a field study with Google Chrome and Mozilla Firefox’s telemetry platforms, allowing us to collect data on 25,405,944 warning impressions. We find that browser security warnings can be successful: users clicked through fewer than a quarter of both browser’s malware and phishing warnings and third of Mozilla Firefox’s SSL warnings. We also find clickthrough rates as high as 70.2% for Google Chrome SSL warnings, indicating that the user experience of a warning can have tremendous impact on user behaviour.
Quick DT — open source (Java) decision tree learner.
Revealing Hidden Changes to Supreme Court Opinions — WHEREAS, It is now well-documented that the Supreme Court of the United States makes changes to its opinions after the opinion is published; and WHEREAS, Only “Four legal publishers are granted access to “change pages” that show all revisions. Those documents are not made public, and the court refused to provide copies to The New York Times”; and WHEREAS, git makes it easy to identify when changes have been made; RESOLVED, I shall apply a cron job to at least identify when the actual PDF has changed so everyone can see which documents have changed.
Microsoft’s “Killer” Android Patents Revealed (Ars Technica) — Chinese Government required them disclosed as part of MSFT-Nokia merger. The patent lists are strategically significant, because Microsoft has managed to build a huge patent-licensing business by taxing Android phones without revealing what kind of legal leverage they really have over those phones.
HTTPie — a command line HTTP client, a user-friendly HTTP client.
Apple’s Secure Database for Users (Ian Waring) — excellent breakdown of how Apple have gone out of their way to make their cloud database product safe and robust. They may be slow to “the cloud” but they have decades of experience having users as customers instead of products.
101 Uses for Content Mining — between the list in the post and the comments from readers, it’s a good introduction to some of the value to be obtained from full-text structured and unstructured access to scientific research publications.
3D Printers Have a Lot to Learn from Sewing Machines — Sewing does not create more waste but, potentially, less, and the process of sewing is filled with opportunities for increasing one’s skills and doing it over as well as doing it yourself. What are quilts, after all, but a clever way to use every last scrap of precious fabric? (via Jenn Webb)