- word2vec — This tool provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research. From Google Research paper Efficient Estimation of Word Representations in Vector Space.
- What Every Frontend Developer Should Know about Page Rendering — Rendering has to be optimized from the very beginning, when the page layout is being defined, as styles and scripts play the crucial role in page rendering. Professionals have to know certain tricks to avoid performance problems. This arcticle does not study the inner browser mechanics in detail, but rather offers some common principles.
- Cayley — an open-source graph inspired by the graph database behind Freebase and Google’s Knowledge Graph.
- Alice in Warningland (PDF) — We performed a field study with Google Chrome and Mozilla Firefox’s telemetry platforms, allowing us to collect data on 25,405,944 warning impressions. We find that browser security warnings can be successful: users clicked through fewer than a quarter of both browser’s malware and phishing warnings and third of Mozilla Firefox’s SSL warnings. We also find clickthrough rates as high as 70.2% for Google Chrome SSL warnings, indicating that the user experience of a warning can have tremendous impact on user behaviour.
ENTRIES TAGGED "open source"
Efficient Representation, Page Rendering, Graph Database, Warning Effectiveness
Google MillWheel, 20yo Bug, Fast Real-Time Visualizations, and Google's Speed King
- MillWheel: Fault-Tolerant Stream Processing at Internet Scale — Google Research paper on the tech underlying the new cloud DataFlow tool. Watch the video. Yow.
- The Integer Overflow Bug That Went to Mars — long-standing (20 year old!) bug in a compression library prompts a wave of new releases. No word yet on whether NASA will upgrade the rover to avoid being pwned by Martian script kiddies. (update: I fell for a self-promoter. The Martians will need to find another attack vector. Huzzah!)
- epoch (github) — Fastly-produced open source general purpose real-time charting library for building beautiful, smooth, and high performance visualizations.
- Achieving Rapid Response Times in Large Online Services (YouTube) — Jeff Dean‘s keynote at Velocity. He wrote … a lot of things for this. And now he’s into deep learning ….
Interactive Narrative, Robot Economies, 8-Bit Philosophy, and Citizens and Sensors
- odyssey.js — storytelling tool to assemble interactive stories from narrative, pictures, maps.
- Our Work Here is Done: Visions of a Robot Economy (NESTA) — free downloadable ebook containing pieces from a variety of authors covering economics, engineering, history, philosophy and innovation studies. (via Robot Economics)
- 8-Bit Philosophy — learn philosophy with an 8-bit aesthetic. (via EdSurge)
- Sensors and Citizens: Finding Balance in the New Urban Reality (Frog Design) — as the sensor systems themselves become capable of autonomous data collection and information creation, we will begin to encounter closed-loop spatial sensing networks capable not only of taking instructions, but also of taking action. When that happens, nearly every industry and government imaginable—and your daily life—will be deeply affected. This is exciting, scary, and inevitable.
More visible at Health Privacy Summit than Health Datapalooza.
On the first morning of the biggest conference on data in health care–the Health Datapalooza in Washington, DC–newspapers reported a bill allowing the Department of Veterans Affairs to outsource more of its care, sending veterans to private health care providers to relieve its burdensome shortage of doctors.
There has been extensive talk about the scandals at the VA and remedies for them, including the political and financial ramifications of partial privatization. Republicans have suggested it for some time, but for the solution to be picked up by socialist Independent Senator Bernie Sanders clinches the matter. What no one has pointed out yet, however–and what makes this development relevant to the Datapalooza–is that such a reform will make the free flow of patient information between providers more crucial than ever.
Decision Trees, Decision Modifications, Mobile Patents, Web Client
- Quick DT — open source (Java) decision tree learner.
- Revealing Hidden Changes to Supreme Court Opinions — WHEREAS, It is now well-documented that the Supreme Court of the United States makes changes to its opinions after the opinion is published; and WHEREAS, Only “Four legal publishers are granted access to “change pages” that show all revisions. Those documents are not made public, and the court refused to provide copies to The New York Times”; and WHEREAS, git makes it easy to identify when changes have been made; RESOLVED, I shall apply a cron job to at least identify when the actual PDF has changed so everyone can see which documents have changed.
- Microsoft’s “Killer” Android Patents Revealed (Ars Technica) — Chinese Government required them disclosed as part of MSFT-Nokia merger. The patent lists are strategically significant, because Microsoft has managed to build a huge patent-licensing business by taxing Android phones without revealing what kind of legal leverage they really have over those phones.
- HTTPie — a command line HTTP client, a user-friendly HTTP client.
Right to Mine, Summarising Microblogs, C Sucks for Stats, and Scanning Logfiles
- UK Copyright Law Permits Researchers to Data Mine — changes mean Copyright holders can require researchers to pay to access their content but cannot then restrict text or data mining for non-commercial purposes thereafter, under the new rules. However, researchers that use the text or data they have mined for anything other than a non-commercial purpose will be said to have infringed copyright, unless the activity has the consent of rights holders. In addition, the sale of the text or data mined by researchers is prohibited. The derivative works will be very interesting: if university mines the journals, finds new possibility for a Thing, is verified experimentally, is that Thing the university’s to license commercially for profit?
- Efficient Online Summary of Microblogging Streams (PDF) — research paper. The algorithm we propose uses a word graph, along with optimization techniques such as decaying windows and pruning. It outperforms the baseline in terms of summary quality, as well as time and memory efficiency.
- Statistical Shortcomings in Standard Math Libraries — or “Why C Derivatives Are Not Popular With Statistical Scientists”. The following mathematical functions are necessary for implementing any rudimentary statistics application; and yet they are general enough to have many applications beyond statistics. I hereby propose adding them to the standard C math library and to the libraries which inherit from it. For purposes of future discussion, I will refer to these functions as the Elusive Eight.
- fail2ban — open source tool that scans logfiles for signs of malice, and triggers actions (e.g., iptables updates).
Trusting Code, Deep Pi, Docker DevOps, and Secure Database
- Trusting Browser Code (Tim Bray) — on the fundamental weakness of the ‘net as manifest in the browser.
- Deep Learning in the Raspberry Pi (Pete Warden) — $30 now gets you a computer you can run deep learning algorithms on. Awesome.
- Announcing Docker Hub and Official Repositories — as Docker went 1.0 and people rave about how they use it, comes this. They’re thinking hard about “integrating into the build ship run loop”, which aligns well with DevOps-enabling tool use.
- Apple’s Secure Database for Users (Ian Waring) — excellent breakdown of how Apple have gone out of their way to make their cloud database product safe and robust. They may be slow to “the cloud” but they have decades of experience having users as customers instead of products.
Open Autopilot, Record Robot Sales, NSA Myths Busted, and Informative Errors
- beaglepilot (Github) — open source open hardware autopilot for Beagleboard. (via DIY Drones)
- IFR Robot Sales Charts (PDF) — 2013: all-time high of 179,000 industrial robots sold and growth continues in 2014. (via Robohub)
- The Top 5 Claims That Defenders of the NSA Have to Stop Making to Remain Credible (EFF) — great Mythbusting.
- Netflix’s New Error Message — instead of “buffering”, they point the finger at the carrier between them and the customer who is to blame for slow performance. Genius!
Machine Learning Mistakes, Recommendation Bandits, Droplet Robots, and Plain English
- Machine Learning Done Wrong — [M]ost practitioners pick the modeling algorithm they are most familiar with rather than pick the one which best suits the data. In this post, I would like to share some common mistakes (the don’t-s).
- Bandits for Recommendations — A common problem for internet-based companies is: which piece of content should we display? Google has this problem (which ad to show), Facebook has this problem (which friend’s post to show), and RichRelevance has this problem (which product recommendation to show). Many of the promising solutions come from the study of the multi-armed bandit problem.
- Droplets — the Droplet is almost spherical, can self-right after being poured out of a bucket, and has the hardware capabilities to organize into complex shapes with its neighbors due to accurate range and bearing. Droplets are available open-source and use cheap vibration motors and a 3D printed shell. (via Robohub)
- Apple’s App Store Approval Guidelines — some of the plainest English I’ve seen, especially the Introduction. I can only aspire to that clarity. If your App looks like it was cobbled together in a few days, or you’re trying to get your first practice App into the store to impress your friends, please brace yourself for rejection. We have lots of serious developers who don’t want their quality Apps to be surrounded by amateur hour.