Liquibase — source control for your database. Apache 2.0 licensed.
A Few Useful Things to Know About Machine Learning (PDF) — This article summarizes twelve key lessons that machine learning researchers and practitioners have learned. These include pitfalls to avoid, important issues to focus on, and answers to common questions. My fave: First-timers are often surprised by how little time in a machine learning project is spent actually doing machine learning. But it makes sense if you consider how time-consuming it is to gather data, integrate it, clean it and pre-process it, and how much trial and error can go into feature design.
Everything You Wanted to Know About Synchronization But Were Too Afraid to Ask (PDF) — This paper presents the most exhaustive study of synchronization to date. We span multiple layers, from hardware cache-coherence protocols up to high-level concurrent software. We do so on different types of architectures, from single-socket — uniform and non- uniform — to multi-socket — directory and broadcast-based many-cores. We draw a set of observations that, roughly speaking, imply that scalability of synchronization is mainly a property of the hardware.
Inside YouTube’s Fame Factory (FastCompany) — great article about the tipping point where peer-to-peer fame becomes stage-managed corporate fame, as Vidcon grows. See also Variety: If YouTube stars are swallowed by Hollywood, they are in danger of becoming less authentic versions of themselves, and teenagers will be able to pick up on that,” Sehdev says. “That could take away the one thing that makes YouTube stars so appealing.”
Material Design in the Google I/O App (Medium) — steps through design thinking as they put Google’s new design metaphor in place. I’ve been chewing on material design. It brings an internal consistency and logic to the Android world that Apple’s iOS and OS X visual worlds have been losing over the years. How long until web users expect this consistency too?
Stewart and Slack (Wired) — profile of Foo Stewart Butterfield and his shiny Slack startup.
Mesa: Geo-Replicated, Near Real-Time, Scalable Data Warehousing (PDF) — paper by Googlers on the database holding G’s ad data. Trillions of rows, petabytes of data, point queries with 99th percentile latency in the hundreds of milliseconds and overall query throughput of trillions of rows fetched per day, continuous updates on the order of millions of rows updated per second, strong consistency and repeatable query results even if a query involves multiple datacenters, and no SPOF. (via Greg Linden)
Thumbstopping (Salon) — The prime goal of a Facebook ad campaign is to create an ad “so compelling that it would get people to stop scrolling through their news feeds,” reports the Times. This is known, in Facebook land, as a “thumbstopper.” And thus, the great promise of the digitial revolution is realized: The best minds of our generation are obsessed with manipulating the movement of your thumb on a smartphone touch-screen.
Microsoft’s Development Practices (Ars Technica) — they get the devops religion but call it “combined engineering”. They get the idea of shared code bases, but call it “open source”. At least when they got the agile religion, they called it that. Check out the horror story of where they started: a two-year development process in which only about four months would be spent writing new code. Twice as long would be spent fixing that code. MSFT’s waterfall was the equivalent of American football, where there’s 11 minutes of actual play in the average 3h 12m game.
Discussion Graph Tool (Microsoft Research) — simplifies social media analysis by making it easy to extract high-level features and co-occurrence relationships from raw data.
Superlinear Productivity in Collective Group Actions (PLoS ONE) — study of open source projects shows small groups exhibit non-linear productivity increases by size, which drop off at larger sizes. we document a size effect in the strength and variability of the superlinear effect, with smaller groups exhibiting widely distributed superlinear exponents, some of them characterizing highly productive teams. In contrast, large groups tend to have a smaller superlinearity and less variability.
coop — cheat sheet of the most common concurrency program flows in Go.
Tessera — set of open source tools around Hadoop, R, and visualization.
Mining of Massive Datasets (PDF) — book by Stanford profs, focuses on data mining of very large amounts of data, that is, data so large it does not fit in main memory. Because of the emphasis on size, many of our examples are about the Web or data derived from the Web. Further, the book takes an algorithmic point of view: data mining is about applying algorithms to data, rather than using data to “train” a machine-learning engine of some sort.
Lessons from Iceland’s Failed Crowdsourced Constitution (Slate) — Though the crowdsourcing moment could have led to a virtuous deliberative feedback loop between the crowd and the Constitutional Council, the latter did not seem to have the time, tools, or training necessary to process carefully the crowd’s input, explain its use of it, let alone return consistent feedback on it to the public.
Thread a ZigBee Killer? — Thread is Nest’s home automation networking stack, which can use the same hardware components as ZigBee, but which is not compatible, also not open source. The Novell NetWare of Things. Nick Hunn makes argument that Google (via Nest) are taking aim at ZigBee: it’s Google and Nest saying “ZigBee doesn’t work”.
HP’s IoT Security Research (PDF) — 70% of devices use unencrypted network services, 90% of devices collected at least one piece of personal information, 60% of those that have UIs are vulnerable to things like XSS, 60% didn’t use encryption when downloading software updates, …
USB Security Flawed From Foundation (Wired) — The element of Nohl and Lell’s research that elevates it above the average theoretical threat is the notion that the infection can travel both from computer to USB and vice versa. Any time a USB stick is plugged into a computer, its firmware could be reprogrammed by malware on that PC, with no easy way for the USB device’s owner to detect it. And likewise, any USB device could silently infect a user’s computer. “It goes both ways,” Nohl says. “Nobody can trust anybody.” [...] “In this new way of thinking, you can’t trust a USB just because its storage doesn’t contain a virus. Trust must come from the fact that no one malicious has ever touched it,” says Nohl. “You have to consider a USB infected and throw it away as soon as it touches a non-trusted computer. And that’s incompatible with how we use USB devices right now.”
AdBlock vs AdBlock Plus — short answer: the genuinely open source AdBlock Plus, because AdBlock resiled from being open source, phones home, has misleading changelog entries, …. No longer trustworthy.
streisand — sets up a new server running L2TP/IPsec, OpenSSH, OpenVPN, Shadowsocks, Stunnel, and a Tor bridge. It also generates custom configuration instructions for all of these services. At the end of the run you are given an HTML file with instructions that can be shared with friends, family members, and fellow activists.