From Gongkai to Open Source (Bunnie Huang) — The West has a “broadcast” view of IP and ownership: good ideas and innovation are credited to a clearly specified set of authors or inventors, and society pays them a royalty for their initiative and good works. China has a “network” view of IP and ownership: the far-sight necessary to create good ideas and innovations is attained by standing on the shoulders of others, and as such there is a network of people who trade these ideas as favors among each other. In a system with such a loose attitude toward IP, sharing with the network is necessary as tomorrow it could be your friend standing on your shoulders, and you’ll be looking to them for favors. This is unlike the West, where rule of law enables IP to be amassed over a long period of time, creating impenetrable monopoly positions. It’s good for the guys on top, but tough for the upstarts.
Roaring Bitmaps — compressed bitmaps which tend to outperform conventional compressed bitmaps such as WAH, EWAH or Concise. In some instances, they can be hundreds of times faster and they often offer significantly better compression.
Two Eras of the Internet: From Pull to Push (Chris Dixon) — in which the consumer becomes the infinite sink for an unending and constant stream of updates, media, and social mobile local offers to swipe right on brands near you.
PaGMO — Parallel Global Multiobjective Optimizer […] a generalization of the island model paradigm working for global and local optimization algorithms. Its main parallelization approach makes use of multiple threads, but MPI is also implemented and can be mixed in with multithreading. PaGMO can be used to solve in a parallel fashion, global optimization tasks.
Avoiding the Tragedy of the Anticommons — Many people talk about “open source biology.” Mike Loukides pulls apart open source and biology to see what the relationship might be. I’m still chewing on what devops for bio would be. Modern software systems throw off gigabytes of data, and we have built tools to monitor those systems, archive their data, and automate much of the analysis. There are free and commercial packages for logging and monitoring, and it continues to be a very active area of software development, as anyone who’s attended O’Reilly’s Velocity conference knows.
peppytides (Makezine) — 3d-printed super accurate, scaled 3D-model of a polypeptide chain that can be folded into all the basic protein structures, like α-helices, β-sheets, and β-turns. (via Lenore Edman)
London Data Store — dashboard and open data catalogue for City of London’s data release efforts.
Angular JS Style Guide — I love style guides, to the point of having posted (I think) three for Angular. Reading other people’s style guides is like listening to them make-up after arguments: you learn what’s important to them, and what they regret.
Consensus Filters — filtering out misreads and other errors to allow all agents, or robots, in the network to arrive at the same value asymptotically by only communicating with their neighbours.
Why Banks are BASE not ACID — Consistency it turns out is not the Holy Grail. What trumps consistency is: Auditing, Risk Management, Availability.
Dynamics of Correlated Novelties (Nature) — paper on “the adjacent possible”. Here we propose a simple mathematical model that mimics the process of exploring a physical, biological, or conceptual space that enlarges whenever a novelty occurs. The model, a generalization of Polya’s urn, predicts statistical laws for the rate at which novelties happen (Heaps’ law) and for the probability distribution on the space explored (Zipf’s law), as well as signatures of the process by which one novelty sets the stage for another. (via Steven Strogatz)
Mining of Massive Datasets (PDF) — book by Stanford profs, focuses on data mining of very large amounts of data, that is, data so large it does not fit in main memory. Because of the emphasis on size, many of our examples are about the Web or data derived from the Web. Further, the book takes an algorithmic point of view: data mining is about applying algorithms to data, rather than using data to “train” a machine-learning engine of some sort.
Lessons from Iceland’s Failed Crowdsourced Constitution (Slate) — Though the crowdsourcing moment could have led to a virtuous deliberative feedback loop between the crowd and the Constitutional Council, the latter did not seem to have the time, tools, or training necessary to process carefully the crowd’s input, explain its use of it, let alone return consistent feedback on it to the public.
Thread a ZigBee Killer? — Thread is Nest’s home automation networking stack, which can use the same hardware components as ZigBee, but which is not compatible, also not open source. The Novell NetWare of Things. Nick Hunn makes argument that Google (via Nest) are taking aim at ZigBee: it’s Google and Nest saying “ZigBee doesn’t work”.
word2vec — This tool provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research. From Google Research paper Efficient Estimation of Word Representations in Vector Space.
What Every Frontend Developer Should Know about Page Rendering — Rendering has to be optimized from the very beginning, when the page layout is being defined, as styles and scripts play the crucial role in page rendering. Professionals have to know certain tricks to avoid performance problems. This arcticle does not study the inner browser mechanics in detail, but rather offers some common principles.
Cayley — an open-source graph inspired by the graph database behind Freebase and Google’s Knowledge Graph.
Alice in Warningland (PDF) — We performed a field study with Google Chrome and Mozilla Firefox’s telemetry platforms, allowing us to collect data on 25,405,944 warning impressions. We find that browser security warnings can be successful: users clicked through fewer than a quarter of both browser’s malware and phishing warnings and third of Mozilla Firefox’s SSL warnings. We also find clickthrough rates as high as 70.2% for Google Chrome SSL warnings, indicating that the user experience of a warning can have tremendous impact on user behaviour.
Mapping Twitter Topic Networks (Pew Internet) — Conversations on Twitter create networks with identifiable contours as people reply to and mention one another in their tweets. These conversational structures differ, depending on the subject and the people driving the conversation. Six structures are regularly observed: divided, unified, fragmented, clustered, and inward and outward hub and spoke structures. These are created as individuals choose whom to reply to or mention in their Twitter messages and the structures tell a story about the nature of the conversation. (via Washington Post)
yasp — a fully functional web-based assembler development environment, including a real assembler, emulator and debugger. The assembler dialect is a custom which is held very simple so as to keep the learning curve as shallow as possible.
Fast Approximation of Betweenness Centrality through Sampling (PDF) — Betweenness centrality is a fundamental measure in social network analysis, expressing the importance or influence of individual vertices in a network in terms of the fraction of shortest paths that pass through them. Exact computation in large networks is prohibitively expensive and fast approximation algorithms are required in these cases. We present two efficient randomized algorithms for betweenness estimation.