PaGMO — Parallel Global Multiobjective Optimizer […] a generalization of the island model paradigm working for global and local optimization algorithms. Its main parallelization approach makes use of multiple threads, but MPI is also implemented and can be mixed in with multithreading. PaGMO can be used to solve in a parallel fashion, global optimization tasks.
Avoiding the Tragedy of the Anticommons — Many people talk about “open source biology.” Mike Loukides pulls apart open source and biology to see what the relationship might be. I’m still chewing on what devops for bio would be. Modern software systems throw off gigabytes of data, and we have built tools to monitor those systems, archive their data, and automate much of the analysis. There are free and commercial packages for logging and monitoring, and it continues to be a very active area of software development, as anyone who’s attended O’Reilly’s Velocity conference knows.
peppytides (Makezine) — 3d-printed super accurate, scaled 3D-model of a polypeptide chain that can be folded into all the basic protein structures, like α-helices, β-sheets, and β-turns. (via Lenore Edman)
London Data Store — dashboard and open data catalogue for City of London’s data release efforts.
Angular JS Style Guide — I love style guides, to the point of having posted (I think) three for Angular. Reading other people’s style guides is like listening to them make-up after arguments: you learn what’s important to them, and what they regret.
Consensus Filters — filtering out misreads and other errors to allow all agents, or robots, in the network to arrive at the same value asymptotically by only communicating with their neighbours.
Why Banks are BASE not ACID — Consistency it turns out is not the Holy Grail. What trumps consistency is: Auditing, Risk Management, Availability.
Dynamics of Correlated Novelties (Nature) — paper on “the adjacent possible”. Here we propose a simple mathematical model that mimics the process of exploring a physical, biological, or conceptual space that enlarges whenever a novelty occurs. The model, a generalization of Polya’s urn, predicts statistical laws for the rate at which novelties happen (Heaps’ law) and for the probability distribution on the space explored (Zipf’s law), as well as signatures of the process by which one novelty sets the stage for another. (via Steven Strogatz)
Mining of Massive Datasets (PDF) — book by Stanford profs, focuses on data mining of very large amounts of data, that is, data so large it does not fit in main memory. Because of the emphasis on size, many of our examples are about the Web or data derived from the Web. Further, the book takes an algorithmic point of view: data mining is about applying algorithms to data, rather than using data to “train” a machine-learning engine of some sort.
Lessons from Iceland’s Failed Crowdsourced Constitution (Slate) — Though the crowdsourcing moment could have led to a virtuous deliberative feedback loop between the crowd and the Constitutional Council, the latter did not seem to have the time, tools, or training necessary to process carefully the crowd’s input, explain its use of it, let alone return consistent feedback on it to the public.
Thread a ZigBee Killer? — Thread is Nest’s home automation networking stack, which can use the same hardware components as ZigBee, but which is not compatible, also not open source. The Novell NetWare of Things. Nick Hunn makes argument that Google (via Nest) are taking aim at ZigBee: it’s Google and Nest saying “ZigBee doesn’t work”.
word2vec — This tool provides an efficient implementation of the continuous bag-of-words and skip-gram architectures for computing vector representations of words. These representations can be subsequently used in many natural language processing applications and for further research. From Google Research paper Efficient Estimation of Word Representations in Vector Space.
What Every Frontend Developer Should Know about Page Rendering — Rendering has to be optimized from the very beginning, when the page layout is being defined, as styles and scripts play the crucial role in page rendering. Professionals have to know certain tricks to avoid performance problems. This arcticle does not study the inner browser mechanics in detail, but rather offers some common principles.
Cayley — an open-source graph inspired by the graph database behind Freebase and Google’s Knowledge Graph.
Alice in Warningland (PDF) — We performed a field study with Google Chrome and Mozilla Firefox’s telemetry platforms, allowing us to collect data on 25,405,944 warning impressions. We find that browser security warnings can be successful: users clicked through fewer than a quarter of both browser’s malware and phishing warnings and third of Mozilla Firefox’s SSL warnings. We also find clickthrough rates as high as 70.2% for Google Chrome SSL warnings, indicating that the user experience of a warning can have tremendous impact on user behaviour.
Mapping Twitter Topic Networks (Pew Internet) — Conversations on Twitter create networks with identifiable contours as people reply to and mention one another in their tweets. These conversational structures differ, depending on the subject and the people driving the conversation. Six structures are regularly observed: divided, unified, fragmented, clustered, and inward and outward hub and spoke structures. These are created as individuals choose whom to reply to or mention in their Twitter messages and the structures tell a story about the nature of the conversation. (via Washington Post)
yasp — a fully functional web-based assembler development environment, including a real assembler, emulator and debugger. The assembler dialect is a custom which is held very simple so as to keep the learning curve as shallow as possible.
Fast Approximation of Betweenness Centrality through Sampling (PDF) — Betweenness centrality is a fundamental measure in social network analysis, expressing the importance or influence of individual vertices in a network in terms of the fraction of shortest paths that pass through them. Exact computation in large networks is prohibitively expensive and fast approximation algorithms are required in these cases. We present two efficient randomized algorithms for betweenness estimation.
Druid — open source clustered data store (not key-value store) for real-time exploratory analytics on large datasets.
It’s Time to Engineer Some Filter Failure (Jon Udell) — Our filters have become so successful that we fail to notice: We don’t control them, They have agendas, and They distort our connections to people and ideas. That idea that algorithms have agendas is worth emphasising. Reality doesn’t have an agenda, but the deployer of a similarity metric has decided what features to look for, what metric they’re optimising, and what to do with the similarity data. These are all choices with an agenda.
Capstone — open source multi-architecture disassembly engine.
The Future of Employment (PDF) — We note that this prediction implies a truncation in the current trend towards labour market polarization, with growing employment in high and low-wage occupations, accompanied by a hollowing-out of middle-income jobs. Rather than reducing the demand for middle-income occupations, which has been the pattern over the past decades, our model predicts that computerisation will mainly substitute for low-skill and low-wage jobs in the near future. By contrast, high-skill and high-wage occupations are the least susceptible to computer capital. (via The Atlantic)
Hackers Gain ‘Full Control’ of Critical SCADA Systems (IT News) — The vulnerabilities were discovered by Russian researchers who over the last year probed popular and high-end ICS and supervisory control and data acquisition (SCADA) systems used to control everything from home solar panel installations to critical national infrastructure. More on the Botnet of Things.
mcl — Markov Cluster Algorithm, a fast and scalable unsupervised cluster algorithm for graphs (also known as networks) based on simulation of (stochastic) flow in graphs.
Facebook to Launch Flipboard-like Reader (Recode) — what I’d actually like to see is Facebook join the open web by producing and consuming RSS/Atom/anything feeds, but that’s a long shot. I fear it’ll either limit you to whatever circle-jerk-of-prosperity paywall-penetrating content-for-advertising-eyeballs trades the Facebook execs have made, or else it’ll be a leech on the scrotum of the open web by consuming RSS without producing it. I’m all out of respect for empire-builders who think you’re a fool if you value the open web. AOL might have died, but its vision of content kings running the network is alive and well in the hands of Facebook and Google. I’ll gladly post about the actual product launch if it is neither partnership eyeball-abuse nor parasitism.
Nanocubes — a fast datastructure for in-memory data cubes developed at the Information Visualization department at AT&T Labs – Research. Nanocubes can be used to explore datasets with billions of elements at interactive rates in a web browser, and in some cases it uses sufficiently little memory that you can run a nanocube in a modern-day laptop. (via Ben Lorica)