- ‘Living Figures’ Make Their Debut (Nature) — In July last year, neurobiologist Björn Brembs published a paper about how fruit flies walk. Nine months on, his paper looks different: another group has fed its data into the article, altering one of the figures. The update — to figure 4 — marks the debut of what the paper’s London-based publisher, Faculty of 1000 (F1000), is calling a living figure, a concept that it hopes will catch on in other articles. Brembs, at the University of Regensburg in Germany, says that three other groups have so far agreed to add their data, using software he wrote that automatically redraws the figure as new data come in.
- Strategies Against Architecture (Seb Chan and Aaron Straup Cope) — the story of the design of the Cooper Hewitt’s clever “pen,” which visitors to the design museum use to collect the info from their favourite exhibits. (Visit the Cooper Hewitt when you’re next in NYC; it’s magnificent.)
- Two Way Street — an independent explorer for The British Museum collection, letting you browse by year acquired, year created, type of object, etc. I note there are more things from a place called “Brak” than there are from USA. Facets are awesome. (via Courtney Johnston)
- The Saddest Moment (PDF) — “How can you make a reliable computer service?” the presenter will ask in an innocent voice before continuing, “It may be difficult if you can’t trust anything and the entire concept of happiness is a lie designed by unseen overlords of endless deceptive power.” The presenter never explicitly says that last part, but everybody understands what’s happening. Making distributed systems reliable is inherently impossible; we cling to Byzantine fault tolerance like Charlton Heston clings to his guns, hoping that a series of complex software protocols will somehow protect us from the oncoming storm of furious apes who have somehow learned how to wear pants and maliciously tamper with our network packets. Hilarious. (via Tracy Chou)
Track key metrics to keep Elasticsearch running smoothly.
Elasticsearch is booming. Together with Logstash, a tool for collecting and processing logs, and Kibana, a tool for searching and visualizing data in Elasticsearch (aka, the “ELK” stack), adoption of Elasticsearch continues to grow by leaps and bounds. When it comes to actually using Elasticsearch, there are tons of metrics generated. Instead of taking on the formidable task of tackling all-things-metrics in one blog post, I’ll take a look at 10 Elasticsearch metrics to watch. This should be helpful to anyone new to Elasticsearch, and also to experienced users who want a quick start into performance monitoring of Elasticsearch.
Most of the charts in this piece group metrics either by displaying multiple metrics in one chart, or by organizing them into dashboards. This is done to provide context for each of the metrics we’re exploring.
To start, here’s a dashboard view of the 10 Elasticsearch metrics we’re going to discuss:
10 Elasticsearch metrics in one compact SPM dashboard. This dashboard image, and all images in this post, are from Sematext’s SPM Performance Monitoring tool.
Now, let’s dig into each of the 10 metrics one by one and see how to interpret them.
Constant KV Store, Google Me, Learned Bias, and DRM-Stripping Lego Robot
- Sparkey — Spotify’s open-sourced simple constant key/value storage library, for read-heavy systems with infrequent large bulk inserts.
- The Truth of Fact, The Truth of Feeling (Ted Chiang) — story about what happens when lifelogs become searchable. Now with Remem, finding the exact moment has become easy, and lifelogs that previously lay all but ignored are now being scrutinized as if they were crime scenes, thickly strewn with evidence for use in domestic squabbles. (via BoingBoing)
- Algorithms Magnifying Misbehaviour (The Guardian) — when the training set embodies biases, the machine will exhibit biases too.
- Lego Robot That Strips DRM Off Ebooks (BoingBoing) — so. damn. cool. If it had been controlled by a C64, Cory would have hit every one of my geek erogenous zones with this find.
Model-Driven Configuration, 1,000 RSS Readers Bloom, JSON Query Language, and Doug Engelbart's Vision
- ansible — Model-driven configuration management, multi-node deployment/orchestration, and remote task execution system. Uses SSH by default, so no special software has to be installed on the nodes you manage. Ansible can be extended in any language.
- The Golden Age of RSS — One of the things I expected least to see in 2013 was that this year would mark the greatest flourishing of RSS reader applications in the decade since it first came to prominence on the web.
- JSONiq: the JSON Query Language — expressive and highly optimizable language to query and update NoSQL stores. It enables developers to leverage the same productive high-level language across a variety of NoSQL products. Implemented in Zorba, an Apache-licensed virtual machine for JSONiq and XQuery queries.
- Bret Victor on Doug Engelbart — If you attempt to make sense of Engelbart’s design by drawing correspondences to our present-day systems, you will miss the point, because our present-day systems do not embody Engelbart’s intent. Engelbart hated our present-day systems. Poetic, articulate, and bang on the money.
Microvideos for MIcrohelp, Organic Search, Probabilistic Programming, and Cluster Management
- How to Make Help Microvideos For Your Site (Alex Holovaty) — Instead of one monolithic video, we decided to make dozens of tiny, five-second videos separately demonstrating features.
- How Google is Killing Organic Search — 13% of the real estate is organic results in a search for “auto mechanic”, 7% for “italian restaurant”, 0% if searching on an iPhone where organic results are four page scrolls away. SEO Book did an extensive analysis of just how important the top left of the page, previously occupied by organic results actually is to visitors. That portion of the page is now all Google. (via Alex Dong)
- Church — probabilistic programming language from MIT, with tutorials. (via Edd Dumbill)
- mesos — a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, MPI, Hypertable, Spark (a new framework for low-latency interactive and iterative jobs), and other applications. Mesos is open source in the Apache Incubator. (via Ben Lorica)
Thread Problems, Better Image Search, Open Standards, and GitHub Maps
- Multithreading is Hard — The compiler and the processor both conspire to defeat your threads by moving your code around! Be warned and wary! You will have to do battle with both. Sample code and explanation of WTF the eieio barrier is (hint: nothing to do with Old McDonald’s server farm). (via Erik Michaels-Ober)
- Improving Photo Search (Google Research) — volume of training images, number of CPU cores, and Freebase entities. (via Alex Dong)
- Is Google Dumping Open Standards for Open Wallets? (Matt Asay) — it’s easier to ship than standardise, to innovate than integrate, but the ux of a citizen in the real world is pants. Like blog posts? Log into Facebook to read your friends! (or Google+) Chat is great, but you’d better have one client per corporation your friends hang out on. Nobody woke up this morning asking for features to make web pages only work on one browser. The user experience of isolationism is ugly.
- GitHub Renders GeoJSON — Under the hood we use Leaflet.js to render the geoJSON data, and overlay it on a custom version of MapBox’s street view baselayer — simplified so that your data can really shine. Best of all, the base map uses OpenStreetMap data, so if you find an area to improve, edit away.
Search API, Cyberwar=Cyberbollocks, 4k Magic, and Geoparsing
- techu Search Server — Techu exposes a RESTful API for realtime indexing and searching with the Sphinx full-text search engine. We leverage Redis, Nginx and the Python Django framework to make searching easy to handle & flexible.
- In Defence of Digital Freedom — a member of the European Parliament’s piece on the risks to our online freedoms caused by framing computer security into cyberwarfare. Digital freedoms and fundamental rights need to be enforced, and not eroded in the face of vulnerabilities, attacks, and repression. In order to do so, essential and difficult questions on the implementation of the rule of law, historically place-bound by jurisdiction rooted in the nation-state, in the context of a globally connected world, need to be addressed. This is a matter for the EU as a global player, and should involve all of society. (via BoingBoing)
- Inside a 4k Demo — what it’s like to write an amazing demo with only 4k of code. (via Nelson Minar)
- CLAVIN — open source (Apache2) Java library for document geotagging and geoparsing that employs context-based geographic entity resolution. (via Pete Warden)
Search Ads Meh, Hacked Website Help, Web Design Sins, and Lazy Correlations
- Consumer Heterogeneity and Paid Search Effectiveness: A Large Scale Field Experiment (PDF) — We ﬁnd that new and infrequent users are positively inﬂuenced by ads but that existing loyal users whose purchasing behavior is not inﬂuenced by paid search account for most of the advertising expenses, resulting in average returns that are negative. We discuss substitution to other channels and implications for advertising decisions in large ﬁrms. eBay-commissioned research, so salt to taste. (via Guardian)
- Google’s Help for Hacked Webmasters — what it says.
- 14 Lousy Web Design Trends Making a Comeback Thanks to HTML 5 — “mystery meat icons” a pet bugbear of mine.
- The Human Microbiome 101 (SlideShare) — SciFoo alum Jonathan Eisen’s talk. Informative, but super-notable for “complexity is astonishing, massive risk for false positive associations”. Remember this the next time your Big Data Scientist (aka kid with R) tells you one surprising variable predicts 66% of anything. I wish I had the audio from this talk!
Drug Interactions from Search History, Web Satire, Visible Peer Review, and Rights-based Copyright
- Pharmacovigilance — Signals from The Crowd (PDF) — in the NY Times’ words: Using automated software tools to examine queries by 6 million Internet users taken from Web search logs in 2010, the researchers looked for searches relating to an antidepressant, paroxetine, and a cholestorol lowering drug, pravastatin. They were able to find evidence that the combination of the two drugs caused high blood sugar. (via New York Times)
- The World Wide Web is Moving to AOL — best satire you’ll read this month.
- Review History for Perceptual elements in Penn & Teller’s “Cups and Balls” magic trick — PeerJ makes peer review history available for the articles it publishes. Not only does this build reputation for peer reviewers who want it, but it is also a wonderful insight into how paranoid science must be to defend against mistakes in data interpretation. (The finished paper is fun, too)
- A New Basis for Copyright — NZ’s most technically-literate judge floats an idea for how copyright might be reimagined in a more useful way for the modern age by considering it in terms of human rights. Perhaps there should be consideration of a new copyright model that recognises content user rights against a backdrop of the right to receive and impart information and a truly balanced approach to information and expression that recognises that ideas expressed are building blocks for new ideas. Underpinning this must be a recognition on the part of content owners that the properties of new technologies dictate our responses, our behaviours, our values and our ways of thinking. These should not be seen as a threat but an opportunity. It cannot be a one-way street with traffic heading only in the direction dictated by content owners.