- Curation & Search — (Twitter) All curation grows until it requires search. All search grows until it requires curation.—Benedict Evans. (via Lists are the New Search)
- Average Developer Tenure (Seattle Times) — The average tenure of a developer in Silicon Valley is nine months at a single company. In Seattle, that length is closer to two years. (via Rands)
- An Interview with John Markoff (Robohub) — the interview will give you a flavour of his book, Machines of Loving Grace, a sweet history of AI told through the stories of the people who pioneered and now shape the field.
- Catapult Drone Launch (YouTube) — utterly nuts. That’s an SUV off its rear wheels! (via IEEE)
Track key metrics to keep Elasticsearch running smoothly.
Elasticsearch is booming. Together with Logstash, a tool for collecting and processing logs, and Kibana, a tool for searching and visualizing data in Elasticsearch (aka, the “ELK” stack), adoption of Elasticsearch continues to grow by leaps and bounds. When it comes to actually using Elasticsearch, there are tons of metrics generated. Instead of taking on the formidable task of tackling all-things-metrics in one blog post, I’ll take a look at 10 Elasticsearch metrics to watch. This should be helpful to anyone new to Elasticsearch, and also to experienced users who want a quick start into performance monitoring of Elasticsearch.
Most of the charts in this piece group metrics either by displaying multiple metrics in one chart, or by organizing them into dashboards. This is done to provide context for each of the metrics we’re exploring.
To start, here’s a dashboard view of the 10 Elasticsearch metrics we’re going to discuss:
10 Elasticsearch metrics in one compact SPM dashboard. This dashboard image, and all images in this post, are from Sematext’s SPM Performance Monitoring tool.
Now, let’s dig into each of the 10 metrics one by one and see how to interpret them.
Constant KV Store, Google Me, Learned Bias, and DRM-Stripping Lego Robot
- Sparkey — Spotify’s open-sourced simple constant key/value storage library, for read-heavy systems with infrequent large bulk inserts.
- The Truth of Fact, The Truth of Feeling (Ted Chiang) — story about what happens when lifelogs become searchable. Now with Remem, finding the exact moment has become easy, and lifelogs that previously lay all but ignored are now being scrutinized as if they were crime scenes, thickly strewn with evidence for use in domestic squabbles. (via BoingBoing)
- Algorithms Magnifying Misbehaviour (The Guardian) — when the training set embodies biases, the machine will exhibit biases too.
- Lego Robot That Strips DRM Off Ebooks (BoingBoing) — so. damn. cool. If it had been controlled by a C64, Cory would have hit every one of my geek erogenous zones with this find.
Model-Driven Configuration, 1,000 RSS Readers Bloom, JSON Query Language, and Doug Engelbart's Vision
- ansible — Model-driven configuration management, multi-node deployment/orchestration, and remote task execution system. Uses SSH by default, so no special software has to be installed on the nodes you manage. Ansible can be extended in any language.
- The Golden Age of RSS — One of the things I expected least to see in 2013 was that this year would mark the greatest flourishing of RSS reader applications in the decade since it first came to prominence on the web.
- JSONiq: the JSON Query Language — expressive and highly optimizable language to query and update NoSQL stores. It enables developers to leverage the same productive high-level language across a variety of NoSQL products. Implemented in Zorba, an Apache-licensed virtual machine for JSONiq and XQuery queries.
- Bret Victor on Doug Engelbart — If you attempt to make sense of Engelbart’s design by drawing correspondences to our present-day systems, you will miss the point, because our present-day systems do not embody Engelbart’s intent. Engelbart hated our present-day systems. Poetic, articulate, and bang on the money.
Microvideos for MIcrohelp, Organic Search, Probabilistic Programming, and Cluster Management
- How to Make Help Microvideos For Your Site (Alex Holovaty) — Instead of one monolithic video, we decided to make dozens of tiny, five-second videos separately demonstrating features.
- How Google is Killing Organic Search — 13% of the real estate is organic results in a search for “auto mechanic”, 7% for “italian restaurant”, 0% if searching on an iPhone where organic results are four page scrolls away. SEO Book did an extensive analysis of just how important the top left of the page, previously occupied by organic results actually is to visitors. That portion of the page is now all Google. (via Alex Dong)
- Church — probabilistic programming language from MIT, with tutorials. (via Edd Dumbill)
- mesos — a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, MPI, Hypertable, Spark (a new framework for low-latency interactive and iterative jobs), and other applications. Mesos is open source in the Apache Incubator. (via Ben Lorica)
Thread Problems, Better Image Search, Open Standards, and GitHub Maps
- Multithreading is Hard — The compiler and the processor both conspire to defeat your threads by moving your code around! Be warned and wary! You will have to do battle with both. Sample code and explanation of WTF the eieio barrier is (hint: nothing to do with Old McDonald’s server farm). (via Erik Michaels-Ober)
- Improving Photo Search (Google Research) — volume of training images, number of CPU cores, and Freebase entities. (via Alex Dong)
- Is Google Dumping Open Standards for Open Wallets? (Matt Asay) — it’s easier to ship than standardise, to innovate than integrate, but the ux of a citizen in the real world is pants. Like blog posts? Log into Facebook to read your friends! (or Google+) Chat is great, but you’d better have one client per corporation your friends hang out on. Nobody woke up this morning asking for features to make web pages only work on one browser. The user experience of isolationism is ugly.
- GitHub Renders GeoJSON — Under the hood we use Leaflet.js to render the geoJSON data, and overlay it on a custom version of MapBox’s street view baselayer — simplified so that your data can really shine. Best of all, the base map uses OpenStreetMap data, so if you find an area to improve, edit away.
Search API, Cyberwar=Cyberbollocks, 4k Magic, and Geoparsing
- techu Search Server — Techu exposes a RESTful API for realtime indexing and searching with the Sphinx full-text search engine. We leverage Redis, Nginx and the Python Django framework to make searching easy to handle & flexible.
- In Defence of Digital Freedom — a member of the European Parliament’s piece on the risks to our online freedoms caused by framing computer security into cyberwarfare. Digital freedoms and fundamental rights need to be enforced, and not eroded in the face of vulnerabilities, attacks, and repression. In order to do so, essential and difficult questions on the implementation of the rule of law, historically place-bound by jurisdiction rooted in the nation-state, in the context of a globally connected world, need to be addressed. This is a matter for the EU as a global player, and should involve all of society. (via BoingBoing)
- Inside a 4k Demo — what it’s like to write an amazing demo with only 4k of code. (via Nelson Minar)
- CLAVIN — open source (Apache2) Java library for document geotagging and geoparsing that employs context-based geographic entity resolution. (via Pete Warden)
Search Ads Meh, Hacked Website Help, Web Design Sins, and Lazy Correlations
- Consumer Heterogeneity and Paid Search Effectiveness: A Large Scale Field Experiment (PDF) — We ﬁnd that new and infrequent users are positively inﬂuenced by ads but that existing loyal users whose purchasing behavior is not inﬂuenced by paid search account for most of the advertising expenses, resulting in average returns that are negative. We discuss substitution to other channels and implications for advertising decisions in large ﬁrms. eBay-commissioned research, so salt to taste. (via Guardian)
- Google’s Help for Hacked Webmasters — what it says.
- 14 Lousy Web Design Trends Making a Comeback Thanks to HTML 5 — “mystery meat icons” a pet bugbear of mine.
- The Human Microbiome 101 (SlideShare) — SciFoo alum Jonathan Eisen’s talk. Informative, but super-notable for “complexity is astonishing, massive risk for false positive associations”. Remember this the next time your Big Data Scientist (aka kid with R) tells you one surprising variable predicts 66% of anything. I wish I had the audio from this talk!