"lucene" entries

10 Elasticsearch metrics to watch

Track key metrics to keep Elasticsearch running smoothly.

Elasticsearch is booming. Together with Logstash, a tool for collecting and processing logs, and Kibana, a tool for searching and visualizing data in Elasticsearch (aka, the “ELK” stack), adoption of Elasticsearch continues to grow by leaps and bounds. When it comes to actually using Elasticsearch, there are tons of metrics generated. Instead of taking on the formidable task of tackling all-things-metrics in one blog post, I’ll take a look at 10 Elasticsearch metrics to watch. This should be helpful to anyone new to Elasticsearch, and also to experienced users who want a quick start into performance monitoring of Elasticsearch.

Most of the charts in this piece group metrics either by displaying multiple metrics in one chart, or by organizing them into dashboards. This is done to provide context for each of the metrics we’re exploring.

To start, here’s a dashboard view of the 10 Elasticsearch metrics we’re going to discuss:

spm_dashboard

10 Elasticsearch metrics in one compact SPM dashboard. This dashboard image, and all images in this post, are from Sematext’s SPM Performance Monitoring tool.

Now, let’s dig into each of the 10 metrics one by one and see how to interpret them.

Read more…

Four short links: 15 November 2011

Four short links: 15 November 2011

Internet Asthma Care, C Fulltext, Citizen Science, and Mozilla

  1. Cost-Effectiveness of Internet-Based Self-Management Compared with Usual Care in Asthma (PLoSone) — Internet-based self-management of asthma can be as effective as current asthma care and costs are similar.
  2. Apache Lucy full-text search engine library written in C and targeted at dynamic languages. It is a “loose C” port of Apache Luceneā„¢, a search engine library for Java.
  3. The Near Future of Citizen Science (Fiona Romeo) — near future of science is all about honing the division of labour between professionals, amateurs and bots. See Bryce’s bionic software riff. (via Matt Jones)
  4. Microsoft’s Patent Claims Against Android (Groklaw) — behold, citizen, the formidable might of Microsoft’s patents and how they justify a royalty from every Android device equal to that which you would owe if you built a Windows Mobile device: These Microsoft patents can be divided into several basic categories: (1) the ‘372 and ‘780 patents relate to web browsers; (2) the ‘551 and ‘233 patents relate to electronic document annotation and highlighting; (3) the ‘522 patent relates to resources provided by operating systems; (4) the ‘517 and ‘352 patents deal with compatibility with file names once employed by old, unused, and outmoded operating systems; (5) the ‘536 and ‘853 patents relate to simulating mouse inputs using non-mouse devices; and (6) the ‘913 patent relates to storing input/output access factors in a shared data structure. A shabby display of patent menacing.

Strata Week: Behind LinkedIn Signal

Life-size visualizations, how Hadoop is used, SciDB has its first release

In this edition of Strata Week: the open source technology behind LinkedIn Signal; Julia Grace on visualization; Hadoop usage survey results, and the first release of the SciDB project.

Four short links: 25 May 2010

Four short links: 25 May 2010

European Economic Crisis, Scaling Guardian API, Cheerful Pessimism, and Science Mapping

  1. Lending Merry-Go-Round — these guys have been Australia’s sharpest satire for years, filling the role of the Daily Show. Here they ask some strong questions about the state of Europe’s economies … (via jdub on Twitter)
  2. What’s Powering the Guardian’s Content API — Scala and Solr/Lucene on EC2 is the short answer. The long answer reveals the details of their setup, including some of their indexing tricks that means Solr can index all their content in just an hour. (via Simon Willison)
  3. What I Learned About Engineering from the Panama Canal (Pete Warden) — I consider myself a cheerful pessimist. I’ve been through enough that I know how steep the odds of success are, but I’ve made a choice that even a hopeless fight in a good cause is worthwhile. What a lovely attitude!
  4. Mapping the Evolution of Scientific Fields (PLoSone) — clever use of data. We build an idea network consisting of American Physical Society Physics and Astronomy Classification Scheme (PACS) numbers as nodes representing scientific concepts. Two PACS numbers are linked if there exist publications that reference them simultaneously. We locate scientific fields using a community finding algorithm, and describe the time evolution of these fields over the course of 1985-2006. The communities we identify map to known scientific fields, and their age depends on their size and activity. We expect our approach to quantifying the evolution of ideas to be relevant for making predictions about the future of science and thus help to guide its development.