Ansible Playbook for Elasticsearch

Ansible Playbook for Elasticsearch

Setting the BooleanQuery maxClauseCount in Elasticsearch

Setting the BooleanQuery maxClauseCount in Elasticsearch

Elasticsearch Index Warmup API: it really whips real-time search's ass!

If you are into search and have not checked out elasticsearch yet, you are really doing yourself a diservice. Elasticsearch has a fantastic set of features, it’s very approachable, super hackable and unit-testable (in-memory nodes rock). It also comes out-of-the-box with the ability to scale to hundreds of nodes and provides automatic replication to boot. Combine that with simple administration and you have a real boon for tech startups that can’t afford dedicated sysadmins. All of the above contributed to us recently migrating our Solr setup to it. 

The Elasticsearch engine comes with real-time search capabilities out-of-the-box which have been documented before. Having said that, one still needs to tweak their settings (refresh times, merge sizes, GC new vs. tenured generation ratios, etc.) according to their system’s usage patterns to get optimal performance out of it.

Even then, if you frequently add large amounts of new documents to your index like we do at Traackr, you can’t really escape the inevitable merges that have to occur in the background to keep your Lucene segments to a manageable number.

It’s either that or watch your search-times come down to a crawl as your segments get out of control. What this means is that your caches will inevitably get cleared as old segments are merged into new ones. And if you happen to use a cache-heavy query like top_children, then some search requests get the short end of the stick as they sit around waiting for those caches to re-populate.

Enter the fantastic Index Warmup API, currently available in master. This feature allows you to define warm-up queries that will periodically get executed as merges occur and caches get invalidated. Those queries are executed on separate threads, performing all the heavy data loading in the background while users can still run regular queries without interruptions. As a result, you get an almost perfectly smooth request latency curve: 

If your requirements include near-real-time (NRT) search, you should absolutely give elasticsearch a spin. 

PS1: if you want to get great insights into how your Solr/Elasticsearch engine is behaving (pretty graphs above), try the performance monitor from Sematext.

PS2: for those of you too young to recognize the Nullsoft nod in the title, take a short stroll down memory lane. Since NRT search is a beast in and of itself, I figured it would be fitting.