Screaming fast Lucene searches using C++ via JNI

At the end of the day, when Lucene executes a query, after the initial setup the true hot-spot is usually rather basic code that decodes sequential blocks of integer docIDs, term frequencies and positions, matches them (e.g. taking union or intersection for BooleanQuery), computes a score for each hit and finally saves the hit if it’s competitive, during collection. Even apparently complex queries like FuzzyQuery or WildcardQuery go through a rewrite process that reduces them to much simpler forms like BooleanQuery. Lucene’s hot-spots are so simple that optimizing them by porting them to native C++ (via JNI) was too tempting!

So I did just that, creating the lucene-c-boost github project, and the resulting speedups are exciting:

Task QPS base StdDev base QPS opt StdDev opt % change
AndHighLow 469.2 (0.9%) 316.0 (0.7%) 0.7 X
Fuzzy1 63.0 (3.3%) 62.9 (2.0%) 1.0 X
Fuzzy2 25.8 (3.1%) 37.9 (2.3%) 1.5 X
AndHighMed 50.4 (0.7%) 110.0 (0.9%) 2.2 X

Source : http://www.javacodegeeks.com/2013/06/screaming-fast-lucene-searches-using-c-via-jni.html

Back to Top