Lucene nightly benchmarksEach night, an automated Python tool checks out the Lucene/Solr trunk source code and runs multiple benchmarks: indexing the entire Wikipedia English export three times (with different settings / document sizes); running a near-real-time latency test; running a set of "hardish" auto-generated queries and tasks. The tests take around 2.5 hours to run, and the results are verified against the previous run and then added to the graphs linked below.
The goal is to spot any long-term regressions (or, gains!) in Lucene's performance that might otherwise accidentally slip past the committers, hopefully avoiding the fate of the boiling frog.
See more details in this blog post.
See pretty flame charts from Java Flight Recorder profiling at blunders.io.
Near-real-time refresh latency
+high-freq +(medium-freq medium-freq)
+medium-freq +(high-freq high-freq)
Sloppy (~4) phrase
Span near (~10)
Ordered intervals (MAXWIDTH/10)
Edit distance 1
Edit distance 2
Primary key lookup
PrefixQuery (3 leading characters)
Numeric range filtering on last-modified-datetime
Term query + date hierarchy
All dates hierarchy
All months (doc values)
All dayOfYear (doc values)
Sorting (on TermQuery):
Date/time (long, high cardinality)
Title (string, high cardinality)
Month (string, low cardinality)
Day of year (int, medium cardinality)
Grouping (on TermQuery):
1M groups (two pass block grouping)
1M groups (single pass block grouping)
GC/JIT metrics during search benchmarks
Geo spatial benchmarks
Sparse vs dense doc values performance on NYC taxi ride corpus
"gradle -p lucene test" and "gradle precommit" time in lucene
Lucene GitHub pull-request counts
[last updated: 2021-04-12 03:31:03.165333; send questions to Mike McCandless]