Lucene nightly benchmarks

Each night, an automated Python tool checks out the Lucene/Solr trunk source code and runs multiple benchmarks: indexing the entire Wikipedia English export three times (with different settings / document sizes); running a near-real-time latency test; running a set of "hardish" auto-generated queries and tasks. The tests take around 2.5 hours to run, and the results are verified against the previous run and then added to the graphs linked below.

The goal is to spot any long-term regressions (or, gains!) in Lucene's performance that might otherwise accidentally slip past the committers, hopefully avoiding the fate of the boiling frog.

See more details in this blog post.



Indexing:
    Indexing throughput
    Analyzers throughput
    Near-real-time refresh latency

BooleanQuery:
    +high-freq +high-freq
    +high-freq +medium-freq
    high-freq high-freq
    high-freq medium-freq
    +high-freq +(medium-freq medium-freq)
    +medium-freq +(high-freq high-freq)

Proximity queries:
    Exact phrase
    Sloppy (~4) phrase
    Span near (~10)

FuzzyQuery:
    Edit distance 1
    Edit distance 2

Other queries:
    TermQuery
    Respell (DirectSpellChecker)
    Primary key lookup
    WildcardQuery
    PrefixQuery (3 leading characters)
    Numeric range filtering on last-modified-datetime

Faceting:
    Term query + date hierarchy
    All dates hierarchy
    All months
    All months (doc values)
    All dayOfYear
    All dayOfYear (doc values)

Sorting (on TermQuery):
    Date/time (long, high cardinality)
    Title (string, high cardinality)
    Month (string, low cardinality)
    Day of year (int, medium cardinality)

Grouping (on TermQuery):
    100 groups
    10K groups
    1M groups
    1M groups (two pass block grouping)
    1M groups (single pass block grouping)

Others:
    Geo spatial benchmarks
    "ant clean test" time in lucene
    CheckIndex time


[last updated: 2016-12-03 00:35:24.302767; send questions to Mike McCandless]