This document is designed to be viewed as a reveal.js slide show presentation. You can either read/print the contents of the presentation (and the speaker notes) below, or click here to view as slides. (Please consult the reveal.js docs for information on how to control the slideshow once it's started)

A ZIP file of this presentation, and the associated demo config files, is available for offline perusal.

Groups & Blocks in Apache Solr


Chicago Meetup - 2013-10-07

https://people.apache.org/~hossman/

@_hossman

Result Grouping

aka: Field Collapsing

Result Grouping

Group documents with some attribute in common, returning the top document(s) per group, and the top groups based on what documents are in the groups.

Useful for limiting the number of results returned of the same "type"

Google search results showing domain collapsing, circa 2011
Best Buy product search showing category collapsing, circa 2011

Result Grouping In Solr

  • Driven By Request Params
    • group=true
  • Various ways to specify groups
    • group.field
    • group.func
    • group.query

Output Structure

  • group.format=grouped
    • Default: one DocList per group value, per group spec
  • group.format=simple
    • Flatten the groups: one (merged) DocList per group spec
  • group.main=true
    • Mimic normal response: one (merged) DocList for last group spec
    • Forces group.format=simple
    • Doesn't work with group.query in any useful way

Sorting & Pagination

group=true Of Groups In Groups
Sorting sort group.sort
Pagination start group.offset
Pagination rows group.limit

Result Grouping
Examples

sample-data-docs-flat.xml
<add>
  <doc>
    <field name="id">101</field>
    <field name="song_name">Bohemian Rhapsody</field>
    <field name="artist_name">Queen</field>
    <field name="album_name">Wayne's World (soundtrack)</field>
  </doc>
  <doc>
    <field name="id">102</field>
    <field name="song_name">Hot and Bothered</field>
    <field name="artist_name">Cinderella</field>
    <field name="album_name">Wayne's World (soundtrack)</field>
  </doc>
  ...
  <doc>
    <field name="id">533</field>
    <field name="song_name">Hey, Hey</field>
    <field name="artist_name">Bad Company</field>
    <field name="album_name">The Original Bad Co. Anthology</field>
  </doc>
</add>
/select?q=love
{ "response":{"numFound":7,"start":0,"maxScore":1.298945,"docs":[
      {
        "id":"114",
        "song_name":"Loud Love",
        "artist_name":"Soundgarden",
        "album_name":"Wayne's World (soundtrack)",
        "score":1.298945},
      {
        "id":"406",
        "song_name":"One Year of Love",
        "artist_name":"Queen",
        "album_name":"Classic Queen",
        "score":1.298945},
      {
        "id":"112",
        "song_name":"Loving Your Lovin'",
        "artist_name":"Eric Clapton",
        "album_name":"Wayne's World (soundtrack)",
        "score":1.0824541},
      ...,
...&group.field=artist_name&group.limit=2
{ "grouped":{
    "artist_name":{
      "matches":7,
      "groups":[{
          "groupValue":"Soundgarden",
          "doclist":{"numFound":1,"start":0,"maxScore":1.298945,"docs":[
              { "id":"114",
                "song_name":"Loud Love",
                "artist_name":"Soundgarden",
                "album_name":"Wayne's World (soundtrack)",
                "score":1.298945}]
          }},
        ...,
        { "groupValue":"Bad Company",
          "doclist":{"numFound":3,"start":0,"maxScore":1.0824541,"docs":[
              { "id":"503",
                "song_name":"Ready For Love",
                "artist_name":"Bad Company",
                "album_name":"The Original Bad Co. Anthology",
                "score":1.0824541},
              { "id":"532",
                "song_name":"Hammer of Love",
                "artist_name":"Bad Company",
                "album_name":"The Original Bad Co. Anthology",
                "score":1.0824541}]
          }}]}}}
...&group.field=album_name&group.limit=2
{ "grouped":{
    "album_name":{
      "matches":7,
      "groups":[{
          "groupValue":"Wayne's World (soundtrack)",
          "doclist":{"numFound":2,"start":0,"maxScore":1.298945,"docs":[
              { "id":"114",
                "song_name":"Loud Love",
                "artist_name":"Soundgarden",
                "album_name":"Wayne's World (soundtrack)",
                "score":1.298945},
              { "id":"112",
                "song_name":"Loving Your Lovin'",
                "artist_name":"Eric Clapton",
                "album_name":"Wayne's World (soundtrack)",
                "score":1.0824541}]
          }},
        { "groupValue":"Classic Queen",
          "doclist":{"numFound":1,"start":0,"maxScore":1.298945,"docs":[
              { "id":"406",
                "song_name":"One Year of Love",
                "artist_name":"Queen",
                "album_name":"Classic Queen",
                "score":1.298945}]
          }},
        ...,
...&group.format=simple
{ "grouped":{
    "album_name":{
      "matches":7,
      "doclist":{"numFound":7,"start":0,"maxScore":1.298945,"docs":[
          { "id":"114",
            "song_name":"Loud Love",
            "artist_name":"Soundgarden",
            "album_name":"Wayne's World (soundtrack)",
            "score":1.298945},
          { "id":"112",
            "song_name":"Loving Your Lovin'",
            "artist_name":"Eric Clapton",
            "album_name":"Wayne's World (soundtrack)",
            "score":1.0824541},
          { "id":"406",
            "song_name":"One Year of Love",
            "artist_name":"Queen",
            "album_name":"Classic Queen",
            "score":1.298945},
          ...,
...&group.main=true
{ "response":{"numFound":7,"start":0,"maxScore":1.298945,"docs":[
      {
        "id":"114",
        "song_name":"Loud Love",
        "artist_name":"Soundgarden",
        "album_name":"Wayne's World (soundtrack)",
        "score":1.298945},
      {
        "id":"112",
        "song_name":"Loving Your Lovin'",
        "artist_name":"Eric Clapton",
        "album_name":"Wayne's World (soundtrack)",
        "score":1.0824541},
      {
        "id":"406",
        "song_name":"One Year of Love",
        "artist_name":"Queen",
        "album_name":"Classic Queen",
        "score":1.298945},
      ...,

For Further Consideration

  • group.truncate=true
    • Compute facet counts & stats using only the "top" result in each group
  • group.facet=true
    • Compute facet counts treating each group as if it was a single document
  • group.ngroups=true
    • Return the total number of groups found (per group spec)
  • SOLR-5045 & SOLR-5027: Generalizing group collapsing/expanding as an "aggregation" function

Block Joins

Block Joins

Block Joining is method of indexing a hierarchy of documents as a single "block" in the index. Special query wrappers let you find "parent" documents based on "children" that match some query, or vice-versa.

Amazon search result for small green t-shirts
PubMed search result for firstname=John lastname=Smith

Block Joins In Solr

  • Each hierarchy of docs must be indexed atomicly to be in a single block:
    • SolrInputDocument.getChildDocuments()
    • Nested <doc/> tags in XML
  • Updating / Replacing docs must be done on an entire hierarchy (ie: block)
    • Can't update just one child
  • Special Query Parsers take advantage of hierarcy structure:
    {!parent which="PARENT_FILTER"}CHILD_QUERY
    {!child     of="PARENT_FILTER"}PARENT_QUERY

Block Join
Examples

sample-data-docs-blocks.xml
<add>
  <doc>
    <field name="id">100</field>
    <field name="doctype">album</field>
    <field name="album_name">Wayne's World (soundtrack)</field>
    <doc>
      <field name="id">101</field>
      <field name="doctype">song</field>
      <field name="song_name">Bohemian Rhapsody</field>
      <field name="artist_name">Queen</field>
    </doc>
    <doc>
      <field name="id">102</field>
      <field name="doctype">song</field>
      <field name="song_name">Hot and Bothered</field>
      <field name="artist_name">Cinderella</field>
    </doc>
    ...
  </doc>
  ...
</add>
/select?q=doctype:album
{  "response":{"numFound":5,"start":0,"docs":[
      {
        "id":"100",
        "album_name":"Wayne's World (soundtrack)"},
      {
        "id":"200",
        "album_name":"Empire Records (Soundtrack)"},
      {
        "id":"300",
        "album_name":"Reality Bites (Soundtrack)"},
      {
        "id":"400",
        "album_name":"Classic Queen"},
      {
        "id":"500",
        "album_name":"The Original Bad Co. Anthology"}]
  }}
/select?q=doctype:song
{ "response":{"numFound":94,"start":0,"docs":[
      { "id":"101",
        "song_name":"Bohemian Rhapsody",
        "artist_name":"Queen"},
      { "id":"102",
        "song_name":"Hot and Bothered",
        "artist_name":"Cinderella"},
      { "id":"103",
        "song_name":"Rock Candy",
        "artist_name":"BulletBoys"},
      { "id":"104",
        "song_name":"Dream Weaver",
        "artist_name":"Gary Wright"},
      { "id":"105",
        "song_name":"Sikamikanico",
        "artist_name":"Red Hot Chili Peppers"},
      ...,
q=soundtrack&fq=doctype:album
{ "response":{"numFound":3,"start":0,"docs":[
      {
        "id":"100",
        "album_name":"Wayne's World (soundtrack)"},
      {
        "id":"200",
        "album_name":"Empire Records (Soundtrack)"},
      {
        "id":"300",
        "album_name":"Reality Bites (Soundtrack)"}]
  }}
q=love&fq=doctype:song
{ "response":{"numFound":7,"start":0,"docs":[
      { "id":"114",
        "song_name":"Loud Love",
        "artist_name":"Soundgarden"},
      { "id":"112",
        "song_name":"Loving Your Lovin'",
        "artist_name":"Eric Clapton"},
      { "id":"406",
        "song_name":"One Year of Love",
        "artist_name":"Queen"},
      { "id":"503",
        "song_name":"Ready For Love",
        "artist_name":"Bad Company"},
      { "id":"532",
        "song_name":"Hammer of Love",
        "artist_name":"Bad Company"},
      ...,
q=soundtrack&fq={!parent which="doctype:album"}love
{ "response":{"numFound":2,"start":0,"docs":[
      {
        "id":"100",
        "album_name":"Wayne's World (soundtrack)"},
      {
        "id":"300",
        "album_name":"Reality Bites (Soundtrack)"}]
  }}
q=love&fq={!child of="doctype:album"}soundtrack
{ "response":{"numFound":3,"start":0,"docs":[
      {
        "id":"114",
        "song_name":"Loud Love",
        "artist_name":"Soundgarden"},
      {
        "id":"112",
        "song_name":"Loving Your Lovin'",
        "artist_name":"Eric Clapton"},
      {
        "id":"314",
        "song_name":"Baby, I Love Your Way",
        "artist_name":"Big Mountain"}]
  }}

Block Join Caveats

  • Really new feature, still evolving (SOLR-5142)
  • Currently only supported as constant score queries
  • No mechanism in Solr to return fields from the matching children (or parent) docs yet (SOLR-5285)
  • Special _root_ field needed to handle deletes when updating a block:
    • Currently has a bug if you "update" a parent document to have no children (SOLR-5211)
    • Doesn't play nicely with deleting by id -- need to use delete by query to ensure all children are removed

Compare
& Contrast

When is Grouping Better?

  • Can pick grouping criteria on the fly at query time
  • Simpler indexing rules, no need to re-index entire group if one doc changes
  • 1 < group.limit

When is Block Join Better?

  • Less redundency in stored / indexed fields
  • More "efficient" to query because blocks are already denormalized into the index structure
  • Less complexity in dealing with results and pagination -- no "tricks" in response format

Questions?