skip to Main Content

I am using python (Django) to post requests to SOLR (9.0.0) in json format (JSON request API) to retrieve the data. When using faceting, I am facing a problem probably due to json format of the request. Can anyone please help?

My aim is to search any two lettered lowercase words (using regex [a-z]{2}) from the field ‘sentence’ from a core.

This query runs correctly from SOLR admin (URL decoded) and gives the desired result:

http://localhost:8983/solr/lexdb_genl93/select?facet.field=sentence&facet.matches=[a-z]{2}&facet.query=sentence:/[a-z]{2}/&facet=true&indent=true&q.op=OR&q=*:*&rows=0&start=0

My request json which returns 400 (Unknown top-level key in JSON request : facet.field):

{
   "query":"*:*",
   "limit":0,
   "facet":"true",
   "facet.field":"sentence",
   "facet.matches":"[a-z]{2}",
   "facet.query":"sentence:/[a-z]{2}/"
}

The python code:

import requests

prerule= '[a-z]{2}'
solr_query= {"query": "*:*", 'limit':0, "facet": "true", "facet.field": "sentence", "facet.matches": prerule, "facet.query": "sentence:/"+prerule+"/"}
solr_headers = {'Content-type': 'application/json'}
resp = requests.post(lexsettings.solr_url+'/query', data=json.dumps(solr_query), headers=solr_headers, auth=(lexsettings.solr_user, lexsettings.solr_pass))
jresp= json.loads(resp.content)
print('solrlist: '+str(jresp))

Error portion of the output (of print statement):

"error":{
    "metadata":[
    "error-class",
    "org.apache.solr.common.SolrException",
    "root-error-class",
    "org.apache.solr.common.SolrException"
    ],
    "msg":"Unknown top-level key in JSON request : facet.field",
    "code":400
}

Thank you for reading this far. Please let me know if you need any further information.

2

Answers


  1. It’s because the query that works is not at all the same as the request you are putting together. You need query-parameters for most of this, so your request should be:

    import requests
    
    prerule= '[a-z]{2}'
    solr_query= {
        "q": "*:*", 
        'limit':0, 
        "facet": "true", 
        "facet.field": "sentence", 
        "facet.matches": prerule, 
        "facet.query": f"sentence:/{prerule}/"
    }
    
    solr_headers = {'Content-type': 'application/json'}
    
    resp = requests.post(
        lexsettings.solr_url+'/query', 
        params=solr_query, # <--------- Note the params kwarg
        headers=solr_headers, 
        auth=(lexsettings.solr_user, lexsettings.solr_pass)
    )
    
    print(resp.json())
    

    If you truly need to send this as json, then use the json kwarg rather than the data one:

    resp = requests.post(
        lexsettings.solr_url+'/query', 
        json=solr_query, # <---------- Here
        headers=solr_headers, 
        auth=(lexsettings.solr_user, lexsettings.solr_pass)
    )
    
    Login or Signup to reply.
  2. When you’re using the JSON interface, facet refers to the JSON Facet API, and not to the old query string facet interface (see Supported properties and syntax.

    Your query should therefore use the JSON Facet API syntax directly under facet:

    {
      "query": "*:*",
      "facet": {
        "sentences" : {
          "type": "query",
          "q": "sentence:/[a-z]{2}/",
        }
      }
    }
    

    However, be aware that this will give you a count of the number of entries matching your query (which is what a facet query is). From your comment on the other answer, it seems like you want to retrieve all tokens for the field that have two letters.

    In that case it might be better to use the terms component:

    ?q=*:*&terms=true&terms.fl=sentence&terms.regex=[a-z]{2}
    

    If this is a query you’re going to issue often, it might be more effective to do the extraction as a pre-processing step, having a single field that is sentence_tokens_of_length_2 which only keeps tokens with the wanted length. You can then generate proper facets across that field and get more efficient statistics and filtering instead of having to rely on regex-ing against the whole list of tokens for each request.

    But if the query string format does what you want; use that.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search