I am using python (Django) to post requests to SOLR (9.0.0) in json format (JSON request API) to retrieve the data. When using faceting, I am facing a problem probably due to json format of the request. Can anyone please help?
My aim is to search any two lettered lowercase words (using regex [a-z]{2}
) from the field ‘sentence’ from a core.
This query runs correctly from SOLR admin (URL decoded) and gives the desired result:
http://localhost:8983/solr/lexdb_genl93/select?facet.field=sentence&facet.matches=[a-z]{2}&facet.query=sentence:/[a-z]{2}/&facet=true&indent=true&q.op=OR&q=*:*&rows=0&start=0
My request json which returns 400 (Unknown top-level key in JSON request : facet.field):
{
"query":"*:*",
"limit":0,
"facet":"true",
"facet.field":"sentence",
"facet.matches":"[a-z]{2}",
"facet.query":"sentence:/[a-z]{2}/"
}
The python code:
import requests
prerule= '[a-z]{2}'
solr_query= {"query": "*:*", 'limit':0, "facet": "true", "facet.field": "sentence", "facet.matches": prerule, "facet.query": "sentence:/"+prerule+"/"}
solr_headers = {'Content-type': 'application/json'}
resp = requests.post(lexsettings.solr_url+'/query', data=json.dumps(solr_query), headers=solr_headers, auth=(lexsettings.solr_user, lexsettings.solr_pass))
jresp= json.loads(resp.content)
print('solrlist: '+str(jresp))
Error portion of the output (of print
statement):
"error":{
"metadata":[
"error-class",
"org.apache.solr.common.SolrException",
"root-error-class",
"org.apache.solr.common.SolrException"
],
"msg":"Unknown top-level key in JSON request : facet.field",
"code":400
}
Thank you for reading this far. Please let me know if you need any further information.
2
Answers
It’s because the query that works is not at all the same as the request you are putting together. You need query-parameters for most of this, so your request should be:
If you truly need to send this as json, then use the
json
kwarg rather than thedata
one:When you’re using the JSON interface,
facet
refers to the JSON Facet API, and not to the old query string facet interface (see Supported properties and syntax.Your query should therefore use the JSON Facet API syntax directly under
facet
:However, be aware that this will give you a count of the number of entries matching your query (which is what a facet query is). From your comment on the other answer, it seems like you want to retrieve all tokens for the field that have two letters.
In that case it might be better to use the
terms
component:If this is a query you’re going to issue often, it might be more effective to do the extraction as a pre-processing step, having a single field that is
sentence_tokens_of_length_2
which only keeps tokens with the wanted length. You can then generate proper facets across that field and get more efficient statistics and filtering instead of having to rely on regex-ing against the whole list of tokens for each request.But if the query string format does what you want; use that.