I have an elasticsearch index containing "hit" documents (with fields like ip
/timestamp
/uri
etc) which are populated from my nginx access logs.
I’m looking for a method of getting the total number of hits / ip – but for a subset of IPs, namely the ones that did a request today.
I know I can have a filtered aggregation by doing:
/search?size=0
{
'query': { 'bool': { 'must': [
{'range': { 'timestamp': { 'gte': $today}}},
{'query_string': {'query': 'status:200 OR status:404'}},
]}},
'aggregations': {'c': {'terms': {'field': 'ip', 'size': 99999}}}
}
but this will sum only the hits that were done today, I want the total number of hits in the index but only from IPs that have hits today. Is this possible?
-edit-
I’ve tried the global option but while
'aggregations': {'c': {'global': {}, 'aggs': {'c2': {'terms': {'field': 'remote_user', 'size': 99999}}}}}
returns counts from all IPs; it ignores my filter on timestamp (eg. it includes IPs that did hits a couple of days ago)
2
Answers
In the example you have shared you have a query and your documents are filtered according to that. But you want your aggregation to take all documents regardless of the query.
This is why the
global
option exists.Sample query example:
There is a way to achieve what you want in a single query but since it involves scripting and the performance might suffer depending on the volume of data you will be running this query on.
The idea is to leverage the
scripted_metric
aggregation in order to build your own aggregation logic over the whole document set.What we do below is pretty simple:
Here is how the query looks like:
And here is how the answer looks like:
I think that pretty much does what you expect.
The other option (more performant because no script) requires you to make two queries. First, a query with the date range and status check with a
terms
aggregation to retrieve all IPs that have hits today (like you do now), and then a second query where you filter on those IPs (using aterms
query) over the whole index (no date range or status check) and get hits count for each of them using aterms
aggregation.