I am trying to find the MIN and MAX of a simple data set….
Score
15
12
8
28
5
Simply doing this would work.
SELECT MIN(Score) AS minScore, MAX(Score) AS maxScore FROM Scores;
Which should return 5 and 28.
But what if somehow a huge number ended up in the set throwing off the results…
Score
15
120000
8
28
5
This would return 5 and 120000.
What I need to do is to somehow have the query detect the unusually large number and ignore it in the result returning 5 and 28.
The actual set would have a large numbers in a certain unknown range, so the anomaly should stand out. Was hoping for some sort of mathematical way to omit the anomaly.
Not exactly the same as the similar question given….
This seems to omit some of the top numbers outside of the anomaly, which probably wouldn’t change the average much, but in my case I need the actual MIN and MAX values omitting the anomaly. Also, I cant use a subquery, must be a straight calculation to determine the number.
2
Answers
It would be pretty simple to exclude values outside of some sensible range.
Demo using your example values:
The CASE expression returns NULL by default if the value doesn’t satisfy its condition.
Most aggregate functions like MIN(), MAX(), SUM(), COUNT() ignore NULLs.
You can use some other expression if you want. The one I used,
BETWEEN 0 AND 100
is just an example.That’s a solution that works with a plain computation, as you requested. If you want anomalous values, that is, values that are outside of some standard deviation, that may be possible with window functions. You’ll have to clarify what you consider anomalous.
One option to exclude the scores that are considered to be anomaly is to calculate the percentage of the anomaly beeing bigger than average …
If there are some border intermediate scores ( 2400, 620 ) then you should decide to include them or not by adjusting the percentage condition …
… calculate what is the percentage of how much the current score is bigger than average
The results will depend on the size of anomaly and you should decide what is the margin when the score is too big to be taken into account.
The bigger the anomaly the narrower percentage to be selected. Expected it to be less than -95% for rows that should be included.
Use the above code as subquery (why not) and put the condition in Where clause that fits your actual data…
R e s u l t :
… if you change the Where condition value -95 to -65 then you’ll get max_score 620 …
fiddle