skip to Main Content

I am trying to find the MIN and MAX of a simple data set….

Score
15
12
8
28
5

Simply doing this would work.

SELECT MIN(Score) AS minScore, MAX(Score) AS maxScore FROM Scores;

Which should return 5 and 28.
But what if somehow a huge number ended up in the set throwing off the results…

Score
15
120000
8
28
5

This would return 5 and 120000.

What I need to do is to somehow have the query detect the unusually large number and ignore it in the result returning 5 and 28.

The actual set would have a large numbers in a certain unknown range, so the anomaly should stand out. Was hoping for some sort of mathematical way to omit the anomaly.

Not exactly the same as the similar question given….
This seems to omit some of the top numbers outside of the anomaly, which probably wouldn’t change the average much, but in my case I need the actual MIN and MAX values omitting the anomaly. Also, I cant use a subquery, must be a straight calculation to determine the number.

2

Answers


  1. It would be pretty simple to exclude values outside of some sensible range.

    Demo using your example values:

    mysql> select min(score) as MinScore, max(score) as MaxScore from mytable;
    +----------+----------+
    | MinScore | MaxScore |
    +----------+----------+
    |        5 |    12000 |
    +----------+----------+
    
    mysql> select 
      min(case when score between 0 and 100 then score end) as MinScore,
      max(case when score between 0 and 100 then score end) as MaxScore 
    from mytable;
    +----------+----------+
    | MinScore | MaxScore |
    +----------+----------+
    |        5 |       28 |
    +----------+----------+
    

    The CASE expression returns NULL by default if the value doesn’t satisfy its condition.

    Most aggregate functions like MIN(), MAX(), SUM(), COUNT() ignore NULLs.

    You can use some other expression if you want. The one I used, BETWEEN 0 AND 100 is just an example.

    That’s a solution that works with a plain computation, as you requested. If you want anomalous values, that is, values that are outside of some standard deviation, that may be possible with window functions. You’ll have to clarify what you consider anomalous.

    Login or Signup to reply.
  2. One option to exclude the scores that are considered to be anomaly is to calculate the percentage of the anomaly beeing bigger than average …
    If there are some border intermediate scores ( 2400, 620 ) then you should decide to include them or not by adjusting the percentage condition …

    --      S a m p l e    D a t a : 
    WITH
        tbl ( score ) As
          ( Select 15 Union All 
            Select 12 Union All 
            Select  8 Union All 
            Select 28 Union All
            Select 12000 Union All 
            Select 2400 Union All
            Select 620 Union All
            Select  5 
          )
    

    … calculate what is the percentage of how much the current score is bigger than average
    The results will depend on the size of anomaly and you should decide what is the margin when the score is too big to be taken into account.

    Select tbl.*, 
           ( score - Avg(score) Over() ) * 100 / Avg(score) Over() as pct_over_avg
    From tbl  
    
    score pct_over_avg
    15 -99.20466596
    12 -99.36373277
    8 -99.57582185
    28 -98.51537646
    12000 536.26723224
    2400 27.25344645
    620 -67.12619300
    5 -99.73488865

    The bigger the anomaly the narrower percentage to be selected. Expected it to be less than -95% for rows that should be included.
    Use the above code as subquery (why not) and put the condition in Where clause that fits your actual data…

    --       S Q L :  
    Select    Max(t.score) as max_score, Min(t.score) as min_score
    From      ( Select tbl.*, 
                       ( score - Avg(score) Over() ) * 100 / Avg(score) Over() as pct_over_avg
                From tbl 
              ) t
    Where     t.pct_over_avg < -95
    

    R e s u l t :

    max_score min_score
    28 5

    … if you change the Where condition value -95 to -65 then you’ll get max_score 620 …

    fiddle

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search