skip to Main Content

I was wondering why this line that is underlined in red isnt AVG(tenure) OVER() AS avg_tenure? The output table is correct the way it is below, but not sure when we should use AVG() normally vs. AVG() OVER() as a window function?

SELECT gender, Churn,
COUNT(*) as count_by_category,
COUNT(*)/SUM(COUNT(*)) OVER () AS pct_of_total,
AVG(tenure) AS avg_tenure
FROM churn.telco
GROUP BY gender, Churn;

enter image description here

2

Answers


  1. In MySQL, AVG() and AVG() OVER() are both functions that can be used to calculate the average value of a column or expression in a query. However, they are used in different contexts and have different syntax and behavior.

    AVG() is an aggregate function that is used with the GROUP BY clause to calculate the average value of a column for each group of rows. For example:

    SELECT department, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department;
    

    This query calculates the average salary for each department in the employees table.

    AVG() OVER() is a window function that is used to calculate the average value of a column over a specified window of rows, without grouping them. For example:

    SELECT name, salary, AVG(salary) OVER () AS avg_salary
    FROM employees;
    

    This query calculates the average salary for all rows in the employees table and adds it as a new column to the result set. The window frame is not specified, so the entire result set is used.

    The main difference between AVG() and AVG() OVER() is that the former requires a GROUP BY clause and returns a single row per group, while the latter does not require grouping and returns the same number of rows as the input table, with an additional column for the calculated average value.

    Note that AVG() OVER() can also be used with a window frame specification, to calculate the average value over a specific range of rows, such as the last three rows. For example:

    SELECT name, salary, AVG(salary) OVER (ORDER BY hire_date ROWS 2 PRECEDING) AS avg_salary_last_three
    FROM employees;
    

    This query calculates the average salary for each row and the two preceding rows, ordered by hire date.

    Login or Signup to reply.
  2. AVG OVER and AVG do very different things.

    Here, you are grouping by gender and Churn; plain AVG(tenure) returns an average tenure from all source rows with the same gender and Churn. If you had no group by, selecting AVG or any other aggregate function implicitly groups all source rows into one result row.

    AVG(tenure) OVER () on the other hand, returns an average of what tenure would be for all result rows returned by the select. Since if you were to do select tenure from ... group by gender, Churn it would return an arbitrary tenure from one of the rows for each gender, Churn, taking an average of that is not useful (and in fact is not allowed under the recommended ONLY_FULL_GROUP_BY sql_mode).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search