skip to Main Content

I am trying to create a correlation function in MySql 5 that takes in two columns (x, y) and returns a single Decimal value (correlation_coefficient).

Here is my function definition

-- Calculate pearson correlation coefficient.
-- INPUT: X and Y should be columns of data (decimal) 
-- RETURN: A value between -1 and 1 depending on the strength of the relationship between the 2 columns e.g. 0.43.

DELIMITER $$

CREATE FUNCTION PearsonCorrelation(
    x Decimal(10,1),
    y Decimal(10,1)
)
RETURNS Decimal(10,1)
DETERMINISTIC
BEGIN
    DECLARE correlation_coefficient  DECIMAL(3,2);
    SET correlation_coefficient = (avg(x * y) - avg(x) * avg(y)) / (sqrt(avg(x * x) - avg(x) * avg(x)) * sqrt(avg(y * y) - avg(y) * avg(y)));
    RETURN(correlation_coefficient);
END $$

DELIMITER ;

However, when I execute the function call, I get the error ‘invalid use of group function’. Here are some test data, where the correlation coefficient returned from the dataset should be
0.86.

CREATE TABLE data_table
(
x Decimal(3,1) NOT NULL,
y Decimal(3,1) NOT NULL
)
INSERT INTO data_table 
VALUES(11.2, 10.4),
(9.7, 4.6),
(4.5, 2.1)

I would intend to call this function as follows:

Select PearsonCorrelation(x,y) as corrcoef
FROM data_table

Given feedback, my question may be reframed to say, is it possible to pass in table columns as arguments in to the correlation function and if so, how do I adapt the function to achieve this?

2

Answers


  1. AVG() Function has no meaning here..
    only in a select statement like this, where it will give you the average of values.

    SELECT * FROM Products
    WHERE Price > (SELECT AVG(Price) FROM Products);
    

    just remove all AVG() from the code

    Login or Signup to reply.
  2. You’re trying to pass columns of table data to a MySQL stored function. You Can’t Do Thatâ„¢. Aggregate functions like AVG() only work in the context of SQL statements like SELECT — statements that mention FROM sometable.

    You could try something like this.

     SELECT (AVG(x * y) - AVG(x) * AVG(y)) /
            (SQRT(AVG(x * x) - AVG(x) * AVG(x)) *
             SQRT(AVG(y * y) - AVG(y) * AVG(y))) correlation_coefficient
      FROM data_table;
    

    You should know that MySQL translates all numbers to double-precision floating point before doing arithmetic on them. DECIMAL(3,1) is probably not a good choice of data format.

    Some other makes and versions of table server let you create your own aggregate functions, but not MariaDB / MySQL.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search