Downloads

Information

Lies, damn lies, and MySQL UDFs

about

MySQL UDFs offer a powerful way to extend the functionality of your MySQL database.

The UDFs on this site are all free: free to use, free to distribute, free to modify and free of charge. Even for commercial projects.

lib_mysqludf_stat

MySQL supports a few statistical functions natively. However, for moderately more advanced statistical analysis, you need to resort to custom queries or third party tools. This library lib_mysqludf_stat aims to offer a complete set of commonly used statistical functions.

This library lib_mysqludf_stat aims to offer a complete set of commonly used statistical functions. The following functions are currently supported:

  1. stat_accum_int - calculates a running sum of integer values.
  2. stat_accum_double - calculates a running sum of double (floating point) values.
  3. stat_pmcc_sample - computes the sample correlation using Pearson's product moment correlation coefficient (aggregate).
  4. stat_ptbis_sample - computes the sample point biserial correlation (aggregate).

Use lib_mysqludf_stat_info() to obtain information about the currently installed version of lib_mysqludf_stat.

stat_accum_int

stat_accum_int accumulates the argument integer value and calculates a 'running sum'. It is like sum except that it calculates the intermediate result for each row instead of the final result for the entire group or set.

Syntax

stat_accum_int(x)

Parameters and Return Values

x
An integer value that is to contribute to the running sum. If x is not an integer type, it is coerced into one. If x is NULL, it will be ignored.
returns
The sum of the current and prior NOT NULL arguments passed to stat_accum_int.

Examples

Cumulative frequency:

SELECT frequency
,      stat_accum_int(frequency) as cumulative_frequency    
FROM   observations
    

Emulation of Oracle rownum pseudocolumn:

SELECT stat_accum_int(1) as rownum  
FROM   aTable
    

See Also

stat_pmcc_samp

The stat_pmcc_samp aggregate function calculates Pearson's product moment correlation coefficient. Correlation indicates the strength of the relationship between two variables. Correlation ranges from 1 (strong relationship) to -1 (strong inverse relationship) and correlation around 0 indicates the absence of a relationship. An example of variables with a strong correlation:

There are several metrics to compute correlation, each with their own specific applications. Pearson's correlation works best in those cases where both variables follow the normal distribution. Use point-biserial to calculate the correlation in those cases where on the variables follows a normal distribution and the other variable is dichotomous (for examples: true or false; pass or fail)

Syntax

stat_pmcc_samp(x,y)

Parameters and Return Values

x
An real value expressing the abundance of one variable. If x is not an real type, it is coerced into a real type. If x is NULL, that particular x,y pair is not used in the calculation.
y
An real value expressing the abundance of the other variable. If y is not an real type, it is coerced into a real type. If y is NULL, that particular x,y pair is not used in the calculation.
returns
A real value between 1 and -1. The closer the correlation is to 1, the stronger the relationship between the variables. The closer the correlation is to 0, the weaker the relationship between the variables. Negative correlations expres an inverse relationship between the variables.

Examples

Relationship between smoking and age:

SELECT stat_pmcc_samp(age,sigarettes_per_day) 
FROM   smokers
    

Relationship between smoking and blood pressure:

SELECT stat_pmcc_samp(sigarettes_per_day,blood_pressure) 
FROM   smokers
    

See Also

stat_ptbis_samp

The stat_ptbis_samp aggregate function calculates the point biserial correlation coefficient. Correlation indicates the strength of the relationship between two variables. Correlation ranges from 1 (strong relationship) to -1 (strong inverse relationship) and correlation around 0 indicates the absence of a relationship.

There are several metrics to compute correlation, each with their own specific applications. Use point-biserial to calculate the correlation in those cases where on the variables follows a normal distribution and the other variable is dichotomous. In those cases where both variables follow the normal distribution, use Pearson's correlation coefficient.

Syntax

stat_ptbis_samp(x,y)

Parameters and Return Values

x
An real value expressing the abundance of one variable. If x is not an real type, it is coerced into a real type. If x is NULL, that particular x,y pair is not used in the calculation.
y
An integer value expressing the mode of the other variable. There are two modes: either y is 0, or y is not equal to 0. If y is not an integer type, it is coerced into an integer type. If y is NULL, that particular x,y pair is not used in the calculation.
returns
A real value between 1 and -1. The closer the correlation is to 1, the stronger the relationship between the variables. The closer the correlation is to 0, the weaker the relationship between the variables. Negative correlations expres an inverse relationship between the variables.

Examples

Relationship between smoking and age:

SELECT stat_ptbis_samp(age,sigarettes_per_day) 
FROM   smokers
    

Relationship between smoking and blood pressure:

SELECT stat_ptbis_samp(sigarettes_per_day,blood_pressure) 
FROM   smokers
    

See Also