lib_mysqludf_stat
MySQL supports a few statistical functions natively.
However, for moderately more advanced statistical analysis, you need to resort to custom queries or third party tools.
This library lib_mysqludf_stat aims to offer a complete set of commonly used statistical functions.
This library lib_mysqludf_stat aims to offer a complete set of commonly used statistical functions.
The following functions are currently supported:
stat_accum_int- calculates a running sum of integer values.stat_accum_double- calculates a running sum of double (floating point) values.stat_pmcc_sample- computes the sample correlation using Pearson's product moment correlation coefficient (aggregate).stat_ptbis_sample- computes the sample point biserial correlation (aggregate).
Use lib_mysqludf_stat_info() to obtain information about the currently installed version of lib_mysqludf_stat.
stat_accum_int
stat_accum_int accumulates the argument integer value and calculates a 'running sum'.
It is like sum except that it calculates the intermediate result for each row instead of the final result for the entire group or set.
Syntax
stat_accum_int(x)
Parameters and Return Values
x-
An integer value that is to contribute to the running sum.
If
xis not an integer type, it is coerced into one. IfxisNULL, it will be ignored. - returns
-
The sum of the current and prior
NOT NULLarguments passed tostat_accum_int.
Examples
Cumulative frequency:
SELECT frequency
, stat_accum_int(frequency) as cumulative_frequency
FROM observations
Emulation of Oracle rownum pseudocolumn:
SELECT stat_accum_int(1) as rownum
FROM aTable
See Also
stat_pmcc_samp
The stat_pmcc_samp aggregate function calculates Pearson's product moment correlation coefficient.
Correlation indicates the strength of the relationship between two variables.
Correlation ranges from 1 (strong relationship) to -1 (strong inverse relationship) and
correlation around 0 indicates the absence of a relationship.
An example of variables with a strong correlation:
There are several metrics to compute correlation, each with their own specific applications. Pearson's correlation works best in those cases where both variables follow the normal distribution. Use point-biserial to calculate the correlation in those cases where on the variables follows a normal distribution and the other variable is dichotomous (for examples: true or false; pass or fail)
Syntax
stat_pmcc_samp(x,y)
Parameters and Return Values
x-
An
realvalue expressing the abundance of one variable. Ifxis not anrealtype, it is coerced into arealtype. IfxisNULL, that particularx,ypair is not used in the calculation. y-
An
realvalue expressing the abundance of the other variable. Ifyis not anrealtype, it is coerced into arealtype. IfyisNULL, that particularx,ypair is not used in the calculation. - returns
-
A
realvalue between1and-1. The closer the correlation is to1, the stronger the relationship between the variables. The closer the correlation is to0, the weaker the relationship between the variables. Negative correlations expres an inverse relationship between the variables.
Examples
Relationship between smoking and age:
SELECT stat_pmcc_samp(age,sigarettes_per_day)
FROM smokers
Relationship between smoking and blood pressure:
SELECT stat_pmcc_samp(sigarettes_per_day,blood_pressure)
FROM smokers
See Also
stat_ptbis_samp
The stat_ptbis_samp aggregate function calculates the point biserial correlation coefficient.
Correlation indicates the strength of the relationship between two variables.
Correlation ranges from 1 (strong relationship) to -1 (strong inverse relationship) and
correlation around 0 indicates the absence of a relationship.
There are several metrics to compute correlation, each with their own specific applications. Use point-biserial to calculate the correlation in those cases where on the variables follows a normal distribution and the other variable is dichotomous. In those cases where both variables follow the normal distribution, use Pearson's correlation coefficient.
Syntax
stat_ptbis_samp(x,y)
Parameters and Return Values
x-
An
realvalue expressing the abundance of one variable. Ifxis not anrealtype, it is coerced into arealtype. IfxisNULL, that particularx,ypair is not used in the calculation. y-
An
integervalue expressing the mode of the other variable. There are two modes: eitheryis0, oryis not equal to0. Ifyis not anintegertype, it is coerced into anintegertype. IfyisNULL, that particularx,ypair is not used in the calculation. - returns
-
A
realvalue between1and-1. The closer the correlation is to1, the stronger the relationship between the variables. The closer the correlation is to0, the weaker the relationship between the variables. Negative correlations expres an inverse relationship between the variables.
Examples
Relationship between smoking and age:
SELECT stat_ptbis_samp(age,sigarettes_per_day)
FROM smokers
Relationship between smoking and blood pressure:
SELECT stat_ptbis_samp(sigarettes_per_day,blood_pressure)
FROM smokers