Kernel Density Estimation
Kernel density estimation (KDE) is a non-parametric method to estimate the probability density function of a random variable through kernel smoothing [9].
The kernel density estimate f
based on a random sample
where
The kernel density estimation is exposed through the EmpiricalDistribution
. Since the bandwidth is automatically selected only a vector containing the data must be passed to the constructor.
d = EmpiricalDistribution(x)
Internally, we perform the kernel density estimation to obtain the PDF of the distribution. From this PDF we estimate the support of the distribution through numerical root finding. The CDF and the quantile function (inverse CDF) are interpolated from the numerical integral of the PDF. The number of points used for this interpolation (defaults to ContinousUnivariateDistribution
the EmpiricalDistribution
can be applied the same as any of the native distributions from Distributions.jl.
Example
As an example we consider synthetic data generated from a bimodal distribution and fit the empirical distribution.
x = [rand(Normal(5), 500)..., rand(Normal(10), 500)...]
ed = EmpiricalDistribution(x)
Next, we plot the normalized histogram of the data and the resulting PDF.