Skip to content

Kernel Density Estimation

Kernel density estimation (KDE) is a non-parametric method to estimate the probability density function of a random variable through kernel smoothing [9].

The kernel density estimate f^h of a univariate density f based on a random sample X1,,Xn is defined as

f^h(x)=n1i=1nh1K{xXih},

where h is the so called bandwidth and K is the kernel function. The kernel function is assumed to be a symmetric probability density and is set to be a Gaussian density in UncertaintyQuantification.jl. The bandwidth h also called the smoothing parameter has a strong effect on the resulting density estimate. There are various different methods to select an optimal bandwidth. Here we have decided to apply the method developed by Sheather & Jones [10] for its excellent performance and straightforward implementation.

The kernel density estimation is exposed through the EmpiricalDistribution. Since the bandwidth is automatically selected only a vector containing the data must be passed to the constructor.

julia
 d = EmpiricalDistribution(x)

Internally, we perform the kernel density estimation to obtain the PDF of the distribution. From this PDF we estimate the support of the distribution through numerical root finding. The CDF and the quantile function (inverse CDF) are interpolated from the numerical integral of the PDF. The number of points used for this interpolation (defaults to 104) can be passed to the constructor as an optional second parameter. As a ContinousUnivariateDistribution the EmpiricalDistribution can be applied the same as any of the native distributions from Distributions.jl.

Example

As an example we consider synthetic data generated from a bimodal distribution and fit the empirical distribution.

julia
x = [rand(Normal(5), 500)..., rand(Normal(10), 500)...]
ed = EmpiricalDistribution(x)

Next, we plot the normalized histogram of the data and the resulting PDF.