Bayesian Updating
Bayesian updating is a method of statistical inference where Bayes' theorem is used to update the probability distributions of model parameters based on prior beliefs and available data.
Bayes' Theorem
Bayes' theorem is defined as
where
This term serves as a normalizing constant for the posterior probability. However, as it can be difficult or even impractical to calculate it is often disregarded. Instead, only the product of likelihood and prior is used, as it is proportional to the posterior probability
Based on this relationship, the posterior probability can be approximated without calculation of
Markov Chain Monte Carlo
Markov chains are sequences of variables, where each variable is dependent on the last. In a discrete space
A Markov chain is called ergodic or irreducible when it is possible to reach each state from every other state with a positive probability. Markov chains that are ergodic and time-homogeneous, i.e. the probability between states doesn't depend on time, and have a unique stationary distribution such that
The goal of Markov chain Monte Carlo (MCMC) sampling methods is to construct a Markov chain, whose stationary distribution is equal to the posterior distribution of Bayes' theorem. This will result in samples generated from the Markov chain being equivalent to random samples of the desired distribution. The very first MCMC algorithm is the Metropolis-Hastings (MH) Algorithm.
Metropolis Hastings
The Metropolis-Hastings algorithm, was published in 1970 by W. K. Hastings [21]. The MH algorithm is a random-walk algorithm that provides a selection criteria for choosing the next sample
Substituting the posterior with Bayes' theorem yields
Note, how the normalization constant
In practice, a random number
As an example consider a synthetic data sequence Y
as the outcome of 100 Bernoulli trials with unknown success probability p
(here p=0.8).
n = 100
Y = rand(n) .<= 0.8
The likelihood function which, similar to a Model
must accept a DataFrame
, follows a Binomial distribution and returns the likelihood for each row in the DataFrame
as a vector. The prior is chosen as a beta distribution with
function loglikelihood(df)
return [
sum(logpdf.(Binomial.(n, df_i.p), sum(Y))) for df_i in eachrow(df)
]
end
logprior = df -> logpdf.(Beta(1,1), df.p)
UncertaintyQuantification.jl implements a variant of the MH algorithm known as single-component Metropolis-Hastings, where the proposal and acceptance step is performed independently for each dimension. To run the algorithm, we must first define the SingleComponentMetropolisHastings
object which requires the UnivariateDistribution
as a proposal
, a NamedTuple
for x0
which defines the starting point of the Markov chain, the number of samples and the number of burn-in samples. The burn-in samples are used to start the chain but later discarded.
proposal = Normal(0, 0.2)
x0 = (;p=0.5)
n_samples= 4000
burnin = 500
mh = SingleComponentMetropolisHastings(proposal, x0, n_samples, burnin)
SingleComponentMetropolisHastings(Normal{Float64}(μ=0.0, σ=0.2), (p = 0.5,), 4000, 500, true)
The final optional argument islog=true
can be omitted when passing the log-likelihood and log-prior. When set to false
, the algorithm will log
for both the likelihood and prior. Finally, the algorithm is executed using the bayesianupdating
function. This function returns the samples and the average acceptance rate.
mh_samples, α = bayesianupdating(logprior, loglikelihood, mh)
The following figure shows a histogram of the samples returned by the Metropolis-Hastings algorithm. For comparison, we also plot the analytical posterior distribution obtained using conjugate priors [22].
As a second example we will attempt to sample from a bimodal target distribution in two dimensions. The prior is uniform over
prior = Uniform(-2, 2)
logprior = df -> logpdf.(prior, df.x) .+ logpdf.(prior, df.y)
N1 = MvNormal([-0.5, -0.5], 0.1)
N2 = MvNormal([0.5, 0.5], 0.1)
loglikelihood =
df -> log.([0.5 * pdf(N1, collect(x)) + 0.5 * pdf(N2, collect(x)) for x in eachrow(df)])
n = 2000
burnin = 500
x0 = (; x=0.0, y=0.0)
proposal = Normal()
mh = SingleComponentMetropolisHastings(proposal, x0, n, burnin)
mh_samples, α = bayesianupdating(logprior, loglikelihood, mh)
The scatter plot clearly shows that the MH algorithm has converged to only one of the two peaks of the bimodal target (contour also plotted). In fact, this is a known weakness of the MH algorithm. However, there are a number of alternative MCMC methods that aim to solve this problem. One of these methods, known as Transitional Markov Chain Monte Carlo [23], will be presented next.
Transitional Markov Chain Monte Carlo
The Transitional Markov Chain Monte Carlo (TMCMC) method [23] is an extension of the MH algorithm where instead of directly sampling a complex posterior distribution, samples are obtained from a series of simpler transitional distributions. The samples are obtained from independent single-step Markov Chains. The transitional distributions are defined as
where
The complete TMCMC algorithm can be summarized as
Set
and . Sample . Set
. Compute the next tempering parameter
. Determine the weights
. Generate a single-step Markov chain for each
. Repeat steps (2) to (5) until (and including)
.
Returning to the bimodal example, this time using the TMCMC algorithm. In order to apply a different MCMC algorithm we only need to construct a TransitionalMarkovChainMonteCarlo
object and pass it to the bayesianupdating
method. The definition of prior and likelihood remains the same. In difference to the SingleComponentMetropolisHastings
the log evidence is returned instead of the acceptance rate.
tmcmc = TransitionalMarkovChainMonteCarlo(RandomVariable.(Uniform(-2,2), [:x, :y]), n, burnin)
tmcmc_samples, S = bayesianupdating(logprior, loglikelihood, tmcmc)
The resulting scatter plot shows how TMCMC is able to sample both peaks of the bimodal target distribution. The standard implementation of TMCMC uses a multivariate Gaussian proposal distribution centred at each SingleComponentMetropolisHastings
resulting in SingleComponentTransitionalMarkovChainMonteCarlo
.
Note
SingleComponentTransitionalMarkovChainMonteCarlo
is currently not available but planned for implementation.
For convenience, the prior can be automatically constructed from the random variables passed to TransitionalMarkovChainMonteCarlo
.
tmcmc = TransitionalMarkovChainMonteCarlo(RandomVariable.(Uniform(-2,2), [:x, :y]), n, burnin)
tmcmc_samples, S = bayesianupdating(loglikelihood, tmcmc)
Maximum likelihood and maximum a posteriori estimates
Instead of using sample-based methods it is also possible to calculate the local maxima of likelihood (
Generally, the MAP estimate can be considered as regularized version of the MLE, since the prior distribution is used to constrain the estimates. Both estimates are found with optimization schemes and thus come with the usual drawbacks, including convergence to local maxima. Both methods are therefore sensitive to initial conditions. Further note that both estimates do not calculate the mean but the mode of the respective distributions, i.e. for distributions that have a high variance these estimates do not provide much information about the parmater distribution.
The implementation in UncertaintyQuantification.jl is analgous to the sampling methods. A MaximumAPosterioriBayesian
or a MaximumLikelihoodBayesian
object is created that takes the important settings as input. These are the prior, the optimization method to use and the starting points for optimization. For convenience, multiple points can be specified here. The optimization method is given as a string such that the Optim.jl
package does not need to be included in the script. Finally, the bayesianupdate
-function can be used again with the known syntax, i.e. a likelihood, UQ-Model
-array and the desired object for the method are given and a DataFrame
is returned. The log-densities are given in the DataFrame
as the variable :maxval
. Below is an example for the implementation.
μ = 0
σ = .2
prior = RandomVariable.(Normal(μ,σ), [:x, :y])
N1 = MvNormal([-0.5, -0.5], 0.1)
N2 = MvNormal([0.5, 0.5], 0.1)
loglikelihood =
df -> log.([0.5 * pdf(N1, collect(x)) + 0.5 * pdf(N2, collect(x)) for x in eachrow(df)])
priorFunction =
df -> prod.([pdf(Normal(μ, σ), collect(x)) for x in eachrow(df)])
x0 = [[.1, .1],[-.1,-.1]]
MAP = MaximumAPosterioriBayesian(prior, "LBFGS", x0)
MLE = MaximumLikelihoodBayesian(prior, "LBFGS", x0)
mapestimate = bayesianupdating(loglikelihood, UQModel[], MAP)
mlestimate = bayesianupdating(loglikelihood, UQModel[], MLE)
contour!(xs, ys, likelihood_eval.*prior_eval/.05, lim = [-2,2], c = :black)
scatter!(mapestimate.x, mapestimate.y; lim=[-2, 2], label = "MAP estimate", c = :black)
scatter!(mlestimate.x, mlestimate.y; lim=[-2,2], label = "ML estimate", c = :red)
plot!([0,0],[0,0],c = :red, label = "Likelihood")
plot!([0,0],[0,0],c = :blue, label = "Prior")
plot!([0,0],[0,0],c = :black, label = "Posterior")
┌ Warning: Multiple series with different color share a colorbar.
│ Colorbar may not reflect all series correctly.
└ @ Plots ~/.julia/packages/Plots/Ec1L1/src/backends/gr.jl:528
┌ Warning: Multiple series with different line color share a colorbar.
│ Colorbar may not reflect all series correctly.
└ @ Plots ~/.julia/packages/Plots/Ec1L1/src/backends/gr.jl:528
The figure shows the (bimodal) likelihood in red and the prior distribution in blue. The difference in MAP and MLE is clearly visible, as the MLE conincides directly with the maxima of the likelihood, while MAP is shifted in the direction of the prior mean.
Bayesian calibration of computer simulations
UncertaintyQuantification.jl allows for complex computer models to be included in the Bayesian updating procedure, if for example one wishes to infer unknown model parameters from experimental data of model outputs. Several models can be evaluated in order to compute the likelihood function, by passing a vector of UQModel
s to the bayesianupdating
method. These will be executed before the likelihood is evaluated and models outputs will be available in the DataFrame
inside the likelihood function. There is no restriction on the type of UQModel
used. For example, it is possible to use an ExternalModel
and call an external solver.
For a complete example refer to the Inverse eigenvalue problem.