next up previous
Next: Local Parameter Estimation Up: Introduction Previous: Introduction

Densities, Bayes and MCMC

When fitting a parametrized model to data, there are two general approaches which may be taken. The first is to look for the set of parameters ($ \omega$) which best fit the data. This is called a point estimate of the parameters. A special case of this is Maximum Likelihood estimation where we look for the set of parameters which maximize the probability of seeing this realization of the data given the model and its parameters.

$\displaystyle \omega_{ML}=arg \max_{\Omega}\mathcal{P}(Y\vert\omega,M),$ (1)

where $ Y$ is the data and $ M$ is the model.

The second approach is to associate a probability density function with the parameters. In the Bayesian framework, this distribution is called the posterior distribution on the parameters given the data

$\displaystyle \mathcal{P}(\omega\vert Y,M)=\frac{\mathcal{P}(Y\vert\omega,M)\mathcal{P}(\omega)}{\mathcal{P}(Y\vert M)}.$ (2)

This posterior density allows us to ask the question of any hypervolume $ \mathcal{V}$ in parameter space $ \Omega$, ``What is our belief given the measured data that the true value of $ \omega$ is in $ \mathcal{V}$?''. In the one dimensional case, this question becomes, for any $ (\omega_0,\omega_1)$ ``What is our belief that $ \omega$ lies between $ \omega_0$ and $ \omega_1$?''. These questions, and their answers, represent the uncertainty we have in the values of the parameters $ \omega$.

Unfortunately, calculating this pdf is seldom straightforward. The denominator in equation 2 is

$\displaystyle \mathcal{P}(Y\vert M)=\int_{\Omega}\mathcal{P}(Y\vert\omega,M)\mathcal{P}(\omega)d\omega,$ (3)

an integral which is often not tractable analytically. To make matters worse, this joint posterior pdf on all parameters is often not the distribution which we are most interested in. We are often interested in the posterior pdf on a single parameter or an interesting subset of parameters. Obtaining these marginal distributions again involves performing large integrals,

$\displaystyle \mathcal{P}(\omega_I\vert Y,M)=\int_{\Omega_{-I}}\mathcal{P}(\omega\vert Y,M)d\omega_{-I},$ (4)

where $ \omega_{I}$ are the parameters of interest and $ \omega_{-I}$ are all other parameters. Again these integrals are seldom tractable analytically.

One solution to this problem is to draw samples in parameter space from the joint posterior distribution, implicitly performing the integrals numerically. For example, we may repetitively choose random sets of parameter values and choose to accept or reject these samples according to a criterion based on the value of the numerator in equation 2. It can be shown (e.g [12]) that a correct choice of this criterion will result in the accepted samples being distributed according to the joint posterior pdf (equation 2). Schemes such as this are rejection sampling and importance sampling which generate independent samples from the posterior. Any marginal distributions may then be generated by examining the samples from only the parameters of interest. However, these kinds of sampling schemes tend to be painfully slow, particularly in high dimensional parameter spaces, as samples are proposed at random, and thus each has a very small chance of being accepted.

Markov Chain MonteCarlo (MCMC) (e.g. [12,13]) is a sampling technique which addresses this problem by proposing samples preferentially in areas of high probability. Samples drawn from the posterior are no longer independent of one another, but the high probability of accepting samples, allows for many samples to be drawn and, in many cases, for the posterior pdf to be built in a relatively short period of time.


next up previous
Next: Local Parameter Estimation Up: Introduction Previous: Introduction
Tim Behrens 2004-01-22