We are going to approximate the distribution in equation 3 by replacing the discrete labels, , with continuous weights vectors, :

where } and is the continuous weights vector at voxel . Equation 9 only approximates equation 3 if we apply certain constraints to the continuous weights vectors. If we choose a prior on the continuous weights vector, , with the constraints that and , then as tends to delta functions at and , then equation 9 will tend to equation 3. Therefore, to apply these constraints the prior we use is:

where:

and , where crucially is a deterministic relationship by which and are related by the logistic transform:

The normalising constant in the logistic transform ensures that the condition is met. This expression also ensures that , if and only if . Figure 1 shows how the logistic transform produces an approximation to the delta functions as gets smaller. We fix the value of to 0.05 whilst bounding , this ensures that we get the desired approximation to delta functions at 0 and 1, whilst ensuring that we can compute without causing overflow.

To summarise, we now have two vectors of continuous weights at each voxel, and . are weights which have a prior on them which is uniform on the real line. We then use the logistic transform to deterministically map the weights to at each voxel. Then, are the continuous weights which represent approximations to the discrete labels with delta functions at 0 and 1.