## Modeling Using Priors – A guide on the conception of an image denoising model

A model that works perfectly fine in every single case. Unfortunately, models that get closer to this tend to be very complex and, as a consequence, unpractical. Sometimes the best thing is to focus on a particular case and use all the assumptions that come with it in your favor and then devise a model to solve this particular class. In the following, we are going to work through image restoration models. The concepts introduced here will be motivated using the problem of Image Denoising.

## An example from politics

Try to answer the question: Who is going to be the next president of Brazil?

It could be very hard to answer, even if you are a Brazilian. Nonetheless, you can be more confident on your answer if additional pieces of information are given. For example, your guess will be different in each of the scenarios below.

1. Candidate Beto is involved in a corruption scheme.
2. Candidate Amanda has 70% of votes one week before elections.
3. The current government passes through a serious economic and political crisis and the current vice-president is an unscrupulous person.

The key idea here is to use available information or to assume reasonable hypotheses for your problem that helps you to solve it.

## Image denoising problem

You are given an image in which some of its pixels are corrupted by noise. By noise, I mean any set of unexpected pixels that together do not add any information to the description of the image. Those noise pixels can occur, for example, as a consequence of an interference in the data transmission or due to the quality of the equipment used.

In the denoising problem, one has as input a noisy image $v$ and wants to recover the image without noise (the perfect image) $u$. By denoting the noise component as $\eta$, the problem can be written mathematically as:

$v = u + \eta.$

The noise component $\eta$ is well modeled by a Gaussian with zero mean and unit variance (the Central Limit Theorem offers one reason why).

$\eta \sim \mathcal{N}(0,1).$

The denoising problem is an example of an inverse problem. It is straightforward to corrupt an image. You just need to add a noise component. One can generate countless corrupted images by this process, and that’s one of the reasons why it is so difficult to recover the original perfect image. In general, inverse problems are difficult to solve.

## A bayesian model for image denoising

Let $A$ and $B$ to be two events in some sample set $U$. Recall that:

$\displaystyle \Pr(A | B) = \frac{ \Pr( A \wedge B) }{\Pr(B)} \,, \quad\Pr(B | A) = \frac{ \Pr( A \wedge B) }{\Pr(A)} .$

Then, Baye’s theorem says:

$\Pr(A | B) \cdot \Pr(B) = \Pr(B | A) \cdot \Pr(A).$

We can use Baye’s theorem to devise a way to work out the denoising problem. The strategy will be to find the value $u^\star$ that maximizes the probability of $u^\star$ conditioned to the noisy image $v$. In mathematical notation:

$\displaystyle u^\star = \text{arg}\max_u{\Pr(u|v)}.$

This technique is also known as Maximum a Posteriori (MAP).

At a first glance, this approach seems useless, as we don’t have a direct expression for $\Pr(u|v)$. However, by using Baye’s theorem, we can derive one. Firstly, note that we can write, indeed, the conditioned probability in the other direction:

$\displaystyle \Pr(v | u ) = \Pr( \eta = v-u ) = \frac{1}{\sqrt{2\pi}}e^{-\frac{\parallel v-u \parallel^2}{2}}.$

Applying Baye’s theorem:

$\displaystyle \Pr(u|v) = \frac{ \Pr(v|u) \cdot \Pr(u) }{\Pr(v)}.$

The value of $\Pr(v)$ plays no role in the maximization of $\Pr(u|v)$, so we can just ignore it. The probability $\Pr(u)$, on the other hand, must be supplied. Here takes place our prior assumption.

I claim that the perfect image $u$ has small total variation $TV(u)$. The formal definition of the total variation of a function $f$ is not so straightforward (read more here), but it is enough for us to say that:

$\displaystyle TV(u) = \int{|\nabla u|}.$

The total variation assumption is reasonable because it is natural to imagine that an image is a set of regions (objects) in which its intensity level (or color) variation is zero or close to zero in its interior (see Mumford-Shah functional).

In order to use MAP, we need to attach a probability distribution to the total variation energy $TV(u)$. There are reasons that lead us to use the Gibbs Distribution (more about those reasons later):

$\displaystyle \Pr(u) = e^{-\beta \cdot TV(u)}.$

By replacing it in our maximization problem:

$\displaystyle \text{arg}\max_u{\Pr(u|v)} =\text{arg}\max_u{ \left ( \: \Pr(v |u) + e^{-\beta \cdot TV(u)} \: \right ) }.$

Now we apply a series of tricks to make our life easier. First, as $log$ is monotonically increasing we can rewrite the expression above as:

$\begin{array}{ll} u^\star &= \displaystyle \text{argmax}_u{ \left ( \: \log \left ( \Pr(v |u) \right ) -\beta \cdot TV(u) \: \right ) } \\[0.3in] &= \displaystyle \text{argmax}_u{ \left ( \: \log ( \frac{1}{\sqrt{2\pi}} ) - \frac{\parallel v-u \parallel^2}{2} -\beta \cdot TV(u) \: \right ) }. \end{array}$

The term inside the $log$ is a constant, and can be eliminated. Further, we can remove the minus signs by computing the equivalent argmin problem:

$\displaystyle u^\star = \text{argmin}_u{ \left ( \: \frac{ \parallel v-u \parallel^2}{2} + \beta \cdot TV(u) \: \right ) }.$

## Other Priors

A prior is a special information available to the problem you want to solve. Different priors will work better for different problems. For example, if one wants to segment an image in which one priorly knows that the image contains straigth-line contours, maybe it is a good idea to use a curvature prior. The exploited property here is that straight lines have zero curvature.

## On the Gibbs distribution

In order to use the MAP (maximum a posteriori) framework, one first defines an energy function (the prior energy, for example, total variation), and then uses the Gibbs distribution to have an expression for $\Pr(u)$. The question that now arises is: What is the motivation to use the Gibbs Distribution? The answer is not clear to me at the moment, but I have some clues that I am going to investigate further.

The origins of Gibbs distribution in image models dates back to a paper from Hassner and Sklansky (The Use of Markov Random Fields as Models of Texture). To make the connection with Gibbs distribution, there exists a result that states the equivalence between Markov Random Fields (MRF) and Gibbs Distribution (link).