In this third part of my log I will talk about determining metrics from images transformed with POT, metrics which will help in solving the discontinuity problem.
The main point of "stabilizing" the POT is that given its line-by-line training, variations might occur on the real applied transform that produce "discontinuities" along the vertical axis on the transformed image. Such irregularities in the shape of discontinuities are hard to code and thus the 2D coder applied thereafter doesn't work as good as it could. As we've seen in the previous post, at low bitrates these discontinuities become noticeable. They are clearly more prominent in the BIFR compressed image than in the waterfill one. It thus seems that non-optimal rate allocation might worsen the discontinuity problem.
In order to address this issue it is of course necessary to get a measure of the possible causes of this problem. One source of discontinuities could be the image itself which naturally presents such variations that are amplified by POT. The second possibility is the transform "training" process. You will remember from the previous post that I talked about side information and the t parameter used in computing the rotation matrix. This parameter takes values between -1 and +1 and depends on the correlation between bandlines. If the two components (bandlines) are loosely correlated than the t parameter can "jump" easily from -1 to +1. This is not desirable as a t parameter which continuously jumps like this will produce discontinuities in the transformed image. In the image below, we see a plot (like the one in the previous post) of this t parameter for a specific image.
So, as mentioned in the previous post, one metric we'd like to determine is the variance of the differences of t values across different lines. Another metric would be the number of -1 to +1 (and viceversa) jumps the t parameter makes.
In the case of the lossless transform, the rotation matrix is decomposed into a product of elementary matrices, with each one being equivalent to a sequence of lifting steps. The lifting decomposition and application of KLT gives us the reversible KLT (RKLT) which guarantees the lossless nature of the transform. There are two possible lifting decompositions, depending on whether |t| >= |p| or |t| < |p|, where p = sqrt(1 - t^2). If t were to jump a lot from |t| >= |p| to |t| < |p|, or viceversa, we would again have discontinuities in the transformed image. Our third metric is thus the number of p-jumps.
I have written a C++ program which, given the side information file of an image, computes these 3 metrics. The program was run on various images in order to see how these metrics vary from image to image and determine the best approach in solving the discontinuity problem.