I've been busy trying to let the decoder output the bits and their interpretation of all the (macro)block parts. And I think I have succeeded. I still need to clean it up and make it compatible/configurable with the existing codebase. But I wanted to share some details so others (mainly @IainCole) can already start thinking/working on an extention of the online tool. Also for others to get used to the idea.
I think it will bring us to the next level.
Thanks, arnezami, great work! I've been using your previous version with my automated bit flipper, and it still doesn't work. But that's due to the objective function not yet being good enough, and maybe (hopefully not) due to the search algorithm being too greedy. I'll keep working on it, and I'll integrate the new version.
In the mean time, I've seen a few posts by people saying that they have no idea what's going on. That's usually my position (especially with ISS stuff, acronym soup alert!), but now I'm coming from the other side, so I'll make an attempt at an explanation of all the stuff that's going on here. Hopefully, that will allow more people to help fix things. I'll probably make a mistake here and there and/or oversimplify something, so if you spot an error, let me know and I'll fix the text.MPEG4
The damaged video stream that SpaceX made available is in the MPEG4 format. MPEG4 is a standard that says how you should convert a video into a list of ones and zeros and back, in such a way as to make that list nice and short so you can download it quickly. A video here is a sequence of pictures, each of which consists of a rectangle of pixels. Each of the pixels has a colour, expressed as the amount of Red light, the amount of Green light, and the amount of Blue light. That's what we're trying to pack into the MPEG4 file. What I'm going to describe here is the procedure for converting video from the Falcon 9 camera into the MPEG4 bitstream that we have. To turn this back into a picture, you just do everything in reverse.Downsampling and colour space conversion
So, we take the first picture, and we split it up into a set of 16x16 pixel squares, called macroblocks (MBs), which we can process separately. The SpaceX video has 44 MBs horizontally and 30 MBs vertically. We convert each pixel in the macroblock from Red-Green-Blue (RGB) into a brightness value (Luma or Y) and two additional values that describe the colour (Chroma or C). Since the human eye doesn't see as much detail in colour as it does in brightness, we the average each 2x2 pixel block of chroma values, halving the resolution of the two sets of chroma values to 8x8. So, of the original 16x16x3 = 768 numbers in the macroblock, we now have 16x16 + 8x8 + 8x8 = 384 numbers. That's a 50% reduction in space already! If we split the 16x16 block of Luma numbers into 4 8x8 blocks, then we have a total of 6 8x8 blocks per macroblock.Discrete Cosine Transform
Next, we apply to each 8x8 block a mathematical trick called the Discrete Cosine Transform (DCT). The outcome of the DCT is another 8x8 block of numbers, one of which is simply the average of all 64 numbers, and the other 63 describe the differences (e.g. how much brighter the left half is than the right half, and 62 other patterns
). The average is called the DC value, the differences are the AC values. Another way of interpreting this is to say that the DC value describes the overall brightness and colour, while the AC values describe the texture of the block.
If you look at arnezami's BITLOG output
, then you see dc_lum0, dc_lum1, dc_lum2, dc_lum3, dc_chrom4 and dc_chrom5. These are the DC values of the six blocks in the macroblock.Quantisation
In itself, the DCT does not save any space. The reason we do this DCT, is that it turns out that for most blocks in your typical image, most of the resulting 64 values are 0, or close to 0. Also, small errors in these numbers do not make the image look much worse. So, the next step is to quantise these values, which basically means dividing them by a given number, and rounding to a whole number again. Smaller values use less digits (bits, in this case), so that saves space, and if we store the value that we divided by, then the player can multiply by it and get almost the correct values to put into its inverse DCT.
If you quantise too much, then you get those typical artifacts around sharp edges that you sometimes see in JPEG images. Do it right, and you can save quite a lot of space with little degradation in image quality. In this post
, another problem was found: transmission damage resulted in the decoder reading a quantisation factor of 4, rather than 1. Thus, it multiplied in this case the chroma (colour) values by 4 instead of by 1, resulting in overly intense colours. Changing the 4 back to a 1 fixed the image.DC and AC prediction
The basic idea of the DCT is that all those pixels in a block usually have quite a lot in common (DC value), so it helps to store what they have in common once, and then add only a short description of the few differences (the quantised AC values) here and there. After we've done the above for all the macroblocks in an image, we can try to apply the same trick across the macroblocks. For example, adjacent macroblocks often have roughly the same colour, and thus similar DC values for their six blocks. If we order the macroblocks in reading order (left-to-right and top-to-bottom), then we can store the difference of the DC values of the six blocks relative to the corresponding ones for the left or top neighbour macroblock. This again results in shorter numbers and space savings. Actually, since texture is also often similar between adjacent macroblocks, we can do the same with the AC values (if ac_pred
equals 1, that's what's happening). And with the quantisation multiplier: if you see a dquant
value then that's a difference as well.
It gets even better. Sometimes, the differences between a current block and the corresponding block in a neighbouring MB that we're using for reference, are so small that we can just drop them altogether. In that case, we don't need to store anything about this block at all. The cbpc
(for chroma blocks) and cbpy
(luma blocks) values specify for which blocks in this macroblock information is stored (see tables B-6 through B-11 in the standard document linked from the wiki).
This does introduce a dependency between macroblocks. If one macroblock gets damaged and is decoded incorrectly, all the other macroblocks that are encoded as differences relative to this one will also be wrong. Overwriting the damaged macroblock with one that looks about right gives the dependent macroblocks a good reference again, and if there are no other errors will allow them to be decoded (more or less) correctly. In this way, a single -mmb command, which disables and/or fixes broken macroblocks, can fix a whole chunk of the image. The -mmb -1:x:y and similar commands do different kinds of resetting broken blocks.Variable length coding
As you've probably noticed, we're moving a lot of numbers around here. We've used various tricks already to make those numbers smaller, and thus shorter. There's another trick we can apply if some numbers occur more often than others, and that is to translate each number into a code. A code is simply another number, but the trick is that commonly occurring numbers have a shorter code, while uncommon numbers have a longer code. Thus, we lose a bit of space on uncommon numbers, but gain a lot on common numbers. MPEG4 refers to this as variable length coding (VLC).
As a result of this, most numbers that you see in arnezami's debug output do not actually occur in the file as the corresponding binary number, and a single flipped bit can result in a very different value. In fact, it can completely mess up decoding if the flipped bit results in a shorter or longer code being read. The decoder will then start reading the next value too early or too late, causing another misread, lots of error messages, and colourful junk for output. Sometimes, this can be fixed by manually restarting the decoding process of the next (or a later subsequent) macroblock at the correct position. That's what the -mmb x:y:position commands do. Usually, you want to combine this with a reset (see above) just before the restart, to give the differential values something to reference.I-frames and P-frames
So, if you've done the above for a whole picture, then you have an I-frame. I-frames are frames that stand alone, they contain all the information to decode the whole picture. As far as we currently know (and we're pretty sure I think) the SpaceX video contains 15 such I-frames, every 20th frame is one. The other 269 frames are P-frames. A P-frame uses the store-only-differences trick between frames. So, if there are parts of the picture that are the same as in the previous frame (e.g. the rocket body at the bottom of the image), then a P-frame will not store them at all. If there are parts that still look the same, but have moved, then the P-frame will contain a motion vector, which tells the decoder where to find it in the previous frame. So, to decode a P-frame, you need to fix the last I-frame before it, plus all the P-frames in between that I-frame and the P-frame, to get a correct reference image for the P-frame to use.Transport Stream
Doing all the above yields a list of ones and zeros that describe a video. However, it's only the video. In this case that's all we have, but usually you also have audio, and sometimes multiple sound and video tracks, subtitles, station information, etc., that all have to be combined into a single file or stream. This is where the Transport Stream (TS) comes in. It works by splitting the bits for each substream into 188-byte chunks called packets. The packets are numbered sequentially, and contain a description of which stream they belong to and what kind of content it contains. The decoder unravels this, picks out the video data, and decodes it as described above.
Thanks mainly to Shanuson and Princess, we now have the TS packets all fixed up, and I think we've also got the starting points for (almost) all the MPEG4 frames. So what remains is fixing those up. It's one epic jigsaw puzzle, but then I hear it takes several months to get to Mars, and this particular jigsaw puzzle is zero-g compatible
There is much much much more to be said about MPEG4 (the standard document linked from the wiki is 536 pages!), but I think I've covered most of the stuff that's been discussed in this thread (which is the limit of on-topicness I guess) and we've also pretty much reached the end of my current knowledge. So I'll leave it at this. If anything is unclear or wrong, please post or PM me and I'll try to fix it, or if you have questions, ask!