Author Topic: SpaceX Falcon 9 v1.1 CRS-3 Splashdown Video Repair Task Thread  (Read 851021 times)

Offline saliva_sweet

  • Full Member
  • ****
  • Posts: 508
  • Liked: 380
  • Likes Given: 1249
Alpha 3 of my transport stream search and repair tool is finished.

Were you able to find any other sub-byte shifts aside from the one we already know about?

I recall there are at least two ts packets in raw.ts in iframe 8 that are missing one bit and the next packet has an extra bit causing the following ones to be aligned again. I expect there are more in other frames.

Going to port my iframe8 changes to this new iframe8.
How do you do that?

Apparently the new and the old iframe8 are byte identical. Was hoping some of the issues would be fixed, but apparently not.

That made for an easy port.

Offline JamesH

  • Full Member
  • ****
  • Posts: 525
  • United Kingdom
  • Liked: 283
  • Likes Given: 7
Hi guys,

I just noticed that there are no B-frames in the file. Only I-frames and P-frames.

Quote
./ffprobe try1.ts -show_frames > frames.txt

Not sure yet what the implications are yet. Hopefully it will get easier that way.

Regards,

arnezami

That is very fortunate happenstance.  P frames will only have compound errors from any damage from the frames before them in the sequence.  B frames would have compound errors from both before and after them in the sequence.

Not really that 'fortunate' - there are very few if any real time encoders of the sort that would be used here that use B-frames. They need to encode in near real time, which means basing frames on frames that haven't yet appeared doesn't really work.  B-frames tends to work best when doing offline encoding. It does give a better compression ratio, but is more computationally expensive.

Online seanpg71

  • Member
  • Posts: 26
  • Liked: 21
  • Likes Given: 0
...

Going to port my iframe8 changes to this new iframe8.
How do you do that?

Apparently the new and the old iframe8 are byte identical. Was hoping some of the issues would be fixed, but apparently not.

Comparing try1 to the frames from the edit1 zip above:
iframes 1, 4, 5, 7, 8, 9 and 13 match the frames from try1 exactly.
iframe 11 and 12 match up except for a couple bitflips
iframes 2, 6, and 10 have extra bits in try1 that aren't in the edit1 (dunno if they're important bits).

So we're probably not going to get any better versions of the current frames using the new iframes in Shanuson's edit1-fixed-iframe.zip.

The new frames are exciting though.
« Last Edit: 05/13/2014 08:54 AM by seanpg71 »

Offline Kaputnik

  • Extreme Veteran
  • Senior Member
  • *****
  • Posts: 2806
  • Liked: 454
  • Likes Given: 397
A long thread but worth it. I feel like I have just been on a crash course in digital video processing!
Keep up the amazing work- the world is watching :)
Waiting for joy and raptor

Offline SwissCheese

  • Full Member
  • *
  • Posts: 164
  • Liked: 249
  • Likes Given: 80
So I quickly tried the new frame 131 (between 6 and 7)! I threw away some valid blocks so it should be possible to improve it further.

0:0:-1,
1:3:7967,0:7:-1,
12:8:23866,26:14:-1,
34:16:65834,25:20:-1,
14:21:92080,24:25:-1,
14:27:120887,10:29:-1
« Last Edit: 05/13/2014 11:10 AM by SwissCheese »

Offline gospacex

  • Senior Member
  • *****
  • Posts: 3028
  • Liked: 535
  • Likes Given: 604

To find out how many different TS-PIDs exists, I counted all different "0x47xxxx" appearances in raw_edit1.ts:
This are all which occured 20 times or more:
20 471be8
20 4733e8
21 470fe8
25 4703e9
63 474703
74 474020
81 474000
269 4743e8
4968 471fff
15016 4703e8


I think the first 4 are permutations of the last ones by bitflips.
4743e8 and 4703e8 have the same PID,
When one checks where 474703 is found, one sees that the next byte is E8. So the first 47 is the last byte of a previous packet, and therefore 474703 is not part of a TS-header.
So we end up with only 4 PIDs (13 bit long):
0x0000, 0x0020, 0x03E8,0x1FFF
Each PID has its own continuity counter, which should help deciding which packet is which.
 
With some different flagbits we end with the following TS-headers:
0x4740001W
0x4740201X
0x4703E81Y, 0x4703E83Y and 0x4743E83Y
0x471FFF1Z
with the 4 different counters W,X,Y,Z

For completeness, I counted PIDs by absolute bit position in the raw.ts, here are the top 50:

PID       HEX      COUNT 
1000    x03E8      15592   -- Video
8191    x1FFF      5070     -- Null packet
   0    x0000      84
  32    x0020      75
5096    x13E8      49       -- 1 bit off from 1000
1001    x03E9      48        -- 1 bit off from 1000
 904    x0388      40                -- 2 bit off from 1000
7144    x1BE8      30       -- 2 bit off from 1000
 984    x03D8      29                -- 2 bit off from 1000
1003    x03EB      29       -- 1 bit off from 1000
 616    x0268      28
3048    x0BE8      28
1512    x05E8      27
2000    x07D0      27
 996    x03E4      26
1008    x03F0      26
1002    x03EA      25
4095    x0FFF      24
 500    x01F4      23
1006    x03EE      23
4072    x0FE8      23
8190    x1FFE      23
 808    x0328      19
1004    x03EC      19
 992    x03E0      17
 232    x00E8      16
 488    x01E8      16
 968    x03C8      16
1016    x03F8      16
2024    x07E8      16
6120    x17E8      15
 872    x0368      14
 936    x03A8      14
4000    x0FA0      14
 680    x02A8      13
 744    x02E8      13
1005    x03ED      13
2047    x07FF      13
2536    x09E8      12
8167    x1FE7      11
8189    x1FFD      11
7807    x1E7F      10
 840    x0348      9
5119    x13FF      9
7423    x1CFF      9
8095    x1F9F      9
8127    x1FBF      9
8185    x1FF9      9
 125    x007D      8
 994    x03E2      8


All 1650 are in the attached csv.

I think your 4-PID theory is correct.

I think "ultimate fix" for ts-packet stream is to identify the nature and sequence numbers of nearly every packet. It is in fact possible despite instances of heavy local corruption.

Example how it can be done (looking at raw.hex):

00d9ce8 471fff1bffffffffffffffffffffffffffffffffffffffff...
00d9da4 071e7f1cfffffffffffce7ffaffffffffffbffe7ffffffff...
00d9e60 5e9eff1bfffffc3ff77fffffffffffff53ffffffffffffff...
00d9f1c 465eff389fcebbfd67f0ffffffbffe79ffebffaffe1fdf7f
00d9fd8 26bebae7fbffdfe73fafff646f669f7ecd5c63a4e271d4a5

above packets are null packets. last two are heavily corrupted.
Now data packets follow:

00da094 4743e8380710002d2a507e00000001e00000818007210169
00da150 4703e819cdc0466941eb5c9df5f0320710bc3a330e091ba7
00da20c c703e81a059c70c40701a982190958cd31e2f06e6605a0b3
00da2c8 470fe41bb1d1204529aa6b20c9e3161765247f8f27400625
00da384 4703e81c2b371e0c9e2220df2056a413bc0c4f06c727fd05
00da440 4703e81ddfeeb416075502c8320cc4680dab3f6dafe923dc
00da4fc 4703e81e8d3801135e80608ecd4ee8a6b831473a5c0704c6
00da5b8 4703e81f19c3e015e44317a641ab463daf6c9474f5404178
00da674 5303e810845492731ce7088df908d5c69c9d8650f83de0c9
00da730 c0aa09d1aff4edb39423532036d6414be3bff3774fec9ff7
00da7ec 3870d4b906f3dc1101ac21f8154e663f444e7a684e908e5f
00da8a8 98167426424d9a350043e7e988b7648e76139028fef4b339
00da964 5060f71f0180f1b65c02ad0414880a6a0594d1211547cc01
00daa20 8e07902a9908c0d17d52b1e4cd2a16ed7895f5209a251ca7

last four packets above are heavily corrupted - header is completely
unrecognizable, but they *must be* data packets, otherwise seq numbers
don't match: there is 8,9,a,b,c,d,e,f,0,1 sequence before them,
and below the same sequence continues: 6,7,8,9,...
So these four packets above *have to be* 2,3,4,5,
and since seq numbers are counting independently for different PIDs,
they must be data packets.

00daadc 4703e816f0cd370cc3300b7a55347680436640e0c8456c32
00dab98 4703e81752d05c5b6234200ae027f951e53499e182195038
00dac54 4703e81893671ee0c8058f24c320c9ce0b250c9cf878496e
00dad10 4703e81973877ea096f1181004e2301e774163c04f72c038
00dadcc 4703e81a40b2a83cdc8e085b9d3413de15028243200af9e7
00dae88 4703e81bb9ec065fc3da74f299fd281e0b551ffcd3aab013
00daf44 4703e81c5938420482ded1e067066230400bdd77cf833832
00db000 4f03d81d380ff2904f9ae02286676e2bd40704b73e1a17e9
00db0bc 4703e8127c77ab0958cc54ac742ce54c00c3a0b2fd029871

Corrupted seq no above, should be 'e'.

00db178 471fff10ffffffffffffffffffffffffffffffffffffffff
00db234 4703e81f82ba45e0090670958f067b80dac02da97f1904f8

As you see, every packet can be identified in this example, despite
some headers being shot to hell.
« Last Edit: 05/13/2014 12:14 PM by gospacex »

Offline Jakusb

  • Full Member
  • ****
  • Posts: 580
  • NL
  • Liked: 301
  • Likes Given: 98

Offtopic: An updated version of the sequence from video I-Frames that I posed some pages ago, now with the legs clearly moving!

Could someone maybe make a small addition where in top right corner per frame an estimation from how stage would look from the side?

Offline theshadow27

  • Member
  • Posts: 28
  • Liked: 27
  • Likes Given: 7

last four packets above are heavily corrupted - header is completely
unrecognizable, but they *must be* data packets, otherwise seq numbers
don't match: there is 8,9,a,b,c,d,e,f,0,1 sequence before them,
and below the same sequence continues: 6,7,8,9,...
So these four packets above *have to be* 2,3,4,5,
and since seq numbers are counting independently for different PIDs,
they must be data packets.


Careful with this assumption - the continuity counter/sequence number is just 4 bits. It is not special nor immune to noise than any other four bits in the header or data. Yes, including it in the match will increase the chances of getting the right PID, but it's not a magic bullet in itself.

Offline saliva_sweet

  • Full Member
  • ****
  • Posts: 508
  • Liked: 380
  • Likes Given: 1249
Careful with this assumption - the continuity counter/sequence number is just 4 bits. It is not special nor immune to noise than any other four bits in the header or data. Yes, including it in the match will increase the chances of getting the right PID, but it's not a magic bullet in itself.

The info is not in the 4 bits of an individual counter, but the sequence of thousands of them. The question is whether including completely garbled packets does more good or harm.

Offline mvpel

  • Full Member
  • ****
  • Posts: 1116
  • New Hampshire
  • Liked: 1280
  • Likes Given: 1676
Careful with this assumption - the continuity counter/sequence number is just 4 bits. It is not special nor immune to noise than any other four bits in the header or data. Yes, including it in the match will increase the chances of getting the right PID, but it's not a magic bullet in itself.
I don't think he's trying to say that, but rather merely pointing out that there's a second dimension from which the correct header can be derived, kind of like a macroblock referring to the one above or to the left. There's only four PIDs in the stream, and only two which occur frequently (0x03e8 and 0x1fff), so even with a completely trashed header like 0x5060F71, pulling these known quantities together you can figure out with a very high level of confidence that it's supposed to be 0x4703E814.

Of course, it might be 0x34, not 0x14, if it's an adaptation packet, but there's also context available to figure that out too - such as if the next byte after the header is greater than 184 (0xB8) since that would mean the adaptation field is longer than the rest of the packet and so either that byte is wrong or the 0x34 should be 0x14.

Now whether the likely completely trashed payload that goes with that completely trashed header will be of any use is another question, but at least we have the payload.

Of course this doesn't help much if you go too far beyond 16 packets in either direction (3008 bytes), but I don't think we've seen that much contiguous unrecognizable data anywhere in the transport stream.
"Ugly programs are like ugly suspension bridges: they're much more liable to collapse than pretty ones, because the way humans (especially engineer-humans) perceive beauty is intimately related to our ability to process and understand complexity. A language that makes it hard to write elegant code makes it hard to write good code." - Eric S. Raymond

Offline SwissCheese

  • Full Member
  • *
  • Posts: 164
  • Liked: 249
  • Likes Given: 80
I updated frame 131 (and can finally make .png as output):

0:0:-1,
7:2:5895,4:7:-1,
11:7:20490,26:14:-1,
18:15:56376,13:16:-1,
14:16:62505,20:16:-1,
21:16:64068,27:16:-2,28:16:-1,
32:16:65707,25:20:-1,
14:21:92080,24:25:-1,
14:26:117098,41:26:-1,
14:27:120887,10:29:-1


Offline Shanuson

  • Full Member
  • **
  • Posts: 258
  • Liked: 171
  • Likes Given: 411
Here are the best results on the first 2 I-frames IF0 and IF 0.5 and the ts-file where they came from:


Offline Okie_Steve

  • Full Member
  • **
  • Posts: 265
  • Oklahoma
  • Liked: 71
  • Likes Given: 101
As you see, every packet can be identified in this example, despite
some headers being shot to hell.
Interesting approach. Do you see more 0->1 or 1->0 bit flips are they about equal? If non-random then the people doing genetic bit flips might want to bias their attempts.

Also, might be interesting to see an "XOR" file the same size as the input that could be applied to generate the "repaired" headers if someone wants to look for patterns and apply them to the data too.

Offline arnezami

  • Full Member
  • **
  • Posts: 282
  • Liked: 262
  • Likes Given: 341
@Michael Niedermayer:

I've been investigating the possibility of automatically setting all L1,L2,L3,L4,C1,C2 values for all macroblocks for a frame. Basicly inputting an mmb and letting a script detect "bad lines" in the picture. Moving from the right-bottom corner to the upper-top.

I'm starting to believe (using some extra logging and some manual experimentation) that this might actually be possible. If it does this would improve the quality of the frames quite dramatically and would save a lot of manual work!

However there is one thing that I find difficult to do myself. And I think you might be able to help us. Instead of using this kind of setting:

15:14:-1:0:-40:0:0:0:20

where I replace the current macroblock with a blank one and change the DC values.

But what I really want to do is this:

15:14:57217:0:-40:0:0:0:20

In other words: I want to keep the macroblock with its (structured) data as much as I can (at the very least the blocks in it which are set to 0 above) but change only the DC "starting" values. So not blanken it, but modifying it so these DC-values make sure propogation to other blocks works the same as now with -1 (therefore fixing the color and intensity of following blocks) but I also keep the current blocks inside the current macroblock.

Is something like that possible?

Regards,

arnezami
Hi guys,

I've created a new feature that implements the above.

Simply put you can now do something like this:

15:14:57217:0:-40:0:0:0:20

This will keep the macroblock (so not nullify it) but it will change the DC values. With this you can really fix the lum/chrom issues in the current I-frames. Should have a big effect. I hope.

[edit] Forgot: it now also logs the 6 DC values for each macroblock. Which can come in handy in combination with this new feature.

Also you can do this now:

15:14:57217:0:0:0:0:0:0:0
15:14:57217:0:0:0:0:0:0:63

What this does is change direction from where a DC prediction are coming/inherited from: left or top. For each 1-bit in this number it tells it to get the DC-prediction from the top for a specific DC (4 x lum, 2 x chrom). A 0-bit means left. The number consists of 6 bit (hence max is 63). So in effect a 0 is all left, a 63 is all top. I don't know yet whether this feature is going to be very useful but it can't hurt.

I've attached the 3 source files that were changed. If somebody can check if it doesn't conflict with the existing branch (and maybe test it) and then push it to github, that would be great. Its these files btw:

libavcodec/mpegvideo.h
libavcodec/mpeg4video.h
libavcodec/h263dec.c

[edit] Fixed a bug: when using -2 it would get stuck in that mode. zip re-uploaded.

I need some sleep now ;)

Regards,

arnezami

PS. I was also trying to make a reverse-macroblock finder of sorts. So instead of telling its starting bitposition you would tell it its ending bitposition (which would usually be a starting bitposition of a mb you already have). But I ran into a lot of issues. Even when you try all possible DC-directions it still won't find the right starting position. Even when I fake the right DC values left to the block. I'm not sure why yet. Probably something I don't understand yet. This part of the code is in there but its commented out. It was also extremely error-log-heavy...
« Last Edit: 05/15/2014 10:01 PM by arnezami »

Offline SwissCheese

  • Full Member
  • *
  • Posts: 164
  • Liked: 249
  • Likes Given: 80
Hi guys,

I just noticed that there are no B-frames in the file. Only I-frames and P-frames.

Quote
./ffprobe try1.ts -show_frames > frames.txt

Not sure yet what the implications are yet. Hopefully it will get easier that way.

Regards,

arnezami

That is very fortunate happenstance.  P frames will only have compound errors from any damage from the frames before them in the sequence.  B frames would have compound errors from both before and after them in the sequence.

Not really that 'fortunate' - there are very few if any real time encoders of the sort that would be used here that use B-frames. They need to encode in near real time, which means basing frames on frames that haven't yet appeared doesn't really work.  B-frames tends to work best when doing offline encoding. It does give a better compression ratio, but is more computationally expensive.

I am trying to recreate the full movie using the I-frames, a few "nice-looking" P-frames and fill the rest with empty P-frames. How can I convert a png into a P-frame? Or should I create the empty P-frames differently?

For the I-frames, I use for example:
ffmpeg -i frame_131.png -s 704x480 -aspect 22:15 -f image2 -r 44999/3003 frame131.mpg4-img
« Last Edit: 05/13/2014 05:30 PM by SwissCheese »

Offline theshadow27

  • Member
  • Posts: 28
  • Liked: 27
  • Likes Given: 7
Careful with this assumption - the continuity counter/sequence number is just 4 bits. It is not special nor immune to noise than any other four bits in the header or data. Yes, including it in the match will increase the chances of getting the right PID, but it's not a magic bullet in itself.
I don't think he's trying to say that, but rather merely pointing out that there's a second dimension from which the correct header can be derived, kind of like a macroblock referring to the one above or to the left. There's only four PIDs in the stream, and only two which occur frequently (0x03e8 and 0x1fff), so even with a completely trashed header like 0x5060F71, pulling these known quantities together you can figure out with a very high level of confidence that it's supposed to be 0x4703E814.

Of course, it might be 0x34, not 0x14, if it's an adaptation packet, but there's also context available to figure that out too - such as if the next byte after the header is greater than 184 (0xB8) since that would mean the adaptation field is longer than the rest of the packet and so either that byte is wrong or the 0x34 should be 0x14.

Now whether the likely completely trashed payload that goes with that completely trashed header will be of any use is another question, but at least we have the payload.

Of course this doesn't help much if you go too far beyond 16 packets in either direction (3008 bytes), but I don't think we've seen that much contiguous unrecognizable data anywhere in the transport stream.

Yes, and if null packets occurred one at a time (or even N at a time, N in known set) every ~M data packets, this would be useful, and I agree that looking at any particular segment you might believe you have recovered a header based on the counter, but it adds much less value than you might hope. The problem is that null packets do not occur in a predictable way. I ran some stats to illustrate..

The first graph shows the frequency of such runs vs their length. There is obviously some outliers resulting from problematic data-packet matching, but the variance between 1 and 32 looks real enough to me. Strings of 1 occur often, but percentage-wise (third graph) they only constitute 2.58% of null packets.

The second graph shows the distance between null packets. Excluding the case where d=0 (i.e. a continuous string per above, 5103 counts) and presuming that d=1..3 (120 counts) might be d=0 with over-opomistic data-packet identification, there is still a huge variance in the potential number of data packets before the next null packet.

The third graph is the most important, since it answers the question "suppose I know for sure I have N null packets, what is the chance that the next packet is also a null packet". The chance is pretty even to about N=38, meaning that the 16 values offered by the continuity counter could alias twice with equal probability.

That said, I did implement the counter into the packet-matcher data (see attached). As I expected, it does not seem to have made much of a difference.
« Last Edit: 05/13/2014 06:48 PM by theshadow27 »

Offline wronkiew

  • Full Member
  • *
  • Posts: 186
  • 34.502327, -116.971697
  • Liked: 105
  • Likes Given: 116
I am trying to recreate the full movie using the I-frames, a few "nice-looking" P-frames and fill the rest with empty P-frames. How can I convert a png into a P-frame? Or should I create the empty P-frames differently?

For the I-frames, I use for example:
ffmpeg -i frame_131.png -s 704x480 -aspect 22:15 -f image2 -r 44999/3003 frame131.mpg4-img

When you convert the p-frames to PNGs, some information about what to do with the blocks is lost. I don't think it is possible to reassemble the movie that way.

I think probably mlindner's idea to add a frame specifier to the mmb parameter is going to work best for contiguous sequences of i- and p-frames. In cases where we are missing p-frames, there is going to need to be some manual interpolation and reassembly.

Offline arnezami

  • Full Member
  • **
  • Posts: 282
  • Liked: 262
  • Likes Given: 341
Regarding the update of the ffmpeg:

I just fixed a bug: "when using -2 it would get stuck in that mode. zip re-uploaded".

Please re-download the new one above.
« Last Edit: 05/13/2014 07:40 PM by arnezami »

Offline mvpel

  • Full Member
  • ****
  • Posts: 1116
  • New Hampshire
  • Liked: 1280
  • Likes Given: 1676
Could pframes be used to help fine tune iframes? If you start with a high quality iframe like 6, and then look for spots where the pframes indicate no change, could you simply carry the block from the good iframe forward (or backward) or does the compression in the iframe make that impractical?
"Ugly programs are like ugly suspension bridges: they're much more liable to collapse than pretty ones, because the way humans (especially engineer-humans) perceive beauty is intimately related to our ability to process and understand complexity. A language that makes it hard to write elegant code makes it hard to write good code." - Eric S. Raymond

Offline moralec

Hi guys,

I just noticed that there are no B-frames in the file. Only I-frames and P-frames.

Quote
./ffprobe try1.ts -show_frames > frames.txt

Not sure yet what the implications are yet. Hopefully it will get easier that way.

Regards,

arnezami

That is very fortunate happenstance.  P frames will only have compound errors from any damage from the frames before them in the sequence.  B frames would have compound errors from both before and after them in the sequence.

Not really that 'fortunate' - there are very few if any real time encoders of the sort that would be used here that use B-frames. They need to encode in near real time, which means basing frames on frames that haven't yet appeared doesn't really work.  B-frames tends to work best when doing offline encoding. It does give a better compression ratio, but is more computationally expensive.

I am trying to recreate the full movie using the I-frames, a few "nice-looking" P-frames and fill the rest with empty P-frames. How can I convert a png into a P-frame? Or should I create the empty P-frames differently?

For the I-frames, I use for example:
ffmpeg -i frame_131.png -s 704x480 -aspect 22:15 -f image2 -r 44999/3003 frame131.mpg4-img

How does it look with just the iframes? care to join us in the IRC channel?

Tags: