Author Topic: LIVE: SpaceX Dragon CRS-1 (SpX-1) (EOM) Unberthing, Entry, Splashdown  (Read 124578 times)

Offline Space Pete

I'm pretty sure Shuttle had occasional issues with one of its triple redundant flight computers... Was the same "sky is falling" mentality expressed about that?

True, but Shuttle had five GPCs, all of which were rad hardened - making total loss much less likely.
NASASpaceflight ISS Writer

Offline mmeijeri

  • Senior Member
  • *****
  • Posts: 7772
  • Martijn Meijering
  • NL
  • Liked: 397
  • Likes Given: 822
Just a datapoint for everyone. I used to work at a company that used the RAD750 (for GLAST, now Fermi Gamma-ray Space Telescope). It is a rad-hard version of the PowerPC 750. Apple called it the PowerPC G3. It was used in the multi-color iMacs. It think it is still the top or near the top of the heap for rad-hard CPUs. It runs at 200MHz. It is used on Curiosity, Juno and many others. Cost per board was ~$200,000.

And Mongoose-V apparently costs only $20k-$40k, depending on the quantities ordered. Much less powerful than RAD750, but another interesting datapoint.

Given that these costs are no more than a rounding error on the price of a Dragon launch I don't understand why you wouldn't simply buy a bunch of rad hard processors. And that's not even counting the fact that Dragon is reusable.
Pro-tip: you don't have to be a jerk if someone doesn't agree with your theories

Offline Robotbeat

  • Senior Member
  • *****
  • Posts: 39359
  • Minnesota
  • Liked: 25388
  • Likes Given: 12164
I'm pretty sure Shuttle had occasional issues with one of its triple redundant flight computers... Was the same "sky is falling" mentality expressed about that?

True, but Shuttle had five GPCs, all of which were rad hardened - making total loss much less likely.
Yes, but they had a very low MTBF (where low is bad) rating compared to modern electronics: http://www.atarimagazines.com/compute/issue132/92_Space_shuttle_techno.php
« Last Edit: 11/16/2012 04:55 pm by Robotbeat »
Chris  Whoever loves correction loves knowledge, but he who hates reproof is stupid.

To the maximum extent practicable, the Federal Government shall plan missions to accommodate the space transportation services capabilities of United States commercial providers. US law http://goo.gl/YZYNt0

Offline Go4TLI

  • Full Member
  • ****
  • Posts: 816
  • Liked: 96
  • Likes Given: 0
Attacking strawmen is not "bringing balance."

Identifying true mistakes and errors /does/ bring balance.

+1.

"Because they have three computers, I don't see what the big deal is about losing one on an early mission. When you have a system in place to deal with failure gracefully, people shouldn't get their underwear in a bundle when a failure does occur."

That's a direct quote from you.  So this and everything else is "no big deal", etc.  Take your own advice. 

Offline alk3997

  • Full Member
  • ***
  • Posts: 380
  • Liked: 31
  • Likes Given: 27
I'm pretty sure Shuttle had occasional issues with one of its triple redundant flight computers... Was the same "sky is falling" mentality expressed about that?

True, but Shuttle had five GPCs, all of which were rad hardened - making total loss much less likely.
Yes, but they had a very low MTBF (where low is bad) rating compared to modern electronics: http://www.atarimagazines.com/compute/issue132/92_Space_shuttle_techno.php

That article is old and incorrect.  The AP-101S (not B) was speced at 10000 hrs MTBF and had reached 80000 hrs MTBF at the end of the program.  80000 hrs is over 9 years without a failure and much better than generic commercial processors/computers.  It is what is needed when someone's life depends upon the avionics working. 


Offline Robotbeat

  • Senior Member
  • *****
  • Posts: 39359
  • Minnesota
  • Liked: 25388
  • Likes Given: 12164
I'm pretty sure Shuttle had occasional issues with one of its triple redundant flight computers... Was the same "sky is falling" mentality expressed about that?

True, but Shuttle had five GPCs, all of which were rad hardened - making total loss much less likely.
Yes, but they had a very low MTBF (where low is bad) rating compared to modern electronics: http://www.atarimagazines.com/compute/issue132/92_Space_shuttle_techno.php

That article is old and incorrect.  The AP-101S (not B) was speced at 10000 hrs MTBF and had reached 80000 hrs MTBF at the end of the program.  80000 hrs is over 9 years without a failure and much better than generic commercial processors/computers.  It is what is needed when someone's life depends upon the avionics working. 


It did indeed experience failure: http://www.slashgear.com/space-shuttle-crew-gets-midnight-wakeup-over-computer-failure-15165274/

Not a huge, scary problem. There were five redundant ones. But it wasn't error-free, either.
Chris  Whoever loves correction loves knowledge, but he who hates reproof is stupid.

To the maximum extent practicable, the Federal Government shall plan missions to accommodate the space transportation services capabilities of United States commercial providers. US law http://goo.gl/YZYNt0

Offline Go4TLI

  • Full Member
  • ****
  • Posts: 816
  • Liked: 96
  • Likes Given: 0
I'm pretty sure Shuttle had occasional issues with one of its triple redundant flight computers... Was the same "sky is falling" mentality expressed about that?

True, but Shuttle had five GPCs, all of which were rad hardened - making total loss much less likely.
Yes, but they had a very low MTBF (where low is bad) rating compared to modern electronics: http://www.atarimagazines.com/compute/issue132/92_Space_shuttle_techno.php

That article is old and incorrect.  The AP-101S (not B) was speced at 10000 hrs MTBF and had reached 80000 hrs MTBF at the end of the program.  80000 hrs is over 9 years without a failure and much better than generic commercial processors/computers.  It is what is needed when someone's life depends upon the avionics working. 


It did indeed experience failure: http://www.slashgear.com/space-shuttle-crew-gets-midnight-wakeup-over-computer-failure-15165274/

Not a huge, scary problem. There were five redundant ones. But it wasn't error-free, either.

Come on.  As I said earlier problems do come up.  It is how they are dealt with that is important.  Yes, there was this issue.  However what is said above about the general failure rate and their overall reliability is quite correct.  With this experience on Dragon, are you attempting to claim that reliability does not matter as long as there is sufficient redundancy?

You seem to be arguing both sides of the fence here.  If you are going to point to this issue on orbiter, if the same approximate situation on Dragon occured on orbiter would it still be "no big deal"?

Offline MP99

Let me take this viewpoint: NASA purchased launch services from SpaceX, and in the process of doing so issued a number of requirements for SpaceX to conform to. Was the use of RAD-hardened computer equipment amongst those requirements?

If NO, then NASA apparently was short-sighted OR NASA expected other means of compensating for the increased radiation-induced malfunctions in-orbit to be sufficient. (Such as flying with added redundancy).

If YES, then SpaceX didn't perform as required OR the radiation environment in-orbit exceeds the requirement levels. IF the former turns out to be correct, one could wonder why NASA signed off on a flight that did not meet requirements.

But, the above is all big IF's.

ISTM the requirement should just be about a demonstrated reliability, not how a particular requirement is achieved. This especially given the "hands off" approach of COTS.

cheers, Martin

Offline Robotbeat

  • Senior Member
  • *****
  • Posts: 39359
  • Minnesota
  • Liked: 25388
  • Likes Given: 12164
I'm pretty sure Shuttle had occasional issues with one of its triple redundant flight computers... Was the same "sky is falling" mentality expressed about that?

True, but Shuttle had five GPCs, all of which were rad hardened - making total loss much less likely.
Yes, but they had a very low MTBF (where low is bad) rating compared to modern electronics: http://www.atarimagazines.com/compute/issue132/92_Space_shuttle_techno.php

That article is old and incorrect.  The AP-101S (not B) was speced at 10000 hrs MTBF and had reached 80000 hrs MTBF at the end of the program.  80000 hrs is over 9 years without a failure and much better than generic commercial processors/computers.  It is what is needed when someone's life depends upon the avionics working. 


It did indeed experience failure: http://www.slashgear.com/space-shuttle-crew-gets-midnight-wakeup-over-computer-failure-15165274/

Not a huge, scary problem. There were five redundant ones. But it wasn't error-free, either.

Come on.  As I said earlier problems do come up.  It is how they are dealt with that is important.  Yes, there was this issue.  However what is said above about the general failure rate and their overall reliability is quite correct.  With this experience on Dragon, are you attempting to claim that reliability does not matter as long as there is sufficient redundancy?

You seem to be arguing both sides of the fence here.  If you are going to point to this issue on orbiter, if the same approximate situation on Dragon occured on orbiter would it still be "no big deal"?
In both cases, it's no big deal. Sure, it's better not to have any, but we live in an imperfect world.

And yeah, sufficient redundancy can indeed compensate (in fact, usually MORE than compensates) for individual non-perfect reliability. Aggregate reliability is increased. That is why Shuttle went for 5 computers... they originally designed to a reliability rating of a MTBF of 5000 hours per computer. But having 5 computers meant they could still do their missions safely.

I do systems engineering for redundant computer systems as a part-time job.
« Last Edit: 11/16/2012 09:21 pm by Robotbeat »
Chris  Whoever loves correction loves knowledge, but he who hates reproof is stupid.

To the maximum extent practicable, the Federal Government shall plan missions to accommodate the space transportation services capabilities of United States commercial providers. US law http://goo.gl/YZYNt0

Offline alk3997

  • Full Member
  • ***
  • Posts: 380
  • Liked: 31
  • Likes Given: 27
In both cases, it's no big deal. Sure, it's better not to have any, but we live in an imperfect world.

And yeah, sufficient redundancy can indeed compensate (in fact, usually MORE than compensates) for individual non-perfect reliability. Aggregate reliability is increased. That is why Shuttle went for 5 computers... they originally designed to a reliability rating of a MTBF of 5000 hours per computer. But having 5 computers meant they could still do their missions safely.

I do systems engineering for redundant computer systems as a part-time job.

Historical numbers for GPCs - The AP-101B was spec'd at 1,000 hours MTBF and demonstrated 5,000 hours MTBF before being replaced by the AP-101S.  The AP-101S MTBF specs were previously discussed.  If you work out the odds of a failure of the AP-101B over a 16 day flight, 8 times a year, you'll see one of the reasons the AP-101S was needed.

From T-20 minutes until OPS 2 (on-orbit) start, we used a 4-GPC redundant set with 1 BFS.  This was also true from just after the start of OPS 3 (entry) to landing and wheels stop.  So, that would have been about 5 hours at most per flight.  Most of on-orbit was flown with a distributed set - 1 or 2 GPCs flying the orbiter (GN&C) and the other single GPC running the systems onboard.  Different software for each function.  It was also possible to have 3 GPC in GN&C but always only 1 in SM.  Why not more?  Because extra GPCs eat power and it wasn't necessary because the GPCs had such high reliability.

On the last Shuttle flight the GPC problem was actually an SEU that hit a register (not memory, but a register).  A few of the really smart Shuttle veterans got to work one last issue due to that SEU and show the high probability that it was a register SEU.  So there was a positive to it and it wasn't a destructive failure.

The SEU was a really unlucky hit (to hit the small space of a register) and was the second time in Shuttle history that a register was "hit".  The crew reconfigured one of the "off" GPCs for the SM function and went back to bed.  The next day they brought the computer that sufferred the SEU back-up and it worked perfectly and was brought back into full use for the rest of the flight.  The nine years on average between expected failures still stands even with that incident. 

« Last Edit: 11/17/2012 02:45 am by alk3997 »

Offline mmeijeri

  • Senior Member
  • *****
  • Posts: 7772
  • Martijn Meijering
  • NL
  • Liked: 397
  • Likes Given: 822
And yeah, sufficient redundancy can indeed compensate (in fact, usually MORE than compensates) for individual non-perfect reliability. Aggregate reliability is increased.

In fact, redundancy and voting systems (at the gate level) is one of the techniques used in rad hard processors. From what I've read other techniques are using different technologies like SoS, increasing insulation between gates through other means, error correcting codes for memory and registers and finally shielding.
Pro-tip: you don't have to be a jerk if someone doesn't agree with your theories

Offline anik

  • Global Moderator
  • Senior Member
  • *****
  • Posts: 7776
  • Liked: 955
  • Likes Given: 368

Offline mlindner

  • Software Engineer
  • Senior Member
  • *****
  • Posts: 2928
  • Space Capitalist
  • Silicon Valley, CA
  • Liked: 2240
  • Likes Given: 827
Posted on the SpX-2 General Discussion:

http://www.aviationweek.com/Blogs.aspx?plckBlogId=Blog:04ce340e-4b63-4d23-9695-d49ab661f385&plckPostId=Blog%3A04ce340e-4b63-4d23-9695-d49ab661f385Post%3Aa8b87703-93f9-4cdf-885f-9429605e14df

Dragon uses the same design principles as the Shuttle and Hubble.

So they have no intention of fixing anything for the rad hardened-ness of the computers and it also looks like they intend to keep using these indefinitely. In other words, they're going to fly to Mars without any rad-hardened parts. Awesome!
LEO is the ocean, not an island (let alone a continent). We create cruise liners to ride the oceans, not artificial islands in the middle of them. We need a physical place, which has physical resources, to make our future out there.

Offline Jim

  • Night Gator
  • Senior Member
  • *****
  • Posts: 37818
  • Cape Canaveral Spaceport
  • Liked: 22048
  • Likes Given: 430
In other words, they're going to fly to Mars without any rad-hardened parts. Awesome!

No, that was not said or inferred.

Offline mlindner

  • Software Engineer
  • Senior Member
  • *****
  • Posts: 2928
  • Space Capitalist
  • Silicon Valley, CA
  • Liked: 2240
  • Likes Given: 827
In other words, they're going to fly to Mars without any rad-hardened parts. Awesome!

No, that was not said or inferred.

Replied in SpX-2 Discussion.
LEO is the ocean, not an island (let alone a continent). We create cruise liners to ride the oceans, not artificial islands in the middle of them. We need a physical place, which has physical resources, to make our future out there.

Offline woods170

  • IRAS fan
  • Senior Member
  • *****
  • Posts: 12192
  • IRAS fan
  • The Netherlands
  • Liked: 18491
  • Likes Given: 12560
Come on.  As I said earlier problems do come up.  It is how they are dealt with that is important.  Yes, there was this issue.  However what is said above about the general failure rate and their overall reliability is quite correct.  With this experience on Dragon, are you attempting to claim that reliability does not matter as long as there is sufficient redundancy?

You seem to be arguing both sides of the fence here.  If you are going to point to this issue on orbiter, if the same approximate situation on Dragon occured on orbiter would it still be "no big deal"?
Yes, no big deal. They had similar experience on Shuttle, during the first Hubble repair mission, and it was no big deal, exactly because of the level of redundancy and the way the redundancy logic is applied.
Read this: http://www.aviationweek.com/Blogs.aspx?plckBlogId=Blog:04ce340e-4b63-4d23-9695-d49ab661f385&plckPostId=Blog%3A04ce340e-4b63-4d23-9695-d49ab661f385Post%3Aa8b87703-93f9-4cdf-885f-9429605e14df

Offline alk3997

  • Full Member
  • ***
  • Posts: 380
  • Liked: 31
  • Likes Given: 27
Come on.  As I said earlier problems do come up.  It is how they are dealt with that is important.  Yes, there was this issue.  However what is said above about the general failure rate and their overall reliability is quite correct.  With this experience on Dragon, are you attempting to claim that reliability does not matter as long as there is sufficient redundancy?

You seem to be arguing both sides of the fence here.  If you are going to point to this issue on orbiter, if the same approximate situation on Dragon occured on orbiter would it still be "no big deal"?
Yes, no big deal. They had similar experience on Shuttle, during the first Hubble repair mission, and it was no big deal, exactly because of the level of redundancy and the way the redundancy logic is applied.
Read this: http://www.aviationweek.com/Blogs.aspx?plckBlogId=Blog:04ce340e-4b63-4d23-9695-d49ab661f385&plckPostId=Blog%3A04ce340e-4b63-4d23-9695-d49ab661f385Post%3Aa8b87703-93f9-4cdf-885f-9429605e14df

Good reference and John's explanations are good.  Two notes:

1) ISS uses their commercial ThinkPad suite as user input/output (UI) devices.  Rad tolerant MDMs (the main computers) use i386s with protected memory.  The user input laptops talk to the MDMs which make sure the commands are acceptable before executing.  So, there are rad tolerant/rad hardenned processors on ISS doing the actual work.

2) The part about Hubble was no different than any other Shuttle flight.  There was an SEU counter available on each GPC.  That is what John was watching.  It was the number of times the rad tolerant memory had detected an SEU and corrected for it.  All SEUs that affected a single bit in a byte could be corrected.  No double error in a byte was never detected (a double error could be detected but not corrected).  This is, of course, for the AP-101S.  But, to say we watched the bit errors "rolling" doesn't really explain that this was expected behavior and corrected by the hardware.  So, that part of the article wasn't really a good explanation and had nothing to do with redundancy or redundancy logic but rather high quality, low fail designs.

We started collected SEU data on the early ThinkPads during STS-61.  My masters thesis used this data, as well as other Shuttle and Mir flight data, for assessing the impacts of spaceflight on small portable computers.  The number of SEUs increases with LEO altitude and (although not as much) with inclination increase.

« Last Edit: 11/19/2012 12:57 pm by alk3997 »

Offline edkyle99

  • Expert
  • Senior Member
  • *****
  • Posts: 15502
    • Space Launch Report
  • Liked: 8788
  • Likes Given: 1386
I'm pretty sure Shuttle had occasional issues with one of its triple redundant flight computers... Was the same "sky is falling" mentality expressed about that?

True, but Shuttle had five GPCs, all of which were rad hardened - making total loss much less likely.
The following paper describes the improved GPCs and single-event upset events on their non-radiation hardened SRAM chips.  Note that software was included to constantly check for upset events, to do error checking and correction, etc.  I wonder if SpaceX does all of that.

http://klabs.org/DEI/Processor/shuttle/oneill_94.pdf

 - Ed Kyle

Offline MikeAtkinson

  • Full Member
  • ****
  • Posts: 1980
  • Bracknell, England
  • Liked: 784
  • Likes Given: 120
I'm pretty sure Shuttle had occasional issues with one of its triple redundant flight computers... Was the same "sky is falling" mentality expressed about that?

True, but Shuttle had five GPCs, all of which were rad hardened - making total loss much less likely.
The following paper describes the improved GPCs and single-event upset events on their non-radiation hardened SRAM chips.  Note that software was included to constantly check for upset events, to do error checking and correction, etc.  I wonder if SpaceX does all of that.

http://klabs.org/DEI/Processor/shuttle/oneill_94.pdf

 - Ed Kyle

According to  John Muratore, SpaceX director of vehicle certification (http://www.aviationweek.com/Blogs.aspx?plckBlogId=Blog:04ce340e-4b63-4d23-9695-d49ab661f385&plckPostId=Blog%3A04ce340e-4b63-4d23-9695-d49ab661f385Post%3Aa8b87703-93f9-4cdf-885f-9429605e14df)

Quote
Think of a computer as lots of white marbles that are arranged in a specific pattern on a table, and a black marble comes in and knocks one of the white marbles out of place. Now, the memories of our computers are constantly checking for that happening. So if we take a hit in our most dense part of our computer – the memory – the computer detects it and repairs it and there's no harm done. But our other circuits in the computer, places like where we're bringing information in and out of the processor, if we take a hit there it can cause basically a bit to flip from a zero to a one. And that instruction can be wrong, and that is where the two processors in a single computer element voting on each other can detect that, and it can force a reboot. And that's what happened, we rebooted the computer.

They use 3 computers each of which consists of dual memory and processor in a processing unit. There are 18 of these processing units on Dragon.

Offline alk3997

  • Full Member
  • ***
  • Posts: 380
  • Liked: 31
  • Likes Given: 27
I'm pretty sure Shuttle had occasional issues with one of its triple redundant flight computers... Was the same "sky is falling" mentality expressed about that?

True, but Shuttle had five GPCs, all of which were rad hardened - making total loss much less likely.
The following paper describes the improved GPCs and single-event upset events on their non-radiation hardened SRAM chips.  Note that software was included to constantly check for upset events, to do error checking and correction, etc.  I wonder if SpaceX does all of that.

http://klabs.org/DEI/Processor/shuttle/oneill_94.pdf

 - Ed Kyle

I wouldn't call that software.  That was done outside of the software's "knowledge".  It was a dedicated the system that constantly scrubbed memory.  Maybe system-level routines would be a better choice, although that's not entirely accurate either.  It ran much faster than software of its era could do.
« Last Edit: 11/19/2012 02:11 pm by alk3997 »

Tags:
 

Advertisement NovaTech
Advertisement Northrop Grumman
Advertisement
Advertisement Margaritaville Beach Resort South Padre Island
Advertisement Brady Kenniston
Advertisement NextSpaceflight
Advertisement Nathan Barker Photography
0