Author Topic: SpaceX CRS-1 Software/Computer Design Discussion Thread  (Read 36092 times)

Offline GalacticIntruder

  • Full Member
  • ****
  • Posts: 512
  • Pet Peeve:I hate the word Downcomer. Ban it.
  • Huntsville, AL
  • Liked: 247
  • Likes Given: 70
SpX says rad problems expected. much ado about nothing. Their non hardened, but tolerant systems approach are superior. I find it hard to believe they could do that in a Mars transit, but they have a plan. I just have to take their word for it.


http://www.aviationweek.com/Blogs.aspx?plckBlogId=Blog:04ce340e-4b63-4d23-9695-d49ab661f385&plckPostId=Blog%3a04ce340e-4b63-4d23-9695-d49ab661f385Post%3aa8b87703-93f9-4cdf-885f-9429605e14df
« Last Edit: 11/20/2012 07:25 pm by GalacticIntruder »
"And now the Sun will fade, All we are is all we made." Breaking Benjamin

Offline Go4TLI

  • Full Member
  • ****
  • Posts: 816
  • Liked: 96
  • Likes Given: 0
SpaceX CRS-1 Software/Computer Design Discussion Thread
« Reply #1 on: 11/19/2012 09:37 pm »
Yes, you have an opinion on everthing and seem to act like an expert on just about everything as well.  As I said, carry on.   
To quote Jim: Pot, meet kettle.

:) 

But I have actually been part of a team that has developed, tested, engineered, built, and flown hardware in space and brought it back on multiple programs for about 15 years now. 

I don't claim to be an expert on everything and I don't comment where I don't have insight, hence why I am void from certain parts of this forum or on certain subject matter, unlike some. 

Don't mistake experience for anything else but what it is.  If you and others don't like it, don't read it or don't comment on it.  Hell, it's not like I don't have other things to do anyway if true experience is not welcome here.
« Last Edit: 11/19/2012 09:44 pm by Go4TLI »

Offline Nomadd

  • Senior Member
  • *****
  • Posts: 8840
  • Lower 48
  • Liked: 60430
  • Likes Given: 1305
SpaceX CRS-1 Software/Computer Design Discussion Thread
« Reply #2 on: 11/20/2012 01:08 pm »
 

Don't mistake experience for anything else but what it is.  If you and others don't like it, don't read it or don't comment on it.  Hell, it's not like I don't have other things to do anyway if true experience is not welcome here.

 Your posts are always welcome as far as I'm concerned.
 I agree with you. Using redundancy can be a good way to increase reliability, but it's often used as a way to slack off on standards. I've seen a lot of equipment and more than one life lost because of redundancy induced complacency. (A phrase I just invented for this post)
 Saying that one computer failing isn't a big deal because there are two more is a great way to insure a LOM. It's no more acceptable than ignoring the loss on an engine because you still had 8 working ones.
 Knowing SpaceX they'll get into the radiation hardened electronics business now. Probably mine their own silicon.
Those who danced were thought to be quite insane by those who couldn't hear the music.

Offline JBF

  • Full Member
  • ****
  • Posts: 1459
  • Liked: 472
  • Likes Given: 914
SpaceX CRS-1 Software/Computer Design Discussion Thread
« Reply #3 on: 11/20/2012 01:34 pm »
 

Don't mistake experience for anything else but what it is.  If you and others don't like it, don't read it or don't comment on it.  Hell, it's not like I don't have other things to do anyway if true experience is not welcome here.

 Your posts are always welcome as far as I'm concerned.
 I agree with you. Using redundancy can be a good way to increase reliability, but it's often used as a way to slack off on standards. I've seen a lot of equipment and more than one life lost because of redundancy induced complacency. (A phrase I just invented for this post)
 Saying that one computer failing isn't a big deal because there are two more is a great way to insure a LOM. It's no more acceptable than ignoring the loss on an engine because you still had 8 working ones.
 Knowing SpaceX they'll get into the radiation hardened electronics business now. Probably mine their own silicon.

They didn't loose a computer it rebooted just fine. NASA just didn't want them to resysnc it while it was at the ISS due to the complication of explaining it to all the partners. According to what's been reported elsewhere SpaceX just plans to make the resyncing an automatic operation.
"In principle, rocket engines are simple, but that’s the last place rocket engines are ever simple." Jeff Bezos

Offline Go4TLI

  • Full Member
  • ****
  • Posts: 816
  • Liked: 96
  • Likes Given: 0
SpaceX CRS-1 Software/Computer Design Discussion Thread
« Reply #4 on: 11/20/2012 01:52 pm »
They didn't loose a computer it rebooted just fine. NASA just didn't want them to resysnc it while it was at the ISS due to the complication of explaining it to all the partners. According to what's been reported elsewhere SpaceX just plans to make the resyncing an automatic operation.

This is from some of the reports on what went wrong:

•One of three flight computers failed while Dragon was docked at ISS due to a suspected radiation hit. The computer was restarted but could not re-synchronize with the other two units. The computer was restarted but was not resynchronized with the other two units. SpaceX says that NASA felt it was not necessary to continue the mission.
•One of three GPS units, the Propulsion and Trunk computers and Ethernet switch also experienced suspected radiation hits, but they were recovered during a power cycle.

This is for a about a 2 week long flight where the majority of the time it was not doing anything and just attached to ISS. 

While these anomolies were managed, on a more active flight it becomes an unnecessary distraction and issues that must be worked.  It takes away from the reason for the flight in the first place and, law of averages suggest, that someday they will have to deal with a more serious flight issue. 

These issues that crop up because their electronics are unreliable or under the constant threat of being disrupted by rad hits will make it much more difficult and just inputs additional risk into the mission for little reason as far as I can tell. 

Offline mlindner

  • Software Engineer
  • Senior Member
  • *****
  • Posts: 2908
  • Space Capitalist
  • Silicon Valley, CA
  • Liked: 2204
  • Likes Given: 818
SpaceX CRS-1 Software/Computer Design Discussion Thread
« Reply #5 on: 11/20/2012 01:53 pm »
Knowing SpaceX they'll get into the radiation hardened electronics business now. Probably mine their own silicon.

I've noticed a problem on this board that while everyone here is _very_ knowledgable on rocket technology, there seems to be quite an absence of knowledge of how the integrated circuit industry works and the design processes involved. (Not picking on anyone specifically.)

There is a reason a giant like Apple had to _buy_ a company to make the _designs_ of the hardware. Even more so, the company they bought, itself, _buys_ parts of the design of the hardware from ARM. And then even then, they still don't actually burn anything to silicon, they contract that out to Samsung (oh the irony).

There is no way in hell SpaceX will actually start manufacturing its own integrated circuits. They said they don't even manufacture their own printed circuit boards in house when I asked them about it at the career fair here a few months ago.

The whole idea of making rad-hardened extreme-expense parts is part of the era of too-big-to-fail philosophy. Using enough computation and redundancy you can automatically correct and adjust for failure in hardware.
« Last Edit: 11/20/2012 01:56 pm by mlindner »
LEO is the ocean, not an island (let alone a continent). We create cruise liners to ride the oceans, not artificial islands in the middle of them. We need a physical place, which has physical resources, to make our future out there.

Offline mlindner

  • Software Engineer
  • Senior Member
  • *****
  • Posts: 2908
  • Space Capitalist
  • Silicon Valley, CA
  • Liked: 2204
  • Likes Given: 818
SpaceX CRS-1 Software/Computer Design Discussion Thread
« Reply #6 on: 11/20/2012 02:00 pm »
They didn't loose a computer it rebooted just fine. NASA just didn't want them to resysnc it while it was at the ISS due to the complication of explaining it to all the partners. According to what's been reported elsewhere SpaceX just plans to make the resyncing an automatic operation.
These issues that crop up because their electronics are unreliable or under the constant threat of being disrupted by rad hits will make it much more difficult and just inputs additional risk into the mission for little reason as far as I can tell. 

I suggest you read this url.
« Last Edit: 11/20/2012 02:01 pm by mlindner »
LEO is the ocean, not an island (let alone a continent). We create cruise liners to ride the oceans, not artificial islands in the middle of them. We need a physical place, which has physical resources, to make our future out there.

Offline Go4TLI

  • Full Member
  • ****
  • Posts: 816
  • Liked: 96
  • Likes Given: 0
SpaceX CRS-1 Software/Computer Design Discussion Thread
« Reply #7 on: 11/20/2012 02:04 pm »
The whole idea of making rad-hardened extreme-expense parts is part of the era of too-big-to-fail philosophy. Using enough computation and redundancy you can automatically correct and adjust for failure in hardware.

For the rest of your post, I have to say I'm pretty sure the person you were responding to was saying tongue-in-cheek. 

Now for this, please see my previous post.  I would not call rad-hardened components "extreme expense".  It takes a certain amount of money and effort to qualify components to be rad-hardened but once they are, they are only moderately more expensive than non rad-hardened. 

This is why rad-hardened electronics lag the industry here on the ground.  Trade that against buying all the parts necessary to add whatever layer of redundancy to compensate for non-reliability and it is a wash or your solution is more expensive. 

Finally your "too-big-to-fail" line is inaccurate at best.  You want to imply that this is a product of by-gone era but on the flip side being cavalier about it is equally disturbing.  Especially for a company that claims it will send people, etc to Mars. 

Perhaps there are other ways of doing things, never suggested otherwise, but discounting hard-won experience operating in the most dangerous environment possible is naive. 

Offline alk3997

  • Full Member
  • ***
  • Posts: 380
  • Liked: 31
  • Likes Given: 27
SpaceX CRS-1 Software/Computer Design Discussion Thread
« Reply #8 on: 11/20/2012 02:31 pm »
I think I'll make this my last append on the subject since I only got interested because someone was misquoting GPC specs.  However, I'd like to make a few points, if you don't mind...

1) This is particle radiation that is impacting memory, registers, etc.  Particle radiation comes in different flavors, whether that be protons or heavy ions or positrons.  Each generates a different type of effect.  For instance, a heavy ion can impact multiple memory cells.  These particles are moving fast and are high energy for the extremely small size of the particle.  Some equipment can be pretty immune to protons and very soft to heavy ions.

2) There are a few rules-of-thumb.  First is that if you see a pattern, it isn't a radiation hit (I think I may have written that previously).  Second, an SEU can do anything that bad programming can do.  So when you are looking for effects of SEUs, just imagine if a compiler changes a random 1 to a 0 when creating the object code.  What will that do?  Maybe someone can work out the different probabilities.  Also writing to a location will remove the SEU.

3) There are no absolutes.  An SEU is a random event.  Where it occurs in memory is also a random location.  So, to say that an SEU will occur at this particular time is incorrect.  You can average them or say that over a flight you'll get this many *on average*, but being more specific than that is not possible.

4) A slight clarification to #1 and #3 is that any good discussion of SEUs in LEO must include the South Atlantic Anomaly.  More proton events occur over the SAA than anywhere else.  We had 6 ThinkPads go down nearly simultaneously while over the SAA on one flight.  Guess which type of Shuttle flight that was?  (hint: high altitude)

5) The LEO environment is much different than beyond the Van Allen belts.

6) We haven't even discussed latch-ups rather than SEUs.  Latch-ups are always a possibility (although small in LEO for memory).

Hope all of that helps your informed discussions.

Andy
« Last Edit: 11/20/2012 02:32 pm by alk3997 »

Online mmeijeri

  • Senior Member
  • *****
  • Posts: 7772
  • Martijn Meijering
  • NL
  • Liked: 397
  • Likes Given: 822
SpaceX CRS-1 Software/Computer Design Discussion Thread
« Reply #9 on: 11/20/2012 02:34 pm »
Latch-ups are always a possibility (although small in LEO for memory).

Isn't Silicon on Insulator supposed to be immune to latchups?
Pro-tip: you don't have to be a jerk if someone doesn't agree with your theories

Offline Robotbeat

  • Senior Member
  • *****
  • Posts: 39270
  • Minnesota
  • Liked: 25240
  • Likes Given: 12115
SpaceX CRS-1 Software/Computer Design Discussion Thread
« Reply #10 on: 11/20/2012 02:35 pm »
... I've seen a lot of equipment and more than one life lost because of redundancy induced complacency. (A phrase I just invented for this post)...
Same argument applies to ANY reliability increase, does it not?

With added redundancy, you have the additional benefit of more "near misses," which gives you more opportunities to improve the system. Without redundancy, you either succeed or you fail hard, much fewer near misses.

For instance, SpaceX is going to improve their error-handling capability for the next mission, making resyncing automatic. This will make them more robust to future problems.
« Last Edit: 11/20/2012 02:37 pm by Robotbeat »
Chris  Whoever loves correction loves knowledge, but he who hates reproof is stupid.

To the maximum extent practicable, the Federal Government shall plan missions to accommodate the space transportation services capabilities of United States commercial providers. US law http://goo.gl/YZYNt0

Offline Go4TLI

  • Full Member
  • ****
  • Posts: 816
  • Liked: 96
  • Likes Given: 0
SpaceX CRS-1 Software/Computer Design Discussion Thread
« Reply #11 on: 11/20/2012 02:40 pm »

With added redundancy, you have the additional benefit of more "near misses," which gives you more opportunities to improve the system. Without redundancy, you either succeed or you fail hard, much fewer near misses.


As far as I can tell not a single person has suggested there should be no redundancy.  Redundancy is obvious thing to have.

The discussion is wheter or not unlimted redundancy should be the anwer-all for unreliability. 

I suggest there is a middle ground where reliability is high but redundancy is there because things happen. 

Offline Robotbeat

  • Senior Member
  • *****
  • Posts: 39270
  • Minnesota
  • Liked: 25240
  • Likes Given: 12115
SpaceX CRS-1 Software/Computer Design Discussion Thread
« Reply #12 on: 11/20/2012 02:42 pm »

With added redundancy, you have the additional benefit of more "near misses," which gives you more opportunities to improve the system. Without redundancy, you either succeed or you fail hard, much fewer near misses.


As far as I can tell not a single person has suggested there should be no redundancy.  Redundancy is obvious thing to have.

The discussion is wheter or not unlimted redundancy should be the anwer-all for unreliability. 

I suggest there is a middle ground where reliability is high but redundancy is there because things happen. 
Nobody disagrees with you on the idea there should be a middle ground. Or at least, nobody should.
Chris  Whoever loves correction loves knowledge, but he who hates reproof is stupid.

To the maximum extent practicable, the Federal Government shall plan missions to accommodate the space transportation services capabilities of United States commercial providers. US law http://goo.gl/YZYNt0

Offline Go4TLI

  • Full Member
  • ****
  • Posts: 816
  • Liked: 96
  • Likes Given: 0
SpaceX CRS-1 Software/Computer Design Discussion Thread
« Reply #13 on: 11/20/2012 02:49 pm »
Nobody disagrees with you on the idea there should be a middle ground. Or at least, nobody should.

"You can go a LONG ways just adding redundancy and still end up with a much /more/ reliable system even with FEWER reliable components. This is more true with computer systems than it is for other engineered systems."

That's a quote from you.  I read that as suggesting reliability is unimportant as long as there is sufficient redundancy to compensate. 

The problem is how deep that redundancy has to be is likely a variable that is a function of the issue at hand the operations being performed at the time. 

Offline JBF

  • Full Member
  • ****
  • Posts: 1459
  • Liked: 472
  • Likes Given: 914
SpaceX CRS-1 Software/Computer Design Discussion Thread
« Reply #14 on: 11/20/2012 02:50 pm »
This is very off topic, but as someone who designs PCB, no one except the very largest manufacturers etch their own PCBs; and unless you are manufacturing at least several hundred assemblies a month it's generally not worth it to do your own component placement either.

The primary reason rad-hardened components are so expensive is scale of manufacture. The smaller the run of silicon wafers the more expensive the set-up time is per wafer.


There is no way in hell SpaceX will actually start manufacturing its own integrated circuits. They said they don't even manufacture their own printed circuit boards in house when I asked them about it at the career fair here a few months ago.


"In principle, rocket engines are simple, but that’s the last place rocket engines are ever simple." Jeff Bezos

Offline mlindner

  • Software Engineer
  • Senior Member
  • *****
  • Posts: 2908
  • Space Capitalist
  • Silicon Valley, CA
  • Liked: 2204
  • Likes Given: 818
SpaceX CRS-1 Software/Computer Design Discussion Thread
« Reply #15 on: 11/20/2012 02:53 pm »

With added redundancy, you have the additional benefit of more "near misses," which gives you more opportunities to improve the system. Without redundancy, you either succeed or you fail hard, much fewer near misses.


As far as I can tell not a single person has suggested there should be no redundancy.  Redundancy is obvious thing to have.

The discussion is wheter or not unlimted redundancy should be the anwer-all for unreliability. 

I suggest there is a middle ground where reliability is high but redundancy is there because things happen. 
Nobody disagrees with you on the idea there should be a middle ground. Or at least, nobody should.

I agree with this as well. The problem is there is a very large range of possible middle grounds. The inflection point could be much further out than conventionally thought. I trust SpaceX to do this calculation. My personal belief though is that there has to be a better solution rather than using 15 year old technology.

Following Moore's law there have been 10 doubling's in transistor density since then. Which implies a roughly 1024x fold increase in computation power since then. Meaning assuming you do distributed computing (even more radiation prone) and assuming that distributed computing scales linearly (it doesn't) you need roughly 1000 of these processors to get to the speed of one modern processor.
« Last Edit: 11/20/2012 03:00 pm by mlindner »
LEO is the ocean, not an island (let alone a continent). We create cruise liners to ride the oceans, not artificial islands in the middle of them. We need a physical place, which has physical resources, to make our future out there.

Offline Robotbeat

  • Senior Member
  • *****
  • Posts: 39270
  • Minnesota
  • Liked: 25240
  • Likes Given: 12115
SpaceX CRS-1 Software/Computer Design Discussion Thread
« Reply #16 on: 11/20/2012 03:02 pm »
Nobody disagrees with you on the idea there should be a middle ground. Or at least, nobody should.

"You can go a LONG ways just adding redundancy and still end up with a much /more/ reliable system even with FEWER reliable components. This is more true with computer systems than it is for other engineered systems."

That's a quote from you.  I read that as suggesting reliability is unimportant as long as there is sufficient redundancy to compensate. 
That is something that certainly isn't true for many engineered systems but for computer systems, it really can be true. It takes more software development prowess, which can often turn out to be more expensive than just throwing hardware (i.e. better rad-hard capabilities) at it, but it essentially is true.

Quote
The problem is how deep that redundancy has to be is likely a variable that is a function of the issue at hand the operations being performed at the time. 

Your statement is more true for mechanical systems than it is for modern computer systems, owing to the fact that there have been several, several orders of magnitude improvements in capabilities beyond what's strictly necessary. Perhaps a factor of a million greater than strictly necessary.

Another point is that while throwing redundancy at the problem can indeed solve almost any reliability issue, the cost of such a decision may be an order of magnitude increase in complexity and software development costs.
« Last Edit: 11/20/2012 03:40 pm by Robotbeat »
Chris  Whoever loves correction loves knowledge, but he who hates reproof is stupid.

To the maximum extent practicable, the Federal Government shall plan missions to accommodate the space transportation services capabilities of United States commercial providers. US law http://goo.gl/YZYNt0

Offline mlindner

  • Software Engineer
  • Senior Member
  • *****
  • Posts: 2908
  • Space Capitalist
  • Silicon Valley, CA
  • Liked: 2204
  • Likes Given: 818
SpaceX CRS-1 Software/Computer Design Discussion Thread
« Reply #17 on: 11/20/2012 03:11 pm »
Another point is that while throwing redundancy at the problem can indeed solve almost any reliability issue, the cost of such a decision may be an order of magnitude increase in complexity and software development costs.

A properly designed system should be plug-n-play being able to drop any number of computational "modules" into the loop as you want. As long as they didn't hardcode the 3 modules of 2 computing units architecture, it may already be capable of dropping additional modules into the loop.
« Last Edit: 11/20/2012 03:12 pm by mlindner »
LEO is the ocean, not an island (let alone a continent). We create cruise liners to ride the oceans, not artificial islands in the middle of them. We need a physical place, which has physical resources, to make our future out there.

Online dunderwood

  • Full Member
  • *
  • Posts: 158
  • Liked: 7
  • Likes Given: 6
SpaceX CRS-1 Software/Computer Design Discussion Thread
« Reply #18 on: 11/20/2012 03:33 pm »

A properly designed system should be plug-n-play being able to drop any number of computational "modules" into the loop as you want. As long as they didn't hardcode the 3 modules of 2 computing units architecture, it may already be capable of dropping additional modules into the loop.

This sounds like something a software engineer would say :)

Once you start adding hardware inputs/outputs into the equation, it becomes much harder to abstract your 'modules' in such a way. 

Offline Go4TLI

  • Full Member
  • ****
  • Posts: 816
  • Liked: 96
  • Likes Given: 0
SpaceX CRS-1 Software/Computer Design Discussion Thread
« Reply #19 on: 11/20/2012 05:10 pm »
Another point is that while throwing redundancy at the problem can indeed solve almost any reliability issue, the cost of such a decision may be an order of magnitude increase in complexity and software development costs.

And there you go.  So redundancy alone is not the fix-all you were suggesting.

Also redundancy does NOT solve reliability issues.  It is a mitigation to the fact that one has poor reliability, that is a major difference. 

And operationally speaking, if one takes a step back and looks at the big picture of why these systems and vehicles exist, there are phases of potential mission scenarios where it is not optimal to have to assume one has poor reliability and then rely solely on redundancies that may require crew/ground input at less then ideal times and/or circumstances
« Last Edit: 11/20/2012 05:28 pm by Go4TLI »

Tags:
 

Advertisement NovaTech
Advertisement Northrop Grumman
Advertisement
Advertisement Margaritaville Beach Resort South Padre Island
Advertisement Brady Kenniston
Advertisement NextSpaceflight
Advertisement Nathan Barker Photography
1