Quote from: Go4TLI on 11/19/2012 04:22 pmYes, you have an opinion on everthing and seem to act like an expert on just about everything as well. As I said, carry on. To quote Jim: Pot, meet kettle.
Yes, you have an opinion on everthing and seem to act like an expert on just about everything as well. As I said, carry on.
Don't mistake experience for anything else but what it is. If you and others don't like it, don't read it or don't comment on it. Hell, it's not like I don't have other things to do anyway if true experience is not welcome here.
Quote from: Go4TLI on 11/19/2012 09:37 pm Don't mistake experience for anything else but what it is. If you and others don't like it, don't read it or don't comment on it. Hell, it's not like I don't have other things to do anyway if true experience is not welcome here. Your posts are always welcome as far as I'm concerned. I agree with you. Using redundancy can be a good way to increase reliability, but it's often used as a way to slack off on standards. I've seen a lot of equipment and more than one life lost because of redundancy induced complacency. (A phrase I just invented for this post) Saying that one computer failing isn't a big deal because there are two more is a great way to insure a LOM. It's no more acceptable than ignoring the loss on an engine because you still had 8 working ones. Knowing SpaceX they'll get into the radiation hardened electronics business now. Probably mine their own silicon.
They didn't loose a computer it rebooted just fine. NASA just didn't want them to resysnc it while it was at the ISS due to the complication of explaining it to all the partners. According to what's been reported elsewhere SpaceX just plans to make the resyncing an automatic operation.
Knowing SpaceX they'll get into the radiation hardened electronics business now. Probably mine their own silicon.
Quote from: JBF on 11/20/2012 01:34 pmThey didn't loose a computer it rebooted just fine. NASA just didn't want them to resysnc it while it was at the ISS due to the complication of explaining it to all the partners. According to what's been reported elsewhere SpaceX just plans to make the resyncing an automatic operation.These issues that crop up because their electronics are unreliable or under the constant threat of being disrupted by rad hits will make it much more difficult and just inputs additional risk into the mission for little reason as far as I can tell.
The whole idea of making rad-hardened extreme-expense parts is part of the era of too-big-to-fail philosophy. Using enough computation and redundancy you can automatically correct and adjust for failure in hardware.
Latch-ups are always a possibility (although small in LEO for memory).
... I've seen a lot of equipment and more than one life lost because of redundancy induced complacency. (A phrase I just invented for this post)...
With added redundancy, you have the additional benefit of more "near misses," which gives you more opportunities to improve the system. Without redundancy, you either succeed or you fail hard, much fewer near misses.
Quote from: Robotbeat on 11/20/2012 02:35 pmWith added redundancy, you have the additional benefit of more "near misses," which gives you more opportunities to improve the system. Without redundancy, you either succeed or you fail hard, much fewer near misses.As far as I can tell not a single person has suggested there should be no redundancy. Redundancy is obvious thing to have.The discussion is wheter or not unlimted redundancy should be the anwer-all for unreliability. I suggest there is a middle ground where reliability is high but redundancy is there because things happen.
Nobody disagrees with you on the idea there should be a middle ground. Or at least, nobody should.
There is no way in hell SpaceX will actually start manufacturing its own integrated circuits. They said they don't even manufacture their own printed circuit boards in house when I asked them about it at the career fair here a few months ago.
Quote from: Go4TLI on 11/20/2012 02:40 pmQuote from: Robotbeat on 11/20/2012 02:35 pmWith added redundancy, you have the additional benefit of more "near misses," which gives you more opportunities to improve the system. Without redundancy, you either succeed or you fail hard, much fewer near misses.As far as I can tell not a single person has suggested there should be no redundancy. Redundancy is obvious thing to have.The discussion is wheter or not unlimted redundancy should be the anwer-all for unreliability. I suggest there is a middle ground where reliability is high but redundancy is there because things happen. Nobody disagrees with you on the idea there should be a middle ground. Or at least, nobody should.
Quote from: Robotbeat on 11/20/2012 02:42 pmNobody disagrees with you on the idea there should be a middle ground. Or at least, nobody should."You can go a LONG ways just adding redundancy and still end up with a much /more/ reliable system even with FEWER reliable components. This is more true with computer systems than it is for other engineered systems."That's a quote from you. I read that as suggesting reliability is unimportant as long as there is sufficient redundancy to compensate.
The problem is how deep that redundancy has to be is likely a variable that is a function of the issue at hand the operations being performed at the time.
Another point is that while throwing redundancy at the problem can indeed solve almost any reliability issue, the cost of such a decision may be an order of magnitude increase in complexity and software development costs.
A properly designed system should be plug-n-play being able to drop any number of computational "modules" into the loop as you want. As long as they didn't hardcode the 3 modules of 2 computing units architecture, it may already be capable of dropping additional modules into the loop.
Quote from: Robotbeat on 11/20/2012 03:02 pmAnother point is that while throwing redundancy at the problem can indeed solve almost any reliability issue, the cost of such a decision may be an order of magnitude increase in complexity and software development costs.And there you go. So redundancy alone not is the fix-all you were suggesting.
Also redundancy does NOT solve reliability issues. It is a mitigation to the fact that one has poor reliability, that is a major difference. ...
I'd love to hear more details about why SpaceX went with non-hardened components. So far I've heard C++ and Linux as an explanation, which I don't find very convincing since both work just fine on several rad-hard processors. There must be more to it.
And operationally speaking, if one takes a step back and looks at the big picture of why these systems and vehicles exist, there are phases of potential mission scenarios where it is not optimal to have to assume one has poor reliability and then rely solely on redundancies that may require crew/ground input at less then ideal times and/or circumstances
Could ITAr or other limitations be one of the reason not to choose rad-hardened hardware?Especially if Musk mentioned that ultimatly, he could try to sell Dragons to 3rd party, possibly outside of US...
Quote from: mmeijeri on 11/20/2012 07:11 pmI'd love to hear more details about why SpaceX went with non-hardened components. So far I've heard C++ and Linux as an explanation, which I don't find very convincing since both work just fine on several rad-hard processors. There must be more to it.AFAIK the going price for the BA 750 board (POWER PC architecture) is in the $400-800k range. I'd expect that's "price competitve" in this market with similar products running lesser know instruction sets like the USAF 1750A and the USN ANsomething-or-other. IIRC this is about the capability of a mid 90s Apple Mac. Aside from the *eyewatering* price I think you'll find these boards are *mostly* instruction set compatible with other POWER PCs, but not *exactly*, much as the European equivalent (Mongoose?) is based on the SPARC 7 architecture.So on the upside the hardware is mfg in a rad hard process (SOS/SOI substrates are only the *start*). from the transistor up, *all* registers are likely to have 3 way voting, as is all I/O and the watchdog timer so you get defense in depth (*provided* your software make appropriate use of the features).*but* you've got not-quite compatibility with less popular instruction sets (possibly with *substantial* limitations, like a 1MB address space on 1750A, still used by ULA IIRC or the Shuttle's 4Pi architecture) probably favoring military standard 1553b bus protocols (with mil spec pricing) and a clock frequency at most in the low 100s of MHz with *no* control over the form factor and any additional peripherals will be available at the same "competitive" pricing. I note in all this talk I've not seen any comment on what Spacex actually *uses*. My instinct is x86 compatibles or ARM's (which have enjoyed *much* better power consumption.
From that Avation Week link it's obvious that SpaceX spent a good deal of time engineering a computing solution. They did a lot of analysis and even a good amount of testing. The result is the current set of computing resources used in the SpaceX vehicles. So far it's worked out. Please remember that Radiation Hardened means a lot of different things. There are the transient effects of particles hitting computer components and there is the total dose over time. Even hardened components suffer from SEU and you have to deal with that no matter what type of parts you use.The total dose that a Dragon computer might see in a LEO mission is low. just guessing I'd say 1 rad or so. The Curiosity rover has a RAD750 computer that is specified for 100k rads. To get that 100k rad you get to pay a reported $400k or so for it.Now SpaceX had said that the Dragon could land on any solid surface in the solar system. Good luck landing on Io with the current computer setup. At the surface of Io you get about 2 rads per minute. The current computer system would not survive the radiation environment for long. What about Mars? I'd say that is a maybe or maybe not.
So, four or five of those puppies would get to be in the millions of dollars, not counting peripherals.
That becomes a significant portion of the spacecraft's cost... SpaceX is a company that likes to spend as little as possible on outside components.
And presumably, they would want similiarity to their rocket's avionics as well. That would mean millions for each Falcon 9 or even Falcon 1 (back when they were still pursuing it) or the extra overhead of having two very different platforms.
This is from some of the reports on what went wrong:•One of three flight computers failed while Dragon was docked at ISS due to a suspected radiation hit. The computer was restarted but could not re-synchronize with the other two units. The computer was restarted but was not resynchronized with the other two units. SpaceX says that NASA felt it was not necessary to continue the mission.•One of three GPS units, the Propulsion and Trunk computers and Ethernet switch also experienced suspected radiation hits, but they were recovered during a power cycle.This is for a about a 2 week long flight where the majority of the time it was not doing anything and just attached to ISS.
This is very off topic, but as someone who designs PCB, no one except the very largest manufacturers etch their own PCBs; and unless you are manufacturing at least several hundred assemblies a month it's generally not worth it to do your own component placement either.
Aside from the *eyewatering* price I think you'll find these boards are *mostly* instruction set compatible with other POWER PCs, but not *exactly*
, much as the European equivalent (Mongoose?) is based on the SPARC 7 architecture.
probably favoring military standard 1553b bus protocols (with mil spec pricing)
I note in all this talk I've not seen any comment on what Spacex actually *uses*. My instinct is x86 compatibles or ARM's (which have enjoyed *much* better power consumption.
Quote from: Robotbeat on 11/20/2012 08:43 pmThat becomes a significant portion of the spacecraft's cost... SpaceX is a company that likes to spend as little as possible on outside components. It does mount up. The AvWeek article said they have about 54 processors on the whole LV/capsule doing various things. Commonality seems to be a *very* big Spacex trait. Why support 2 (or 3?) architectures when you can standardize on 1?
We've got 54 in a Dragon – and they're all different kinds of computers, different kinds of processors.
Quote from: JBF on 11/20/2012 02:50 pmI note in all this talk I've not seen any comment on what Spacex actually *uses*. My instinct is x86 compatibles or ARM's (which have enjoyed *much* better power consumption.I have no knowledge about SpaceX avionics architecture, but I'd be shocked beyond words if it were ARM based today.
I have no knowledge about SpaceX avionics architecture, but I'd be shocked beyond words if it were ARM based today.
You can use rad hardened hardware but if you want to carry humans you have to work on shields.In the long run it could be "smarter" for SpaceX to focus on shields and gain experience on it, investing bucks on it and saving on not rad hardened electronics.
I "believe" in redundancy , but a strong event (rad storm ?) could destroy all your redundant parts and leave you naked.
That paper says Hydrogen rich materials shield better: could it be feasible to put CPU boards and ram in a sphere inside the future methane tanks for free shielding?
And what about redundant CPU boards put in orthogonal direction to minimize the damage in case of directional rays (sun bursts)?
Last thought, modern CPUs and RAM are way smaller, so they lessen the chance of a hit, but I presume the smaller transistors are damaged by smaller energy levels than bigger ones, is it true ?
Replying to two items from this thread:I watch flight flight boards get populated/placed and assembled in unit quantities down to qty=1 in the lab next to mine- on a regular basis.Space rated PWA and PWB assembly is a different game than nearly anything commercial and absolutely everything high-volume.
Quote from: mlindner on 11/20/2012 03:11 pmA properly designed system should be plug-n-play being able to drop any number of computational "modules" into the loop as you want. As long as they didn't hardcode the 3 modules of 2 computing units architecture, it may already be capable of dropping additional modules into the loop.This sounds like something a software engineer would say Once you start adding hardware inputs/outputs into the equation, it becomes much harder to abstract your 'modules' in such a way.
Quote from: Robotbeat on 11/20/2012 02:42 pmQuote from: Go4TLI on 11/20/2012 02:40 pmQuote from: Robotbeat on 11/20/2012 02:35 pmWith added redundancy, you have the additional benefit of more "near misses," which gives you more opportunities to improve the system. Without redundancy, you either succeed or you fail hard, much fewer near misses.As far as I can tell not a single person has suggested there should be no redundancy. Redundancy is obvious thing to have.The discussion is wheter or not unlimted redundancy should be the anwer-all for unreliability. I suggest there is a middle ground where reliability is high but redundancy is there because things happen. Nobody disagrees with you on the idea there should be a middle ground. Or at least, nobody should.I agree with this as well. The problem is there is a very large range of possible middle grounds. The inflection point could be much further out than conventionally thought. I trust SpaceX to do this calculation. My personal belief though is that there has to be a better solution rather than using 15 year old technology.Following Moore's law there have been 10 doubling's in transistor density since then. Which implies a roughly 1024x fold increase in computation power since then. Meaning assuming you do distributed computing (even more radiation prone) and assuming that distributed computing scales linearly (it doesn't) you need roughly 1000 of these processors to get to the speed of one modern processor.
If your hardware elements (transistors, memory cells, etc) are 1000 times smaller, wouldn't that make them much more susceptible to individual rad hits, perhaps of lower energy?Also, I wonder if a hit that would have affected one component before might now affect multiple?
Can't say I'm sure of the process here, but making ten of anything at once often costs little more than making a single item.
Quote from: mlindner on 11/20/2012 02:53 pmQuote from: Robotbeat on 11/20/2012 02:42 pmQuote from: Go4TLI on 11/20/2012 02:40 pmQuote from: Robotbeat on 11/20/2012 02:35 pmWith added redundancy, you have the additional benefit of more "near misses," which gives you more opportunities to improve the system. Without redundancy, you either succeed or you fail hard, much fewer near misses.As far as I can tell not a single person has suggested there should be no redundancy. Redundancy is obvious thing to have.The discussion is wheter or not unlimted redundancy should be the anwer-all for unreliability. I suggest there is a middle ground where reliability is high but redundancy is there because things happen. Nobody disagrees with you on the idea there should be a middle ground. Or at least, nobody should.I agree with this as well. The problem is there is a very large range of possible middle grounds. The inflection point could be much further out than conventionally thought. I trust SpaceX to do this calculation. My personal belief though is that there has to be a better solution rather than using 15 year old technology.Following Moore's law there have been 10 doubling's in transistor density since then. Which implies a roughly 1024x fold increase in computation power since then. Meaning assuming you do distributed computing (even more radiation prone) and assuming that distributed computing scales linearly (it doesn't) you need roughly 1000 of these processors to get to the speed of one modern processor.If your hardware elements (transistors, memory cells, etc) are 1000 times smaller, wouldn't that make them much more susceptible to individual rad hits, perhaps of lower energy?Also, I wonder if a hit that would have affected one component before might now affect multiple?cheers, Martin
Quote from: jimvela on 11/21/2012 04:44 amReplying to two items from this thread:I watch flight flight boards get populated/placed and assembled in unit quantities down to qty=1 in the lab next to mine- on a regular basis.Space rated PWA and PWB assembly is a different game than nearly anything commercial and absolutely everything high-volume.Which is exactly why they cost so much.
That cost is in the noise compared to the cost of a failure- which is why they are built that way.
Quote from: jimvela on 11/21/2012 02:34 pmThat cost is in the noise compared to the cost of a failure- which is why they are built that way.So it's the QA in design & build coupled with testing *after* mfg and population that soaks up the cash?
I'd guessed it might have something to do with needing some kind of forced flow (either gas or liquid) cooling due to zero g.
This also raises a point. Are layer counts and line widths for space rated PWA's and PWC's (Those sound like IBM terms, I thought most people call them PCB's) behind those of terrestrial boards in the same way as space rated parts tend to be a generation or 2 behind their ground based equivalents?
An uneducated guess: on a slow system the realtime requirements may not be met with Linux and C++.
I think SpaceX is just trying to follow Amdahl's Law in that you shouldn't optimize a small part of the problem.
Quote from: guckyfan on 11/20/2012 07:57 pmAn uneducated guess: on a slow system the realtime requirements may not be met with Linux and C++.I doubt it. You can do cycle-perfect simulations of Apollo hardware in Javascript in a browser nowadays, so that can't be it. Console video games run on limited hardware too, and C++ is the language of choice for that.
Quote from: mmeijeri on 11/21/2012 05:42 pmQuote from: guckyfan on 11/20/2012 07:57 pmAn uneducated guess: on a slow system the realtime requirements may not be met with Linux and C++.I doubt it. You can do cycle-perfect simulations of Apollo hardware in Javascript in a browser nowadays, so that can't be it. Console video games run on limited hardware too, and C++ is the language of choice for that.That is quite a few orders of magnitude slower. Some here on the forum were even surprised they use Linux at all because it is not hard realtime.
Quote from: guckyfan on 11/21/2012 06:28 pmQuote from: mmeijeri on 11/21/2012 05:42 pmQuote from: guckyfan on 11/20/2012 07:57 pmAn uneducated guess: on a slow system the realtime requirements may not be met with Linux and C++.I doubt it. You can do cycle-perfect simulations of Apollo hardware in Javascript in a browser nowadays, so that can't be it. Console video games run on limited hardware too, and C++ is the language of choice for that.That is quite a few orders of magnitude slower. Some here on the forum were even surprised they use Linux at all because it is not hard realtime.Linux is a re-implementation of Unix. Soft real time Unix made its living controlling telephone exchanges. For SpaceX it probably comes down to how fast a rocket engine can gimbal.
I'm pretty sure SpaceX isn't using Linux in that portion of their avionics... probably some other embedded, fully real-time operating system.
Quote from: A_M_Swallow on 11/21/2012 06:45 pmQuote from: guckyfan on 11/21/2012 06:28 pmQuote from: mmeijeri on 11/21/2012 05:42 pmQuote from: guckyfan on 11/20/2012 07:57 pmAn uneducated guess: on a slow system the realtime requirements may not be met with Linux and C++.I doubt it. You can do cycle-perfect simulations of Apollo hardware in Javascript in a browser nowadays, so that can't be it. Console video games run on limited hardware too, and C++ is the language of choice for that.That is quite a few orders of magnitude slower. Some here on the forum were even surprised they use Linux at all because it is not hard realtime.Linux is a re-implementation of Unix. Soft real time Unix made its living controlling telephone exchanges. For SpaceX it probably comes down to how fast a rocket engine can gimbal.I'm pretty sure SpaceX isn't using Linux in that portion of their avionics... probably some other embedded, fully real-time operating system.
So digging in the careers section of spacex.com should have been done a while ago:Summarized: I deleted common traits and traits that were generic.A few comments:They use a lot of linux. They don't use x86. Looks like PowerPC and ARM mainly.
uC rolls up a bunch of Unix commands and a shell into 1 single block to speed up loading.
Quote from: john smith 19 on 11/22/2012 05:15 pmuC rolls up a bunch of Unix commands and a shell into 1 single block to speed up loading.uC stands for microcontroller (Atmel AVR, PIC, TI MSP430 are a few common ones), so I'm not sure what you meant by this.
As a side note, my group here flies MSP430s in space. We and other groups have quite often flown them in space on cubesats. They have quite high reliability, I haven't really heard of any permanently failing, occasionally they crash and have to be rebooted, but thats why we fly everything with watchdog timers (to reboot them). More so the newer ones are FRAM (Ferroelectric RAM) based which has inherent radiation hardening based on the technology because the data is stored in magnetic fields rather than electrons that could be disrupted by radiation. I should also note that they cost around $6 USD per chip.
This is a mis-representation of "real-time." If your system runs fast enough then even if it is not "real-time," it acts as if it is. As long as you can service events fast enough.
OT but how did they fair over the South Atlantic Anomaly?
Quote from: john smith 19 on 11/23/2012 09:53 amOT but how did they fair over the South Atlantic Anomaly?Not sure on that. I'm not directly involved with the mission that has logged the most time in space. We don't (yet) actually fly radiation monitors in space, so we can only tell when it resets. You can take a look at http://rax.engin.umich.edu/It flies an msp430 as its flight computer, older flash based model, apparently works fine, doing great science. Nanosats generally don't fly with any redundancy anywhere because of mass and space requirements, if it breaks it breaks.
I've seen pictures taken with digital cameras and closed shutters over the SAA Vs other parts of their orbit.It's an impressive demonstration of the *relative* radiation level.
Quote from: john smith 19 on 11/23/2012 03:10 pmI've seen pictures taken with digital cameras and closed shutters over the SAA Vs other parts of their orbit.It's an impressive demonstration of the *relative* radiation level.Links?
Quote from: Nomadd on 11/20/2012 01:08 pm... I've seen a lot of equipment and more than one life lost because of redundancy induced complacency. (A phrase I just invented for this post)...Same argument applies to ANY reliability increase, does it not?For instance, SpaceX is going to improve their error-handling capability for the next mission, making resyncing automatic. This will make them more robust to future problems.
Quote from: Robotbeat on 11/20/2012 02:35 pmQuote from: Nomadd on 11/20/2012 01:08 pm... I've seen a lot of equipment and more than one life lost because of redundancy induced complacency. (A phrase I just invented for this post)...Same argument applies to ANY reliability increase, does it not?For instance, SpaceX is going to improve their error-handling capability for the next mission, making resyncing automatic. This will make them more robust to future problems.this might be a fix or it might be the wrong direction. Remember the "weakest link"; auto resyncing with an error prone processor makes the whole system weak.