Author Topic: SpaceX Flight Software: How good is it?  (Read 15356 times)

Offline oldAtlas_Eguy

  • Senior Member
  • *****
  • Posts: 3567
  • Florida
  • Liked: 1993
  • Likes Given: 239
SpaceX Flight Software: How good is it?
« on: 10/22/2011 03:30 pm »
The ASAP initial pushback against SpaceX’s unconcern over its flight software seems to have now been allayed.

http://www.spacenews.com/civil/111021-spacex-soft-review-softens.html

Being on the outside and looking in several items give an indication of how good SpaceX’s software team is:

1) Their IT security is extremely good and is all home grown. You don’t get that type of quality from second best programmers.

2) They can quickly resolve software problems. Again the speed to resolve software problems speaks to the grasp the software team has on every aspect of the software. Something only top quality programmers have the ability to accomplish. (I have known a lot of programmers and this ability is found in very few of the more capable ones.)

3) One of the problems that come with these types of top programmers is a lot of arrogance when it comes to their software. They know they are good and they know their software is good. This trait is very obvious in the NASA Aug 9 meeting. (The second is because they are even more careful at testing their software extensively personally before submitting it to be tested by others, which is why so few even minor bugs from these programmers show up and why industry pays them very large salaries. A large amount of the cost of software development comes from delays when bugs are found or even worse design errors, the fewer the bugs the cheaper even if the programmers are being paid well above industry standards.)

4) Top programmers like challenges more than salary, but not so much that they would work for subpar salary. (It just means that for the same pay you can get the good ones to work for you instead of them working for someone else, it’s an attraction factor that requires equitable pay in order to then keep them.)

5) Their flight system computers are cutting edge complex triple and quad redundant real time systems that far surpass the Shuttle hardware in capability. (This hardware has both enablers and detractors when it comes to good software: enabler – more memory storage and cpu speed means less optimization and more built in test in-line within the code testing for bad data conditions that can cause the software to die, detractors – more capabilities and more data being processed means a lot more code which means an exponential growth in software complexity with the co-problem of more possibility of software errors. If their flight software programmers weren’t really good, the F1 or the F9 would not have made it to orbit without a software error causing a failure. See the DOD’s major concern over fielding software and putting systems through realistic in the field trials just to test the complex software and how many times severe software problems caused a weapon system to be cancelled [Reference Sgt York].)

6) A very common modern problem of the software review. An organization who writes very little “fielded” software does not have the real expertise to understand the current generation of very large complex software implementations. Modern software review has very little to do with examination of the code itself but review of the software object design maps, methodologies, documentation thoroughness, version control, the requirements validation checkoffs, software testing performed and test results, and the review of the adequacy of the company’s internal software life cycle process. A good signoff on these items does not insure that the software will be error free or even work but a bad rating in a review has a high likelihood that the software would never work. For the government it is a lessons learned item aimed at costs and risk reductions as what would more probably result in good software. The other item also associated with this is that these reviewing organizations very very rarely have as part of their review team someone who will be able to even understand why the software organized as it is would be a good thing or bad. (NASA is not as bad here as would normally occur, they should be adequate enough to understand why, but they are not going to be out front except in very limited circumstances where there is no industry comparable experience i.e. certain cutting edge in-house R&D software projects whose personnel would rarely be a participant of software review of a contractor unless it was within the very narrow applicable field of what they are working on or at least close enough to it that their expertise would be a plus. An external software review of a competent software producer is not really going to accomplish much other than a feel good for the non-software oriented management. Same can be said of some engineering reviews. The reasoning behind these types of reviews is to force contractors’ management, and even NASA project management on NASA owned systems, to do good management of their projects. This is not always successful, and catching it at the review is sometimes way too late.) Sorry about the length but software review has serious limitations and applicability.

Having extensive software experience, starting in 1974 at the age of 16, on both sides of the street, government and contractor, as well as being on projects that were well run and others that were more haphazard, I know what can get you to the goal post quickly and cheaply and what will at first look like it will be cheap and quick but results in being very expensive and never finishing. That is why I agree that the software review and what it enforces on a software project team is a good thing but that the review itself is not the goal but the performance of the “best practices” the project team does in order to pass the review when the review occurs.

Elon Musk although a programmer (I am also of this stripe) he is an innovative programmer. He knows how and who is necessary in order to create good solid software design and implementation. Although the resulting team will seem arrogant to the outsiders of the world of programming they will be a tight knit high performance group more interested in the quality of their work than in sleep, an aspect that is also common to some engineering teams.

So how good is the SPaceX flight software?

I have no concerns when looking from the outside in backed up by almost 40 years of experience writing software that includes real time systems and LV flight software. The difference between 1974 tools and compromises in the code in order to make the total system work is no longer seen today, the software design and testing tools are sophisticated and the hardware limitations have so little impact that compromises and work arounds in order to make the system work is rarely done. The quality of the software team has the most impact on the quality of the software regardless of methodologies but good “praticies” improves the quality of the product and the ASAP just acknowledged they have few concerns over SpaceX methodology.

Offline baldusi

  • Senior Member
  • *****
  • Posts: 7438
  • Buenos Aires, Argentina
  • Liked: 1459
  • Likes Given: 4518
Re: SpaceX Flight Software: How good is it?
« Reply #1 on: 10/22/2011 05:01 pm »
I'm of the theory that the definition of the quality of the software depends a lot on the complexity and iterations. I usually regard good software design when it's correctly divided in parts that have a very clear input and output sequence, and can handle graciously bad parameters. It's no surprise I like OpenBSD so much. As a professor once told me, when you have a big problem, you divide it in lots of simple problems.
How to do that without falling into an interrelationship mess, is what good design is all about. What's more, good design let's you design testing tools. A great example was SGI's tester of NFS implementations, that sent the stack more than 65,000 corner cases.
I also think that sometimes system engineers forget that people outside the circle writes code on which life depends and/or on where a single error can cost hundreds of millions. And those outsiders have developed systems where reliability is paramount, but code iteration speed is also fundamental. In fact, I like to have lots of small code iterations at the same time as a good design framework.
I don't care how good you are, the initial design won't work. And doing some prototyping helps a lot understanding the mechanics and critical code paths. I've followed very closely the OpenBSD's pf evolution, and they have changed the architecture at least three times. Even though they had to make the initial version in four months. And those guy haven't had a single root attack on that part in some ten years.
So, I wouldn't be surprised if the SpaceX's software guys use a development model quite different from NASA, and that made them very weary. In fact, I would expect nothing short of a clash of cultures. I've seen a syndrome of "only us are the critical software professionals", which wouldn't surprise me.

Offline Antares

  • ABO^2
  • Senior Member
  • *****
  • Posts: 5202
  • Done arguing with amateurs
  • Liked: 368
  • Likes Given: 226
Re: SpaceX Flight Software: How good is it?
« Reply #2 on: 10/22/2011 10:29 pm »
There's a lot of stuff taken on faith and supposition in the OP.  If you have evidence to back up your opinions, please cite it.  You seem to base it on hearsay of ASAP.
If I like something on NSF, it's probably because I know it to be accurate.  Every once in a while, it's just something I agree with.  Facts generally receive the former.

Offline alk3997

  • Full Member
  • ***
  • Posts: 380
  • Liked: 29
  • Likes Given: 27
Re: SpaceX Flight Software: How good is it?
« Reply #3 on: 10/22/2011 10:50 pm »
Actually the two things that really are a concern in the Space News article are,

1) The reported SpaceX statement that, "the software chief [at SpaceX] said he didn’t worry about errors because ‘there were no mistakes in the software.’" and

2) That there was no CMMI accredited...process.

On #1, that's is the "kiss of death" in human-rated software.  When you start believing that there are no errors is when errors start creeping in because your software is tied to your process and the people running the processes.

For #2, I really don't care about a certificate but the process is the key.  The higher CMMI accredidations call for a process that takes software errors and feeds them back into the process to prevent future occurences.  It's only then that you start to have mature software.

So, wow.  I wouldn't even think to have said the quote about no errors to my NASA customer because we all know there is no such thing as error-free software (at least anything greater than 10 lines or so).  Maybe they meant it as a joke?

Hope they have corrected their issues since so much of the future of human spaceflight is counting on it.

Andy


Offline mmeijeri

  • Senior Member
  • *****
  • Posts: 7486
  • Martijn Meijering
  • NL
  • Liked: 103
  • Likes Given: 263
Re: SpaceX Flight Software: How good is it?
« Reply #4 on: 10/22/2011 10:59 pm »
CMMI is a harmful load of bollocks (but if you're good and have plenty of resources you can still succeed with it), but it is very disappointing to hear a senior manager say there aren't any mistakes. Other than that I agree with Antares.
We will be vic-toooooo-ri-ous!!!

Offline hop

  • Senior Member
  • *****
  • Posts: 3351
  • Liked: 484
  • Likes Given: 842
Re: SpaceX Flight Software: How good is it?
« Reply #5 on: 10/22/2011 11:41 pm »
On #1, that's is the "kiss of death" in human-rated software.  When you start believing that there are no errors is when errors start creeping in because your software is tied to your process and the people running the processes.
Heck, if said in seriousness by an engineering type (not marketing ;)) that would have me running screaming from any software vendor.

As for the OP - reading WAY too much into a press account.

Offline peter-b

  • Dr. Peter Brett
  • Full Member
  • ****
  • Posts: 651
  • Oxford, UK
  • Liked: 17
  • Likes Given: 74
Re: SpaceX Flight Software: How good is it?
« Reply #6 on: 10/23/2011 12:21 am »
Heck, if said in seriousness by an engineering type (not marketing ;)) that would have me running screaming from any software vendor.

Not necessarily. There is such a thing as formally provable software (consider, for example, the work of Donald Knuth). Unfortunately, such software requires highly skilled computer scientists to write and maintain. On the other hand, that sounds like just the sort of person who SpaceX would be able to attract to work for them, doesn't it?

Furthermore, in their 6 launches so far, SpaceX have had no mission anomalies that I am aware of that could be attributed to flight software bugs. Incorrect mathematical system models leading to incorrect coefficients in the algorithms, certainly (e.g. the stage separation and fuel oscillation problems), but no evidence of the sort of mistakes that led to the Ariane 5 test flight failure.

There seem to be three scenarios: SpaceX actually are using some kind of ultra-rigorous software development practice; Mr. Musk has deliberately recruited someone with a dangerously blasé attitude towards code quality to lead software development for an environment where bugs literally cost lives; or someone's been quoted out of context. Which seems most plausible to you?
Research Scientist (Sensors), Sharp Laboratories of Europe, UK

Offline 2552

  • Full Member
  • ****
  • Posts: 486
  • Liked: 41
  • Likes Given: 519
Re: SpaceX Flight Software: How good is it?
« Reply #7 on: 10/23/2011 12:49 am »
Heck, if said in seriousness by an engineering type (not marketing ;)) that would have me running screaming from any software vendor.

Not necessarily. There is such a thing as formally provable software (consider, for example, the work of Donald Knuth). Unfortunately, such software requires highly skilled computer scientists to write and maintain. On the other hand, that sounds like just the sort of person who SpaceX would be able to attract to work for them, doesn't it?

Furthermore, in their 6 launches so far, SpaceX have had no mission anomalies that I am aware of that could be attributed to flight software bugs. Incorrect mathematical system models leading to incorrect coefficients in the algorithms, certainly (e.g. the stage separation and fuel oscillation problems), but no evidence of the sort of mistakes that led to the Ariane 5 test flight failure.

There seem to be three scenarios: SpaceX actually are using some kind of ultra-rigorous software development practice; Mr. Musk has deliberately recruited someone with a dangerously blasé attitude towards code quality to lead software development for an environment where bugs literally cost lives; or someone's been quoted out of context. Which seems most plausible to you?

‘there were no mistakes in the software.’

Seems likely this refers to previous flight performance of the software.

Offline Lurker Steve

  • Full Member
  • ****
  • Posts: 1420
  • Liked: 35
  • Likes Given: 9
Re: SpaceX Flight Software: How good is it?
« Reply #8 on: 10/23/2011 01:08 am »
CMMI is a harmful load of bollocks (but if you're good and have plenty of resources you can still succeed with it), but it is very disappointing to hear a senior manager say there aren't any mistakes. Other than that I agree with Antares.

Actually, just levels 4 and 5 are useless. Pretty much just guarantees more jobs for quality engineers, since it mostly consists of collecting metrics, and theorically modifying your process to eliminate some common repeated errors.

Everyone makes mistakes, but those quality engineers aren't happy unless you find x amount of major and y amount of minor errors per line of code or page of design documentation. Normally, I just point out minor spelling or punctation errors, and don't log it, but when you have to fulfill a quota sometimes, you go overboard.

Right now, I'm in an organization that is the complete other end of the spectrum. We only do very informal reviews, if any. Management sometimes gives lip service towards a quality process, but project schedules and funding would never allow us to even get close to level 2. We are successful because the group is predominately made up of senior engineers, each with 25+ years of experience. We also have the advantage of not having to coordinate the efforts of hundreds of engineers. Our small group of 10-15 does just fine.


Online A_M_Swallow

  • Elite Veteran
  • Senior Member
  • *****
  • Posts: 8599
  • South coast of England
  • Liked: 375
  • Likes Given: 167
Re: SpaceX Flight Software: How good is it?
« Reply #9 on: 10/23/2011 04:18 am »
2) That there was no CMMI accredited...process.
{snip}

CMMI is for cost plus contract only.  A big increase in paperwork but does not solve the problems.  Mechanical engineering companies like it.

SpaceX is a mechanical engineering firm but Mr Musk is a finance software engineering man.

Online A_M_Swallow

  • Elite Veteran
  • Senior Member
  • *****
  • Posts: 8599
  • South coast of England
  • Liked: 375
  • Likes Given: 167
Re: SpaceX Flight Software: How good is it?
« Reply #10 on: 10/23/2011 04:36 am »
{snip}
On #1, that's is the "kiss of death" in human-rated software.  When you start believing that there are no errors is when errors start creeping in because your software is tied to your process and the people running the processes.

The second launch of the Falcon 1 had a software problem.
"This anomaly is two-fold. First, an incorrect propellant utilization file was loaded into the engine computer. This error caused the engine mixture ratio to be lean on lift-off and rich at altitude. {snip}
http://www.spacex.com/F1-DemoFlight2-Flight-Review.pdf

This change control problem may have required a rocket engine engineer to spot.

The no faults statement may have just covered the last flight of the Falcon 9.

Offline peter-b

  • Dr. Peter Brett
  • Full Member
  • ****
  • Posts: 651
  • Oxford, UK
  • Liked: 17
  • Likes Given: 74
Re: SpaceX Flight Software: How good is it?
« Reply #11 on: 10/23/2011 06:38 am »
The second launch of the Falcon 1 had a software problem.
"This anomaly is two-fold. First, an incorrect propellant utilization file was loaded into the engine computer. This error caused the engine mixture ratio to be lean on lift-off and rich at altitude. {snip}

Because I am a pedant, I would argue that that was not, strictly speaking, a software problem. :P
Research Scientist (Sensors), Sharp Laboratories of Europe, UK

Offline Lars_J

  • Senior Member
  • *****
  • Posts: 6161
  • California
  • Liked: 665
  • Likes Given: 195
Re: SpaceX Flight Software: How good is it?
« Reply #12 on: 10/23/2011 06:56 am »
This thread is 100% speculation/conjecture. Please let it die.

Offline oldAtlas_Eguy

  • Senior Member
  • *****
  • Posts: 3567
  • Florida
  • Liked: 1993
  • Likes Given: 239
Re: SpaceX Flight Software: How good is it?
« Reply #13 on: 10/23/2011 04:34 pm »
{snip}
On #1, that's is the "kiss of death" in human-rated software.  When you start believing that there are no errors is when errors start creeping in because your software is tied to your process and the people running the processes.

The second launch of the Falcon 1 had a software problem.
"This anomaly is two-fold. First, an incorrect propellant utilization file was loaded into the engine computer. This error caused the engine mixture ratio to be lean on lift-off and rich at altitude. {snip}
http://www.spacex.com/F1-DemoFlight2-Flight-Review.pdf

This change control problem may have required a rocket engine engineer to spot.

The no faults statement may have just covered the last flight of the Falcon 9.

Thank you for pointing out the version control problem in F1 DemoFlt2 ending in an error. What would worry me a lot more would be did they have any more errors from this type of problem. So they either solved the process deficiency that could result in this type of problem or they are very lucky (there are a lot of files and a lot of file versions, getting the correct set for loading prior to launch is not a small task). One of the items that the SRB is supposed to look for is repeating errors of the same type especially related to the life cycle process.

As an aside you shouldn't even have anything but the verified file set for flight on the ground computers at the pad. Something I think they have learned.

The second failure related to software on the same flight F1 DemoFlt2 of the second stage
Quote
Upper Stage Control Anomaly
An oscillation appeared in the upper stage control system approximately 90 seconds into the burn. This instability grew in pitch and yaw axes initially and after about 30 seconds also induced a noticeable roll torque. This roll torque eventually overcame the 2nd stage’s roll control thrusters and centrifuged the propellants, causing flame-out of the Kestrel engine. There is high confidence that LOX slosh was the primary contributor to this instability. This conclusion has been verified by third party industry experts that have reviewed the flight telemetry.

Falcon 1 did not use slosh baffles in the second stage tanks, as simulations done prior to flight indicated the slosh instability was a low risk. Given that in space there are no gust or buffet effects, the simulations did not take into account a perturbation, as occurred due to the hard slew maneuver after stage separation. Extensive 2nd stage slosh baffles will be included in all future flights, as is currently the case with the 1st stage.

is both an engineering and a software fault. It is a software fault because of deficiency of the simulation that would have caught such an engineering error. Assumptions made based on simulations if the simulations are not mature have a tendency to lead to problems. This is another learning item and another thing for the SRB to investigate to see if there is another repeat of this problem.

Side Note: This thread is to investigate why the concerns that the NAC and ASAP had have been lowered and if the “no errors” statement is because of rigorous testing of the software or just plain arrogance. As a past chairman of an SRB if someone had made such a statement I would have beat them with endless very pointed questions on design, testing and process. This reaction seems to be a consensus among the posters as well as the NAC and ASAP. Why would the SpaceX software project manager make such a statement in the first place? Is it a lack of experience?

Offline Antares

  • ABO^2
  • Senior Member
  • *****
  • Posts: 5202
  • Done arguing with amateurs
  • Liked: 368
  • Likes Given: 226
Re: SpaceX Flight Software: How good is it?
« Reply #14 on: 10/24/2011 02:17 am »
Falcon 1 flight 2 anomaly is as obsolete to this discussion as STS-93 or Delta 259 is to a discussion of those fleets today.  Culture evolves quickly.  Process fixes are easy.
« Last Edit: 10/24/2011 02:18 am by Antares »
If I like something on NSF, it's probably because I know it to be accurate.  Every once in a while, it's just something I agree with.  Facts generally receive the former.

Offline Lurker Steve

  • Full Member
  • ****
  • Posts: 1420
  • Liked: 35
  • Likes Given: 9
Re: SpaceX Flight Software: How good is it?
« Reply #15 on: 10/24/2011 02:42 pm »
2) That there was no CMMI accredited...process.
{snip}

CMMI is for cost plus contract only.  A big increase in paperwork but does not solve the problems.  Mechanical engineering companies like it.

SpaceX is a mechanical engineering firm but Mr Musk is a finance software engineering man.

CMMI is not for Cost Plus contracts only.

I've used it at several Motorola and Rockwell organizations that were building real commerical products. It's a valid set of processes for coordinating the activites of large development teams and subcontractors. A CMMI level 3 certification was required in order to compete for certain automotive component contracts.

I have heard complaints from friends who were working at a CMM level 5 firm that they spent more time worrying about the process than they did the actual product design (quality engineers run amok), but there is nothing wrong with the basic framework.

Offline alk3997

  • Full Member
  • ***
  • Posts: 380
  • Liked: 29
  • Likes Given: 27
Re: SpaceX Flight Software: How good is it?
« Reply #16 on: 10/24/2011 02:49 pm »
2) That there was no CMMI accredited...process.
{snip}

CMMI is for cost plus contract only.  A big increase in paperwork but does not solve the problems.  Mechanical engineering companies like it.

SpaceX is a mechanical engineering firm but Mr Musk is a finance software engineering man.

CMMI is not for Cost Plus contracts only.

I've used it at several Motorola and Rockwell organizations that were building real commerical products. It's a valid set of processes for coordinating the activites of large development teams and subcontractors. A CMMI level 3 certification was required in order to compete for certain automotive component contracts.

I have heard complaints from friends who were working at a CMM level 5 firm that they spent more time worrying about the process than they did the actual product design (quality engineers run amok), but there is nothing wrong with the basic framework.

I'll also add that if implemented properly everyone is involved - not just the quality engineers.  If not, the buy-in on the process improvements won't be there.  Whether certified by CMMI or not, the basic concept of continuous software improvement was one of the big keys to making Shuttle flight software as high quality as it was (we were very good at not making the same mistake twice).

Andy

Tags: