-
#820
by
JonathanD
on 07 Feb, 2020 15:05
-
More details prior to the teleconference today:
https://blogs.nasa.gov/commercialcrew/2020/02/07/nasa-shares-initial-findings-from-boeing-starliner-orbital-flight-test-investigation/
Worth quoting in full:
Definitely some red meat here. In particular:
"Due to these breakdowns found in design, code and test of the software, they will require systemic corrective actions. The team has already identified a robust set of 11 top-priority corrective actions. More will be identified after the team completes its additional work."
With such a statement, it is difficult to see how they could justify the next flight of this vehicle to involve humans. Perhaps Boeing already knows this and was the reason for preemtively taking the $410 million pre-tax charge taken in the latest company financial release.
-
#821
by
leovinus
on 07 Feb, 2020 15:13
-
More details prior to the teleconference today:
https://blogs.nasa.gov/commercialcrew/2020/02/07/nasa-shares-initial-findings-from-boeing-starliner-orbital-flight-test-investigation/
Breakdowns in the design and code phase inserted the original defects. Additionally, breakdowns in the test and verification phase failed to identify the defects preflight despite their detectability.
Well, reading that, here are a couple of questions for whoever can dial in later today at 3:30pm. @ Boeing,
- Could you clarify whether these software processes are 100% in-house, or are teams outside of the USA involved with writing StarLiner code?
- Could you clarify whether the software defects are due to incomplete specifications, or communication & integration issues between teams?
- Could you clarify what corrective actions are being taking? Such as: involving more senior engineers? Or, just better specifications?
- Could you clarify please your main software methodologies for building StarLiner? Is it DoD -like specification driven? Iterative or waterfall? Do you uses Agile programming?
- Can you clarify the experience structure of the software team? How many seasoned vs first-space-project engineers?
- What is the CMM level of the team (in house or external) that wrote the bulk of the StarLiner software?
- What other projects is the same software team involved in? What other software projects have they successfully delivered?
- Are these breakdowns in the software process of design, coding and testing limited to the StarLiner, or would SLS use the same processes?
-
#822
by
wolfpack
on 07 Feb, 2020 15:21
-
Boeing doesn't need to purchase an Atlas for a Starliner IFA.
No, but they'll ask for another $100M to design an adapter.
-
#823
by
leovinus
on 07 Feb, 2020 15:31
-
More details prior to the teleconference today:
https://blogs.nasa.gov/commercialcrew/2020/02/07/nasa-shares-initial-findings-from-boeing-starliner-orbital-flight-test-investigation/
Breakdowns in the design and code phase inserted the original defects. Additionally, breakdowns in the test and verification phase failed to identify the defects preflight despite their detectability.
Well, reading that, here are a couple of questions for whoever can dial in later today at 3:30pm. @ Boeing,
- Could you clarify whether these software processes are 100% in-house, or are teams outside of the USA involved with writing StarLiner code?
- Could you clarify whether the software defects are due to incomplete specifications, or communication & integration issues between teams?
- Could you clarify what corrective actions are being taking? Such as: involving more senior engineers? Or, just better specifications?
- Could you clarify please your main software methodologies for building StarLiner? Is it DoD -like specification driven? Iterative or waterfall? Do you uses Agile programming?
- Can you clarify the experience structure of the software team? How many seasoned vs first-space-project engineers?
- What is the CMM level of the team (in house or external) that wrote the bulk of the StarLiner software?
- What other projects is the same software team involved in? What other software projects have they successfully delivered?
- Are these breakdowns in the software process of design, coding and testing limited to the StarLiner, or would SLS use the same processes?
And a question for Jim B. at NASA - Given all your experience with building spacecraft, couldn't you identify these software issues and risks at Boeing at an earlier stage? Finding them at the end of the project is typically a "bad thing". Can you comment why these corrective actions could not have been identified at an earlier stage, e.g., 2 years ago?
-
#824
by
rpapo
on 07 Feb, 2020 15:32
-
Come on guys. What in the world does IFA have to do with the current problem? Almost nothing. The software failures inside of Starliner exist independently of any abort test, and would not have been revealed by such a test.
-
#825
by
JonathanD
on 07 Feb, 2020 15:36
-
Come on guys. What in the world does IFA have to do with the current problem? Almost nothing. The software failures inside of Starliner exist independently of any abort test, and would not have been revealed by such a test.
I'd agree with that. And in fact a highly orchestrated IFA test would really not address to problem of software defects involving other critical periods of flight (insertion and EDL). So if folks want to say there should be an IFA too, fine, but that's really a separate issue.
NASA may be getting some input from international partners given these revelations as well.
-
#826
by
ChrisWilson68
on 07 Feb, 2020 15:37
-
Come on guys. What in the world does IFA have to do with the current problem? Almost nothing. The software failures inside of Starliner exist independently of any abort test, and would not have been revealed by such a test.
The point of IFA wouldn't be to find these bugs, it would be to find other bugs.
The fact that these problems weren't found before flight reduces confidence that there aren't other bugs that would be found by an IFA test.
-
#827
by
Nomadd
on 07 Feb, 2020 15:48
-
Come on guys. What in the world does IFA have to do with the current problem? Almost nothing. The software failures inside of Starliner exist independently of any abort test, and would not have been revealed by such a test.
The point of IFA wouldn't be to find these bugs, it would be to find other bugs.
The fact that these problems weren't found before flight reduces confidence that there aren't other bugs that would be found by an IFA test.
Exactly right. IFA wasn't planned because of confidence in simulations and analysis. That confidence is getting a little shaky because of the software problems they've encountered, and substituting abort simulations for a real IFA doesn't quite have the same credibility in some minds now.
-
#828
by
SoftwareDude
on 07 Feb, 2020 15:56
-
Over the years, I've stepped into several situations where basic software engineering processes don't exist. They don't have a build. There is no source control system. There is no consistency in sharing code and design paradigms. There is no consistency in testing. There is no build server. There is no automated test for each build. Code quality among the engineers is inconsistent. Etc.
I am usually not surprised by this and it is something I fix.
What is NASA getting for its money?
It will be at least a year before Boeing can refly Starliner.
Boeing will do some hiring.
-
#829
by
Coastal Ron
on 07 Feb, 2020 16:02
-
From that pre-conference statement:
NASA Shares Initial Findings from Boeing Starliner Orbital Flight Test Investigation...
There was no simple cause of the two software defects making it into flight. Software defects, particularly in complex spacecraft code, are not unexpected. However, there were numerous instances where the Boeing software quality processes either should have or could have uncovered the defects. Due to these breakdowns found in design, code and test of the software, they will require systemic corrective actions. The team has already identified a robust set of 11 top-priority corrective actions. More will be identified after the team completes its additional work.
...
It really seems to me that NASA should be asking Boeing to do a complete review of their software, because it is clear that NO PART of their codebase can be relied upon due to the faulty testing processes they had in place.
After the complete review and retest of the software, then Boeing should re-run the uncrewed orbital test flight (OFT). I wasn't advocating for this previously, but these latest revelations have made me question the safety of the Starliner vehicle as it currently stands, and the only way to validate the fixes is to rerun the OFT. Too many issues in question.
We'll see what Bridenstine has to say about this...
-
#830
by
freddo411
on 07 Feb, 2020 16:47
-
Hopefully we will hear some more during the teleconference tomorrow, but at first blush this seems like a pretty significant thing to omit from the initial post-flight news conference or even during live coverage. There seems to be 3 main possibilities:
1. NASA and Administrator Bridenstine were not fully aware of the extent of mid-flight software updates at the time of the post-flight press conference, which could mean a) Boeing didn't think it was significant enough to share, or b) Boeing deliberately withheld the information at that time.
2. NASA and Administrator Bridenstine were aware of the software update, but based upon the communications from Boeing were assured it was not a significant situation worth mentioning.
3. NASA and Administrator Bridenstine were aware of the software update and the potentially serious events that could have happened had it not been caught on the fly, and still chose not to mention it in the post-flight news conference.
Scenarios 1. and 2. are situations where, ostensibly, NASA and Administrator Bridenstine could have been acting in good faith and simply did not have sufficient information to make a statement about that portion of the mission directly following its completion, especially in light of the focus on the MET issue.
Scenario 3. would be a bit more concerning, because it makes some of the "a lot of things went right" cheer-leading seem a whole lot less appropriate.
At this point we have a spacecraft that was unable to achieve its orbital insertion due to an MET issue, which cascaded into erratic firing of control thrusters that pushed them beyond specifications (concerning enough on its own), and then had a separate and unrelated software flaw that could have resulted in loss of vehicle had it not been corrected on the fly mid-mission. Yikes. I had really been trying to give the Boeing team the benefit of the doubt up until this point, but they are losing me in a hurry.
I'm getting the impression some of the space press feel a bit lied-to as well. It will be interesting to not only hear what facts are shared tomorrow, but how much of the call comes off as damage control as opposed to a transparent accounting of who knew what and when.
Agreed, and well put.
I have to add:
* Failure to have effective communication with the spacecraft. While blackouts can be part of a mission, it's clear that it is a very poor plan for a first time flight during a critical system event. Either poor mission design or a comms problem, or a comms problem due to poor handling of other issues.
-
#831
by
cebri
on 07 Feb, 2020 16:53
-
Extremely sensitive situation. NASA is basically admitting it has lost confidence in Boeing's software development processes. They are really going to need to work very hard to build that trust again. I hope we get some additional information later, but three key questions for me would be:
- Why it wasn't detected?
- Did they follow the testing procedures properly?
-- If so, why didn't the test reveal there were issues with the software?
NASA is going to want to review all testing procedures as well as the test scenarios and have the software be subject to a very strict testing regime to iron out any issue. It will take time.
-
#832
by
freddo411
on 07 Feb, 2020 17:00
-
a valve mapping software issue, which was diagnosed and fixed in flight.
So if I'm parsing this correctly, they messed up which valves the software *thinks* it's controlling with wiring to wrong *actual* valves?
That's how I read it as well. It seems reasonable to assume that the MET issue is the only reason they found this bug too. As I understand the MET issue, it caused rapid firing of the thrusters at the wrong time.
I can see a path to finding this bug where someone was very confused as to why the system tried to fire thruster A to cause some m/s change in one axis, but instead got the opposite result. so now the computer thinks it needs to fire thruster A again, even longer, feedback loop ensues, bad things happen.
If you watch the launch video, you can observe on the screens several different thruster firings per second. Maybe oscillating back and forth. To my eyes, that looks unlikely to ever be a normal scenario.
Your narrative makes a *LOT* more sense as to why the thruster firings were so active. It never made sense that a timer would cause highly active thruster firings.
-
#833
by
abaddon
on 07 Feb, 2020 17:09
-
Come on guys. What in the world does IFA have to do with the current problem? Almost nothing. The software failures inside of Starliner exist independently of any abort test, and would not have been revealed by such a test.
This is obviously incorrect. The Starliner would, at separation, have had the +11 hour mission clock and almost certainly would immediately have behaved incorrectly following separation from the Atlas V.
I'm not saying they do or do not have to redo IFA, but how can you possibly claim IFA wouldn't have revealed the clock issue?
-
#834
by
ugordan
on 07 Feb, 2020 17:16
-
Come on guys. What in the world does IFA have to do with the current problem? Almost nothing. The software failures inside of Starliner exist independently of any abort test, and would not have been revealed by such a test.
This is obviously incorrect. The Starliner would, at separation, have had the +11 hour mission clock and almost certainly would immediately have behaved incorrectly following separation from the Atlas V.
Playing devil's advocate here, but you have no basis for asserting that "obviously incorrect" statement. You have no knowledge on the inner workings of Starliner software or its internal state machine during powered ascent, hence no way of asserting what would have actually happened in an IFA scenario.
Am I saying that this does NOT diminish my personal confidence in Boeing's "simulations" of an IFA scenario? No, I'm not. I'm saying things are usually not as black and white as that when it comes to complex sw development.
-
#835
by
butters
on 07 Feb, 2020 17:29
-
Come on guys. What in the world does IFA have to do with the current problem? Almost nothing. The software failures inside of Starliner exist independently of any abort test, and would not have been revealed by such a test.
This is obviously incorrect. The Starliner would, at separation, have had the +11 hour mission clock and almost certainly would immediately have behaved incorrectly following separation from the Atlas V.
I'm not saying they do or do not have to redo IFA, but how can you possibly claim IFA wouldn't have revealed the clock issue?
The way in which the MET anomaly affected the OFT spacecraft revealed that some Starliner subsystems rely on the MET clock more than others. If the MET clock affected the OMS in the same way that it affected the RCS, then Starliner would have initiated its insertion burn too early along with the RCS entering the precision guidance mode for that burn. But that's not what happened. One subsystem went by the clock and another didn't. So outside of the Starliner avionics and flight software teams, it's anybody's guess how the MET anomaly might have affected the execution of an abort command. The flight mode sequencing logic is not consistent enough for us to draw conclusions.
-
#836
by
niwax
on 07 Feb, 2020 17:37
-
Does Boeing not do a static fire with the finalized capsule? The wrong thrusters firing is Proton levels of oversight, although reminiscent of the parachute issue in the pad abort test.
-
#837
by
ChrisWilson68
on 07 Feb, 2020 17:39
-
Does Boeing not do a static fire with the finalized capsule? The wrong thrusters firing is Proton levels of oversight, although reminiscent of the parachute issue in the pad abort test.
Even if they do a static fire, the static fire is likely using special software to test the thrusters and that software might have the right mapping between thrusters while the flight software does not.
-
#838
by
LastWyzard
on 07 Feb, 2020 17:49
-
Over the years, I've stepped into several situations where basic software engineering processes don't exist. They don't have a build. There is no source control system. There is no consistency in sharing code and design paradigms. There is no consistency in testing. There is no build server. There is no automated test for each build. Code quality among the engineers is inconsistent. Etc.
I am usually not surprised by this and it is something I fix.
What is NASA getting for its money?
It will be at least a year before Boeing can refly Starliner.
Boeing will do some hiring.
I worked in a similar position before I retired. Unfortunately I have to agree that a year delay is certainly possible although I would hope for six months or so. I assume the software was outsourced? I tried for years to outsource and was never successful. It was just impossible to write specs to the detail required for developers who were essentially removed from all other aspects of a project. In house, the developers could always just turn to the engineer sitting nearby for clarification (and subsequent revision of the specs).
Anyway, this issue puts tremendous pressure on NASA. The ISS will have a limited American/European/Japanese presence for a long time unless SpaceX is able to pick up the slack.
-
#839
by
Lemurion
on 07 Feb, 2020 17:50
-
I had previously been of the opinion that while I think relying on simulations for the IFA is a bad idea, Boeing had signed a contract to that effect and there was no real reason that anyone should insist they change it.
Now, I’m not so sure.
The big thing for me is that since both the pad abort and OFT had things slip through that absolutely should have been caught earlier, running an IFA would be a very good test to make sure those procedures didn’t let anything through that they shouldn’t.
Boeing’s physical tests have done a very good job of illuminating issues and they should be encouraged strongly to do many more.