Starliner faced “catastrophic” failure before software bug found
"If it had gone uncorrected it would have led to erroneous thruster firing."
ERIC BERGER - 2/6/2020, 9:39 PM
On Thursday, during its quarterly meeting, NASA's Aersopace Safety Advisory Panel dropped some significant news about a critical commercial crew test flight. The panel revealed that Boeing's Starliner may have been lost during a December mission had a software error not been found and fixed while the vehicle was in orbit.
The safety panel also recommended that NASA conduct "an even broader" assessment of Boeing's Systems Engineering and Integration processes. Only after these assessments, Hill said, should NASA determine whether the Starliner spacecraft will conduct a second, uncrewed flight test into orbit before astronauts fly on board.
https://starlinerupdates.com/statement-on-independent-review-team-recommendations-for-the-starliner-orbital-flight-test-anomalies/
Statement on Independent Review Team Recommendations for the Starliner Orbital Flight Test Anomalies
February 6, 2020
We accept and appreciate the recommendations of the jointly led NASA-Boeing Independent Review Team (IRT) as well as suggestions from the Aerospace Safety Advisory Panel following Starliner’s Orbital Flight Test (OFT). Their insights are invaluable to the Commercial Crew Program and we will work with NASA to comprehensively apply their recommendations.
Regarding the Mission Elapsed Timer anomaly, the IRT believes they found root cause and provided a number of recommendations and corrective actions.
The IRT also investigated a valve mapping software issue, which was diagnosed and fixed in flight. That error in the software would have resulted in an incorrect thruster separation and disposal burn. What would have resulted from that is unclear.
The IRT is also making significant progress on understanding the command dropouts encountered during the mission and is further investigating methods to make the Starliner communications system more robust on future missions.
We are already working on many of the recommended fixes including re-verifying flight software code.
Our next task is to build a plan that incorporates IRT recommendations, NASA’s Organizational Safety Assessment (OSA) and any other oversight NASA chooses after considering IRT findings. Once NASA approves that plan, we will be able to better estimate timelines for the completion of all tasks. It remains too soon to speculate about next flight dates.
Interesting update: NASA ordered the full-fledged review of Boeing before the Starliner problems, according to a source.
More details prior to the teleconference today:
https://blogs.nasa.gov/commercialcrew/2020/02/07/nasa-shares-initial-findings-from-boeing-starliner-orbital-flight-test-investigation/
NASA Shares Initial Findings from Boeing Starliner Orbital Flight Test Investigation
Following the anomaly that occurred during the December Boeing Starliner Orbital Fight Test (OFT), NASA and Boeing formed a joint investigation team tasked with examining the primary issues, which occurred during that test. Those issues included three specific concerns revealed during flight:
1. An error with the Mission Elapsed Timer (MET), which incorrectly polled time from the Atlas V booster nearly 11 hours prior to launch.
2. A software issue within the Service Module (SM) Disposal Sequence, which incorrectly translated the SM disposal sequence into the SM Integrated Propulsion Controller (IPC).
3. An Intermittent Space-to-Ground (S/G) forward link issue, which impeded the Flight Control team’s ability to command and control the vehicle.
The joint investigation team convened in early January and has now identified the direct causes and preliminary corrective actions for the first two anomalies. The intermittent communications issues still are under investigation. NASA reviewed these results on Friday, Jan. 31 along with multiple suggested corrective actions recommended by the team. While NASA was satisfied that the team had properly identified the technical root cause of the two anomalies, they requested the team to perform a more in-depth analysis as to why the anomalies occurred, including an analysis of whether the issues were indicative of weak internal software processes or failure in applying those processes. The team is in the process of performing this additional analysis, as well as continuing the investigation of the intermittent communications issues. NASA briefed the Aerospace Safety Advisory Panel on the status of the investigation this week.
Regarding the first two anomalies, the team found the two critical software defects were not detected ahead of flight despite multiple safeguards. Ground intervention prevented loss of vehicle in both cases. Breakdowns in the design and code phase inserted the original defects. Additionally, breakdowns in the test and verification phase failed to identify the defects preflight despite their detectability. While both errors could have led to risk of spacecraft loss, the actions of the NASA-Boeing team were able to correct the issues and return the Starliner spacecraft safely to Earth.
There was no simple cause of the two software defects making it into flight. Software defects, particularly in complex spacecraft code, are not unexpected. However, there were numerous instances where the Boeing software quality processes either should have or could have uncovered the defects. Due to these breakdowns found in design, code and test of the software, they will require systemic corrective actions. The team has already identified a robust set of 11 top-priority corrective actions. More will be identified after the team completes its additional work.
The joint team made excellent progress for this stage of the investigation. However, it’s still too early for us to definitively share the root causes and full set of corrective actions needed for the Starliner system. We do expect to have those results at the end of February, as was our initial plan. We want to make sure we have a comprehensive understanding of what happened so that we can fully explain the root causes and better assess future work that will be needed. Most critically, we want to assure that these necessary steps are completely understood prior to determining the plan for future flights. Separate from the anomaly investigation, NASA also is still reviewing the data collected during the flight test to help determine that future plan. NASA expects a decision on this review to be complete in the next several weeks.
NASA and Boeing are committed to openly sharing the information related to the mission with the public. Thus, NASA will be holding a media teleconference at 3:30 p.m. EST Friday, Feb. 7.
In addition to these reviews, NASA is planning to perform an Organizational Safety Assessment of Boeing’s work related to the Commercial Crew Program. The comprehensive safety review will include individual employee interviews with a sampling from a cross section of personnel, including senior managers, mid-level management and supervision, and engineers and technicians at multiple sites. The review would be added to the company’s Commercial Crew Transportation Capability contract. NASA previously completed a more limited review of the company. The goal of the Organizational Safety Assessment will be to examine the workplace culture with the commercial crew provider ahead of a mission with astronauts.
Boeing’s Orbital Flight test launched on Friday, Dec. 20, on United Launch Alliance Atlas V rocket from Space Launch Complex 41 at Cape Canaveral Air Force Station in Florida. The mission successfully landed two days later on Sunday, Dec. 22, completing an abbreviated test that performed several mission objectives before returning to Earth as the first orbital land touchdown of a human-rated capsule in U.S. history.
Author Marie Lewis
Posted on February 7, 2020
Categories Boeing, CCtCap, Commercial Spaceflight, International Space Station, NASA
Bridenstine: Notes that they usually don't do a press conference before an investigation is complete. But, states that in the interest of transparency and due to some of things he saw online yesterday, he wanted to make sure that NASA provided an update.
NASA's @DouglasLoverro says the two major software errors are indicators of a deeper underlying problem. Now working to understand the full scope of the problem. Major process problems. (This is not good).
Loverro: this is why we test. We briefed ASAP and they told you.
Point is it's not just the specific issues, it's the numerous process escapes we discovered
Jim (Boeing): Admits they WOULD NOT have found the second software issue that would have destroyed Starliner in reentry if the first Mission Elapsed Timer issue didn't occur". @NASA @BoeingSpace #Starliner #OFT
Chilton: I think the independent review team has done an excellent job.
We would not have found the second software issue (service module) had they not gone looking for additional errors after the MET anomaly.
Chilton: The service module could have collided with the crew module after separation had the second issue not have been fixed.
John (Boeing): Starliner should only have pulled the mission time in terminal county (T-4mins through T0). It pulled the time, because of the software code error, before it should have... 11hrs early. @NASA @BoeingSpace #Starliner #OFT
Mulholland: timer issue. We pull MET from launch vehicle only during terminal count. If pull before that it's not mapped to right launch time. Req was to have both conditions met, but software missed that second req -- to pull after terminal count.
Boeing's Mulholland: The "valve mapping error" (the second software issue) came after Starliner's service module separated. The software had the same valve mapping for both the separation and disposal burns.
Boeing's Mulholland: Third Starliner issue was communication from ground with the spacecraft. It took us several minutes to actually establish a link with the spacecraft. That would have exposed itself whether or not the timing anomaly (first issue) happened.
John (Boeing): The antenna issue "would have presented regardless of other issues." No real info to present at this point. @NASA @BoeingSpace #Starliner #OFT
.@Free_Space: How much of Starliner's software needs fixing or updating?
Boeing: "We believe we need to go back and re-verify ALL of the flight software code."
Boeing adds: Starliner has approximately 1 million lines of code. We exercised ~66% of the scripts correctly during the mission but we're going to go back and re-verify.
Doug (NASA) saying that the Organizational Assessment shouldn't be connected to the issue on Starliner. Begs the question of why they're doing one now if they don't think a previous one would have revealed the things that happened on OFT. @NASA @BoeingSpace #Starliner #OFT
The frequency interference with Starliner to the ground around "similar to cell phone towers." @NASA @BoeingSpace #Starliner #OFT
QuoteThe frequency interference with Starliner to the ground around "similar to cell phone towers."