Quote from: Boeinga valve mapping software issue, which was diagnosed and fixed in flight.
So if I'm parsing this correctly, they messed up which valves the software *thinks* it's controlling with wiring to wrong *actual* valves?
That's how I read it as well. It seems reasonable to assume that the MET issue is the only reason they found this bug too. As I understand the MET issue, it caused rapid firing of the thrusters at the wrong time.
I can see a path to finding this bug where someone was very confused as to why the system tried to fire thruster A to cause some m/s change in one axis, but instead got the opposite result. so now the computer thinks it needs to fire thruster A again, even longer, feedback loop ensues, bad things happen.
But reportedly this second issue was identified in *ground* testing while actual hardware was in orbit. What kind of ground testing did they do beforehand, then, and why was it not identified *before* flight?
So many questions...
Valve mapping relates to what valve must be operated for how long to provide a certain amount of thrust along a certain axis. Get this wrong and you fire the wrong thrusters, or you may fire the right thrusters too long or too short. All of which are not good obviously.
Over the years, I've stepped into several situations where basic software engineering processes don't exist. They don't have a build. There is no source control system. There is no consistency in sharing code and design paradigms. There is no consistency in testing. There is no build server. There is no automated test for each build. Code quality among the engineers is inconsistent. Etc.
I am usually not surprised by this and it is something I fix.
What is NASA getting for its money?
It will be at least a year before Boeing can refly Starliner.
Boeing will do some hiring.
Never bought the whole "timer" story, what was the IMU reading...
Does Boeing not do a static fire with the finalized capsule? The wrong thrusters firing is Proton levels of oversight, although reminiscent of the parachute issue in the pad abort test.
Even if they do a static fire, the static fire is likely using special software to test the thrusters and that software might have the right mapping between thrusters while the flight software does not.
I had previously been of the opinion that while I think relying on simulations for the IFA is a bad idea, Boeing had signed a contract to that effect and there was no real reason that anyone should insist they change it.
Now, I’m not so sure.
The big thing for me is that since both the pad abort and OFT had things slip through that absolutely should have been caught earlier, running an IFA would be a very good test to make sure those procedures didn’t let anything through that they shouldn’t.
Boeing’s physical tests have done a very good job of illuminating issues and they should be encouraged strongly to do many more.
Come on guys. What in the world does IFA have to do with the current problem? Almost nothing. The software failures inside of Starliner exist independently of any abort test, and would not have been revealed by such a test.This is obviously incorrect. The Starliner would, at separation, have had the +11 hour mission clock and almost certainly would immediately have behaved incorrectly following separation from the Atlas V.
Playing devil's advocate here, but you have no basis for asserting that "obviously incorrect" statement. You have no knowledge on the inner workings of Starliner software or its internal state machine during powered ascent, hence no way of asserting what would have actually happened in an IFA scenario.
Am I saying that this does NOT diminish my personal confidence in Boeing's "simulations" of an IFA scenario? No, I'm not. I'm saying things are usually not as black and white as that when it comes to complex sw development.
Does Boeing not do a static fire with the finalized capsule? The wrong thrusters firing is Proton levels of oversight, although reminiscent of the parachute issue in the pad abort test.
Even if they do a static fire, the static fire is likely using special software to test the thrusters and that software might have the right mapping between thrusters while the flight software does not.
I thought to test space systems that data or readings were injected into the article being tested and didn’t rely on external systems. This used to be the norm (I read von Braun’s book). Perhaps ‘old space’ is forgetting what they learned or yearly bonuses are now the overriding factor.
Come on guys. What in the world does IFA have to do with the current problem? Almost nothing. The software failures inside of Starliner exist independently of any abort test, and would not have been revealed by such a test.This is obviously incorrect. The Starliner would, at separation, have had the +11 hour mission clock and almost certainly would immediately have behaved incorrectly following separation from the Atlas V.
Playing devil's advocate here, but you have no basis for asserting that "obviously incorrect" statement. You have no knowledge on the inner workings of Starliner software or its internal state machine during powered ascent, hence no way of asserting what would have actually happened in an IFA scenario.
Am I saying that this does NOT diminish my personal confidence in Boeing's "simulations" of an IFA scenario? No, I'm not. I'm saying things are usually not as black and white as that when it comes to complex sw development.Having been involved in several large software projects, my experience has been that if something isn't right, you have no idea where or how deep in the system the error originated until you really understand what happened.
Until the system has been thoroughly inspected and understood there is no way to know how errors propagate, so such and assertion of independence is not correct. If you don't know what happened, assuming everything else is ok is not a good approach.
Does this inform us concerning the SLS software? Boeing is Prime for both.
Over the years, I've stepped into several situations where basic software engineering processes don't exist. They don't have a build. There is no source control system. There is no consistency in sharing code and design paradigms. There is no consistency in testing. There is no build server. There is no automated test for each build. Code quality among the engineers is inconsistent. Etc.
I am usually not surprised by this and it is something I fix.
What is NASA getting for its money?
It will be at least a year before Boeing can refly Starliner.
Boeing will do some hiring.
Come on guys. What in the world does IFA have to do with the current problem? Almost nothing. The software failures inside of Starliner exist independently of any abort test, and would not have been revealed by such a test.This is obviously incorrect. The Starliner would, at separation, have had the +11 hour mission clock and almost certainly would immediately have behaved incorrectly following separation from the Atlas V.
Playing devil's advocate here, but you have no basis for asserting that "obviously incorrect" statement.
For those of us not trained in flight software systems, we hope the Starliner crew can manually override any and all software-based failures throughout the mission. It would be interesting to get the astronauts personal opinions on these latest problems.
Does Boeing not do a static fire with the finalized capsule? The wrong thrusters firing is Proton levels of oversight, although reminiscent of the parachute issue in the pad abort test.
Even if they do a static fire, the static fire is likely using special software to test the thrusters and that software might have the right mapping between thrusters while the flight software does not.
Come on guys. What in the world does IFA have to do with the current problem? Almost nothing. The software failures inside of Starliner exist independently of any abort test, and would not have been revealed by such a test.This is obviously incorrect. The Starliner would, at separation, have had the +11 hour mission clock and almost certainly would immediately have behaved incorrectly following separation from the Atlas V.
Playing devil's advocate here, but you have no basis for asserting that "obviously incorrect" statement.So, to be clear, you're good with @rpap asserting "the software failures [...] would not have been revealed by such a test" but don't like my "almost certainly would have been revealed"? This seems a little inconsistent.
Can we concede that if the software pulls an MET value that's in the wrong day and acts on it that there is a major issue with the software verification process?
But we have known that since the flight.
What's new is this "thruster mapping issue"
Perhaps the 'thruster mapping" error is a single error that explains the collection of issues including thruster flailing, excessive fuel usage, thruster overuse, and why Boeing was testing and patching software after launch, including the SM separation thruster issue that was reported to have potential for LOM/LOC.
That passes Occam's Razor. One explanation for many issues.
NASA Shares Initial Findings from Boeing Starliner Orbital Flight Test Investigation
Following the anomaly that occurred during the December Boeing Starliner Orbital Fight Test (OFT), NASA and Boeing formed a joint investigation team tasked with examining the primary issues, which occurred during that test. Those issues included three specific concerns revealed during flight:
1. An error with the Mission Elapsed Timer (MET), which incorrectly polled time from the Atlas V booster nearly 11 hours prior to launch.
2. A software issue within the Service Module (SM) Disposal Sequence, which incorrectly translated the SM disposal sequence into the SM Integrated Propulsion Controller (IPC).
3. An Intermittent Space-to-Ground (S/G) forward link issue, which impeded the Flight Control team’s ability to command and control the vehicle.
The joint investigation team convened in early January and has now identified the direct causes and preliminary corrective actions for the first two anomalies. The intermittent communications issues still are under investigation. NASA reviewed these results on Friday, Jan. 31 along with multiple suggested corrective actions recommended by the team. While NASA was satisfied that the team had properly identified the technical root cause of the two anomalies, they requested the team to perform a more in-depth analysis as to why the anomalies occurred, including an analysis of whether the issues were indicative of weak internal software processes or failure in applying those processes. The team is in the process of performing this additional analysis, as well as continuing the investigation of the intermittent communications issues. NASA briefed the Aerospace Safety Advisory Panel on the status of the investigation this week.
Regarding the first two anomalies, the team found the two critical software defects were not detected ahead of flight despite multiple safeguards. Ground intervention prevented loss of vehicle in both cases. Breakdowns in the design and code phase inserted the original defects. Additionally, breakdowns in the test and verification phase failed to identify the defects preflight despite their detectability. While both errors could have led to risk of spacecraft loss, the actions of the NASA-Boeing team were able to correct the issues and return the Starliner spacecraft safely to Earth.
There was no simple cause of the two software defects making it into flight. Software defects, particularly in complex spacecraft code, are not unexpected. However, there were numerous instances where the Boeing software quality processes either should have or could have uncovered the defects. Due to these breakdowns found in design, code and test of the software, they will require systemic corrective actions. The team has already identified a robust set of 11 top-priority corrective actions. More will be identified after the team completes its additional work.
The joint team made excellent progress for this stage of the investigation. However, it’s still too early for us to definitively share the root causes and full set of corrective actions needed for the Starliner system. We do expect to have those results at the end of February, as was our initial plan. We want to make sure we have a comprehensive understanding of what happened so that we can fully explain the root causes and better assess future work that will be needed. Most critically, we want to assure that these necessary steps are completely understood prior to determining the plan for future flights. Separate from the anomaly investigation, NASA also is still reviewing the data collected during the flight test to help determine that future plan. NASA expects a decision on this review to be complete in the next several weeks.
NASA and Boeing are committed to openly sharing the information related to the mission with the public. Thus, NASA will be holding a media teleconference at 3:30 p.m. EST Friday, Feb. 7.
In addition to these reviews, NASA is planning to perform an Organizational Safety Assessment of Boeing’'s work related to the Commercial Crew Program. The comprehensive safety review will include individual employee interviews with a sampling from a cross section of personnel, including senior managers, mid-level management and supervision, and engineers and technicians at multiple sites. The review would be added to the company'’s Commercial Crew Transportation Capability contract. NASA previously completed a more limited review of the company. The goal of the Organizational Safety Assessment will be to examine the workplace culture with the commercial crew provider ahead of a mission with astronauts.
Boeing’'s Orbital Flight test launched on Friday, Dec. 20, on United Launch Alliance Atlas V rocket from Space Launch Complex 41 at Cape Canaveral Air Force Station in Florida. The mission successfully landed two days later on Sunday, Dec. 22, completing an abbreviated test that performed several mission objectives before returning to Earth as the first orbital land touchdown of a human-rated capsule in U.S. history.
Author Marie Lewis
Posted on February 7, 2020
Categories Boeing, CCtCap, Commercial Spaceflight, International Space Station, NASA
The team has already identified a robust set of 11 top-priority corrective actions. More will be identified after the team completes its additional work.