-
#900
by
Wargrim
on 07 Feb, 2020 20:36
-
Did i hear it correct that:
- The timer issue was that the time should have been grabbed twice, and the second time, which should have given the correct synched time, did not happen?
- The timer bug was a consequence of requirements not being written correct - the second time reading of the time was missing?
- The timer bug went through "3 or 4" checks that should have caught it but did not get cought?
-
#901
by
Lemurion
on 07 Feb, 2020 20:37
-
Doug's position that another OFT may not be needed because we don't fly to test these types of issues, and that they should be found during testing is insanely scary given that the issues demonstrated during the OFT were not detected during testing and could have resulted in a loss to the vehicle.
Yep, to me the whole takeaway here is that Boeing's
testing is suspect, more than the vehicle itself. I expect problems to crop up in projects of this magnitude--I also expect testing to catch them before it hits the pad.
-
#902
by
Rocket Science
on 07 Feb, 2020 20:37
-
NASA+Boeing=Amateur Hour... Not your grand-dad's space agency...
-
#903
by
abaddon
on 07 Feb, 2020 20:38
-
Doug's position that another OFT may not be needed because we don't fly to test these types of issues, and that they should be found during testing is insanely scary given that the issues demonstrated during the OFT were not detected during testing and could have resulted in a loss to the vehicle.
Despite what was said, the box of things that went wrong is now quite varied and diverse, we're way beyond a single timer issue. I would be very surprised at this point if an OFT2 isn't ordered when all is said and done.
I can even agree, as much as reporters kept asking the question, that now isn't really the right time for Boeing to commit to a reflight. It would have been nice if Boeing emphasized that they are prepared to do a reflight if warranted after completion of the investigation and remediation processes are complete.
-
#904
by
ChrisWilson68
on 07 Feb, 2020 20:39
-
Given the nature of the second major software issue it clearly points to a systemic issue.
Good to hear that NASA is now looking into how Boeing does its software development and verification process.
Absolutely.
But why didn't NASA notice that their testing procedures wouldn't have caught this? With all the in-depth oversight NASA has had with both commercial crew providers, why did nobody at NASA ask if they were doing an all-up full-mission simulation? That should have caught all of these bugs.
-
#905
by
woods170
on 07 Feb, 2020 20:40
-
More details prior to the teleconference today:
https://blogs.nasa.gov/commercialcrew/2020/02/07/nasa-shares-initial-findings-from-boeing-starliner-orbital-flight-test-investigation/
Worth quoting in full:
NASA Shares Initial Findings from Boeing Starliner Orbital Flight Test Investigation
Following the anomaly that occurred during the December Boeing Starliner Orbital Fight Test (OFT), NASA and Boeing formed a joint investigation team tasked with examining the primary issues, which occurred during that test. Those issues included three specific concerns revealed during flight:
1. An error with the Mission Elapsed Timer (MET), which incorrectly polled time from the Atlas V booster nearly 11 hours prior to launch.
2. A software issue within the Service Module (SM) Disposal Sequence, which incorrectly translated the SM disposal sequence into the SM Integrated Propulsion Controller (IPC).
3. An Intermittent Space-to-Ground (S/G) forward link issue, which impeded the Flight Control team’s ability to command and control the vehicle.
The joint investigation team convened in early January and has now identified the direct causes and preliminary corrective actions for the first two anomalies. The intermittent communications issues still are under investigation. NASA reviewed these results on Friday, Jan. 31 along with multiple suggested corrective actions recommended by the team. While NASA was satisfied that the team had properly identified the technical root cause of the two anomalies, they requested the team to perform a more in-depth analysis as to why the anomalies occurred, including an analysis of whether the issues were indicative of weak internal software processes or failure in applying those processes. The team is in the process of performing this additional analysis, as well as continuing the investigation of the intermittent communications issues. NASA briefed the Aerospace Safety Advisory Panel on the status of the investigation this week.
Regarding the first two anomalies, the team found the two critical software defects were not detected ahead of flight despite multiple safeguards. Ground intervention prevented loss of vehicle in both cases. Breakdowns in the design and code phase inserted the original defects. Additionally, breakdowns in the test and verification phase failed to identify the defects preflight despite their detectability. While both errors could have led to risk of spacecraft loss, the actions of the NASA-Boeing team were able to correct the issues and return the Starliner spacecraft safely to Earth.
There was no simple cause of the two software defects making it into flight. Software defects, particularly in complex spacecraft code, are not unexpected. However, there were numerous instances where the Boeing software quality processes either should have or could have uncovered the defects. Due to these breakdowns found in design, code and test of the software, they will require systemic corrective actions. The team has already identified a robust set of 11 top-priority corrective actions. More will be identified after the team completes its additional work.
The joint team made excellent progress for this stage of the investigation. However, it’s still too early for us to definitively share the root causes and full set of corrective actions needed for the Starliner system. We do expect to have those results at the end of February, as was our initial plan. We want to make sure we have a comprehensive understanding of what happened so that we can fully explain the root causes and better assess future work that will be needed. Most critically, we want to assure that these necessary steps are completely understood prior to determining the plan for future flights. Separate from the anomaly investigation, NASA also is still reviewing the data collected during the flight test to help determine that future plan. NASA expects a decision on this review to be complete in the next several weeks.
NASA and Boeing are committed to openly sharing the information related to the mission with the public. Thus, NASA will be holding a media teleconference at 3:30 p.m. EST Friday, Feb. 7.
In addition to these reviews, NASA is planning to perform an Organizational Safety Assessment of Boeing’s work related to the Commercial Crew Program. The comprehensive safety review will include individual employee interviews with a sampling from a cross section of personnel, including senior managers, mid-level management and supervision, and engineers and technicians at multiple sites. The review would be added to the company’s Commercial Crew Transportation Capability contract. NASA previously completed a more limited review of the company. The goal of the Organizational Safety Assessment will be to examine the workplace culture with the commercial crew provider ahead of a mission with astronauts.
Boeing’s Orbital Flight test launched on Friday, Dec. 20, on United Launch Alliance Atlas V rocket from Space Launch Complex 41 at Cape Canaveral Air Force Station in Florida. The mission successfully landed two days later on Sunday, Dec. 22, completing an abbreviated test that performed several mission objectives before returning to Earth as the first orbital land touchdown of a human-rated capsule in U.S. history.
Author Marie Lewis
Posted on February 7, 2020
Categories Boeing, CCtCap, Commercial Spaceflight, International Space Station, NASA
Emphasis mine.
Wow. Just wow. NASA is clearly indicating here that there are systemic failures in the way Boeing does its software development AND software testing.
This is even worse than I feared (and voiced as such) earlier today.
-
#906
by
Wargrim
on 07 Feb, 2020 20:41
-
Also: Did i hear it correct that Starliner was effectively jammed by CELLPHONE TOWERS?
-
#907
by
Rocket Science
on 07 Feb, 2020 20:43
-
Also: Did i hear it correct that Starliner was effectively jammed by CELLPHONE TOWERS?
I heard it was causing noise from the ground, other's please confirm...
-
#908
by
woods170
on 07 Feb, 2020 20:43
-
Earlier NASA said that clearly their oversight of Boeing had been insufficient and that they intended to address that going forward.
Ouch. IMO Kathy Lueders will have some explaining to do. Not good for her and the CCP office.
-
#909
by
RDoc
on 07 Feb, 2020 20:45
-
Is there a standard differentiation between "lines of code" and "scripts"?
If Starliner "operated ~66% of the scripts correctly" then it operated about a third of them incorrectly!
How is a third of their "scripts" being operated incorrectly not a horrible thing?
Is it OK if 90% of their scripts are operated correctly?
I doubt it.
Scripts are presumably higher-level exercises of sequences of functions e.g. a script that would point Starliner from heading A to heading B by executing a series of thruster firings. (I made that up).
They're just saying that the OFT exercised 66% of their scripts and 33% weren't exercised, possibly due to abbreviated mission duration or because they're contingency based (wouldn't execute on a nominal mission).
My guess is that they've got a bunch of primitive software functions and a way to invoke a subset of them in a particular order
to actually do something. Each ordered sequence of function invocations being referred to as a "script". This is a standard way to write software for complex systems.
However, something to keep in mind is that saying 66% of the scripts ran correctly is not at all the same thing as saying that 66% of the code ran correctly. In systems like that there are often a few functions that are only used by a small number of scripts and other functions that are used by many or all of the scripts. The complexity of the individual functions usually varies a lot and the order the scripts call the functions in is also a great source of errors. ABC runs fine, ACB crashes.
-
#910
by
woods170
on 07 Feb, 2020 20:47
-
I think I understand. But isn't the problem that Boeing failed at the 'known, known'?
Sort of.
Boeing failed to detect bugs that were perfectly detectable with a properly set up test.
-
#911
by
FutureSpaceTourist
on 07 Feb, 2020 20:48
-
Wow, that was quite a call.
Clearly Boeing have huge issues in software. Both critical software issues found during OFT had multiple process failures that resulted in the faults getting to flight.
Reverifying order of 1 million lines of code is no small task, especially as process issues need to be addressed too. Don’t see there being another flight (either OFT2 or CFT) for many more months yet. May get a flight in by the end of this year, but I wouldn’t say even that is a given at this point.
-
#912
by
Stan-1967
on 07 Feb, 2020 20:48
-
Regarding the 34% of software scripts not run during the test, keep in mind they failed to approach & dock to ISS. Arguably the most complicated single task of the mission, & easily capable of representing the unexecuted script.
-
#913
by
FutureSpaceTourist
on 07 Feb, 2020 20:50
-
Earlier NASA said that clearly their oversight of Boeing had been insufficient and that they intended to address that going forward.
Ouch. IMO Kathy Lueders will have some explaining to do. Not good for her and the CCP office.
Possibly but it may be wider than that - is the issue that NASA (wrongly) assumed that didn’t need that level of oversight and so didn’t provide the resources needed? Kathy did say that the things they did look at were fine, while acknowledging that they could only look at so much.
-
#914
by
ChrisWilson68
on 07 Feb, 2020 20:56
-
Earlier NASA said that clearly their oversight of Boeing had been insufficient and that they intended to address that going forward.
Ouch. IMO Kathy Lueders will have some explaining to do. Not good for her and the CCP office.
Possibly but it may be wider than that - is the issue that NASA (wrongly) assumed that didn’t need that level of oversight and so didn’t provide the resources needed? Kathy did say that the things they did look at were fine, while acknowledging that they could only look at so much.
It shouldn't have taken much in the way of NASA resources to catch the fact that Boeing's test plan would miss such blatant and easily-found bugs. A single days of meetings by one good software person should have flagged the problem.
Also, I'd fault every level of Boeing's management for this.
-
#915
by
wxmeddler
on 07 Feb, 2020 21:02
-
One of my "favorite" lines during the call (forgive me I do not remember who said it, in reference to the separation "bug") was:
"We know the software patch we uploaded worked because the craft landed safely"
You uploaded a software patch that without it would have potentially caused catastrophic LOV and your primary indicator of whether it worked or not was to fly the re-entry sequence and cross your fingers?
That's a whole lot of YIKES.
-
#916
by
LiamS
on 07 Feb, 2020 21:04
-
Is there a recording of the call somewhere? I managed to get my time zone conversion mixed up and missed it
EDIT:
I found a good recording if anyone else is looking for one, it came from '@tp_1024' on twitter
-
#917
by
freddo411
on 07 Feb, 2020 21:09
-
Doug's position that another OFT may not be needed because we don't fly to test these types of issues, and that they should be found during testing is insanely scary given that the issues demonstrated during the OFT were not detected during testing and could have resulted in a loss to the vehicle.
Yep, to me the whole takeaway here is that Boeing's testing is suspect, more than the vehicle itself. I expect problems to crop up in projects of this magnitude--I also expect testing to catch them before it hits the pad.
How do you have confidence in "the vehicle itself" if the testing is suspect?
I think it is valid and necessary to think of software as an integral part of "the vehicle itself" as we discuss and analyze the functionality and safety of the vehicle. With that in mind, NO the takeaway is that the vehicle has failed several key areas, and is suspect in others.
While we did not discuss this in the presser, there may be problems with the hardware and/or the design that have not been disclosed yet. Comms problem root cause? Thrusters failing after minimal usage? Lack of enough reserve propellent?
-
#918
by
Rocket Science
on 07 Feb, 2020 21:11
-
Earlier NASA said that clearly their oversight of Boeing had been insufficient and that they intended to address that going forward.
Ouch. IMO Kathy Lueders will have some explaining to do. Not good for her and the CCP office.
Possibly but it may be wider than that - is the issue that NASA (wrongly) assumed that didn’t need that level of oversight and so didn’t provide the resources needed? Kathy did say that the things they did look at were fine, while acknowledging that they could only look at so much.
That's the point I was making in this thread that Boeing gets away with a lot of "self certification" from the Gov...
-
#919
by
freddo411
on 07 Feb, 2020 21:14
-
Also: Did i hear it correct that Starliner was effectively jammed by CELLPHONE TOWERS?
I heard it was causing noise from the ground, other's please confirm...
That is essentially what I heard.
Does anyone on NSF has any insight into other (any other) instance of TDRS interference or known failure modes?
The question about using/not using S band was not answered in the presser. There's a known issue with TRDS and s-band over the Indian ocean (at least that's what I can tell by googling).