Total Members Voted: 61
Voting closed: 09/07/2024 11:32 am
Quote from: LouScheffer on 06/09/2022 11:45 amQuote from: woods170 on 06/09/2022 11:22 amWaterfall does not guarantee deficient testing. Neither does it guarantee sufficient testing.In fact, Boeing still uses waterfall for Starliner development efforts. But they improved their integrated end-to-end test strategy by no longer cutting that test up in temporal blocks. The test now runs the full 3 days. The reason they initially cut it up in temporal blocks, was to save money. Instead, that decision cost them money: ~ $685M.This has happened before. Mars Polar Lander crashed when the leg extension (performed while high up) bumped the "weight on legs" switch. The software saw this, thought it was on the ground, and cut the engines leading to the crash.The error was that the software should have ignored the transient case as legs opened (This was a known effect at the system level, but did not get translated into the software specifications).But why was it not caught during testing of the flight hardware? There were a number of reasons, but as I understand it, the hardware-in-the-loop simulations were broken into two phases - entry through legs out, and legs out through landing. The landing simulation started with the legs out, and so never saw the transient.So, a lesson not learned by Boeing. Cutting up tests in temporal blocks is not necessarily a bad thing. But it can become a bad thing when the split is made at the exact point of a phase transition.
Quote from: woods170 on 06/09/2022 11:22 amWaterfall does not guarantee deficient testing. Neither does it guarantee sufficient testing.In fact, Boeing still uses waterfall for Starliner development efforts. But they improved their integrated end-to-end test strategy by no longer cutting that test up in temporal blocks. The test now runs the full 3 days. The reason they initially cut it up in temporal blocks, was to save money. Instead, that decision cost them money: ~ $685M.This has happened before. Mars Polar Lander crashed when the leg extension (performed while high up) bumped the "weight on legs" switch. The software saw this, thought it was on the ground, and cut the engines leading to the crash.The error was that the software should have ignored the transient case as legs opened (This was a known effect at the system level, but did not get translated into the software specifications).But why was it not caught during testing of the flight hardware? There were a number of reasons, but as I understand it, the hardware-in-the-loop simulations were broken into two phases - entry through legs out, and legs out through landing. The landing simulation started with the legs out, and so never saw the transient.
Waterfall does not guarantee deficient testing. Neither does it guarantee sufficient testing.In fact, Boeing still uses waterfall for Starliner development efforts. But they improved their integrated end-to-end test strategy by no longer cutting that test up in temporal blocks. The test now runs the full 3 days. The reason they initially cut it up in temporal blocks, was to save money. Instead, that decision cost them money: ~ $685M.
As you know, in 2011, Nasa published the report The Legacy of Space Shuttle Flight Software, NTRS, 20110014946 with experience of 30 years of development, tests, improvements, on the Shuttle's Primary Avionics Software System (PASS) which lead to list of "Lessons learned". The software process was at CMM-5, i.e., the highest and best, most reproducible. We discussed the report and Lessons Learned in more detail in an earlier post. Therefore, it is simply incomprehensible that one of Nasa's partners and contractors (Boeing) was allowed to build a new human spaceflight system (Starliner) without fully embracing those "Lessons learned" even though it included the end-to-end, hardware/software, system testing recommendation at least twice in Section IX "Lessons Learned", (c)(h), p33-35. In addition, the 2011 Nasa report was around the same time as the start of the Starliner development between IIRC between 2010 and 2012, i.e., hot from the press, and ready to used.The question on why Boeing did not embrace "Lessons learned" from the best comparable human spaceflight system available (Shuttle, i.e. PASS) will one day make a very good article on this site after some investigative sleuthing and interviewing of the involved people. We can speculate that the list of 61 issues will relate to this question as well. In the meantime, let's hope Boeing has learned it's lesson and we will get a safe and effective Starliner.
I think an FOIA request on NASA's 80 recommendations to Boeing is in order.
Quote from: SoftwareDude on 06/09/2022 06:03 pmI think an FOIA request on NASA's 80 recommendations to Boeing is in order.it will be denied. Propriety information.
Does anyone know where to direct the FOIA for recommendations to Boeing's request? HQ, which is in/near Washington DC, Cape Kennedy, one of the others? It looks like the FOIA might be accepted because all NASA opinions are fair game.
Quote from: SoftwareDude on 06/10/2022 01:11 amDoes anyone know where to direct the FOIA for recommendations to Boeing's request? HQ, which is in/near Washington DC, Cape Kennedy, one of the others? It looks like the FOIA might be accepted because all NASA opinions are fair game.No, because propriety information is exempt from FOIA. And NASA recommendations would reveal such information.
Quote from: woods170 on 06/09/2022 01:38 pmQuote from: LouScheffer on 06/09/2022 11:45 amQuote from: woods170 on 06/09/2022 11:22 amWaterfall does not guarantee deficient testing. Neither does it guarantee sufficient testing.In fact, Boeing still uses waterfall for Starliner development efforts. But they improved their integrated end-to-end test strategy by no longer cutting that test up in temporal blocks. The test now runs the full 3 days. The reason they initially cut it up in temporal blocks, was to save money. Instead, that decision cost them money: ~ $685M.This has happened before. Mars Polar Lander crashed when the leg extension (performed while high up) bumped the "weight on legs" switch. The software saw this, thought it was on the ground, and cut the engines leading to the crash.The error was that the software should have ignored the transient case as legs opened (This was a known effect at the system level, but did not get translated into the software specifications).But why was it not caught during testing of the flight hardware? There were a number of reasons, but as I understand it, the hardware-in-the-loop simulations were broken into two phases - entry through legs out, and legs out through landing. The landing simulation started with the legs out, and so never saw the transient.So, a lesson not learned by Boeing. Cutting up tests in temporal blocks is not necessarily a bad thing. But it can become a bad thing when the split is made at the exact point of a phase transition.It's important to take the state from the previous block and use it to initialize the next one.
Quote from: Lee Jay on 06/09/2022 01:53 pmQuote from: woods170 on 06/09/2022 01:38 pmQuote from: LouScheffer on 06/09/2022 11:45 amQuote from: woods170 on 06/09/2022 11:22 amWaterfall does not guarantee deficient testing. Neither does it guarantee sufficient testing.In fact, Boeing still uses waterfall for Starliner development efforts. But they improved their integrated end-to-end test strategy by no longer cutting that test up in temporal blocks. The test now runs the full 3 days. The reason they initially cut it up in temporal blocks, was to save money. Instead, that decision cost them money: ~ $685M.This has happened before. Mars Polar Lander crashed when the leg extension (performed while high up) bumped the "weight on legs" switch. The software saw this, thought it was on the ground, and cut the engines leading to the crash.The error was that the software should have ignored the transient case as legs opened (This was a known effect at the system level, but did not get translated into the software specifications).But why was it not caught during testing of the flight hardware? There were a number of reasons, but as I understand it, the hardware-in-the-loop simulations were broken into two phases - entry through legs out, and legs out through landing. The landing simulation started with the legs out, and so never saw the transient.So, a lesson not learned by Boeing. Cutting up tests in temporal blocks is not necessarily a bad thing. But it can become a bad thing when the split is made at the exact point of a phase transition.It's important to take the state from the previous block and use it to initialize the next one.Which often requires adding some additional stuff to the test setup. That adds cost and we all know that Boeing was cutting cost left and right during Starliner development. It caused OFT-1 to go the way it went.Ultimately, in fixing the problem, Boeing chose to implement the better of the two fixes: to NOT cut up the test in temporal blocks, instead running it in one long go.
Quote from: woods170 on 06/10/2022 03:40 pmQuote from: Lee Jay on 06/09/2022 01:53 pmIt's important to take the state from the previous block and use it to initialize the next one.Which often requires adding some additional stuff to the test setup. That adds cost and we all know that Boeing was cutting cost left and right during Starliner development. It caused OFT-1 to go the way it went.Ultimately, in fixing the problem, Boeing chose to implement the better of the two fixes: to NOT cut up the test in temporal blocks, instead running it in one long go.You seem to know a lot about this. What do you think about the approach of using accelerated time to do tasks such as this?
Quote from: Lee Jay on 06/09/2022 01:53 pmIt's important to take the state from the previous block and use it to initialize the next one.Which often requires adding some additional stuff to the test setup. That adds cost and we all know that Boeing was cutting cost left and right during Starliner development. It caused OFT-1 to go the way it went.Ultimately, in fixing the problem, Boeing chose to implement the better of the two fixes: to NOT cut up the test in temporal blocks, instead running it in one long go.
It's important to take the state from the previous block and use it to initialize the next one.
Hubble sat in storage for four years with that flaw. Another case of years of opportunity, and inadequate testing.
I can't imagine not running end-to-end tests. I get it, simulating months of coast time seems like a waste, but somewhere in an organization there will be systems doing longevity tests, so put them to use. Of course you can independently speed up sectional tests by pre-setting values, but you can't tell me that somewhere in the years of delays, there wasn't time or systems available to run full-length mission tests. I had lunch yesterday with a good friend who was brought in to fix the Hubble mirror flaw. Hubble sat in storage for four years with that flaw. Another case of years of opportunity, and inadequate testing.
So long as we're beating a dead horse, I ran Test Automation for a Silicon Valley company. Over four decades on the Dev side I had seen first-hand how badly testing was done at organizations like Microsoft and I wanted to approach it and integrate it completely into the development cycle. The framework I built started at midnight every night by rebuilding the entire test automation code base and then began running tests across multiple platforms, cloud services, mobile devices and custom hardware we developed. By 9am, my dashboard would report on over 17,000 tests run on development, production and staging services.I can't imagine not running end-to-end tests. I get it, simulating months of coast time seems like a waste, but somewhere in an organization there will be systems doing longevity tests, so put them to use. Of course you can independently speed up sectional tests by pre-setting values, but you can't tell me that somewhere in the years of delays, there wasn't time or systems available to run full-length mission tests. I had lunch yesterday with a good friend who was brought in to fix the Hubble mirror flaw. Hubble sat in storage for four years with that flaw. Another case of years of opportunity, and inadequate testing.Since this thread is going over lots of previous ground, I don't recall ever hearing what the resolution was for the communications issue that was blamed on cell towers.
I used to work with a guy who had a favorite saying: "Why is there never enough time to do it right, but enough time to do it over?".
Quote from: AJW on 06/12/2022 10:43 pm Hubble sat in storage for four years with that flaw. Another case of years of opportunity, and inadequate testing.Not possible to test in the spacecraft configuration. Testing had to be done on the mirror. Don't even know if testing was feasible in the telescope configuration.
Wollensak said that the Perkin-Elmer design did not allow for the primary mirror to be tested after it was placed into the assembled telescope. He said that the instrument was designed to work in zero gravity and that the gravity of Earth caused the glass to sag slightly, which would have changed the focus.Kodak and Itek, however, said Wollensak, had developed a way to prevent the sag and thus test the mirrors as an assembled unit.
Quote from: AJW on 06/12/2022 10:43 pmI can't imagine not running end-to-end tests. I get it, simulating months of coast time seems like a waste, but somewhere in an organization there will be systems doing longevity tests, so put them to use. Of course you can independently speed up sectional tests by pre-setting values, but you can't tell me that somewhere in the years of delays, there wasn't time or systems available to run full-length mission tests. I had lunch yesterday with a good friend who was brought in to fix the Hubble mirror flaw. Hubble sat in storage for four years with that flaw. Another case of years of opportunity, and inadequate testing.The ironic thing is the Hubble backup mirror, which is now in the Smithsonian collection, did not have the manufacturing flaw.https://airandspace.si.edu/collection-objects/hubble-space-telescope-backup-mirror/nasm_A20010288000https://www.nytimes.com/1990/07/18/us/hubble-has-backup-mirror-unused.html