-
#380
by
Barley
on 08 Jun, 2022 01:44
-
Coding is part of software design. It's something you do once, not for each deployment of the software.
The software equivalent of production is loading the image to PROM or FLASH or disk. This does get messed up occasionally, but by and large it's a solved problem.
The sooner all the people who think coding is a manufacturing process retire the better.
Yes and No. Yes, once a product's code has been written, compiled, loaded, and tested, it then becomes part of the production configuration that's cranked out with each delivered unit.
No, in that code is NOT design. A proper software design is code-agnostic. IOW, I can implement a given design in one of several different languages that support the product hardware. Also, software requirements, high level design, and detail design documents all comprise what I'd call "software design."
Most real-time code is not OS / processor / language agnostic. It would be like designing something that can be made of steel, aluminum or wood. If it's language agnostic, it's only in the sense that the language proper doesn't define anything you actually care about. If a language does define something you care about you can't easily change to a different one that defines things differently, unless your specs are so loose that it doesn't count as real time.
-
#381
by
woods170
on 08 Jun, 2022 10:06
-
Starliner issues to date have been design flaws.
Not true. More misinformation
So incorrect initialization of the MET timer and erroneous thruster mapping tables were part of the original functional spec? Fascinating.
incorrect initialization of the MET timer is a input error. Also, software errors are not design flaws
Emphasis mine.
No, it is was NOT an input error. ULA's Atlas V gave the correct inputs.
The INcorrect initialization of the MET timer was in fact a design error, just as GreyBeardEngineer had correctly indicated.
[snip]
That's a very long winded way to say the software implementation did not follow the software design. Software design occurs before coding (if you're getting into even psuedocode you're going too deep), and should be independant of the actual code written - so you could implement the same software design in a variety of different languages with the same desired outcome. The design was correct, but it was not implemented (coded) correctly.
The hardware analog make the distinction clear: if you specify a part X mm long to fit in an X mm deep hole, and the part is produced X m long, the design was correct but the implementation was not. If the design specifies an X m long part to fit in an X mm deep hole, that's a design flaw.
If you consider your implemented code as your software design, you're in the same situation as cutting hardware to fit as you build it: the days of individually fettled parts are long over and it is now bad practice due to lack of testability, repeatability, and documentation.
True for a waterfall software design.
A modified waterfall process was leading for the Starliner software efforts.
-
#382
by
Jim
on 08 Jun, 2022 11:18
-
The CST-100 should have avoided using the Atlas MET. The standard timeframe for going internal power could also be used for final configuration commanding of the vehicle. As for MET, liftoff and separation breakwires could do the rest.
-
#383
by
edzieba
on 08 Jun, 2022 11:26
-
The CST-100 should have avoided using the Atlas MET. The standard timeframe for going internal power could also be used for final configuration commanding of the vehicle. As for MET, liftoff and separation breakwires could do the rest.
Now
that would come under the heading of "design flaw".
Regardless, two flights in and Starliner has gone from "issues that could cause LoM/LoC" to "issues that reduce available redundancy". Whatever issues remain hardly seem insurmountable.
-
#384
by
deadman1204
on 08 Jun, 2022 13:43
-
That is wrong as well as the rest of the explanation.
Wrong take away again
Care to elaborate instead of just claiming you know better and walking away again?
I can't share everything I know.
I understand - lots of stuff is restricted information. However all the one phrase "your just wrong" comments come off as insulting and could be better said.
-
#385
by
mn
on 08 Jun, 2022 17:03
-
That is wrong as well as the rest of the explanation.
Wrong take away again
Care to elaborate instead of just claiming you know better and walking away again?
I can't share everything I know.
I understand - lots of stuff is restricted information. However all the one phrase "your just wrong" comments come off as insulting and could be better said.
This forum is not a beauty pageant and Jim is not competing for any awards, he's sharing information when he chooses to and we (at least I think most of us) appreciate his info and opinions (and learn to disregard the form if bothers you).
-
#386
by
Khadgars
on 08 Jun, 2022 17:16
-
That is wrong as well as the rest of the explanation.
Wrong take away again
Care to elaborate instead of just claiming you know better and walking away again?
I can't share everything I know.
I understand - lots of stuff is restricted information. However all the one phrase "your just wrong" comments come off as insulting and could be better said.
This forum is not a beauty pageant and Jim is not competing for any awards, he's sharing information when he chooses to and we (at least I think most of us) appreciate his info and opinions (and learn to disregard the form if bothers you).
Agreed. Very few if any have the insight or experience that Jim has here. Even if I disagree I appreciate him taking the time to respond.
-
#387
by
SoftwareDude
on 08 Jun, 2022 18:22
-
A modified waterfall process was leading for the Starliner software efforts.
I assumed so because the testing was deficient. I think the "why waterfall" question is interesting.
-
#388
by
king1999
on 08 Jun, 2022 20:37
-
I understand - lots of stuff is restricted information. However all the one phrase "your just wrong" comments come off as insulting and could be better said.
You can put userid in the ignore list from your Forum Profile.
-
#389
by
SoftwareDude
on 08 Jun, 2022 20:53
-
A modified waterfall process was leading for the Starliner software efforts.
I assumed so because the testing was deficient. I think the "why waterfall" question is interesting.
GAO just now dropped a report and the first page after the cover is about how few DOD programs use modern software development and how that is a major impediment to deployment. I imagine that answers my question about Boeing/NASA software development being like Boeing/DOD; stuck in a methodology developed for the Poseidon program.
GAO Report:
https://www.gao.gov/assets/gao-22-105230.pdfInteresting that the GAO is tracking waterfall vs agile by measuring the iteration interval. This is a huge step forward in my view.
-
#390
by
Redclaws
on 08 Jun, 2022 20:55
-
A modified waterfall process was leading for the Starliner software efforts.
I assumed so because the testing was deficient. I think the "why waterfall" question is interesting.
I’m also a software guy and waterfall doesn’t imply anything bad about testing…. It’s just a different approach which is pretty well suited to static requirements. We - humanity - have successfully built many systems broadly in this way and it is not intrinsically bad.
-
#391
by
thirtyone
on 08 Jun, 2022 21:16
-
Okay, so I think there's actually some misunderstanding here with semantics for engineering processes, and maybe no one wants to explain/point it out. It feels like half the posters here take design in a more vernacular sense of the word, and the other half is looking at a very specific engineering meaning. I'm going to explain my take on some of the short replies for everyone's benefit:
To most engineers (maybe I should say engineering managers, actually), if your engineering design is poor, then it means you missed something on the early parts of your engineering processes (user needs, design inputs, design outputs, etc. - they use different terms in aerospace but I can't remember off the top of my head). The problems would be along the lines of, someone wrote a requirement that there must be an MET, but forgot to specify that it must start at rocket launch. Or on a lower level, the MET code module itself had a written requirement to start the timer on power-on instead of say the launch signal. These would be egregious (even more egregious than what happened) as many industry processes are structured around ensuring even the most careless engineers cannot get away with messing up this design stage because of the number of reviews and cross-checks during this early, critical stage.
Nothing serious is really built or even implemented in this stage, for the most part.
It really doesn't sound like this was really a design issue from everything I've heard, but only insiders would know for sure.
Now, poorly implemented code happens all the time. If you've ever written code before, well, usually the very first time you compile and run, it just "doesn't work" the way it was supposed to (e.g. to the spec). That is, by definition, poorly implemented code, and that's why testing your code well is arguably more important than writing good code. A mechanical analog, by the way, might be accidentally having a hole through the wall of what was specified to be pressure vessel in a drawing, and failing to catch it until the part's already made. Poorly written code making its way to product is probably not considered to be a design problem by many engineers - it's a testing or verification problem.
That's not to say it's not serious problem because it's not a "design" problem. In this case, the inability to catch such a critical implementation mistake was indicative of a serious problems with software engineering testing processes in the organization, which is why it took a good year+ to return to flight. It is just not what many would consider to be a "design" problem. Arguably, it was an engineering process design problem.
-
#392
by
SoftwareDude
on 08 Jun, 2022 21:30
-
A modified waterfall process was leading for the Starliner software efforts.
I assumed so because the testing was deficient. I think the "why waterfall" question is interesting.
I’m also a software guy and waterfall doesn’t imply anything bad about testing…. It’s just a different approach which is pretty well suited to static requirements. We - humanity - have successfully built many systems broadly in this way and it is not intrinsically bad.
I personally didn't mind waterfall either because I felt I knew what I was supposed to do, which it would turn out was the result of an organization-wide delusion. But as a manager of development, and later as an executive, the waterfall is unacceptable because no one can understand enough to know what has to be done to achieve a requirement until they get into the details during software development.
Many organizations say that they are agile when they are not. The GAO looking at iteration length shows the government understands that organizations lie to themselves about being agile or its just a checkmark on a RFP.
Waterfall works out the answer in all its detail upfront before development while agile is a management and development process that converges on the answer even as requirements change.
-
#393
by
LouScheffer
on 09 Jun, 2022 00:07
-
Part of the reason it's hard to say something is a "design error" or "implementation error" is that most errors are both.
Consider an architect that designs a building. They then hand the plans to a structural engineer, who designs the size, placement, and materials of the supports, joists, and so on. These may be built of manufactured wood (for example), which is designed by a different expert with experience in composites of lumber and plastic. The plastic rosins, in turn, were designed by chemists.
Now suppose the building fails since the synthetic lumber was not as strong as specified under some condition (hot, cold, wet, etc). From the point of view of the architect and structural engineer, this is an implementation error. For the lumber design guy, it's a design error. For the chemist, it may not be an error at all, but simply a mis-use of their product.
In general, anyone above the level of the actual problem will call it an implementation error, but it's a design error at some lower level.
-
#394
by
groknull
on 09 Jun, 2022 01:23
-
Part of the reason it's hard to say something is a "design error" or "implementation error" is that most errors are both.
Consider an architect that designs a building. They then hand the plans to a structural engineer, who designs the size, placement, and materials of the supports, joists, and so on. These may be built of manufactured wood (for example), which is designed by a different expert with experience in composites of lumber and plastic. The plastic rosins, in turn, were designed by chemists.
Now suppose the building fails since the synthetic lumber was not as strong as specified under some condition (hot, cold, wet, etc). From the point of view of the architect and structural engineer, this is an implementation error. For the lumber design guy, it's a design error. For the chemist, it may not be an error at all, but simply a mis-use of their product.
In general, anyone above the level of the actual problem will call it an implementation error, but it's a design error at some lower level.
So true. And an excellent example.
"Incompetent employees" / "inept management" is the correlation from the people side of things. "Most errors are both" applies here too. In my occasional role / responsibility to eliminate the finger pointing dichotomy, success was more likely if leadership was 100% committed to eliminating that dichotomy. High commitment from employees but low management buy-in generally resulted in failure. It is not clear to me if design / implementation has the same sort of asymmetry. I'd love to hear examples from forum members (by PM).
-
#395
by
deltaV
on 09 Jun, 2022 04:23
-
I get the impression that old space generally doesn't respect software engineers (SWEs) as much as hardware engineers and doesn't pay SWEs nearly as well as top software firms. An extreme example is some Boeing 737 MAX software was apparently outsourced to $9/hour SWEs (
https://www.industryweek.com/supply-chain/article/22027840/boeings-737-max-software-outsourced-to-9anhour-engineers). The space-is-cool factor may allow old space to attract some great SWEs regardless but I'm guessing that their average SWE quality isn't great. This presumably increases the chances of software errors like the OFT-1 one.
-
#396
by
woods170
on 09 Jun, 2022 11:22
-
A modified waterfall process was leading for the Starliner software efforts.
I assumed so because the testing was deficient. I think the "why waterfall" question is interesting.
Waterfall does not guarantee deficient testing. Neither does it guarantee sufficient testing.
In fact, Boeing still uses waterfall for Starliner development efforts. But they improved their integrated end-to-end test strategy by no longer cutting that test up in temporal blocks. The test now runs the full 3 days.
The reason they initially cut it up in temporal blocks, was to save money. Instead, that decision cost them money: ~ $685M.
-
#397
by
LouScheffer
on 09 Jun, 2022 11:45
-
Waterfall does not guarantee deficient testing. Neither does it guarantee sufficient testing.
In fact, Boeing still uses waterfall for Starliner development efforts. But they improved their integrated end-to-end test strategy by no longer cutting that test up in temporal blocks. The test now runs the full 3 days.
The reason they initially cut it up in temporal blocks, was to save money. Instead, that decision cost them money: ~ $685M.
This has happened before. Mars Polar Lander crashed when the leg extension (performed while high up) bumped the "weight on legs" switch. The software saw this, thought it was on the ground, and cut the engines leading to the crash.
The error was that the software should have ignored the transient case as legs opened (
This was a known effect at the system level, but did not get translated into the software specifications).
But why was it not caught during testing of the flight hardware? There were a number of reasons, but as I understand it, the hardware-in-the-loop simulations were broken into two phases - entry through legs out, and legs out through landing. The landing simulation started with the legs out, and so never saw the transient.
-
#398
by
woods170
on 09 Jun, 2022 13:38
-
Waterfall does not guarantee deficient testing. Neither does it guarantee sufficient testing.
In fact, Boeing still uses waterfall for Starliner development efforts. But they improved their integrated end-to-end test strategy by no longer cutting that test up in temporal blocks. The test now runs the full 3 days.
The reason they initially cut it up in temporal blocks, was to save money. Instead, that decision cost them money: ~ $685M.
This has happened before. Mars Polar Lander crashed when the leg extension (performed while high up) bumped the "weight on legs" switch. The software saw this, thought it was on the ground, and cut the engines leading to the crash.
The error was that the software should have ignored the transient case as legs opened (This was a known effect at the system level, but did not get translated into the software specifications).
But why was it not caught during testing of the flight hardware? There were a number of reasons, but as I understand it, the hardware-in-the-loop simulations were broken into two phases - entry through legs out, and legs out through landing. The landing simulation started with the legs out, and so never saw the transient.
So, a lesson not learned by Boeing. Cutting up tests in temporal blocks is not necessarily a bad thing. But it can become a bad thing when the split is made at the exact point of a phase transition.
-
#399
by
Lee Jay
on 09 Jun, 2022 13:53
-
Waterfall does not guarantee deficient testing. Neither does it guarantee sufficient testing.
In fact, Boeing still uses waterfall for Starliner development efforts. But they improved their integrated end-to-end test strategy by no longer cutting that test up in temporal blocks. The test now runs the full 3 days.
The reason they initially cut it up in temporal blocks, was to save money. Instead, that decision cost them money: ~ $685M.
This has happened before. Mars Polar Lander crashed when the leg extension (performed while high up) bumped the "weight on legs" switch. The software saw this, thought it was on the ground, and cut the engines leading to the crash.
The error was that the software should have ignored the transient case as legs opened (This was a known effect at the system level, but did not get translated into the software specifications).
But why was it not caught during testing of the flight hardware? There were a number of reasons, but as I understand it, the hardware-in-the-loop simulations were broken into two phases - entry through legs out, and legs out through landing. The landing simulation started with the legs out, and so never saw the transient.
So, a lesson not learned by Boeing. Cutting up tests in temporal blocks is not necessarily a bad thing. But it can become a bad thing when the split is made at the exact point of a phase transition.
It's important to take the state from the previous block and use it to initialize the next one.