-
#240
by
intrepidpursuit
on 22 Dec, 2019 16:31
-
Sometimes dumb/simple is good. See KISS principle. At least one Shuttle payload with a giant solid rocket motor (Transfer Orbit Stage on ACTS mission) used a simple MET clock triggered by separation from Shuttle for the entire mission sequencing. However, there were actually three MET clocks for dual fault tolerance, with a majority-voting scheme for critical event like rocket motor ignition.
That was a man-rated software/avionics design, because the system had to be two-fault-tolerant against the MET clock sequencing from igniting the rocket motor too close to Shuttle. So it was good enough to be man-rated after extreme scrutiny by the NASA/JSC safety panel.
Again, that was just three MET clocks, initiated by the separation event, with majority voting to null the vote of one errant clock.
It was good enough for Shuttle man-rated payloads, so it wouldn't surprise me if CST-100 used a similar MET scheme.
Is that really a good comparison, though? You're talking about a MET for a single-event system - ignition of a solid rocket motor, as well as presumably guidance during firing, all during a very defined/rigid trajectory. A manned spacecraft that has to have complex sequences of motor/rcs firings to rendezvous with the space station, dock, has abort modes, etc is far more complicated, no?
Yes, and those state-vector-dependent functions are controlled by flight computer/IMU logic. In the TOS example I gave, the MET clock sequencer enable “unsafing” of the SRM igniter, but actual ignition timing was controlled by the flight computer based on IMU input.
So you can have a combination of MET-based event sequencing and flight-computer logic control for timing-critical events.
There was some similar system disagreement here. Fortunately Starliner didn't just fire its thrusters the wrong direction and bring down the capsule. It stabilized itself as if it was firing thrusters but then never did. And apparently stabilized itself for much longer than it should have or more aggressively or both in order to blow out RCS sensors and burn 25% of its fuel. So some part of the computer thought it was time for orbit insertion and some part didn't. Those systems not checking with each other before doing conflicting actions is the part that is the most baffling to me.
The system doing the stabilizing should check and see if the thrusters are firing and if they aren't within a window then safe. The RCS system should know it only needs to stabilize for a short time and if it goes way longer than that then it should safe. If the RCS thruster sensors are reaching unsafe temperatures then they should throw a warning and if they aren't in a critical part of the mission they should safe. If the antennas aren't pointed at the TDRS satellites then it should throw an error and prioritize getting connected before burning out the RCS.
As others have mentioned, a collaboration between the hardware, mission planning, and software people should determine the safe states in each scenario and where systems should be allowed to push beyond their limits to recover a bad situation. It is hard to believe none of that was done, but so many compounding errors here make it seem like there is very little fault tolerance in the system.
-
#241
by
Kabloona
on 22 Dec, 2019 16:51
-
Obviously Boeing has some problems with their control logic based on MET, but it has been done correctly in the past, so the premise itself isn’t necessarily flawed. It got past NASA/JSC safety review panel, which is unbelievably rigorous.
As always, the devil is in the details.
-
#242
by
wolfpack
on 22 Dec, 2019 18:41
-
I wouldn’t fly Starliner again (crewed or uncrewed) without being confident that this isn’t evidence of a much broader failing. It’ll be interesting to see what ASAP etc have to say about this.
In my opinion there should be a lengthy stand down for Starliner until the code is 100% verified, line by line, between NASA and Boeing.
They raked SpaceX over the coals on software, and rightfully so. It should be no different from a contractor with whom they’ve had a much longer relationship, especially in light of Boeing’s recent spate of problems.
-
#243
by
llanitedave
on 22 Dec, 2019 18:56
-
Just a wild guess, but they may have run full simulations where inputs were faked to make the system think it was actually in flight. That's how they test new airliners. That could mean they screwed up in returning everything to flight configuration.
On the same level as the parachute connector, then.
-
#244
by
TheRadicalModerate
on 22 Dec, 2019 19:21
-
This is a little O/T, but since we seem to be having a software discussion, I'd like to ask a more general question about the right way to be developing software for space systems, and how incumbents could transition over to it without losing their minds.
First, an assertion: I'm guessing that Starliner software (and software for all Boeing products, and likely all the products from Lockmart and NGIS) is developed using the old-timey once-through spec/design/review/code/test methodology, rather than one of the newfangled iterative approaches like Agile. Anybody have information to confirm or disprove this hypothesis?
Next, let's think about how to use iterative methodology on a spacecraft. Obviously, the idea of a "minimum viable product" doesn't fit very well with a spacecraft, which is a pretty freakin' maximal piece of hardware before the crew is allowed onboard. But there can be lots of minimum viable testbeds.
Furthermore, simulation is a lot more advanced than it was back in the day. I'm wondering if a decent methodology would be to develop a simulation of each hardware component right along with the software designed to control it. So you'd be unit testing and integrating your sim in parallel with unit testing and integrating the actual iron.
That would make your "sprint" look something like this:
Define features/user stories for the sprint.
Develop, unit test, and integration test to the sim
Unit test and integration test to whatever extent possible on unit or integrated hardware.
Feed back deviations from the hardware into the next sprint for the sim
Complete the sprint.
This is obviously pushing the simulation really, really hard, and I'm on very shaky ground whether you can build an entire virtual spacecraft this way--I'm looking for feedback from people in the industry, because I haven't been in it for many, many years. But if you can do this, it has the wonderful property that you have a complete regression testbed that can run automatically. Essentially, it's the ultimate in test-centric design.
Is this possible? Is this even close to what's actually happening with the new space companies?
The next question is how an incumbent that's using the One Spec to Rule Them All methodology could transition to an iterative methodology without essentially blowing up their engineering groups and starting over. This is really, really tough. Consider what has to change:
1) Your integration methodology gets blown up.
2) None of your existing QA suites are adaptable to the new system, so you have massive one-time test-engineering costs.
3) Your source code control system probably doesn't work any more, and we all know how fun it is migrating one of those.
4) Not only do you have to retrain your software staff, but you also have to invent a whole new hardware engineering methodology to go with it, and get people who've spent their entire careers doing things one way to do them a completely different way.
5) The interfaces with manufacturing have to change.
6) Even things like contract administration and sales can't proceed business as usual, because the process of going from RFP to a CDR is different and twisty.
There's obviously an existence proof that it's possible to design a system like this from scratch and make NASA and DoD comfy somehow, because SpaceX has done it, at least partially. But I sorta-kinda think that the incumbents are pretty screwed when it comes to fixing this, which gives SpaceX a sustainable advantage that likely makes their pricing power look like a drop in the bucket.
It should also be noted that there's a military dimension to this that doesn't get talked about very much: If most of our big military-industrial developers are stuck with the old methodology, a bunch of our new geopolitical opponents (esp. China) likely have a sustainable advantage from adopting modern development practices when they started growing their tech base.
This seems like a problem that desperately needs an answer. I have no clue if I'm on the right track here or not, and would appreciate some feedback.
-
#245
by
TheRadicalModerate
on 22 Dec, 2019 19:34
-
I wouldn’t fly Starliner again (crewed or uncrewed) without being confident that this isn’t evidence of a much broader failing. It’ll be interesting to see what ASAP etc have to say about this.
In my opinion there should be a lengthy stand down for Starliner until the code is 100% verified, line by line, between NASA and Boeing.
They raked SpaceX over the coals on software, and rightfully so. It should be no different from a contractor with whom they’ve had a much longer relationship, especially in light of Boeing’s recent spate of problems.
This seems a bit extreme to me. This certainly wouldn't be the first time that somebody pulled stuff from the wrong address (if that's what actually happened). It's basically a source code control glitch. It's a pretty serious one, and is highly indicative of an old methodology, but we are dealing with a launcher that was developed in the 90's, using a second stage that was developed in the 60's.
I'm sure that there are better ways to build aerospace code these days, even in old companies, but back on the Space Shuttle, we'd go for weeks or even months without doing a full build, because there were so many hard-coded addresses used in ground and crew procedures that publishing an update was a major undertaking. After the code was feature complete, we'd do machine-language patches to fix bugs until we could get them merged into the source code.
Atlas V is more modern than that, but not by a huge amount.
If this does turn out to be a case where somebody pulled stuff from the wrong spot, I agree that the procedures for how that gets controlled need to be pretty carefully reviewed. That's likely a short stand-down, but it's a far cry from what you're recommending.
-
#246
by
JEF_300
on 22 Dec, 2019 20:03
-
Here's the post-landing briefing.
Edit for commentary: I was really impressed with the Boeing representative's openess here.
-
#247
by
ninjaneer
on 22 Dec, 2019 20:09
-
I'm wondering if a decent methodology would be to develop a simulation of each hardware component right along with the software designed to control it. So you'd be unit testing and integrating your sim in parallel with unit testing and integrating the actual iron.
The buzzword du jour for what you are referencing is "digital twin." However, the emulators that I've dealt with don't allow mocks of direct address reading, only bus and protocol mocks.
-
#248
by
JEF_300
on 22 Dec, 2019 20:17
-
For those of you who wanted an exact number, in the post landing presser the Boeing rep said that the MET was 11 hours off.
-
#249
by
Oberon_Command
on 22 Dec, 2019 20:31
-
I'm sure that there are better ways to build aerospace code these days, even in old companies, but back on the Space Shuttle, we'd go for weeks or even months without doing a full build, because there were so many hard-coded addresses used in ground and crew procedures that publishing an update was a major undertaking. After the code was feature complete, we'd do machine-language patches to fix bugs until we could get them merged into the source code.
That's
terrifying. In every other part of the software industry - even video games, which is notoriously behind the curve when it comes to modern processes - we typically have an automated system that takes a snapshot of the current state of the repository and makes a build from it once an
hour to see if any regressions slipped through developers' local testing or if two changes to the code interact badly with one another. I've worked at places where the automation ran on
every commit.
I can't even imagine how frustrating and time consuming - and expensive - it must have been not to have a working build for months.
-
#250
by
yokem55
on 22 Dec, 2019 21:11
-
For those of you who wanted an exact number, in the post landing presser the Boeing rep said that the MET was 11 hours off.
Not sure how correlated this is - but the Atlas power up event was at ~ T- 11 hours. If I were to bet, the Starliner MET could have come from the the Atlas startup uptime.
-
#251
by
Roy_H
on 22 Dec, 2019 21:23
-
At first I was alarmed that NASA would consider this flight to be good enough to proceed with crewed flight without docking. But then I looked at it from the safety viewpoint and realized that if the rendezvous and docking failed with crew aboard, they would have to return to earth, but no danger of loss of life, just a mission failure.
-
#252
by
ThomasGadd
on 22 Dec, 2019 21:23
-
Of all the events to have this is the best
Nothing was destroyed, they get tons data to go through.
They launched and got to orbit, just not the right orbit.
They successfully landed their intended landing site, just a few days earlier.
For the launch team unique on the job training. When was the last time this Boeing team handled a launch?
-
#253
by
ThomasGadd
on 22 Dec, 2019 21:29
-
At first I was alarmed that NASA would consider this flight to be good enough to proceed with crewed flight without docking. But then I looked at it from the safety viewpoint and realized that if the rendezvous and docking failed with crew aboard, they would have to return to earth, but no danger of loss of life, just a mission failure.
I think if the automated rendezvous and docking failed the crew would have overridden it and manually docked.
-
#254
by
CorvusCorax
on 22 Dec, 2019 22:08
-
At first I was alarmed that NASA would consider this flight to be good enough to proceed with crewed flight without docking. But then I looked at it from the safety viewpoint and realized that if the rendezvous and docking failed with crew aboard, they would have to return to earth, but no danger of loss of life, just a mission failure.
I think if the automated rendezvous and docking failed the crew would have overridden it and manually docked.
Yeah, the autonomous docking can be done manual on a manned Starliner, or you could have an Astronaut ready to take over manual if anything goes wrong and test it just before docking. Also the proximity operation they could check out maneuvering around a pretend-space-station being in the wrong orbit.
I think the only thing missing is the close range sensor suite telling them where in relation to the station they are. But with crew that is not an issue, since Starliner has windows and they can fall back to the Mk1 eyeballs and do it during orbital daylight (which they do anyway so the on-station crew could press the abort button) should those sensors fail on a crewed flight.
Really the only thing that could endanger the crew that could not be checked now but could be checked if it had docked is a fatal mechanical or electromechanical design flaw in the docking ring - leading to loss of pressure during the actual docking procedure.
Considering the mechanics were tested on the ground that's incredibly unlikely, but its the only thing that
1. can not be tested without actually docking
2. can not be fixed by the crew and would endanger the crew (assume the capsule cannot be sealed anymore
if that were to happen, it doesn't necessarily lead to loss of crew, but - assume the worst case
- capsule is attached to station, but not properly
- the capsule lost pressure
- they cannot undock due to mechanical failure
- the forward hatch is blocked by the station docking adapter
- in that case someone from the station would have to do an EVA and rescue the crew through the side hatch, then carry them to the airlock
- then the ISS would have more crew on board than rescue boat seats, so they'd have to send an extra empty Soyus to provide a ride back
-
#255
by
clongton
on 22 Dec, 2019 22:38
-
Not to be overlooked, as several posters are doing, is the legally binding contractual requirement for Boeing: "The Contractor’s flight test program shall include an uncrewed orbital flight test to the ISS," the document states. And this test should include, "Automated rendezvous and proximity operations, and docking with the ISS, assuming ISS approval." Boeing signed this contract, as did SpaceX, who was required to fulfill this specific requirement with their spacecraft. For Boeing to be allowed to simply ignore this binding requirement would be, well, not helpful. Whether they like it or not, Boeing is required by the language of the contract they signed to demonstrate, with an uncrewed spacecraft, automated rendezvous and proximity operations, and docking with the ISS. Expect Boeing to fight hard to be let out of this requirement, even though SpaceX was required to fulfill it. If the terms of the contract are to be fulfilled, OFT-1 must be reflown and the first crewed flight of Starliner must now be the 3rd flight of that spacecraft. We'll see if NASA has any gonads or caves to Boeing pressure to allow this contractual requirement to be ignored.
-
#256
by
MATTBLAK
on 22 Dec, 2019 22:43
-
"
To Boldly Go where Gonads have gone before..."
Truly sorry: couldn't resist it!
-
#257
by
gongora
on 22 Dec, 2019 23:11
-
I wouldn’t fly Starliner again (crewed or uncrewed) without being confident that this isn’t evidence of a much broader failing. It’ll be interesting to see what ASAP etc have to say about this.
In my opinion there should be a lengthy stand down for Starliner until the code is 100% verified, line by line, between NASA and Boeing.
They raked SpaceX over the coals on software, and rightfully so. It should be no different from a contractor with whom they’ve had a much longer relationship, especially in light of Boeing’s recent spate of problems.
It isn't NASA's role to verify, line-by-line, the software in these vehicles. They are not NASA vehicles.
-
#258
by
skater
on 22 Dec, 2019 23:43
-
Here's the post-landing briefing.
Edit for commentary: I was really impressed with the Boeing representative's openess here.
I must admit, Bridenstine has been very engaged with this.
-
#259
by
Patchouli
on 22 Dec, 2019 23:43
-
[EDIT] Adding and responding to Jim's second comment - that's what I was talking about, curious what your thoughts are on that point specifically, "what the spacecraft does with it".
I think even on Dragon some events are also handled in a similar way I remember reading they had to manually command the solar wings to deploy on CRS-2 when the thrusters failed to fire as normally this event would not happen until after it fired it's thrusters to clear the second stage and get into the right attitude.