-
#360
by
CorvusCorax
on 25 Dec, 2019 21:59
-
I am not concerned with "pulling the wrong value" - that's a mistake that happens very quickly when you interface with hardware from a different vendor and the terminology and documentation standards differ only the slightest.
I am also not too concerned with the bug slipping through QC and hitl tests. although its possible that the tests were not as stringent as they should be, its completely possible this slipped due to the way the simulation EDS boxes were used in the HITL and SITL tests (for example powered just before launch as opposed to 11 hours prior, masking the issue by chance)
whats really unsettling IMHO is the effects the wrong clock had, causing a chain of fault propagation leading to
- wrong attitude
- overloaded RCS system
- tons of propellant wasted
- communication outage
- ...
because that speaks of an architecture that is lacking fault tolerance as a whole when it comes to wrong data being introduced
this was triggered by a small thing which is easy to fix, but if the architecture is that fragile, one doesn't dare to imagine what kinda mayhem data corruption from faulty sensors, memory, power fluctuations and similar could cause during a mission.
Remember when Apollo 12 was hit by lightning during a launch upsetting the electronics?
After what happened on this launch I seriously doubt a simple "SCE to Aux" would fix it...
-
#361
by
JEF_300
on 25 Dec, 2019 22:01
-
Someone up thread pointed out that Atlas start-up is about 11 hours before launch, and I recall hearing that Starliner sets it's MET based on data from the Atlas. I'd appreciate it if someone could verify both of these facts for me.
If they are both true however, it suggests that the error was simply retrieving the wrong data from Atlas. I don't know how Starliner/Atlas coding works, but I assume the fix would in fact be a simple change in code. Perhaps as simple as changing one variable.
Those facts are correct that I know of, and I don't think anyone disagrees with the assessment that the fix for the MET issue is very likely to be a simple one. The concern is regarding the series of actions taken by the guidance software subsequent to the failure.
Anyone who has written software long enough (on my 25th year myself, yegods) knows how likely it is to fix a bug and then find a different vector to the same issue, or similar issues expressed differently. The guidance software seems rather fragile in how it handled the error condition, which is not what you want in an autonomous system trusted with human lives...
I hope and expect there will be a thorough review to all systems that might be affected similarly before a re flight (which, regardless of what anyone here thinks, will be crewed).
I wasn't trying to make a statement, just clarify the facts
-
#362
by
ulm_atms
on 25 Dec, 2019 22:28
-
With all the comparison of other things/missions...I am having the hardest time accepting no do-over for CST for one very simple issue.
Their craft requires a accurate MET since it seems to be a digital cam timer. So with the system requiring a accurate MET....they have no checking from any other source and basically pull it from Atlas and hope for the best with no way to even verify if what they pulled from Atlas is correct.
For all things being equal, that statement above makes me question the thought processes by the engineers for it's entire design. This vital piece has no verification/checking.....

Side Bar: I have also been told I don't "imagine" hard enough when trying to find some way something can go wrong so I will try and help with that here

So they fix the wrong field being pulled. They still have absolutely no way to verify that it is correct however. Read correct field. Bit flip. Take two on what the hell is going on by the ground guys.....
Edit: typo
-
#363
by
saliva_sweet
on 26 Dec, 2019 00:18
-
Their craft requires a accurate MET since it seems to be a digital cam timer. So with the system requiring a accurate MET....they have no checking from any other source and basically pull it from Atlas and hope for the best with no way to even verify if what they pulled from Atlas is correct.
1) I think this is good design. If the met is that important it is correct to have one clock to rule them all. The Atlas clock is proven right. It is the one that got the craft to (almost) orbit. If that clock is wrong it doesn't matter if you can conjure a better one. Your flight is doomed anyway. They pulled the wrong clock from atlas and it can be fixed.. this is why we test.
2) The egg-timer is reliable. Your egg will be done right if the clock is programmed right. It is the simplest approach and I do not agree with people that say the system should have been smarter. Trying to derive a better clock through heuristics from multiple inputs is even more risky and very difficult to do right.
3) I see this as a trial by fire for ASAP. I have been really skeptical of ASAP and their dwelling on theoretical concerns. This scenario is what ASAP was called to existence for. NASA leadership has got the go fever. Real testing has revealed the need for more testing. I do not buy the NASA stance of docking-schmocking at all. Major objectives of the OFT mission (according to my intuition, common sense, preflight communications and contractual agreements) were not accomplished. Going to ISS orbit, meeting the ISS, docking with it, integrated ops, returning from it, all are risky endeavours. And they were not tested. Starliner failed the very second it detached form Atlas and this is not a good look. OFT-2 should be required I feel.
4) Should Boeing do the IFA? I'm not sure. Of course it would be nice. SpaceX wanted to do it and it is an edge of the envelope test. Always nice to have. But I think there could be many other contingency scenarios that are as plausible as in flight abort at max q. Can you test them all? Should you? If you should would you ever fly? IFA is expensive for Boeing as it would scrap a reusable Starliner.
-
#364
by
LouScheffer
on 26 Dec, 2019 00:36
-
So they fix the wrong field being pulled. They still have absolutely no way to verify that it is correct however.
Remember the Atlas has two EDS systems. I'm guessing they read the time from both and if they agree then they consider it correct. I can't imagine they do no verification at all for such an important quantity - if that turns out to be the case, there is no way astronauts should fly in it until there is complete software review to look for other such missing double checks.
-
#365
by
ulm_atms
on 26 Dec, 2019 01:08
-
So they fix the wrong field being pulled. They still have absolutely no way to verify that it is correct however.
Remember the Atlas has two EDS systems. I'm guessing they read the time from both and if they agree then they consider it correct. I can't imagine they do no verification at all for such an important quantity - if that turns out to be the case, there is no way astronauts should fly in it until there is complete software review to look for other such missing double checks.
I agree. I just find it hard to justify that with modern time keeping methods available (GPS, WWVB, ground clock, internal RTC) that it could not have some way of checking that it's MET was 11 hours off and then taking that MET as word and going about it's job incorrectly.
As saliva said above, egg timers are reliable....as long as it gets set correctly...otherwise the roast still burns

But all being equal...they pulled the wrong time off Atlas...how in the world did that get missed in integration testing? That's the one that causes more ???s
-
#366
by
Comga
on 26 Dec, 2019 01:09
-
Follow this first principles engineering concept:
Once released from the second stage on a suborbital trajectory, Starliner needs a prograde burn to achieve that vaunted “stable orbit” i.e. above the atmosphere.
Starliner has a forward looking camera.
The camera can see stars (unless it’s facing at the Sun.)
A modest processor can centroid the stars and determine the direction of motion between multiple images.
The direction of motion indicates the pointing angle with respect to the velocity.
The limit of the star tracks indicates the angle above the horizon.
The curvature of the limit could even give an estimate of altitude.
This would be sufficient to set the spacecraft orientation for the orbit insertion burn (and antenna pointing to TDRS.)
I have designed space-borne camera based metrology and star tracking. It can be easily good to a milliradian and run on an FPGA. Space qualified, flight proven camera systems like this are available for $50k. (I can’t say where they are currently flying, but they are.)
If this solution differed dramatically from the baseline burn plan, say by 90 degrees, it would be time to raise an alarm, at the least.
Didn’t Bridenstine tout “dissimilar redundancy”?
Of course Starliner’s star trackers can do that and more, like give vehicle absolute pointing to 5 or 10 microradians, so it should have been apparent the the burn wasn’t set up right. How the system reacted like it did is astounding.
-
#367
by
clongton
on 26 Dec, 2019 01:26
-
So they fix the wrong field being pulled. They still have absolutely no way to verify that it is correct however.
Remember the Atlas has two EDS systems. I'm guessing they read the time from both and if they agree then they consider it correct. I can't imagine they do no verification at all for such an important quantity - if that turns out to be the case, there is no way astronauts should fly in it until there is complete software review to look for other such missing double checks.
I agree. I just find it hard to justify that with modern time keeping methods available (GPS, WWVB, ground clock, internal RTC) that it could not have some way of checking that it's MET was 11 hours off and then taking that MET as word and going about it's job incorrectly.
As saliva said above, egg timers are reliable....as long as it gets set correctly...otherwise the roast still burns
But all being equal...they pulled the wrong time off Atlas...how in the world did that get missed in integration testing? That's the one that causes more
Ok so Atlas has 2 EDSs? Do they include a MET timer? I'm not up on the EDS circuitry so I can't identify where the MET timers are but several people have mentioned that there are 2 EDSs onboard. So I assume they are "implying" that the MET timer is part of those Emergency Detection Systems. I do not know if that is correct or not. Perhaps someone more knowledgeable of the Atlas EDS can clarify that question.
So going with that -(admitted)- assumption, I would have to conclude that BOTH systems informed Starliner the identical incorrect MET. Otherwise they wouldn't have agreed - PROVIDED that the Starliner flight avionics compares both to see if there is a disagreement. Is that a fair conclusion? Or is the MET circuitry located elsewhere and mention of the EDS in the posts above is uninformed? Where is the MET circuitry?
-
#368
by
ulm_atms
on 26 Dec, 2019 01:28
-
Follow this first principles engineering concept:
Once released from the second stage on a suborbital trajectory, Starliner needs a prograde burn to achieve that vaunted “stable orbit” i.e. above the atmosphere.
Starliner has a forward looking camera.
The camera can see stars (unless it’s facing at the Sun.)
A modest processor can centroid the stars and determine the direction of motion between multiple images.
The direction of motion indicates the pointing angle with respect to the velocity.
The limit of the star tracks indicates the angle above the horizon.
The curvature of the limit could even give an estimate of altitude.
This would be sufficient to set the spacecraft orientation for the orbit insertion burn (and antenna pointing to TDRS.)
I have designed space-borne camera based metrology and star tracking. It can be easily good to a milliradian and run on an FPGA. Space qualified, flight proven camera systems like this are available for $50k. (I can’t say where they are currently flying, but they are.)
If this solution differed dramatically from the baseline burn plan, say by 90 degrees, it would be time to raise an alarm, at the least.
Didn’t Bridenstine tout “dissimilar redundancy”?
Of course Starliner’s star trackers can do that and more, like give vehicle absolute pointing to 5 or 10 microradians, so it should have been apparent the the burn wasn’t set up right. How the system reacted like it did is astounding.
I thought the star tracker on Starliner wasn't available right after separation before the last burn to a stable orbit?
Pondering thought...
With everything that happened....I wonder if NASA will ask that Starliner's orbit be a stable orbit after separation from now on? That would of bought them the time to straighten the situation out and Starliner would probably be at the ISS currently. I know they wanted to do it to reduce prop weight, but don't see a down side overall.
-
#369
by
Comga
on 26 Dec, 2019 01:39
-
So they fix the wrong field being pulled. They still have absolutely no way to verify that it is correct however.
Remember the Atlas has two EDS systems. I'm guessing they read the time from both and if they agree then they consider it correct. I can't imagine they do no verification at all for such an important quantity - if that turns out to be the case, there is no way astronauts should fly in it until there is complete software review to look for other such missing double checks.
I agree. I just find it hard to justify that with modern time keeping methods available (GPS, WWVB, ground clock, internal RTC) that it could not have some way of checking that it's MET was 11 hours off and then taking that MET as word and going about it's job incorrectly.
As saliva said above, egg timers are reliable....as long as it gets set correctly...otherwise the roast still burns
But all being equal...they pulled the wrong time off Atlas...how in the world did that get missed in integration testing? That's the one that causes more
Ok so Atlas has 2 EDSs? Do they include a MET timer? I'm not up on the EDS circuitry so I can't identify where the MET timers are but several people have mentioned that there are 2 EDSs onboard. So I assume they are "implying" that the MET timer is part of those Emergency Detection Systems. I do not know if that is correct or not. Perhaps someone more knowledgeable of the Atlas EDS can clarify that question.
So going with that -(admitted)- assumption, I would have to conclude that BOTH systems informed Starliner the identical incorrect MET. Otherwise they wouldn't have agreed - PROVIDED that the Starliner flight avionics compares both to see if there is a disagreement. Is that a fair conclusion? Or is the MET circuitry located elsewhere and mention of the EDS in the posts above is uninformed? Where is the MET circuitry?
Yes
One of the Atlas videos shown pre-launch touted the two identical EDS boxes on opposite sides of the rocket.
It’s a good probability that the software, which was described as fetching the MET, pulled the same incorrect value from each side.
They matched.
Success
Or so it thought.
-
#370
by
clongton
on 26 Dec, 2019 01:45
-
So they fix the wrong field being pulled. They still have absolutely no way to verify that it is correct however.
Remember the Atlas has two EDS systems. I'm guessing they read the time from both and if they agree then they consider it correct. I can't imagine they do no verification at all for such an important quantity - if that turns out to be the case, there is no way astronauts should fly in it until there is complete software review to look for other such missing double checks.
I agree. I just find it hard to justify that with modern time keeping methods available (GPS, WWVB, ground clock, internal RTC) that it could not have some way of checking that it's MET was 11 hours off and then taking that MET as word and going about it's job incorrectly.
As saliva said above, egg timers are reliable....as long as it gets set correctly...otherwise the roast still burns
But all being equal...they pulled the wrong time off Atlas...how in the world did that get missed in integration testing? That's the one that causes more
Ok so Atlas has 2 EDSs? Do they include a MET timer? I'm not up on the EDS circuitry so I can't identify where the MET timers are but several people have mentioned that there are 2 EDSs onboard. So I assume they are "implying" that the MET timer is part of those Emergency Detection Systems. I do not know if that is correct or not. Perhaps someone more knowledgeable of the Atlas EDS can clarify that question.
So going with that -(admitted)- assumption, I would have to conclude that BOTH systems informed Starliner the identical incorrect MET. Otherwise they wouldn't have agreed - PROVIDED that the Starliner flight avionics compares both to see if there is a disagreement. Is that a fair conclusion? Or is the MET circuitry located elsewhere and mention of the EDS in the posts above is uninformed? Where is the MET circuitry?
Yes
One of the Atlas videos shown pre-launch touted the two identical EDS boxes on opposite sides of the rocket.
It’s a good probability that the software, which was described as fetching the MET, pulled the same incorrect value from each side.
They matched.
Success
Or so it thought.
And yet we know that the spacecraft itself has a MET timer in the avionics supplied by a break wire through the umbelical disconnects. So it polled the 3 values, found that the 2 from the LV agreed, voted, and disregarded the (correct) MET that the spacecraft had reported? Wow.
-
#371
by
Comga
on 26 Dec, 2019 02:08
-
Yes
One of the Atlas videos shown pre-launch touted the two identical EDS boxes on opposite sides of the rocket.
It’s a good probability that the software, which was described as fetching the MET, pulled the same incorrect value from each side.
They matched.
Success
Or so it thought.
And yet we know that the spacecraft itself has a MET timer in the avionics supplied by a break wire through the umbelical disconnects. So it polled the 3 values, found that the 2 from the LV agreed, voted, and disregarded the (correct) MET that the spacecraft had reported? Wow.
We don’t know that.
It could have compared them and “voted”. (Oof!)
It could have not compared them. (Damn!)
It could have used both values for different purposes, contributing to the pandemonium that was seen. (Wow!)
There are probably other possibilities.
None are good because the flight wasn’t good.
-
#372
by
GWH
on 26 Dec, 2019 08:19
-
In regards to the next flight being crewed I think a proper discussion on risk assessment is needed. Based on what we know and what wasn't demonstrated by docking what risks will be induced and what will the severity of the risk pose?
Further does having crew on board increase or decrease risk to human life and damage to ISS?
Purely regarding the undemonstrated rendevous and docking, I would think having crew on board Starliner and humans in the loop would significantly reduce risk. Based on this first flight I don't see any reason to not move to the first crewed flight, except...
The real risk IMO that needs to be resolved is the process that led to the past 2 Starliner anomalies, and the systemic problems they're indicative of.
The issue to resolve is the testing procedures and QC. An additional flight test isn't likely to fix those. At best one more problem may get uncovered. But it won't resolve the root cause, which is the overarching process. Review that first, and don't focus on the red herring of untested rendezvous and docking. Test pilots can mitigate the risk on that.
-
#373
by
launchwatcher
on 26 Dec, 2019 16:55
-
It could have used both values for different purposes, contributing to the pandemonium that was seen. (Wow!)
pure speculation here:
I'm wondering if part of the pandemonium was due to two different entities within the control software disagreeing (and alternately exercising control over) over the desired orientation -- prograde-ish for the orbital insertion burn vs. tail to the sun for charging. intended behavior is of course that one is in charge for the insertion burn and the other during ordinary coasting flight and never both at the same time...
Now that could end up wasting a lot of fuel in a few minutes...
-
#374
by
Lars-J
on 26 Dec, 2019 19:20
-
2) The egg-timer is reliable. Your egg will be done right if the clock is programmed right. It is the simplest approach and I do not agree with people that say the system should have been smarter. Trying to derive a better clock through heuristics from multiple inputs is even more risky and very difficult to do right.
Relying on an egg-timer is NOT reliable. What if your system crashes and has to reboot? With cosmic particles impact computer systems in LEO, this can (and will) happen even if your system as ZERO bugs. Which is why you either A) harden your hardware against radiation to the extreme, or B) use multiple redundant computers (usually 3), or C) both.
Relying on a single source is very risky.
-
#375
by
Coastal Ron
on 26 Dec, 2019 21:27
-
2) The egg-timer is reliable. Your egg will be done right if the clock is programmed right. It is the simplest approach and I do not agree with people that say the system should have been smarter. Trying to derive a better clock through heuristics from multiple inputs is even more risky and very difficult to do right.
Relying on an egg-timer is NOT reliable. What if your system crashes and has to reboot? With cosmic particles impact computer systems in LEO, this can (and will) happen even if your system as ZERO bugs. Which is why you either A) harden your hardware against radiation to the extreme, or B) use multiple redundant computers (usually 3), or C) both.
Relying on a single source is very risky.
Indeed!
As we all dig deeper into this issue, and I refresh my public knowledge about available systems, I keep wondering why my iPhone has more ability to determine its time and location in space than the hugely expensive Boeing Starliner?
Doing a quick search turns up lots of
inertial navigation system (INS) systems that are inexpensive for spacecraft uses.
Here is one for $9,126 that does:
Spatial Dual is a ruggedized miniature GPS aided inertial navigation system and AHRS that provides accurate position, velocity, acceleration and orientation under the most demanding conditions. It combines temperature calibrated accelerometers, gyroscopes, magnetometers and a pressure sensor with a dual antenna RTK GNSS receiver. These are coupled in a sophisticated fusion algorithm to deliver accurate and reliable navigation and orientation.
One of these units would know that the launch vehicle has left the ground, would know if it was following the right launch profile, and would certainly KNOW WHAT TIME IT IS.
I fear Boeing will convince NASA to NOT make the results of the investigation public, since it will likely show that Boeing designed the system inadequately - and Boeing doesn't need more attention paid on their inability to design fool-proof flight systems...
-
#376
by
Comga
on 26 Dec, 2019 21:30
-
2) The egg-timer is reliable. Your egg will be done right if the clock is programmed right. It is the simplest approach and I do not agree with people that say the system should have been smarter. Trying to derive a better clock through heuristics from multiple inputs is even more risky and very difficult to do right.
Relying on an egg-timer is NOT reliable. What if your system crashes and has to reboot? With cosmic particles impact computer systems in LEO, this can (and will) happen even if your system as ZERO bugs. Which is why you either A) harden your hardware against radiation to the extreme, or B) use multiple redundant computers (usually 3), or C) both.
Relying on a single source is very risky.
No kitchen egg timer has ever undergone a Single Event Upset from radiation.
-
#377
by
Lars-J
on 26 Dec, 2019 23:37
-
2) The egg-timer is reliable. Your egg will be done right if the clock is programmed right. It is the simplest approach and I do not agree with people that say the system should have been smarter. Trying to derive a better clock through heuristics from multiple inputs is even more risky and very difficult to do right.
Relying on an egg-timer is NOT reliable. What if your system crashes and has to reboot? With cosmic particles impact computer systems in LEO, this can (and will) happen even if your system as ZERO bugs. Which is why you either A) harden your hardware against radiation to the extreme, or B) use multiple redundant computers (usually 3), or C) both.
Relying on a single source is very risky.
No kitchen egg timer has ever undergone a Single Event Upset from radiation.
I had assumed all of us in this thread knew that “egg timer” was not to be taken literally, but you proved me wrong. A software egg timer in space would certainly be susceptible without proper precautions having been taken.
But maybe you can explain your position? You believe a software timer cannot undergo a “single event upset”?
-
#378
by
freddo411
on 26 Dec, 2019 23:43
-
Questions for the group:
If you review the launch video, in the control room, you can see a computer graphic of the SL which appears to be displaying both position and engine firings in real time.
* If communication was not established with the spacecraft, how was that data available? Is/was there a one way downlink, or something else?
-
#379
by
clongton
on 26 Dec, 2019 23:49
-
Questions for the group:
If you review the launch video, in the control room, you can see a computer graphic of the SL which appears to be displaying both position and engine firings in real time.
* If communication was not established with the spacecraft, how was that data available? Is/was there a one way downlink, or something else?
I believe that was after communications were reestablished. I assume that what was being shown was the spacecraft's orientation, attitude and thruster firings, with the avionics doing what it wanted to do, iaw the erroneous MET, while the ground controllers were attempting to override it and gain manual control. The display wasn't up very long before the broadcast was ended. I don't have corroboration for that. It just seems to me, knowing what we now do, that that is most likely what was going on at that time.