-
#120
by
litton4
on 02 Jun, 2024 12:35
-
Aren't some of these issues down to the fact that this is a human-rated mission and they are applying tighter limits.
They constantly mention (about every 3 seconds) that they are taking super-duper-extra special care as this is the first time people have launched on Atlas V?
I think Tory mentioned that normally for the chattering valve, they would just recycle to stop it.
Since there were people going up there, they didn't.
Would they have launched with 2/3 ground sequencers running (or re-cycled to attempt to get all 3 back on line)?
(I know this was an instantaneous window, so the latter my not apply).
-
#121
by
SWGlassPit
on 02 Jun, 2024 12:45
-
So this Shuttle Flight Director is now with ULA?
That's Richard Jones, the lead flight director. You'll recall him being lead on OFT-1 -- watch for shots of him at the center of the room as the team watched the spacecraft out of control and not responding to commands, firing its thrusters and using up the prop (ignore the commentary, they were blithely sticking to script).
He's the
deputy program manager now.
-
#122
by
Slothman
on 02 Jun, 2024 13:23
-
I think Tory mentioned that normally for the chattering valve, they would just recycle to stop it.
Since there were people going up there, they didn't.
Not even recycle, just manually cycle the valve
-
#123
by
Ken the Bin
on 02 Jun, 2024 13:25
-
Can someone please remind me (I can't find the post that originally dealt with this question, sorry), the Starliner stack was rolled to the pad on 26/04/24 (date the correct way round) if IIRC. I'm sure I saw a post that said that there are some consumables (solid fuels?) which are only certified for a defined period post roll out, and these limits would be met on 07/06/24? So if we get a couple of (hopefully no more serious than) weather delays, does the stack need to return to the barn for a refurb anyway around that time?
The latest NGA Rocket Launching marine hazard notice (which I posted over in the launch topic) has hazard periods for June 5, June 6, and June 9. So I assume that the limited life issues (batteries is what I primarily recall) are good at least through June 9.
-
#124
by
cplchanb
on 02 Jun, 2024 13:45
-
So the launcher computer hardware is broken now? Still doesn't make it any better that they once again have to delay the launch by almost a week
-
#125
by
meekGee
on 02 Jun, 2024 13:57
-
It's like climate vs. weather.
Fiber is important as a rule, as are other things, but you can go on any non-poisonous diet for a week or two and nothing bad will happen.
If the effect is linear then being on a bad diet for a couple weeks would hurt life expectancy by something like a day. Do you have scientific evidence that a linear model isn't correct here?
Edit: if life expectancy is 35 years, increasing probability of loss of crew from 1 in 270 to 1 in 264.4 costs one day of life expectancy. So an extra day of life expectancy is not huge but could make the difference between barely meeting the LOC requirement and barely not.
It's even better than linear. The body repairs itself.
Chronic bad diet will add up and shorten your life, but there's no mechanism by which your body remembers 20 years later that week you went to Germany and only ate local food.
-
#126
by
DanClemmensen
on 02 Jun, 2024 14:25
-
So the launcher computer hardware is broken now? Still doesn't make it any better that they once again have to delay the launch by almost a week
Assuming I understood the press conference, I heard Tory(?) explain that the affected computer was GSE that was physically located in a rack in an enclosure near the rocket, and could not be safely accessed until the rocket was completely safed. I interpreted this to mean that they intended to access it to fix the problem. I speculate they intend to replace any failed or problematic field-replaceable units, probably circuit cards. The next-day launch was precluded because 24 hours is not enough time to safe the rocket, access the enclosure, replace hardware, test the replacement, and restart the countdown from a safe condition.
I do not know why they needed a 4-day delay instead of a 2-day delay in this case, but we have seen this sort of pattern in other launches and it related to management of the cryogenics in the GSE.
-
#127
by
laszlo
on 02 Jun, 2024 14:40
-
I have architected successful redundant systems. You need three (or more) cards if you need to support hot swap. In the GLS case, for reasons that are unclear they need all three cards to be functional: the scrub occurred because one card failed, and hot swap is not used. If all three elements must function, then (other things being equal) one card is more reliable than three cards, simply because traces on a PC card are more reliable than connectors. In earlier decades, the circuitry was more likely to be so large that it needed to be on multiple cards anyway, but depending on the function it is now often feasible to use a single card.
But not if the redundancy criteria require physical isolation to prevent a single catastrophic event from taking down all three circuits at once. Don't know if that's the case here, but that is a valid reason to have the systems on separate cards for redundancy.
As for why all three needed to be functional, sometime during the broadcast I thought I heard the speaker saying that the sequencer was required to go into the terminal count with all 3 cards functioning because, among other things, they blow the hold-down bolts. If that's actually true and not just a bad memory on my part, then it makes sense to require all three to be good so that in case of a double failure the third can still blow the bolts.
-
#128
by
Yellowstone10
on 02 Jun, 2024 14:51
-
I do not know why they needed a 4-day delay instead of a 2-day delay in this case
They had announced at the press conference on Friday that their backup launch opportunities were on the 2nd, 5th, and 6th. I don't know what rules out the 3rd and 4th, but it was not due to the nature of the GSE failure.
-
#129
by
Lee Jay
on 02 Jun, 2024 15:18
-
Two came up correctly, one came up slow and triggered a red-line.
I used to be a tech support engineer at Intel and I saw a lot of problems like this in the field. As I recall, every single one of them I ever saw turned out to be a power supply issue - low voltage or noisy power supply voltage due to improper grounding or failed/undersized capacitors.
Perhaps I should mention why I said this.
If they go out and start replacing circuit cards, discover that "everything works fine now" but don't understand that the underlying reason for the issue is the power situation, it can lead it an unreliable system. For example, maybe the reason "everything works fine now" is because one card is slightly more tolerant of out-of-spec power supply voltages than another, but that doesn't mean the power supply is in-spec. It just makes the system marginal. A slight change in the EMI state (say, from fluids flowing nearby, the atmospheric electric potential or even the current power grid voltage) can change the behavior of the system. You don't want that, you want it to always work regardless of these other things.
If I were them, I'd go out there and test the system with an o-scope hooked up to the power supply rails of the cards in question and record a trace of what the power supply looks like while the cards are coming up. Of course, I don't know the system so I'm only stating my past experience with this sort of failure. Computers don't just "come up slow". Given the right supply of power, they will come up within a window of microseconds or tens of microseconds every time. I've tested this on many types of systems.
-
#130
by
Jim
on 02 Jun, 2024 15:26
-
I too was struck more by the suit fan issue than the launch control system issue. Happily it was easy to resolve it quickly. That said, transitions to internal power are common in space launch. Does the industry have 'standard' test cases for that?
no standard test. Batteries are seldom put on line. Some flight ones are installed at the launch site.
-
#131
by
duh
on 02 Jun, 2024 16:43
-
Two came up correctly, one came up slow and triggered a red-line.
I used to be a tech support engineer at Intel and I saw a lot of problems like this in the field. As I recall, every single one of them I ever saw turned out to be a power supply issue - low voltage or noisy power supply voltage due to improper grounding or failed/undersized capacitors.
Thought about just clicking the "like" button on this one, but then started to remember a number of things
that fit into a this general category. Some strike me as a bit mind boggling so will try not to bore anyone
with details but some that come to mind:
Testing a breadboard at the required high temp limit at 85 deg C and it worked fine. For whatever the
cause (think about that for awhile if you will), instead of accepting this, had the temperature in the oven
raised. Yes, it failed at 86 deg C. Yes, the circuit was not designed correctly (Guess who was guilty -
I do not like fingers pointed at me even if it is done correctly). The redesign of a FET circuit eliminated
the problem.
Another case was when a circuit activated drawing a large but acceptable amount of power, the power
supply voltage remained within tolerance, but the dv/dt of the power supply adversely affected a circuit
demodulating an analog circuit. The problem took a major effort to isolate. The solution was realitivlely
simple.
Another involved a flip-flop in a digitial circuit. Two signals that were asynchronous to each other were
involved. It was not important if the timing of the input signal varied from the clock to the flipflop by however
much time. The problem was when the input signal arrived close to the clock edge that triggered the flip flop.
It was supposedly a "don't care" situation. If the flip flop changed state, fine. If not fine because it would
change on the next state. Supposedly the output of the flipflop is only going to be in one state or the other.
Haha. The output burped. The Q output put out a pulse that exceeded the logic 0 level and did not reach
the logic 1 level. This signal was sent to two different places. Ironically, the problem showed up rarely (days
apart) and presented two different signatures. The fix was simple: add a second flip flop so that the setup
time would not be violated on the second flipflop. Interestingly, this is referred to as metastability and some
flipflops are supposed guaranteed to not have this problem.
Another problem involved a supposedly involved a flight proven circuit for an oscillator. Interestingly in one
case, the oscillator (a simple SSI one integrated circuit design) took 8 seconds for the output voltage to
develop. Just say the circuit that was in production had never been subjected to a satisfactory worst case
design prior to its initial release.
Perhaps, I should stop now. Just attempting to provide an indication that there is information that supports
the quoted material. (Ok, somebody may be muttering "why did he just not press the like button" and
not bore us with all this trivia.)
Perhaps I should mention why I said this.
If they go out and start replacing circuit cards, discover that "everything works fine now" but don't understand that the underlying reason for the issue is the power situation, it can lead it an unreliable system. For example, maybe the reason "everything works fine now" is because one card is slightly more tolerant of out-of-spec power supply voltages than another, but that doesn't mean the power supply is in-spec. It just makes the system marginal. A slight change in the EMI state (say, from fluids flowing nearby, the atmospheric electric potential or even the current power grid voltage) can change the behavior of the system. You don't want that, you want it to always work regardless of these other things.
If I were them, I'd go out there and test the system with an o-scope hooked up to the power supply rails of the cards in question and record a trace of what the power supply looks like while the cards are coming up. Of course, I don't know the system so I'm only stating my past experience with this sort of failure. Computers don't just "come up slow". Given the right supply of power, they will come up within a window of microseconds or tens of microseconds every time. I've tested this on many types of systems.
-
#132
by
joek
on 02 Jun, 2024 17:04
-
...
If I were them, I'd go out there and test the system with an o-scope hooked up to the power supply rails of the cards in question and record a trace of what the power supply looks like while the cards are coming up. Of course, I don't know the system so I'm only stating my past experience with this sort of failure. Computers don't just "come up slow". Given the right supply of power, they will come up within a window of microseconds or tens of microseconds every time. I've tested this on many types of systems.
Plenty of room for speculation, but if I were them, I'd be looking at the diagnostic logs I hope and expect tell the story. This is a system, not just a CPU, and expect there is a series of initialization test & validation steps. That can take milliseconds to minutes for the entire system.
If any of those initialization steps hung or was delayed due to excessive retries, that might explain the the adjective "slow". For example, it took too long for the system to see what it expected to see on an I/O interface, which could point to a problem with the interface, or whatever the interface is connected to.
-
#133
by
Remes
on 02 Jun, 2024 17:21
-
Tory said they have occasionally had similar launch sequencer issues with previous Atlas launches. Not common, but not unheard off (just like with the valve issue back in May).
I find this attitude concerning, verging on unprofessional. If you have a known problem that bites you a few percent of the time,
He didn't say "a few percent of the time". Every piece of electronic can die at any time. It can be dead on arrival, it can die after 2 days or 20 days or 20 years. It's the bathtub curve, it might be low, but is never zero.
Tory commented on the card in the aftermath pressconference. At that point they didn't know if it was defect or what the real reason was. Just that, as everything else, they also had defects in that part of the system. But nothing significant. Totally legitimate from my point of view.
-
#134
by
joek
on 02 Jun, 2024 17:53
-
This might help clarify the GLS issue a bit:
To assure triple redundancy, “each of those three big racks—those three big computers—do a health check and monitor to see that those [ground launch sequencer] cards came up when they were commanded to … and begin doing their job,” ULA CEO Tory Bruno told reporters after the scrub. “Two came up normally, and the third was slow to come up. And that tripped a redline that created an automatic hold.”
...
The leading suspect is a hardware problem or a problem with the network communication among the three computers. “We won't really know until we get physical access and troubleshoot that one rack that has this one card that came up slow,” Bruno said.
...
“We have seen card failures in the past,” Bruno noted, “But they are relatively rare. This is the 100th flight of an Atlas V, and we haven’t had more than a handful of these over the years. So it’s not a common occurrence, but neither is it something that’s unheard of.”
Edit; spoke too soon...
In short, we don't yet know if it is a "card" issue or something else (chassis, intra-rack, inter-rack, ...).
p.s. My bet is I/O related, whether on- or off-card. All those PITA interfaces and connectors.From the update thread...
Repair is complete (replacement of power chassis).
-
#135
by
dglow
on 02 Jun, 2024 18:24
-
-
#136
by
Jorge
on 02 Jun, 2024 18:39
-
Two came up correctly, one came up slow and triggered a red-line.
I used to be a tech support engineer at Intel and I saw a lot of problems like this in the field. As I recall, every single one of them I ever saw turned out to be a power supply issue - low voltage or noisy power supply voltage due to improper grounding or failed/undersized capacitors.
Thought about just clicking the "like" button on this one, but then started to remember a number of things
that fit into a this general category. Some strike me as a bit mind boggling so will try not to bore anyone
with details but some that come to mind:
Testing a breadboard at the required high temp limit at 85 deg C and it worked fine. For whatever the
cause (think about that for awhile if you will), instead of accepting this, had the temperature in the oven
raised. Yes, it failed at 86 deg C. Yes, the circuit was not designed correctly (Guess who was guilty -
I do not like fingers pointed at me even if it is done correctly). The redesign of a FET circuit eliminated
the problem.
Another case was when a circuit activated drawing a large but acceptable amount of power, the power
supply voltage remained within tolerance, but the dv/dt of the power supply adversely affected a circuit
demodulating an analog circuit. The problem took a major effort to isolate. The solution was realitivlely
simple.
Another involved a flip-flop in a digitial circuit. Two signals that were asynchronous to each other were
involved. It was not important if the timing of the input signal varied from the clock to the flipflop by however
much time. The problem was when the input signal arrived close to the clock edge that triggered the flip flop.
It was supposedly a "don't care" situation. If the flip flop changed state, fine. If not fine because it would
change on the next state. Supposedly the output of the flipflop is only going to be in one state or the other.
Haha. The output burped. The Q output put out a pulse that exceeded the logic 0 level and did not reach
the logic 1 level. This signal was sent to two different places. Ironically, the problem showed up rarely (days
apart) and presented two different signatures. The fix was simple: add a second flip flop so that the setup
time would not be violated on the second flipflop. Interestingly, this is referred to as metastability and some
flipflops are supposed guaranteed to not have this problem.
Another problem involved a supposedly involved a flight proven circuit for an oscillator. Interestingly in one
case, the oscillator (a simple SSI one integrated circuit design) took 8 seconds for the output voltage to
develop. Just say the circuit that was in production had never been subjected to a satisfactory worst case
design prior to its initial release.
Perhaps, I should stop now. Just attempting to provide an indication that there is information that supports
the quoted material. (Ok, somebody may be muttering "why did he just not press the like button" and
not bore us with all this trivia.)
Perhaps I should mention why I said this.
If they go out and start replacing circuit cards, discover that "everything works fine now" but don't understand that the underlying reason for the issue is the power situation, it can lead it an unreliable system. For example, maybe the reason "everything works fine now" is because one card is slightly more tolerant of out-of-spec power supply voltages than another, but that doesn't mean the power supply is in-spec. It just makes the system marginal. A slight change in the EMI state (say, from fluids flowing nearby, the atmospheric electric potential or even the current power grid voltage) can change the behavior of the system. You don't want that, you want it to always work regardless of these other things.
If I were them, I'd go out there and test the system with an o-scope hooked up to the power supply rails of the cards in question and record a trace of what the power supply looks like while the cards are coming up. Of course, I don't know the system so I'm only stating my past experience with this sort of failure. Computers don't just "come up slow". Given the right supply of power, they will come up within a window of microseconds or tens of microseconds every time. I've tested this on many types of systems.
duh, your quoting is broken in this post. As a result it looks like you didn't write anything but all your words are attributed to Lee Jay.
-
#137
by
yoram
on 02 Jun, 2024 18:57
-
Does anyone know what kind of computer failed? Like IBM 4 PI, maybe?
Atlas is an old rocket, so it's likely an ancient computer too. I hope it's not something they have to keep alive with parts from Ebay.
-
#138
by
zoey
on 02 Jun, 2024 19:07
-
Does anyone know what kind of computer failed? Like IBM 4 PI, maybe?
Atlas is an old rocket, so it's likely an ancient computer too. I hope it's not something they have to keep alive with parts from Ebay.
in another CFT related thread Tony? iirc said they had spares of everything ready to go if something goes out.
-
#139
by
meekGee
on 02 Jun, 2024 19:33
-
This might help clarify the GLS issue a bit:
To assure triple redundancy, “each of those three big racks—those three big computers—do a health check and monitor to see that those [ground launch sequencer] cards came up when they were commanded to … and begin doing their job,” ULA CEO Tory Bruno told reporters after the scrub. “Two came up normally, and the third was slow to come up. And that tripped a redline that created an automatic hold.”
...
The leading suspect is a hardware problem or a problem with the network communication among the three computers. “We won't really know until we get physical access and troubleshoot that one rack that has this one card that came up slow,” Bruno said.
...
“We have seen card failures in the past,” Bruno noted, “But they are relatively rare. This is the 100th flight of an Atlas V, and we haven’t had more than a handful of these over the years. So it’s not a common occurrence, but neither is it something that’s unheard of.”
Edit; spoke too soon...
In short, we don't yet know if it is a "card" issue or something else (chassis, intra-rack, inter-rack, ...).
p.s. My bet is I/O related, whether on- or off-card. All those PITA interfaces and connectors.
From the update thread...
Repair is complete (replacement of power chassis).
If it's the hundred times and they've ONLY (

?) seen a handful of these... Handful being more than 2 and less than 10?
If a system fails 5% of the time, it's beyond unreliable, it's broken. Would you accept that from your car or your TV? You'd declare it a lemon and ship it back. Why is this much more expensive system any different?