-
#1100
by
mn
on 12 Feb, 2020 14:48
-
And if I recall, Boeing took pictures of the pins and didn't bother to look at them until after the failure.
For a while I would come on here and do my best to answer questions from an engineering perspective. I was called "Boeing PR", etc. I rarely come on here any more because many of you know better than anyone/everyone else.
But this....this is just out of order. The pin "buy" is a "TQN" buy. Meaning the tech believed it was installed, Quality believed it was installed correctly and NASA believed the pin was installed correctly. Every closeout photo is not reviewed, especially when three different and independent people believed it to be installed correctly.
That said, this was a definite lesson learned and the procedure and process has been revised.
Are you serious? Belief belongs in a church and not in an aerospace company! I work with aircraft and that word is pretty much taboo.
Bob, have you tightened the wing attachment bolts? Yeah, I believe so...
If QA at Boeing can’t establish beyond doubt if a shackle pin has been tightened to specs and just ‘believe‘ that it has the process is entirely pointless. A design that cannot be properly QA checked during and after assembly is bad design and a QA team just going through the motions is beyond the pale. If the part cannot be inspected properly after assembly, at least review the photographs!
The software issues that also hadn’t been picked up pre flight just highlight that their QA is basically flawed. Excusing that is not Boeing PR, it’s Boeing hypocrisy IMHO...
I’ve got to agree with that.
Having a critical safety system that can be misconfiguered by a simple human error, and which is not obviously revealed, is simply unacceptable in most high hazard industries. Multiple human checks buys nothing. I have witnessed multiple critical checks failing first hand simply because everyone assumed that it was being thoroughly checked multiple times by other people.
What I would like to know is if this was a tried and tested shackle design that was simply too venerable for anybody to bother doing a meaningful failure modes analysis, or if it was a design that had been ‘improved’ in some well meaning but ultimately misguided manner (I’ve seen that a few times too).
What is most worrying through all this is that I can still sense a reluctance to admit that anything has really gone wrong. Safety culture is built on embracing failure and learning from it.
And what kind of process changes can you do to fix humans not checking what they are supposed to check. Adding more checks won't fix that, might even make it worse.
-
#1101
by
Tulse
on 12 Feb, 2020 15:01
-
Are there any more details on the comms issue(s) during the test? I really have a hard time understanding the claim about cell phone tower signals having a critical impact on comms -- if so, why haven't other missions experienced this, and why wasn't it anticipated? (And how the heck do cell towers impact on communications with a craft 100km away and above them?)
-
#1102
by
abaddon
on 12 Feb, 2020 15:02
-
And what kind of process changes can you do to fix humans not checking what they are supposed to check. Adding more checks won't fix that, might even make it worse.
Humans who are supposed to check and sign off on things should be held accountable when they don't do it. If there are no repercussions, their reactions will likely be yours, namely to shrug and say "better luck next time".
I sure hope I'm not flying on planes built, serviced, or piloted by humans who think like you do. Fortunately there are organizations like the NTSB that protect against that kind of attitude.
-
#1103
by
Captain Crutch
on 12 Feb, 2020 15:40
-
The issues with Starliner could be excusable if they were isolated issued that were one-time flukes. However, the fact that the QA on these capsules seems to constantly be lacking is very concerning. What I'm saying is concerning is press releases stating things such as "catastrophic failure" (albeit there wasn't, just likely) and yet it seems like some of those involved seem to suggest to the media that it's not a big deal and nothing worthy of attention (saying it's not an issue because they, fortunately, caught it before it became an issue at the last possible second), even lying about the effects of issues (Ars Technia revealing they contacted Boeing and a representative told them the software issue would not have affected the capsule during reentry) is incredibly concerning. It's also appalling that NASA has said they want to be transparent and hold everyone equally accountable then shortly after going back on that on that assertion by keeping an issue that could have resulted in LOV a secret for over a month.
The day Crew Dragon blew up on LZ-1 there was media coverage on the issue, and press were informed of as much as was known at the time rapidly. A month goes by without a peep after Boeing was hours away from a catastrophic failure. I don't see how anyone could deny that this is unequal treatment and unacceptable that Starliner's issues are this common and this severe.
SpaceX, a relatively new company, sent a spacecraft to the ISS and returned it to earth on their first try, Boeing who has been around the block and (to my knowledge) some part of some of the first-ever spacecraft rendezvous, yet they just barely brought themselves in to orbit on their first shot with their new vehicle. I understand issues happen, but it seems like every test that has been televised with this vehicle has experienced a painfully obvious issue.
-
#1104
by
punder
on 12 Feb, 2020 16:08
-
For a while I would come on here and do my best to answer questions from an engineering perspective.
Perhaps. Along with calling people "trolls" for expressing reasonable and mildly stated opinions.
If you can't see the dissonance between "give us nearly twice as much money for being mighty Boeing" and "oops sorry we forgot to connect the parachute" then perhaps there truly is some bias involved.
Mods, do your will...
-
#1105
by
kdhilliard
on 12 Feb, 2020 16:50
-
...
The day Crew Dragon blew up on LZ-1 there was media coverage on the issue, and press were informed of as much as was known at the time rapidly. A month goes by without a peep after Boeing was hours away from a catastrophic failure. I don't see how anyone could deny that this is unequal treatment and unacceptable that Starliner's issues are this common and this severe.
...
Had the CST-100 valve mapping problem resulted in a large cloud of NTO visible to beach goers followed by the leak of a dramatic video, then it would have been promptly discussed by NASA and Boeing. The Commercial Crew Development process is largely opaque, and this isn't the first serious issue revealed long after the fact via an ASAP public meeting. There may be differences in transparency between the organizations when it comes revealing setbacks, but it's like comparing two different shades of welding glass.
-
#1106
by
Tommyboy
on 12 Feb, 2020 18:35
-
And what kind of process changes can you do to fix humans not checking what they are supposed to check. Adding more checks won't fix that, might even make it worse.
Humans who are supposed to check and sign off on things should be held accountable when they don't do it. If there are no repercussions, their reactions will likely be yours, namely to shrug and say "better luck next time".
I sure hope I'm not flying on planes built, serviced, or piloted by humans who think like you do. Fortunately there are organizations like the NTSB that protect against that kind of attitude.
What? Mn is asking exactly the right question; If 3 people doing the same checks didn't catch this error, what safety margin would adding a fourth person doing the exact same checks add? Why do we not start by analyzing what events/assumptions/process/culture led to the engineer not attaching the pin properly, and why did none of the three checking people catch the error?
Major failures are a long process with multiple dependencies, not singular events.
-
#1107
by
mn
on 12 Feb, 2020 18:53
-
And what kind of process changes can you do to fix humans not checking what they are supposed to check. Adding more checks won't fix that, might even make it worse.
Humans who are supposed to check and sign off on things should be held accountable when they don't do it. If there are no repercussions, their reactions will likely be yours, namely to shrug and say "better luck next time".
I sure hope I'm not flying on planes built, serviced, or piloted by humans who think like you do. Fortunately there are organizations like the NTSB that protect against that kind of attitude.
A: Lucky for you I don't build planes. (But Boeing does, do you fly?)
B: I was not suggesting that I think this is OK. I was asking how can Boeing fix
their problem with
their people not doing what
they were supposed to do.
This is a serious question: Boeing and NASA are both saying that they will fix the processes that allowed these issues to go undetected and I'm wondering what
process changes do you make to fix
people problems.
-
#1108
by
DistantTemple
on 12 Feb, 2020 18:55
-
Well if the same checks aren't working, find an alternative way to check, get external auditing, contract others to oversee.... ask NASA to take a look!.... :-) Make up some new testing regimes, like test the thruster firings on the ground, if not live then instrument it and see which vlaves open etc! If no one signs it off, it doesn't fly. If it is signed off fraudulently that's a felony in the space business, and jail results.
-
#1109
by
abaddon
on 12 Feb, 2020 19:15
-
B: I was not suggesting that I think this is OK. I was asking how can Boeing fix their problem with their people not doing what they were supposed to do.
Like everyone else in the world does -
hold them accountable. Since you seem to like bold.
If there were three individuals that separately signed off on the pin being in place when it wasn't, that's not a "process" issue, that's a "people" issue. The "process" part of the issue is Boeing hand-waving the result as if it's not a problem, which reinforces the kind of culture where people - shockingly - assume it's not a big deal. Whether those folks got the same message or not internally is a separate question.
-
#1110
by
ThatOldJanxSpirit
on 12 Feb, 2020 19:40
-
B: I was not suggesting that I think this is OK. I was asking how can Boeing fix their problem with their people not doing what they were supposed to do.
Like everyone else in the world does - hold them accountable. Since you seem to like bold.
If there were three individuals that separately signed off on the pin being in place when it wasn't, that's not a "process" issue, that's a "people" issue. The "process" part of the issue is Boeing hand-waving the result as if it's not a problem, which reinforces the kind of culture where people - shockingly - assume it's not a big deal. Whether those folks got the same message or not internally is a separate question.
No it’s an engineering issue and a process issue because the engineering should not have permitted an unrevealed failure and a process that is relying on multiple identical human checks is fundamentally a bad process.
It’s easy to blame the people, but a good systems approach takes into account that people are fallible.
-
#1111
by
mn
on 12 Feb, 2020 19:51
-
B: I was not suggesting that I think this is OK. I was asking how can Boeing fix their problem with their people not doing what they were supposed to do.
Like everyone else in the world does - hold them accountable. Since you seem to like bold.
If there were three individuals that separately signed off on the pin being in place when it wasn't, that's not a "process" issue, that's a "people" issue. The "process" part of the issue is Boeing hand-waving the result as if it's not a problem, which reinforces the kind of culture where people - shockingly - assume it's not a big deal. Whether those folks got the same message or not internally is a separate question.
No it’s an engineering issue and a process issue because the engineering should not have permitted an unrevealed failure and a process that is relying on multiple identical human checks is fundamentally a bad process.
It’s easy to blame the people, but a good systems approach takes into account that people are fallible.
This makes sense, but the question is so "how"?
-
#1112
by
meberbs
on 12 Feb, 2020 20:07
-
As stated above this is sound like the problem is more with people than process. In this case, the solution is you fix the people. In general, that is known as "training." In a case like this it seems there is a general culture problem, so efforts to fix that will take more than just training, but demonstration of consequences for improper performance. Of course you can't always fix "people" in that case the solution is to get new people. (i.e. fire the people who don't do their jobs and hire new ones, turnover also does something for fixing broken culture.)
While I agree that some of the statements coming from Boeing make it sound questionable how serious they are taking it, the statements from NASA make it sound like NASA has recognized the fundamental issues and will require Boeing to address them properly. Public statements from Boeing that minimize the issue won't let them get past NASA without proper remediation of issues.
-
#1113
by
Chasm
on 12 Feb, 2020 22:08
-
B: I was not suggesting that I think this is OK. I was asking how can Boeing fix their problem with their people not doing what they were supposed to do.
Like everyone else in the world does - hold them accountable. Since you seem to like bold.
If there were three individuals that separately signed off on the pin being in place when it wasn't, that's not a "process" issue, that's a "people" issue. The "process" part of the issue is Boeing hand-waving the result as if it's not a problem, which reinforces the kind of culture where people - shockingly - assume it's not a big deal. Whether those folks got the same message or not internally is a separate question.
It can be dead easy to make Boeing care...
1st step: Ask Boeing when the next flight can be done.
3nd step: Do the flight on that date.
Starliner has 7 seats. This is a fully automated test flight with no required crew interactions.
2nd step: Strap the Boeing CEO and a random 6 (out of 13) Members of the Board of Directors into it...
If the test flight fails ask Boeing again when the next flight can be done. This time with the replacement CEO and the Board members that did not fly, last seat goes to the CFO...
Unfortunately this is the real world so the best idea for NASA is to keep NASA astronauts of the test flight and let the commercial test pilots earn their wage. Or maybe more politically important not to do a ride share until the thing actually worked more than once. (Applies to both Starliner and Dragon.)
But there is a seat shortage and as far as NASA politics is concerned everything is fine...
-
#1114
by
Lemurion
on 12 Feb, 2020 23:24
-
I know I've mentioned it before, but I see Boeing's repeated attempts to categorize failures as successes to be a big problem. If you're not willing to admit that things went wrong, you don't have the impetus to fix them.
One "positive" of the DM-1 pad explosion was that there was no way to spin it as a success. Everyone knew it was a failure and one that they had to take seriously. Minimizing the issues with Starliner takes the onus away from the need to focus on correcting the problems and puts it onto managing perceptions.
-
#1115
by
xyv
on 13 Feb, 2020 01:35
-
What? Mn is asking exactly the right question; If 3 people doing the same checks didn't catch this error, what safety margin would adding a fourth person doing the exact same checks add? Why do we not start by analyzing what events/assumptions/process/culture led to the engineer not attaching the pin properly, and why did none of the three checking people catch the error?
Major failures are a long process with multiple dependencies, not singular events.
An engineer shouldn't be touching flight hardware. Engineers want to do the next thing. Quality techs and assemblers have a mental makeup that takes pride in doing the 50th one like the first one and those are the personality types you want matched to that type of work.
This is a serious question: Boeing and NASA are both saying that they will fix the processes that allowed these issues to go undetected and I'm wondering what process changes do you make to fix people problems.
You don't fix a people problem with a process change; you fix it with a culture change. There a great article going around about how McDac took over Boeing's engineering centric culture and replaced it with focus on the shareholder. This cannot be fixed quickly. While I never thought that highly of Martin Marietta, when I toured the Michoud facility that was building the first shuttle tanks I heard that "...every day starts with a safety talk...they have to know that it starts at the top..." This was then seamlessly melded into a discussion about quality control...they were basically empowering the workforce to control safety and quality at the ground floor. I fear Boeing has a long road ahead.
-
#1116
by
SWGlassPit
on 13 Feb, 2020 14:30
-
B: I was not suggesting that I think this is OK. I was asking how can Boeing fix their problem with their people not doing what they were supposed to do.
Like everyone else in the world does - hold them accountable. Since you seem to like bold.
If there were three individuals that separately signed off on the pin being in place when it wasn't, that's not a "process" issue, that's a "people" issue. The "process" part of the issue is Boeing hand-waving the result as if it's not a problem, which reinforces the kind of culture where people - shockingly - assume it's not a big deal. Whether those folks got the same message or not internally is a separate question.
It can be dead easy to make Boeing care...
1st step: Ask Boeing when the next flight can be done.
3nd step: Do the flight on that date.
Starliner has 7 seats. This is a fully automated test flight with no required crew interactions.
2nd step: Strap the Boeing CEO and a random 6 (out of 13) Members of the Board of Directors into it...
If the test flight fails ask Boeing again when the next flight can be done. This time with the replacement CEO and the Board members that did not fly, last seat goes to the CFO...
Unfortunately this is the real world so the best idea for NASA is to keep NASA astronauts of the test flight and let the commercial test pilots earn their wage. Or maybe more politically important not to do a ride share until the thing actually worked more than once. (Applies to both Starliner and Dragon.)
But there is a seat shortage and as far as NASA politics is concerned everything is fine...
Just stop. There is no value in comments like this. Take your reality TV fantasy somewhere else.
-
#1117
by
niwax
on 13 Feb, 2020 15:39
-
What? Mn is asking exactly the right question; If 3 people doing the same checks didn't catch this error, what safety margin would adding a fourth person doing the exact same checks add? Why do we not start by analyzing what events/assumptions/process/culture led to the engineer not attaching the pin properly, and why did none of the three checking people catch the error?
Major failures are a long process with multiple dependencies, not singular events.
An engineer shouldn't be touching flight hardware. Engineers want to do the next thing. Quality techs and assemblers have a mental makeup that takes pride in doing the 50th one like the first one and those are the personality types you want matched to that type of work.
This is a serious question: Boeing and NASA are both saying that they will fix the processes that allowed these issues to go undetected and I'm wondering what process changes do you make to fix people problems.
You don't fix a people problem with a process change; you fix it with a culture change. There a great article going around about how McDac took over Boeing's engineering centric culture and replaced it with focus on the shareholder. This cannot be fixed quickly. While I never thought that highly of Martin Marietta, when I toured the Michoud facility that was building the first shuttle tanks I heard that "...every day starts with a safety talk...they have to know that it starts at the top..." This was then seamlessly melded into a discussion about quality control...they were basically empowering the workforce to control safety and quality at the ground floor. I fear Boeing has a long road ahead.
I really recommend some reading around the shuttle to understand who these situations can arise and how they can be solved. Wayne Hale is always recommended of course, but even the Rogers Commission report is quite readable and enlightening if morbid. Ben Rich has a section in his book about the actions they took at Lockheed when people used wrong components or left stuff in fuel tanks.
Another ironically interesting one is Joe Sutter. He wrote in length about his work on the Rogers commission in his book on developing of the 747. I wonder what he would say if he saw Boeing today.
-
#1118
by
Paul Moir
on 13 Feb, 2020 16:27
-
... Humans who are supposed to check and sign off on things should be held accountable when they don't do it. ...
This is entirely the wrong course to take when you find a missed check in your fault tree. If you fail to ask yourself why someone didn't do something they were supposed to do then you will never fix your process.
In my experience human checks have always had a sizeable fault rate. It is controllable somewhat with culture, for example the best I've seen is how pilots do them. I think checks fail to control a processes so often because the activity is fundamentally inhuman: it is very difficult to see what you don't expect to see.
-
#1119
by
gaballard
on 13 Feb, 2020 18:58
-
... Humans who are supposed to check and sign off on things should be held accountable when they don't do it. ...
This is entirely the wrong course to take when you find a missed check in your fault tree. If you fail to ask yourself why someone didn't do something they were supposed to do then you will never fix your process.
In my experience human checks have always had a sizeable fault rate. It is controllable somewhat with culture, for example the best I've seen is how pilots do them. I think checks fail to control a processes so often because the activity is fundamentally inhuman: it is very difficult to see what you don't expect to see.
That might be the case if the Starliner issues were edge cases that were hard to predict. They're not. They forgot to put a pin in a parachute. That's like forgetting to make sure your wheels were attached to your car before driving... it's such a basic and incredibly important thing to miss. They pulled the wrong time. They mapped thrusters wrong. All of these things should not have slipped through. QA processes/checks work for a lot of other companies and organizations throughout history... Boeing's not an exception by coincidence.