Author Topic: A Different Take On Reliability  (Read 26656 times)

Offline Toast

A Different Take On Reliability
« on: 03/29/2016 04:45 am »
Many of you may have seen the Launch Vehicle Reliability Stats on Space Launch Report that are generated by forum member Ed Kyle. Lately, my interest is in how to improve the overall quality of the predicted rate of success for rocket launches.

My Methodology and Rationale:

Before beginning into my approach to addressing these issues, I should inform you that I am not a statistician. I enjoy studying statistics as a hobby, but I am not by any stretch of the imagination an expert and I would gladly welcome feedback from those of you who are. That said, we can move on.

I would like to make some key improvements to the previous predicted rates generated on Space Launch Report, as they are calculated using a method that has a few problems, mostly stemming from unjustified or undisclosed assumptions. Specifically:

First: The method uses an uniformative prior, which is inappropriate here. As has been pointed out in discussions elsewhere, this is essentially stating that our prior assumption for an unkown rocket's success rate is 50%. This is absurd, especially for rockets with low flight rates, as the prior overwhelms the observed data. For example, with the (questionable) distinction of the Falcon 9 FT as a separate rocket from previous Falcon 9 iterations, the Full Thrust variant has a projected reliability of 75%. This doesn’t pass the smell test, so to speak: nobody expects that 5 of the next 20 Falcon 9 launches will fail.
 
Second: This projects an inappropriate level of precision, as there is only a single projected rate rather than an interval of likely rates. We are working with a fairly sparse amount of data, and it is very hard to estimate the true reliability of a rocket from a small handful of launches. Only with a few rockets (e.g. Soyuz) do we have a sufficiently large dataset to be relatively confident of the total reliability.

Third: The methodology in distinguishing rockets is sometimes questionable. This certainly applies to the Falcon 9, where it has been broken up in to F9 1.0, F9 1.1 and F9 FT, but also applies to others, including the Soyuz 2 and Ariane 5. This problem is compounded with the use of an uniformative prior, as it is quite pessimistic in the context of rocket launches. It may seem reasonable to distinguish these variants, as it is not unreasonable to expect that (for example) different upper stages in the Ariane 5 variants would not have equal reliability. However, by not grouping similar rockets together we are saying that the prior is more informative that the reliability of related rockets. To use an example, this is stating that for the F9 FT, we expect the reliability will be closer to 50% than to the previous observed reliability for Falcon 9 launches (95%). That’s quite clearly wrong, and we can be fairly sure that while the F9 FT’s reliability may differ from previous iterations, it’s undoubtedly closer to 95% than to 50%.

With that out of the way, here is my methodology in approaching this problem:

I would like to make clear, at each stage, what my assumptions are and why I am making them.
First, I cleaned up and modified Ed Kyle’s dataset, giving me this:



I merged many related platforms because (as discussed above) while the reliability of variants on a rocket may not be equal to each other, they are probably more closely correlated to each other than to the prior. In simple terms, I expect that (for example) the Soyuz 2a’s long-term reliability to be closer to the Soyuz 2b than to the average of all rockets.

That brings us to the next item: the prior that I am using to draw these conclusions is based on the overall success rate of rocket platforms in active use today. Specifically, the prior I am using is a beta distribution with α = 0.95618 and β = 0.04382. Graphically, we are stating that the probability distribution is as follows, where the x-axis is the reliability (where 1 = 100%) and the y-axis is the probability density:



The assumption here is that, in the absence of platform-specific historical data, a good starting point for rocket reliability is the reliability of previous launches. This is obviously better than starting with 50:50 odds of success or failure, but it does have a few weaknesses. Most notably, this does not prevent predictions of 100% maximum reliability. My decision to use this method stems, in part, from the fact that I am hampered by a lack of good pre-existing datasets. Alternate approaches that I considered include:

Use a prior distribution based on cumulative failures per platform over time (the sequence of launches/failures). This would have the advantage of more accurately reflecting the increased risk in new rockets as well as the continuous improvement that takes place as a platform matures. This is a case where available data is the biggest problem: Space Launch Report has by far the best dataset in the genre, but unfortunately does not denote failure modes in a way that is easily computer-readable when looking at historical data for rocket platforms. Specifically, while the datasets for platform launch reliability denote success/failure/partial failure quite clearly, the platform launch history uses footnotes both for failures and for unique or notable launches. This makes it difficult to automate the construction of a dataset that includes the sequence of launches/failures. Eventually, I may revisit this methodology as I think it could represent a significant improvement.

Use a prior distribution for each rocket manufacturer independently. This would have the advantage of not penalizing highly reliable manufacturers by applying a prior based in part on less reliable manufacturer’s success rates (and vice versa). The problem with this approach comes down to the amount of available data. Very few manufacturers have a large enough dataset to allow this methodology, leaving relatively new platforms (such as Antares) with insufficient data to generate a good prior. Further, the benefit of this methodology is questionable as most manufacturers that do have a large number of launches to their name tend to have fairly similar success rates—even the relatively unreliable Proton has a 90% success rate. Combined with the diluted impact of the prior, this shouldn’t significantly penalize successful manufacturers (at the very least, it is a huge improvement from an uninformative prior).


Finally, I took the data on each launch platform and calculated a distribution using binom.bayes in r with the inputs from the prior distribution and the number of successes and failures from each platform. The results are as follows:



That’s a bit hard to parse, though, so let’s go rocket-by-rocket through a few of these.
 
Antares



Reliability: Mean 82.59%, Range 53.83% to ~100%

We’ll start with an instance where the prior overwhelms the data. We’ve got a very wide range this time due to Antares’ single failure, but due to the sparse dataset it’s hard to make any firm projections yet this early in the rocket’s lifespan.

Ariane 5


 
Reliability: Mean 97.62%, Range 94.43% to 99.94%
Moving on, here’s a rocket with a reasonably large launch history, the Ariane 5. Here, we can make predictions with significantly more confidence. The Ariane 5 family has only had two failures in 85 launches, both fairly early in its history.

Atlas 5



Reliability: Mean 98.34%, Range 95.15% to ~100.00%
Another rocket with an excellent and long track record. Atlas 5 has only suffered one complete failure out of 62 launches, so again we can be fairly confident in predicting a high success rate in future launches.

Delta II



Reliability: Mean 98.67%, Range 96.87% to 99.97%
One of the most reliable platforms to date, the Delta II has a good history and a large number of launches. Again, we can strongly predict a high likelihood of success in Delta II’s remaining launches.
 
Delta IV



Reliability: Mean 96.74%, Range 90.53% to ~100.00%
The Delta IV doesn’t have the long track record of its little brother, but still has a strong overall success rate. Due to the smaller amount of data available as well as the single launch failure, the high predicted success rate is accompanied by a relatively long tail.

Falcon 9



Reliability: Mean 95.46%, Range 86.92% to ~100.00%
A rocket that I’m sure many of you are interested in. The Falcon 9 had a strong start, but was hampered by a failure in its 19th launch. We’re still early in the Falcon 9’s career, so we’ve got a fairly large range of probabilities.

Proton M


 
Reliability: Mean 90.64%, Range 85.33% to 95.56%
Proton has a long launch history, but it’s significantly more checkered than the ones we’ve looked at so far. This is the first rocket we’ve looked at where we can confidently say that it will not exceed an upper bound of about 96% reliability in future launches.

Soyuz-2


 
Reliability: Mean 90.83%, Range 83.19% to 97.55%
The newcomer with big shoes to fill, the Soyuz 2 has had a rocky start but still has an overall positive record. Still, without significant improvement it will not reach the reliability of its older sibling, the Soyuz-U.

Soyuz-U


 
Reliability: Mean 97.18%, Range 95.97% to 98.31%
Soyuz-U has the longest track record of any rocket to date, and it has been quite successful. As you can see from the Beta Density on the chart, we can have extreme confidence in this prediction, and this is the shortest interval we’ll see.

Super Strypi


 
Reliability: Mean 47.78%, Range 0.00% to 94.07%
For the sake of showing the extremes of the model, we’ll look at the analysis of the Super Strypi, which failed its first launch attempt a few months back. We’ve got the largest interval in our dataset here, leaning towards the lower end of the scale. Despite our prior informing the model that the average rocket enjoys a 95% success rate, the S. Strypi is off to a rough start, and we can’t be very confident about where it will go from here.



That's about all I've got in me for now, it's taken quite a while to put this together so I'll leave it here for now. If you want any of the probability distribution graphs for a platform I didn't include here, shoot me a message and I'll try to get one to you. If you've got feedback, I'd love to hear it. Moving forward, the next step I'd like to take is to overhaul this methodology to reflect the sequence of success/failure. Rockets tend to fail more frequently early in their history, after which they become more and more reliable through continuous improvement, and these distributions do not take that into account. Beyond that, I'd like to delve a bit more deeply into failure modes. For these intervals, I only used complete failures (e.g. RUD, failure to orbit, etc.). A more accurate picture would also incorporate partial failures, such as missing a target orbit or losing a secondary payload. If there's something else you'd like me to address, let me know!

Offline Chris Bergin

Re: A Different Take On Reliability
« Reply #1 on: 03/29/2016 01:39 pm »
I don't have anything useful to add, but only a "like" doesn't do this justice. What a super thread starter. It's not in a viable format for an article, but that has more information than an article. Fine work!

I'm sure some people will have potential refinements, but that's the whole point of a forum, to allow discussion.
Support NSF via L2 -- Help improve NSF -- Site Rules/Feedback/Updates
**Not a L2 member? Whitelist this forum in your adblocker to support the site and ensure full functionality.**

Offline DAZ

  • Full Member
  • *
  • Posts: 162
  • Everett WA
  • Liked: 165
  • Likes Given: 1
Re: A Different Take On Reliability
« Reply #2 on: 03/29/2016 11:07 pm »
I very much like what you are trying to do here as it is an attempt to go beyond just success and failures.  I have a suggestion for going beyond just the simple.  How about you assign aging to each data point.  Older data points would have less weight than newer data points.  This would take into account how the systems seem to fail earlier in their lives in as the problems are discovered become more reliable with time.  But if they are not becoming more reliable with time this might be a way to highlight that.  For example data points that are 20 years old could age to a value of 0 whereas the most recent launch would have a weight of 100.  You could also use a similar system for waiting partial failures as opposed to total successes or failures.  You would have to be careful on how you waited your data as you could skew the results to the point of leading you to false conclusions.  In the above example of a data point that is 20 years or older having a value of 0 might be an example of very few or none of the people who were working on the rocket at that time are still working on that rocket.  This would mean that the reliability would be based on the inherent robustness of the design and the strength of the systems that were set up to build the rocket.

Offline Stan-1967

  • Full Member
  • ****
  • Posts: 1128
  • Denver, Colorado
  • Liked: 1183
  • Likes Given: 614
Re: A Different Take On Reliability
« Reply #3 on: 03/29/2016 11:33 pm »
The addition of the Beta Density axis is very a very helpful metric.   The conversations around here could be much better informed by overlaying of different rockets on the same scaled chart.   This method makes a very informative graphic of the differences that just cant be summarized by a %reliability guestimate.

Offline deaville

  • Full Member
  • **
  • Posts: 241
  • UK
  • Liked: 26
  • Likes Given: 1
Re: A Different Take On Reliability
« Reply #4 on: 04/11/2016 03:05 pm »
I'm not qualified to comment when assessing the reliability of launch vehicles. However, I did use a very simple analogy when teaching to show just how much testing was done to achieve the reliability of the Saturn rockets.

If your car starts on first try 9 times out of ten it is 90% reliable. If you have two cars and they both start on first try 9 times out of ten then the combined cars have an 81% reliability [9x9 over 10x10]. With three cars with the same characteristic it is 72% reliability [9x9x9 over 10x10x10].

The Saturn/Apollo complex was made up of several million components and was reckoned to be 99.9% reliable. Now that is mind boggling ..... !

On edit - The vehicle had 5,600,000 parts in 1,500,000 systems and assemblies. If all functioned with a 99.9% reliability factor, there would still be over 5,600 defects or malfunctions on the mission. Some space experts claimed that this was not worth the gamble even though the trip would face fewer unknowns than faced by Columbus.
« Last Edit: 04/17/2016 05:28 am by deaville »
Light travels faster than sound, which is why some people appear bright until they speak.

Offline savuporo

  • Senior Member
  • *****
  • Posts: 5152
  • Liked: 1002
  • Likes Given: 342
Re: A Different Take On Reliability
« Reply #5 on: 04/18/2016 04:56 am »
Would be awesome to see manned spacecraft side by side here too. Soyuz, Shenzhou, STS, Gemini, Apollo
Orion - the first and only manned not-too-deep-space craft

Offline deaville

  • Full Member
  • **
  • Posts: 241
  • UK
  • Liked: 26
  • Likes Given: 1
Re: A Different Take On Reliability
« Reply #6 on: 04/18/2016 05:30 am »
Would be awesome to see manned spacecraft side by side here too. Soyuz, Shenzhou, STS, Gemini, Apollo

What about Mercury, Vostok and Voskhod?  :-\
Light travels faster than sound, which is why some people appear bright until they speak.

Offline Tev

  • Member
  • Posts: 41
  • Prague
  • Liked: 20
  • Likes Given: 5652
Re: A Different Take On Reliability
« Reply #7 on: 04/18/2016 11:35 am »
This is great addition to Ed's great project! I hope he will see it that way too and figure out how to include it in his Space Launch Report. :)

Offline rocx

  • Full Member
  • ***
  • Posts: 383
  • NL
  • Liked: 266
  • Likes Given: 144
Re: A Different Take On Reliability
« Reply #8 on: 04/18/2016 01:21 pm »
I think the merging of launcher types, while giving a better result than keeping them completely separate, is throwing away useful data. Right now there are two levels of reliability: all launchers, and launcher type. I think reliability could be tracked on four levels: all launchers, manufacturer, type and version. And aging of data points should also give an improvement in the results, once the right weighting of parameters is found.
Any day with a rocket landing is a fantastic day.

Offline Proponent

  • Senior Member
  • *****
  • Posts: 7277
  • Liked: 2782
  • Likes Given: 1462
Re: A Different Take On Reliability
« Reply #9 on: 04/18/2016 01:49 pm »
How Is failure defined?  For instance yes there was one catastrophic failure.  But there was a satellite or two that didn't achieve their orbits.  *cough* interesting *cough*

Ideally, one would keep statistics according to multiple definitions of success, such as "complete success," "primary payload deli0vered as promised" and "payload delivered to trajectory significantly impairing operations."

Offline Proponent

  • Senior Member
  • *****
  • Posts: 7277
  • Liked: 2782
  • Likes Given: 1462
Re: A Different Take On Reliability
« Reply #10 on: 04/18/2016 02:04 pm »
I think the merging of launcher types, while giving a better result than keeping them completely separate, is throwing away useful data. Right now there are two levels of reliability: all launchers, and launcher type. I think reliability could be tracked on four levels: all launchers, manufacturer, type and version. And aging of data points should also give an improvement in the results, once the right weighting of parameters is found.

Aging is interesting, and not just for new launch vehicles.  One factor cited by Wayne Eleazor in several launch failures is the the impending phase-out of a vehicle, resulting in a lack of manpower and attention.

Offline Toast

Re: A Different Take On Reliability
« Reply #11 on: 05/03/2016 09:51 pm »
Hey everybody, haven't had time to work on this much lately but I'm anticipating some free time in a week or two. I'll be updating all the numbers with the launches we've seen over the last little while, and I'll start reworking my model to include some of these suggestions.

Specifically, I'll try to work in:
   -Consecutive successes/weighting of recent results
   -Breakout for crewed vehicles
   -Breakdown of complete success/partial success/failure
   -Split version of reliability using global/manufacturer/rocket/variant-specific data

I'll also probably just upload this all as a pdf or other downloadable document, that will be easier than separately uploading each graph.

Additionally (with a big maybe) I'll try to start working on some infographics. I love the "Rockets of the World" infographic, but it is perpetually out of date (with launches almost every week, it's hard to keep your tallies updated). I'm trying to work on a few ways of scripting the generation of such a chart or infographic so that I can have it update without any intervention on my part. Don't hold your breath on this part, it's something I'm determined to do but it'll require some time to get set up and running, and I'm still looking for the right tools to automate the job.

Offline vaporcobra

Re: A Different Take On Reliability
« Reply #12 on: 04/13/2018 12:05 am »
I just found this while googling launch vehicle reliability, had no idea it existed! Bravo, Toast ;D

I'd be very curious to see this updated for Soyuz, Proton, and Falcon 9.

Offline Toast

Re: A Different Take On Reliability
« Reply #13 on: 04/13/2018 01:01 am »
Yeah, it's definitely getting a bit dated where the Falcon 9 is concerned, we're at twice the launch history now. Unfortunately I lost the R script that I used, so I can't just plug in some new numbers, and I've been too busy/lazy (and really more the latter) to get around to rebuilding it. I've been wondering about shaking up the methodology too, but I'm just not enough of a statistician to do much to improve it.

But soonTM I'll get around to it...

Offline vaporcobra

Re: A Different Take On Reliability
« Reply #14 on: 04/13/2018 01:15 am »
Yeah, it's definitely getting a bit dated where the Falcon 9 is concerned, we're at twice the launch history now. Unfortunately I lost the R script that I used, so I can't just plug in some new numbers, and I've been too busy/lazy (and really more the latter) to get around to rebuilding it. I've been wondering about shaking up the methodology too, but I'm just not enough of a statistician to do much to improve it.

But soonTM I'll get around to it...

Nooooo! Sorry to hear that, I can sympathize with the annoyance of losing work. No worries at all, just wanted to let you know that the interest is there if you can ever spare the time :) Life comes first!

Offline S.Paulissen

  • Full Member
  • ****
  • Posts: 442
  • Boston
  • Liked: 334
  • Likes Given: 511
Re: A Different Take On Reliability
« Reply #15 on: 05/24/2018 06:11 pm »
How is SpaceX reliability faring these days?
"An expert is a person who has found out by his own painful experience all the mistakes that one can make in a very narrow field." -Niels Bohr
Poster previously known as Exclavion going by his real name now.

Offline envy887

  • Senior Member
  • *****
  • Posts: 8144
  • Liked: 6801
  • Likes Given: 2965
Re: A Different Take On Reliability
« Reply #16 on: 05/24/2018 06:22 pm »
How is SpaceX reliability faring these days?

It's now 54/56=.964 vs the 21/22=.954 listed here (counting AMOS-6 as a failure and not counting the partial failure on CRS-1).

I can't rerun the analysis here, but I's guess the mean hiked up to about .96, and the spread tightened quite a bit. Both failures are now early in the F9's history (man, that sounds really weird to say only 18 months after AMOS), so that would help a lot here.

Offline rocketguy101

  • Member
  • Full Member
  • ****
  • Posts: 868
    • Strib's Rocket Page
  • Liked: 244
  • Likes Given: 888
Re: A Different Take On Reliability
« Reply #17 on: 05/24/2018 06:37 pm »
Wow!  Nice job!  I missed this first time around...I have nothing to add, except I liked this story from “APOLLO The Race to the Moon” by Charles Murray and Catherine Bly Cox...

Quote
The joke that made the rounds of NASA was that the Saturn V had a reliability rating of .9999.  In the story, a group from headquarters goes down to Marshall and asks Wernher von Braun how reliable the Saturn is going to be. Von Braun turns to four of his lieutenants and asks, "Is there any reason why it won't work?" to which they answer: "Nein." "Nein." "Nein." "Nein." Von Braun then says to the men from headquarters, "Gentlemen. I have a reliability of four nines."
David

Offline edkyle99

  • Expert
  • Senior Member
  • *****
  • Posts: 15391
    • Space Launch Report
  • Liked: 8565
  • Likes Given: 1356
Re: A Different Take On Reliability
« Reply #18 on: 05/24/2018 06:51 pm »
How is SpaceX reliability faring these days?
SpaceX altogether, or just the currently-operated launch vehicle variants? 

Falcon 9 v1.2 has flown 35 times with no launch failures, but this ignores the AMOS 6 failure which involved a fully stacked vehicle with payload during a pre-launch test.  If AMOS 6 is included, Falcon 9 v1.2 sports a 0.85-0.99-ish 95% confidence interval range with a 0.95 point estimate, placing it about 8th or so among the world's major launchers (behind H-2A but ahead of the CZ-4 series, all incrementally).  If AMOS 6 is ignored, the numbers are 0.91-1.0 for the range and 0.97 for the point estimate, slightly behind Atlas 5 and slightly ahead of Delta 4 Medium and Ariane 5 ECA. 

For SpaceX entirely, the company has performed 61 launches with 5 failures, not including AMOS 6.  If AMOS 6 is included, the total is 56 successes in 62 "campaigns", a 0.80-0.96 95% confidence interval range with a 0.89 point estimate (not as good as, say, Rokot, but better than, say, Pegasus, or, more apropos, slightly better than OSC-Orbital-ATK's 68/77 (0.79-0.94, 0.87 point estimate). 

SpaceX reliability has obviously improved from the early days, but we don't really need statistics to tell us that result.

 - Ed Kyle
« Last Edit: 05/24/2018 07:16 pm by edkyle99 »

Offline EnigmaSCADA

  • Full Member
  • *
  • Posts: 136
  • Earth
  • Liked: 133
  • Likes Given: 0
Re: A Different Take On Reliability
« Reply #19 on: 05/24/2018 07:51 pm »
Would you implement the aging of data based on time, number of launches elapsed, or some percentage of the total number of launches?

To me, if I'm buying launch services, and option A flies once annually and has a near perfect record over the last 15 years vs option B which flies ~30 times a year and has 3 failures that happened 8, 12, and 15 years ago. I'm leaning toward trusting option B with my expensive payload.

The same could be said of choosing a surgeon. Do you want a guy who has performed a procedure twice a year over 20 years or a guy who does it nearly every day for the past 10 years?

So aging by number of launches in the past makes sense to me vs by time in the past. Maybe a way to compound the two?

Tags:
 

Advertisement NovaTech
Advertisement Northrop Grumman
Advertisement
Advertisement Margaritaville Beach Resort South Padre Island
Advertisement Brady Kenniston
Advertisement NextSpaceflight
Advertisement Nathan Barker Photography
1