ABSTRACT Title: RISK AND ECONOMIC ESTIMATION OF INSPECTION POLICY FOR PERIODICALLY TESTED REPAIRABLE COMPONENTS Carlos Eduardo Barroeta, M.S., 2005 Thesis directed by: Professor Mohammad Modarres, Mechanical Engineering Department This report presents a model to identify the optimal time between surveillance tests and overhaul frequency of components whose failures are detected upon inspection. The model is based on minimizing the total cost per unit time during the component renewal cycle. It considers the component availability assuming that the unit is ?as old? after tests and repairs and ?as new? after overhauls. The model takes into account costs associated with tests and maintenance, as well as potential losses related to unavailability. General conditions and a case study are discussed to evaluate the effect of costs, maintenance task durations, and the uncertainty of the reliability parameters on the optimal inspection policy of typical tested components. This report also discusses the advantage of the cost-based optimization versus the traditional approach based on maximal availability. RISK AND ECONOMIC ESTIMATION OF INSPECTION POLICY FOR PERIODICALLY TESTED REPAIRABLE COMPONENTS By Carlos Eduardo Barroeta Thesis submitted to the Faculty of the Graduate School of the University of Maryland, College Park, in partial fulfillment of the requirements for the degree of Master of Science 2005 Advisory Committee: Professor Mohammad Modarres, Chair Professor Ali Mosleh Professor Aristos Christou ? Copyright by Carlos Eduardo Barroeta 2005 Acknowledgements I really appreciate the guide and advice of Dr. Mohammad Modarres in the development of this work, as well as the valuable comments of Dr. Ali Mosleh and Dr. Aris Christou. I also appreciate the support of my mates in the CTRS: Jos? Luis, Reza, Mercedes, Mohammad and Genebelin. I am especially thankful to my wife, Rosa Ana, for her love and company during these years. ii Table of Contents Acknowledgements....................................................................................................... ii Table of Contents.........................................................................................................iii List of Tables ................................................................................................................ v List of Figures.............................................................................................................. vi Chapter 1: Introduction................................................................................................. 1 Chapter 2: Theoretical Background.............................................................................. 4 2.1 General definitions.............................................................................................. 4 2.1.1 Non-repairable units..................................................................................... 4 2.1.2 Repairable units ........................................................................................... 4 2.2 Availability of repairable items .......................................................................... 5 2.3 Maintenance and renewal theory ........................................................................ 7 2.3.1 Ordinary renewal process (ORP)................................................................. 8 2.3.2 Non-homogeneous Poisson process (NHPP)............................................... 9 2.3.3 Generalized renewal process (GRP) .......................................................... 11 2.4 Inspection policies for periodically tested components .................................... 15 Chapter 3: Analytical Model....................................................................................... 22 3.1 Time between successive failures..................................................................... 23 3.2 Cost per unit time and cycle lengths ................................................................. 25 3.3 Increasing test and repair costs ......................................................................... 27 3.3.1 Linear function........................................................................................... 28 3.3.2 Non-linear function 1................................................................................. 28 3.3.2 Non-linear function 2................................................................................. 28 3.3 Description of the variables involved in the analytical model.......................... 30 Chapter 4: Results....................................................................................................... 33 4.1 Average availability.......................................................................................... 33 4.2 Cost per unit time.............................................................................................. 35 4.3 Optimal inspection interval and overhaul frequency........................................ 40 4.4 Availability versus cost-based optimization ..................................................... 45 4.5 Sensitivity evaluation........................................................................................ 47 iii 4.6 Uncertainty in Weibull parameters ................................................................... 51 Chapter 5: Extensions ................................................................................................. 55 5.1 Generalized Renewal Process after test cycles ................................................. 55 5.2 Imperfect surveillance inspections.................................................................... 56 5.3 Further uncertainty and risk analysis ................................................................ 56 5.4 Systems of periodically tested components ...................................................... 57 Chapter 6: Conclusion................................................................................................. 59 Appendix..................................................................................................................... 62 Bibliography ............................................................................................................... 68 iv List of Tables Table 1. Comparison between typical stochastic repair processes ............................. 14 Table 2. Example of parameters for repair cost functions.......................................... 29 Table 3. Values for the average availability analysis ................................................. 34 Table 4. Arbitrary values for cost rate function examples.......................................... 36 Table 5. Input values for safety relief valve example................................................. 40 Table 6. Availability versus cost-based optimization results for a relief valve.......... 47 Table 7. Results of sensitivity evaluation for the case study ...................................... 48 Table 8. Average availability for systems with periodically tested units ................... 58 v List of Figures Figure 1. Categories of stochastic point processes for repairable systems................... 8 Figure 2. Basic notation for a stochastic point process................................................. 9 Figure 3. Conditional probability of occurrence of failure......................................... 10 Figure 4. Approximate point unavailability for periodically tested components ....... 17 Figure 5. Probability density function considering inspection at T ............................ 19 Figure 6. Basic notation for the mathematical model................................................. 23 Figure 7. Conditional probability of occurrence of failure with inspection at T ........ 24 Figure 8. Behavior of three types of incrementing repair cost functions.................... 29 Figure 9. Average availability versus test cycle number for ? > 1 ............................ 34 Figure 10. Average availability versus test cycle number for ? ? 1 ........................... 35 Figure 11. Cost rate function for different values of N............................................... 36 Figure 12. Optimal test interval versus overhaul frequency for different ? ............... 37 Figure 13. Limiting cost rate function for different values of N................................. 38 Figure 14. T opt versus overhaul frequency for a safety relief valve ............................ 42 Figure 15. crf(T opt ) versus overhaul frequency for a safety relief valve ..................... 43 Figure 16. Cost rate function versus T and N for a safety relief valve ....................... 44 Figure 17. Average availability versus T for a relief valve with N = 10 .................... 46 Figure 18. Cost rate function for N = 10 for a safety relief valve............................... 46 Figure 19. T opt versus N for different incrementing cost models for a relief valve .... 49 Figure 20. crf vs. T for different N for a relief valve with no unavailability impact .. 50 Figure 21. Point estimates and 90% bounds for T opt vs. N for a relief valve .............. 52 Figure 22. Point estimates and 90% bounds for crf(T opt ) vs. N for a relief valve....... 52 Figure 23. Frequency chart for the optimal overhaul frequency for a relief valve..... 53 vi Chapter 1: Introduction While inspection and maintenance strategies have been widely studied for monitored components whose failures are immediately detected, less attention has been given to periodically tested units. The latter are often related to emergency or protection systems and, therefore, are important elements to be considered in terms of reliability and risk assessment. The purpose of this work is to present a model to identify the optimal time between surveillance tests and overhaul frequency of components whose failures are detected upon inspection and are periodically tested to ensure high availability. This type of equipment includes emergency and spare units, as well as hardware components with hidden or dormant failures in normal operation. Although the average and time-dependent point availability of periodically tested components have been studied in the past [1-5], the consideration of economic aspects for the study of their optimal inspection interval and overhaul frequency is more recent. Adachi and Nishida [6] discussed the optimal inspection policy based on maximizing the component availability with ?as new? repairs and preventive maintenance, using random test and repair durations. Vaurio [7] analyzed the optimal availability and cost rate function (cost per unit of time) of periodically inspected preventively maintained units, considering constant duration of repair and maintenance tasks, with ?as old? tests and ?as new? repairs. Bad?a et al. [8] studied the minimal cost per unit time of components with less than perfect tests, ?as new? corrective and preventive maintenance, but negligible test and repair durations. 1 More recently, Martorell et al. [9], and Lapa et al. [10], studied the surveillance test policy optimization under constraints, through the use of genetic algorithms. The analytical model presented in this report is developed for periodically tested components with overhauls (preventive maintenance) after certain number of inspections. The model is based on the assessed component availability during the renewal cycle and considers ?as old? process after tests and repairs (component aging). It takes into account costs associated not only with surveillance tests and maintenance, but also with the potential losses related to the unit unavailability. The model also considers the duration of repairs and inspections (often neglected in literature), and allows for uncertainty in the parameters of the probability density function (pdf) of the time between failures. Summing up, the analytical model in this study is based on the following assumptions: 1. Component failures are only detected upon inspection. Inspection tasks are perfect (probability of detection equals 1). 2. Inspections are carried out every T units of time (constant interval). Repairs are conducted in case of failure detection. 3. Inspection and repair durations are not negligible but constant. 4. The component is under aging process and remains ?as old? after surveillance tests and repairs. Inspection and repair tasks do not deteriorate the unit. 5. Periodic overhaul is performed after every N inspection cycles regardless of the unit condition. Component returns to ?as new? after the overhaul. 6. Component unavailability may cause economic losses with conditional probability ]Pr[ uL (probability of losses given unavailable unit). These losses are independent of component age. 2 7. Direct inspection and repair costs may increase after every test cycle. Inflation and other financial effects are negligible. A theoretical background related to the model, including general definitions about availability and maintenance of repairable components, is presented in chapter 2. Chapter 3 explains the analytical model in detail, and chapter 4 discusses important results based on numerical examples. Possible extensions of the model and scope of this work are commented in chapter 5. Finally, chapter 6 presents concluding remarks obtained upon the completion of this research work. 3 Chapter 2: Theoretical Background 2.1 General definitions The study of reliability of components and systems has a growing interest due to the increasing complexity of equipment and general awareness about performance, quality and especially safety issues. Reliability is defined to be the probability that a unit or system will perform a required function for a given period of time, when used under stated operated conditions [11]. It has an important role in the assessment of system performance and the potential failures that may lead to adverse consequences in terms of equipment, people and environment. Reliability and risk analyses are strongly related to maintenance, as they take into account the condition, repairs and renewal of units and systems in a moment or period of time. When performing reliability studies, it is important to make a distinction between repairable and non-repairable items, since they involve different failures characteristics as well as methods for predicting their reliability and availability [12]. 2.1.1 Non-repairable units These items are those that are discarded and replaced with new ones when they fail (e.g. light bulbs). Their reliability is expressed in terms of the time-to-failure probability distribution. 2.1.2 Repairable units These items are in general not replaced after the occurrence of failures; rather, they are repaired and put into operation again. Their reliability depends on their renewal 4 model and stochastic process different from the models of non-repairable units. Valves and pumps (when seen as single items) can be examples of this type of units. 2.2 Availability of repairable items Availability is defined as the probability that a repairable element (component or system) is performing its required function at a given point in time or over a stated period, when operated and maintained in a prescribed manner [11]. Availability depends not only on the chance of failure of a given item, but also on its maintainability, which is the likelihood that the failed unit will be restored or repaired to a specified condition within a period of time when maintenance is performed. Like in reliability analysis, the rules of probability theory can be applied to quantify this measure. Accordingly, some important definitions can be established [12]: a) Instantaneous availability, a(t): it is the probability that the unit or system is up at a time t. b) Limiting availability: it is defined as the following limit of the instantaneous availability, a = a(t) (1) ??t lim c) Average availability: it is defined for a fixed time interval (mission time), T ?(T) = ? T dtta T 0 )( 1 (2) 5 d) Limiting average availability: ? l = ??T lim ? T dtta T 0 )( 1 (3) The previous definitions constitute the basis to evaluate the availability of equipment. To analyze these measures for a specific unit (component), it is necessary to establish the type of unit, whether it is repairable or not and, particularly, the kind of failure it presents. Thus, the following classification is typically used in literature: a) Replaceable components: no repair action is foreseen (non-repairable units) b) Repairable components with failures which are immediately detected (revealed faults) c) Repairable components with failures which are detected upon demands (faults remain unrevealed until next demand occurs) d) Repairable units whose failures are detected upon inspection (faults remain unrevealed until next inspection is carried out). As mentioned before, this report focuses on repairable components whose failures are revealed upon inspection and have to be periodically tested in order to detect possible faults (type d). This kind of units include items used in emergency conditions, components in spare or storage, and also normal operated units with hidden (dormant) faults only detectable by inspection. 6 2.3 Maintenance and renewal theory Maintenance is defined as the combination of all technical and corresponding supervision and administrative actions, intended to retain or restore an entity to a state in which it can perform its required function [13]. Two general types of maintenance can be distinguished: reactive and proactive maintenance. The former is performed in response to unplanned or unscheduled downtime of the unit, usually as a result of a failure, whether it be internal (inherent) or external (e.g. operator-induced). On the other hand, proactive maintenance is performed prior to failures and may be either preventive or predictive [11]. Preventive maintenance is a scheduled downtime, usually periodical, in which well designed set of tasks, such as repair, replacement, cleaning, adjustment, etc. are performed. Predictive maintenance estimates, through diagnostic tools and measurements, when a part is near failure and should be repaired or replaced (i.e. maintenance based on condition). Candidates for predictive maintenance are normal operating equipment whose condition can be monitored over time. Unlike regular preventive maintenance, maintenance based on diagnostic (predictive tasks) is not necessarily periodical and represents a cost-effective alternative for monitored items. As discussed before, reliability and availability are related to maintenance. Particularly, the study of repairable components and systems strongly depends on the model of repair or renewal involved in the maintenance process. A repairable item is one which undergoes repair and can be returned to operation by a method other than replacement of the entire item. The following sections discuss 7 different stochastic models considered for the analysis of repairable units and systems (see Figure 1). Categories of repair Perfect repair (AGAN) Normal repair Minimal repair (ABAO) Homogeneous Poisson Process Other life distributions Generalized renewal Non-homogeneous Poisson Process Superimposed renewal Figure 1. Categories of stochastic point processes for repairable systems 2.3.1 Ordinary renewal process (ORP) This model assumes that, following a repair, the unit returns to an ?as good as new? (AGAN) condition. In this process, the interarrival times, x i , between successive failures (see Figure 2) are considered independently and identically distributed random variables. It is a generalization of a Homogeneous Poisson Process (HPP). This model represents an ideal situation; it is only appropriate for replaceable items and hence has very limited applications in the analysis of repairable components and systems. Variations of the ORP can also be defined. The modified renewal process, where the first interarrival time differs from the others, and the superimposed renewal process (union of many independent ORPs) are examples of these possible variations [14]. 8 t t 4 t 3 t 2 t 1 x 1 x 3 x 2 x 4 x t: arrival times x: interarrival times Figure 2. Basic notation for a stochastic point process 2.3.2 Non-homogeneous Poisson process (NHPP) This model is also called ?minimal repair? and it assumes that the unit returns to an ?as bad as old? (ABAO) condition after a repair. So that, after the restoration the item is assumed to be operative but as old as it was before the failure. The NHPP differs from the HPP in that the rate of occurrence of failures varies with time rather the being constant [14]. Unlike the previous model, in this process the interarrival times are neither independent nor identically distributed. The NHPP is a stochastic point process in which the probability of occurrence of n failures in any interval [t 1 , t 2 ] has a Poisson distribution with: ? = 2 1 )( t t dttmean ? (4) 9 where ? (t) is the rate of occurrence of failures (ROCOF) defined as the inverse of the expected interarrival times, 1/E[x i ]. One of the most common forms of ROCOF used in reliability analysis of repairable components and systems is the Power Law Model: 1 )( ? ? ? ? ? ? ? = ? ?? ? ? t t (5) This form comes from the assumption that the interarrival times between successive failures follow a conditional Weibull probability density function, with parameters ? and ? . This model implies that the arrival of the ith failure is conditional on the cumulative operating time up to the (i ? 1)th failure. Figure 3 shows a schematic of this conditionality [15]. The Weibull distribution is typically used due to its flexibility and applicability to various failure processes, however, solutions to Gamma and Log-normal distributions are also possible. f(t ) t tt 1 P(Time ? t | Time > t 1 ) Figure 3. Conditional probability of occurrence of failure 10 2.3.3 Generalized renewal process (GRP) A repairable system may end up in one of the five possible states after a repair: a. As good as new b. As bad as old c. Better than old, but worse than new d. Better than new e. Worse than old The two models described before, ordinary renewal process and NHPP, account for the first two states respectively. However, the last three repair states have received less attention since they involve more complex mathematical models. In 1986 Kijima and Sumita [16] proposed a probabilistic model for all the after- repair states called Generalized Renewal Process (GRP). According to this approach, the ordinary renewal process and the NHPP are considered specific cases of the generalized model. The GRP theory of repairable items introduces the concept of virtual age (A n ). This value represents the calculated age of the element immediately after the nth repair occurs. For A n = y the system has a time to the (n + 1)th failure, x n+1 , which is distributed according to the following cumulative distribution function (cdf): )(1 )()( )( yF yFyxF yAxF n ? ?+ == (6) where F(x) is the cdf of the time to the first failure (TTFF) distribution of a new component or system. 11 The summation: ? = = n i in xS 1 (7) with S 0 = 0, is called the real age of the element. The model assumes that the nth repair only compensates for the damage accumulated during the time between the (n - 1)th and the nth failure. With this assumption, the virtual age of the component or system after the nth repair is: A n = A n-1 + qx n = qS n (8) where q is the repair effectiveness (or rejuvenation) parameter and A 0 = 0. According to this model, the result of assuming a value of q = 0 leads to an ordinary renewal process (as good as new), while the assumption of q = 1 corresponds to a non-homogeneous Poisson process (as bad as old). The values of q that fall in the interval 0 < q < 1 represent the after-repair states in which the condition of the element is better than old but worse than new, whereas the cases where q > 1 correspond to a condition worse than old. Similarly, cases with q < 0 would suggest a component or system restored to a state better than new. Therefore, physically speaking, q can be seen as an index for representing the effectiveness and quality of repairs [15]. Even though the q value of the GRP model constitutes a realistic approach to simulate the quality of maintenance, it is important to point out that the model assumes an identical q for every repair in the item life. A constant q may not be the 12 case for some equipment and maintenance process, but it is a reasonable approach for most repairable components and systems. The three models described above have advantages and limitations. In general, the more realistic is the model, the more complex are the mathematical expression involved. Table 1 summarizes the main strengths and weakness of each repair approach. As mentioned in the table, the NHPP model has been proved to provide good results even for realistic situations with better-than-old but worse-than-new repairs [17]. Based on this, and given its conservative nature and manageable mathematical expressions, the NHPP (ABAO repair model) was selected for this particular work. The specific analytical modeling is discussed in chapter 3. 13 Table 1. Comparison between typical stochastic repair processes Repair model Strengths Weakness Ordinary renewal process (AGAN) - Represents a first, simple approach to model repairable components. - It is appropriate for modeling replaceable units. - In general, its mathematical expressions are simpler than the ones of the other models. - It is generally not appropriate for systems, since replacements typically apply for a single part and not for the entire system. - Assumes that the interarrival times between failures are independent and identically distributed. Non-homogeneous Poisson process (ABAO) - It is a useful and quite simple model to represent equipment under aging (deterioration). - Involves relatively simple mathematical expressions. - It is a conservative approach and in most cases provides results very similar to those of more complex models like GRP with 0.1 < q < 1 [17]. - Is not adequate to simulate repair actions that restore the unit to conditions better than new or worse than old. Generalized renewal process (GRP) - It is a realistic general model which cover all the possible restoration conditions, from better than new to worse than old. - Involves an additional parameter (q) and more complex mathematical equations. - Assumes constant rejuvenation parameter (q) for all the repairs. 14 2.4 Inspection policies for periodically tested components Until this point, repair models, reliability and availability definitions have been discussed without making any special distinction between components and systems. Components are single units or elements that represent the minimum level where information available. On the other hand, systems are arrangements of two or more components usually working simultaneously. As mentioned before, this work is oriented to components (single units) which are periodically tested and, in case of failure, remain failed until next inspection. From now on, emphasis will be given to components under this assumption, and particular expressions for availability and cost functions will be discussed. The availability study of periodically tested components (PTC) in most cases turns into difficult analyses, especially if the repair and inspection times are treated as random variables. This problem has been studied by different approaches [1-5]. Particularly, Hilsmeier, Aldemir and Vesely [5] presented general expressions for calculating the point unavailability of aging standby components according to a standard extension of the classic renewal equation. In particular, for the case of tests performed every T hours, considering inspection and repair times negligible with respect to T, ?as old test? and ?as new repairs?, the following equations were presented: ? ? = ?+=?= n k t nT tkTqTknRkTqdttftatq 1 ~ ),(])[()(')'()(1)( nT < t < (n+1)T (9) where: ? ?= t dtttR 0 ]')'(exp[)( ? (10) 15 and ? ? ? ??= kTt Tkn dtttkTq )( ]')'(exp[1),( ? (11) where: ~ f : is the first failure density function, n: is the cycle number, k = 0, 1, 2, ....n T: is the fixed time between inspections, and ? (t): is the time-dependent failure rate. According to these expressions, the availability of PTC goes to dampening oscillations, and eventually settles to the so called ?saw-tooth? periodic behavior, where the settling time increases when the interval between inspections is decreased. Since the availability of PTC is a periodic function of time, the study of the average availability over the test interval becomes especially interesting. For this situation, the effect of inspection and repair times are usually considered in practical cases. For constant rates, assuming ? T << 1, and renewal after each cycle, the following expression is often presented in textbooks (see Figure 4) [12], [18]: Average availability per cycle = ?(T) ? ?? ???? T tT t 2 1 (12) where: T: is the inspection interval, t t : is the constant time to test (t t << T), ? : is the failure rate, and ?: is the corresponding repair rate (mean time to repair = 1/? << T) 16 Approximate pointwise unavailability 1-a(T) T t t 1 1/? Time ? T? Figure 4. Approximate point unavailability for periodically tested components Clearly, the availability of the unit is strongly affected by the time between surveillance tests (T). Therefore, the study of the optimal inspection interval is particularly interesting for maximizing the average or mission availability of periodically tested components. From the previous equation, if ?(T) is differentiated with respect to T, and the result is set equal to zero: 0 2 )(? 2 =+?= T t dT Td t ? (13) The optimal inspection interval, for this simple case, can be established as: ? t opt t T 2 = (14) Other authors have studied the availability maximization in periodically tested components as a function of the time between inspections, considering different assumptions in more realistic cases [3], [6], [19]. 17 In particular, Jardine [19] presented an approach based on the expected uptime per cycle (average availability), considering the following: - The component is periodically tested every T units of time, and it will be repaired if found to be failed. - The unit is considered renewed after inspection and repairs (as good as new). - The lengths of time needed to inspect (T i ) and repair (T r ) are known and constant. Thus, the average availability is expressed as: ?(T)= length cycle expected uptime expected (15) According to the theory of expectations, the expected uptime is the operative time of a good cycle, T, multiplied by its probability, R(T), plus the mean time to failure given that inspection takes place at T , multiplied by [1-R(T)]. To determine the mean time to failure given the periodic inspection, the mean of the shaded portion of Figure 5 is considered. Thus: MTTF shaded = )( )( TF dtttf T ? ?? (16) The mean of the shaded region is similar to the mean of the entire distribution, but considers that the unshaded portion is an impossible region for failures [19]. 18 Figure 5. Probability density function considering inspection at T Likewise, the expected cycle length is the duration of a good cycle, (T + T i ), multiplied by its probability, R(T), plus the length of a failure cycle, (T + T i + T r ) multiplied by its probability [1-R(T)]. Accordingly, the average availability in the renewal cycle becomes: ?(T) = )](1[ )()( TRTTT dtttfTRT ri T ?++ +? ? ?? (17) This expression can be used to calculate the average availability and the optimal inspection interval that maximizes the uptime for different probability density functions, f(t). The equation assumes that after tests the unit is in the ?as new? state, which may be as a result of minor modifications being made during the surveillance inspections. In practice, this assumption can be reasonable and it will be the case if the failure distribution of the component is the exponential distribution. If the ?as new? assumption is not the case and the time to failure distribution has an increasing rate, the expression for the average availability becomes more complex. 19 Until now, the optimal time between inspections in periodically tested components has been discussed considering the maximum availability without taking into account the economical considerations related to periodic inspection and repairs, as well as the cost of unavailability. Nowadays, besides maximizing the operative time of units, operators and inspectors have to face economical limitations which may affect the frequency of surveillance tests. Thus, to identify the optimal inspection interval, it becomes necessary to consider not only the maximum operative time (availability) but also the direct cost of test and repairs as well as the potential losses associated to unavailability. Summing up, it is important to consider the following: The cost of unavailability due to random failures ? ? ? ? The downtime cost due to surveillance tests and possible repairs The direct cost of periodic inspections The direct cost of repairs For downtime losses (unavailability cost), distinctions have to be made between periodically tested components that are in standby and those in normal operation. For modeling this aspect a probability factor of actual losses given unavailable unit, ]uLPr[ , can be introduced. Likewise, for the test interval optimization, the cost rate function (cost per unit time) is considered in literature [7], [8]. As a general approach, the following expressions could be used to study the optimal time between inspections based on cost minimization: 20 Cost per unit time = ][ ][)( LE DECppTFCrCi ??+?+ (18) where T: inspection interval Ci: direct cost of inspection Cr: direct cost of repair Cp: cost of lost production p: probability of losses given unavailable component E[D]: expected downtime E[L]: expected cycle length The expected downtime and the expected cycle length are defined as: Expected downtime = E[D] = E[L] ? E[U] (19) Expected cycle length = E[L] = [Ti+T]R(T) + [T+Ti+Tr]F(T) = Ti + T + Tr?F(T) (20) Expect. uptime = E[U] = )()( )( 1 )( 0 TFdttft TF TR T ??+? ? T = (21) ? ?+? T dttftTRT 0 )()( where T i is the time to inspect and T r is the mean time to repair. Equations 18 to 21 apply to estimate the optimal inspection interval for PTC with ?as good as new? restoration after every test cycle. More general expressions will be presented in next chapter. 21 Chapter 3: Analytical Model This chapter presents the mathematical model developed to calculate the availability and cost per unit time of periodically tested components. The model uses the minimal repair approach (non-homogeneous Poisson Process) after every test cycle and is based on the following assumptions: 1. Component failures are only detected upon inspection; in case of failure, the unit remains failed until the next scheduled test. Inspection tasks are perfect (probability of fault detection equals 1). 2. Inspections are carried out every T units of time (constant interval). Repairs are conducted in case of fault or failure detection. 3. Inspection and repair durations are not negligible but constant (deterministic values). 4. The component is under aging process and it remains ?as bad as old? after surveillance tests and repairs. Inspection and repair tasks do not deteriorate the unit. 5. Preventive periodic overhaul is performed after every N inspection cycles regardless of the unit condition. Component returns to ?as good as new? after the overhaul. This maintenance action defines the renewal cycle. 6. Component unavailability may cause economic losses with conditional probability ]Pr[ uL (probability of losses given unavailable item). These losses are considered in terms of cost per unit of downtime (C p ). This cost is independent of component age. 7. Direct inspection and repair costs increase after every test cycle. Inflation and other financial effects are negligible. 22 The analytical model used in this work is an extension of the approach introduced in the last pages of previous chapter. The model extends the notion of expected availability and expected cycle length to ?as bad as old? test and repairs. It uses the concept of cost rate function (cost per unit time) in the renewal cycle as a basis for identifying the optimal inspection interval and overhaul frequency, N (i.e. values that minimize the cost rate function). Figure 6 shows a schematic of the basic notation. N?TiT2T ?????? Test cycle Renewal cycle 0 Return to ?as new? condition T Overhaul ABAO test and possible repair time ?????? Figure 6. Basic notation for the mathematical model 3.1 Time between successive failures This model assumes that the interarrival times between successive failures follow a conditional Weibull probability distribution, where the arrival of the ith failure is conditional on the cumulative operating time up to the (i ? 1)th failure. This conditionality comes from the fact that the component retains an ?as bad as old? state after repairs. 23 The rate of occurrence of failures (ROCOF) under this assumption corresponds to a power law expression, with parameters ? and ? : 1 )( ? ? ? ? ? ? ? = ? ?? ? ? t t (22) For the case of components inspected at time T (see Figure 7), the following conditional probability is defined: )( )( 1 )( )(1)(1 )( )()( )( TR tR TR TRtR TR TFtF TtimettimeP ?= +?? = ? =>? (23) f(t ) t t T P(Time ? t | Time > T) Figure 7. Conditional probability of occurrence of failure with inspection at T In equation 23, the functions F and R are the probability of failure and reliability (1 - F) at the respective times. Then, considering the Weibull probability distribution where R(x) = exp(x/? ) ? , equation 23 becomes: 24 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?= ? ?? ?? ii i tt tF 1 exp1)( (24) and the conditional Weibull density function, dF(t i )/dt i : ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? = ? ? ??? ???? ? iii i ttt tf 1 1 exp)( (25) The previous equation constitutes the basis of the model in terms of probability of failure. 3.2 Cost per unit time and cycle lengths For a component periodically tested every T units of time, the failure probability and reliability in the ith test cycle are: dttfTF iT Ti i )()( )1( ? ? = (26) dttfTR iT Ti i )(1)( )1( ? ? ?= (27) where f(t) is the conditional Weibull pdf presented above. Likewise, the cost per unit time for the ith test cycle, considering inspection every time T, is defined as fallows: Cost per unit time i = ][LE Cost i i = ][ ][)( LE DECppTFCrCi i iiii ??+?+ (28) 25 where: T: inspection interval Ci i : direct cost of inspection in the ith test cycle Cr i : direct cost of repair in the ith test cycle Cp: cost of lost production p: probability of losses given unavailable component, ]Pr[ uL and Expected downtime i = E[D] i = E[L] i ? E[U] i (29) Expected cycle length i = E[L] i = T + Ti + Tr?F i (T) (30) Expected uptime i = E[U] i = )()(])1([ )( 1 )( )1( TFdttfTit TF TR i iT Tii i ????+? ? ? T (31) being Ti and Tr the average time to test and repair (constant deterministic values). Here, it is important to explain that the expected uptime (equation 31) is defined as the test interval, T, multiplied by its probability, R i (T); plus the mean of the pdf between two successive tests multiplied by its probability, F i (T). As commented before, this expression is an extension of the concepts explained in chapter 2. In accordance with the assumptions, in this model the probability of failure, reliability, and cost rate function (cost per unit time) are different for each inspection cycle. Likewise, the values of the expected uptime and cycle length vary after every test interval since the component is under aging and remains ?as bad as old? after every test/repair. After N test cycles the component is subject to overhaul, and returns to the ?as good as new? condition. Thus, the renewal cycle length is defined as follows: 26 ? = = N i itotal LEL 1 ][ (32) Similarly, the total cost in the renewal cycle is: ? = += N i itotal CostCoC 1 (33) where Co is the cost of overhaul. Therefore, the cost per unit time or cost rate function (crf) in the renewal cycle becomes: total total L Cost NTcrf =),( (34) Equation 34 constitutes the basis for this study, since the purpose of this work is to identify an inspection policy that minimizes the costs during the renewal cycle. Notice that the crf is a function of two variables, T and N. So that, for every overhaul frequency, N, there will be a test interval, T, that minimizes the cost per unit time (crf). 3.3 Increasing test and repair costs As indicated in the model assumptions, the approach presented in this report considers incrementing costs of surveillance tests and repairs (corrective actions). In accordance with the ?as bad as old? premise, as the unit becomes older, it may be more expensive to perform inspections and possible subsequent repairs. This effect is taken into account by considering costs that vary as function of the test cycle number (i). 27 Although it is possible to affirm that in real situations the direct cost of tests and repairs will increase with the component age, the specific function that would represent the increment is uncertain and will depend on the specific type of component. Given this uncertainty, three different incrementing-cost functions are considered in this work: a linear relation, and two non-linear functions: 3.3.1 Linear function According to this model the costs of inspection and repair in the ith test cycle are: imCiCi i ?+= 0 (35) ibCrCr i ?+= 0 (36) 3.3.2 Non-linear function 1 In this case the costs of inspection and repair in the ith test cycle are: i i mCiCi += 0 (37) i i bCrCr += 0 (38) 3.3.2 Non-linear function 2 For this case the costs of test and repair in the ith test cycle are assumed to be: m i iCiCi += 0 (39) b i iCrCr += 0 (40) 28 where Ci 0 , Cr 0 , m and b are arbitrary constants, and i is the test cycle number. To give an idea of the behavior of these incrementing models, the three repair cost functions are plotted in Figure 8. For the example, the following values are assumed: Table 2. Example of parameters for repair cost functions Incrementing repair cost parameters Linear Non-linear 1 Non-linear 2 Cr 0 = 1000 Cr 0 = 1000 Cr 0 = 1000 b = 100 b = 2 b = 3 0 500 1000 1500 2000 2500 12345678910 Inspection cycle number C o s t of r e pa i r ($ ) Cr (linear) Cr (non-linear 1) Cr (non-linear 2) Figure 8. Behavior of three types of incrementing repair cost functions 29 3.3 Description of the variables involved in the analytical model As mentioned before, the cost rate function (crf) is a function of two variables, T and N. Yet, equation 34 involves many different cost values and task durations that will depend on the specific component to be evaluated. The following paragraphs summarize each of these values: ? : The scale parameter of the conditional Weibull distribution. This parameter is similar in magnitude to the mean of the distribution, so that, it can be seen as a representation of the mean time between successive failures. The value of ? is always greater than zero and depends on the specific unit to be studied. ? : The shape parameter of the conditional Weibull density function. Like ? , this value is an attribute of the component under study. Values of ? smaller than 1 are used to represent units with decreasing failures rates (e.g. infant mortality). ? = 1 corresponds to constant failure rate, and values of ? greater than 1 represent components with increasing hazard functions. The model presented in this work, is basically oriented to the latter case, that is, to components under aging process whose failure rate increases with time. Typically, the value of ? ranges from 1 to 5 for units under wear out (deterioration). In next chapters, this work explores the effect of ? and ? in the optimal test interval of periodically tested units, and considers variability (uncertainty) in these two parameters. Ci: The direct cost of a planned surveillance inspection, expressed in dollars or other money unit. This cost is assumed to include both materials and labor. As explained 30 before, this value may increase after each test cycle according to the equations presented in the previous section. Nevertheless, this cost is a non-random value. Cr: The direct cost of the repair tasks (corrective actions) that may follow a surveillance test, expressed in dollars or other money unit. This cost is assumed to include both materials and labor. As explained before, this value may increase after each test cycle according to the equations presented in the previous section. Nevertheless, like the inspection one, this cost is a non-random variable. The magnitude of this value is generally greater than the test cost. p: The conditional probability of losses given unavailable component, ]Pr[ uL . For the case of normally operated units, this is the chance that a component absence actually impacts the production of the system or facility. For standby or spare items, this probability represents the chance of simultaneous failures, that is, the chance that the component is required when being unavailable. This value varies from 0 to 1. Cp: The cost (economic impact) of downtime. It is sometimes the profit margin associated with the system or plant where the component is located. It is expressed in terms of money per unit of time. This value is possible to establish in most production facilities and represents the amount of money that operators loose when a given system becomes unavailable. Co: The total cost of preventive overhaul, expressed in dollars or other currency. It involves the preventive maintenance tasks, materials and labor required to restore the component to an ?as good as new? condition. It may be a total replacement of the unit. 31 Since it is a well planned maintenance task, it is assumed to be short in duration, and to be performed in operational windows, without any production impact due to unit downtime. This is a deterministic value in the analytical model, and is greater than the cost of corrective (unplanned) repairs. Ti: The duration of the surveillance test, expressed in units of time. For this particular model, this is assumed to be a constant (deterministic) value. It can be estimated as the average test duration according to the experience and maintenance history of the component. Tr: The duration of the repair task that may follow an inspection, expressed in units of time. For this particular model, this is assumed to be a constant (deterministic) value. However, it can be estimated as the average or mean time to repair according to the experience and maintenance history of the component. This duration is usually larger than the average time to inspect. 32 Chapter 4: Results The analytical model described in the previous chapter was computed by using Mathcad? Professional version 8. With this tool, it is possible to solve the equations involved in the model, which in most cases require numerical calculation. The following sections discuss the results obtained. 4.1 Average availability An important measure in the study of repairable components is the average availability in a period of time. This value represents the probability that the unit is operative for a specific mission time or interval. For the periodically tested components studied in this work, the average uptime (availability) in a given test cycle is particularly interesting. The probability of being operative in the test cycle is calculated as the expected uptime divided by the expected test cycle length. It depends on several factors: inspection interval, chance of failures, duration of the test and possible repairs. In particular, the chance of failures within the inspection interval is determined by the probability function of the time between failures; thus, by the conditional Weibull density function with parameters ? and ? (equation 25). The scale parameter, ? , gives us an idea of the mean time between failures [11]. The parameter ? of this equation defines the type of failure rate: whether it is decreasing, constant, or increasing with time. The latter corresponds to components under aging processes, that is, to units whose failure rate augments with the age of the item. The effect of the shape parameter, ? , on the test cycle average availability is depicted in Figures 9 and 10. These charts are based on arbitrary values of T, Ti, Tr and ? (Table 3) 33 and show the uptime probability per test cycle in a component with ?as bad as old? inspections/repairs (a component that retains its old condition after every test cycle). Table 3. Values for the average availability analysis Inspection interval Test duration Repair duration Weibull scale parameter T = 325 days Ti = 2 days Tr = 8 days ? = 20000 days 0.9800 0.9850 0.9900 0.9950 1.0000 12345678910 Test cycle number A v a i la bi li t y ? = 1.2 ? = 2 ? = 2.5 Figure 9. Average availability versus test cycle number for ? > 1 Figure 9 illustrates the availability behavior of a unit with increasing failure rate (? > 1). As anticipated, the chart shows how the average uptime decreases as the unit becomes older, and how the aging process is more significant for larger values of the shape parameter. ? greater than one represents components under aging, which is the case of most maintained units, and the focus of this work. 34 0.9100 0.9300 0.9500 0.9700 0.9900 12345678910 Test cycle number A v a i la bi li t y ? = 1 ? = 0.7 ? = 0.5 Figure 10. Average availability versus test cycle number for ? ? 1 Figure 10 depicts the average availability for ? equals 1 and smaller values (decreasing failure rate). The former case constitutes the particular situation where the Weibull distribution becomes the exponential pdf (constant failure rate). Under this circumstance, even with the ?as bad as old? assumption, the component does not exhibit any aging (availability is the same in every cycle) due to the non-memory condition of the exponential distribution. For values of ? smaller than one, the average uptime in the test cycles increases with the age of the unit. This reflects the behavior of items whose failure rate decreases with time. An example of this situation is the so called ?infant mortality?, a characteristic observed in some components in very early stages of their lifespan. 4.2 Cost per unit time The cost rate function, crf(T, N), represents the cost per unit of time in the renewal cycle. To find the optimal time between inspections, T opt , it is necessary to find the value of T 35 that minimizes the cost rate function. Since crf is a function of two variables, an optimal T can be found for every value of N. The overhaul frequency, N, is often established a priori, so that, the optimal test interval is usually found once the value of N is fixed by the maintenance planner. Nevertheless, we will see later that an optimal combination of overhaul frequency and test interval can also be determined. Figure 11 shows the cost rate function versus T for three different values of N. For this chart, the following arbitrary values were assumed: Table 4. Arbitrary values for cost rate function examples Cost values Time between failures $ $/day Durations (days) ? (days) ? Ci 0 * Cr 0 * Co p?Cp Ti Tr 20000 1.2 500 1000 20000 (0.7)?(20000) 2 8 *with linear increment according to equations 35 and 36 (m = 50, b = 100) 600 100 crf1 T() crf5 T() crf10 T() 8001 T 0 100 200 300 400 500 600 700 800 100 200 300 400 500 600 (d ays) (d o l l a rs / d a y ) N = 1 N = 5 N = 10 C o st rate f unct i o n (d o l l a rs / d a y ) (d o l l a rs / d a y ) C o st rate f unct i o n Figure 11. Cost rate function for different values of N 36 According to Figure 11, the minimum of the curve for N = 1 is reached at T 510 days, while for N = 5 and N = 10 the minimum corresponds to T ? ? 360 and T 320 days respectively. These three values constitute the optimal test interval for these three cases of overhaul frequency. ? It is important to notice that T opt decreases as N becomes larger. This behavior is consistent with the inputs used for this example. For N = 1 (overhaul at every test) it is economically feasible to have relatively large inspection intervals. On the other hand, since the component is under aging (? > 1), if the overhaul is performed every five or ten inspection cycles, it becomes necessary to decrease the test interval to reduce the expected cost of unavailability and hence the total cost per unit time. Based on the inputs of Table 4, the behavior of the optimal inspection interval versus N is plotted in Figure 12. Here it is important to observe that the value of T opt decreases as N increases, but in all cases tends to an asymptotic value for very large N. 0.0 100.0 200.0 300.0 400.0 500.0 600.0 123456789101121314151617181920 N T o p t (d a y s ) ? = 20000 ? = 10000 ? = 5000 Figure 12. Optimal test interval versus overhaul frequency for different ? 37 Another important aspect to be discussed is the limiting value of the cost rate function. To observe this limiting behavior, the three curves of Figure 11 are plotted in Figure 13 for very large values of the test interval, T. crf crf 1. 20000 0 1 T() crf5 T() 10 T() 30000001 T 0 5 10 5 1 10 6 5 10 6 2 10 6 2.5 10 6 3 10 6 0 2000 4000 6000 8000 1 10 4 1.2 10 4 1.4 10 4 1.6 10 4 1.8 10 4 2 10 4 (d ays) ( d o lla r s /d a y ) p.Cp N = 1 N = 5 N = 10 cr f ( d o l lars/d ay) ( d o lla r s /d a y ) cr f ( d o l lars/d ay) Figure 13. Limiting cost rate function for different values of N From Figure 13, two observations are important. First, it can be seen that the cost rate function has only one minimum, and second, it is important to point out that the limiting value of the function corresponds to p?Cp regardless of the overhaul frequency N. This fact confirms the importance of the cost of unavailability in this analytical model. The following set of equations verifies the limiting behavior depicted in Figure 13: Knowing that R(?) = 0, and F(?) = 1, for a given test cycle we have, Cost i = Ci ][DECppCr iii ??++ (41) 38 E[L] i = Ti + Tr + T (42) E[U] i = MRL i (43) where MRL i is the mean residual life after the (i-1)th test cycle. E[D] i = E[L] i ? E[U] i = Ti + Tr + T - MRL i (44) So, for the renewal cycle the total cost is: Cost = Co + + + p?Cp (45) ? = N i i C 1 ? = N i i Cr 1 ? ? ? ? ? ? ?++ ? = N i i MRLTTrTiN 1 )( and the renewal cycle length is, L = N(Ti + Tr + T) (46) Thus, with Ti + Tr + T T, the cost rate function for the renewal cycles is: ? crf = TN CrCCo N i N i ii ? ? ? ? ? ? ? ? ++ ?? == 1 11 + p?Cp ? ? ? ? ? ? ? ?? ? = N i i MRLTN 1 TN ? 1 (47) which for T? ? , yields Limiting crf = p?Cp (48) 39 4.3 Optimal inspection interval and overhaul frequency In the previous sections general results for arbitrary data were discussed. This section presents particular results based on a case study. Safety relief valves are typical industrial components that have to be tested and calibrated every certain period of time. These units are normally installed in systems or operating plants; they are used under emergency or abnormal situations to relief overpressures and avoid failure of pressure vessels or process pipelines. Relief valves should operate automatically in case of overpressure, so that, their availability should be guaranteed by periodic inspection and appropriate calibration of the pressure set-point. As a practical example, a typical safety relief valve was considered for this study. The following data were used: Table 5. Input values for safety relief valve example Cost values Time between failures $ $/month Durations (months) ? months ? Ci* Cr* Co p?Cp Ti Tr 3500 1.5 500+50?i 5000+500?i 2x10 4 0.4?8x10 5 0.05 0.25 *with linear increment after every test cycle according to equations 35 and 36; i = cycle number The reliability parameters, ? and ? , were selected according to failure data of nuclear facilities [12]. The Weibull scale parameter (3500 months) gives an idea of the mean time to failure of the unit. The shape parameter (1.5) indicates that the component has an increasing failure rate (aging process). 40 Direct costs of inspection and repair were assumed based on experience and engineering judgment. Here, a linear cost increment according to equations 35 and 36 is taken into account. Similarly, the cost of overhaul (preventive maintenance) and cost of unavailability were selected according to experience and typical data of industrial plants. Cost of overhaul involves material and labor, while cost of unavailability expresses the total losses, in dollars per month, due to system downtime. The parameter p is the conditional probability of actual system downtime given that the valve is unavailable. A value of p equals to 0.4 indicates a 40%-chance of economic impact due to valve unavailability. Further discussion of this aspect will be discussed in next chapter. The last two columns of Table 5 show the duration of test and repair tasks. Based on experience, a periodic valve inspection is assumed to take 1.5 days (0.05 moths), while an unplanned corrective maintenance is taken as 1 week (0.25 months) in length. In general, the values presented in Table 5 are reference numbers for this academic work. They were selected to illustrate the use of the model and are not intended to be used as exact values in real projects. Real life analysis should be based on specific data and will vary according to the particular characteristics of the unit and system under study. Using the inputs presented in Table 5, and optimizing equation 34 for different values of overhaul frequency (N), different results were obtained for the optimal inspection interval (T opt ) for this case study. The following chart shows the results: 41 0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00 12345678910 N Topt ( m ont hs ) Figure 14. T opt versus overhaul frequency for a safety relief valve Figure 14 shows the optimal inspection interval for different values of the overhaul frequency. As discussed before, the value of T opt decreases and tends to stabilize for large values of N. If the inspection/maintenance policy is to pre-set the overhaul frequency based on management strategy, this chart can be used establish the corresponding optimal test interval for a given value of N. This may be the case of some industrial facilities were the frequency of overhaul (preventive maintenance) has to meet certain standards or regulations. Now, in order to identify the optimal overhaul frequency, it is important to explore the value of the cost rate function, crf, for each N. Accordingly, Figure 15 plots the crf evaluated at T opt for different values of overhaul frequency. 42 43 This optimal combination of time between inspection (T) and overhaul frequency (N) can also be seen when plotting the crf as a function of these two variables. Figure 16 illustrates, by using a surface chart, how the optimal combination (minimal point in the crf surface) is reached at T = 51 months and N = 2. Figure 15 indicates that N = 2 constitutes the optimal overhaul frequency, since it yields the minimum value of cost rate function for this case study. Thus, the optimal inspection policy for this safety relief valve is to inspect the valve every 51 months (from Figure 14) with preventive overhaul every 2 test cycles. Surface plotting constitutes a useful way to represent the general idea of optimizing the total cost per unit time as a basis to establish the best inspection and preventive maintenance policy. 600.00 1234 700.00 800.00 900.00 1000.00 1100.00 1200.00 5678910 N c r f(T o p t) [$ / m o t h ] Figure 15. crf(T opt ) versus overhaul frequency for a safety relief valve 20 23 26 29 32 35 38 41 44 47 50 53 56 59 1 2 3 4 5 500.00 650.00 800.00 950.00 1100.00 1250.00 1400.00 1550.00 1700.00 1850.00 2000.00 crf [$/month] T [months] N 44 Figure 16. Cost rate function versus T and N for a safety relief valve 4.4 Availability versus cost-based optimization The inspection interval of periodically tested components has been usually selected without considering cost aspects. Traditionally, the inspection policy has been based on maximizing the unit availability with almost no attention to inspection, repair and failure costs. Using the relief valve example, it is possible to identify the value of the test interval (T) that yields the maximal average availability over the renewal cycle. This calculation can be done for any value of overhaul frequency. For instance, taking N equals to 10, the average availability for the renewal cycle is defined as follows: ?(T) = ? ? = = = 10 1 10 1 ][ ][ i i i i LE UE lengthcycleExpected uptimeExpected (49) where E[L] i and E[U] i are calculated according to equations 30 and 31 respectively. By definition, the expected uptime and the expected renewal cycle length are functions of the inspection interval. Thus, the average availability for the renewal cycle can be plotted as a function of the time between inspections (T). Figure 17 shows the behavior of ?(T) versus the inspection interval for N = 10. This chart also indicates the value of T that maximizes the average availability for the renewal cycle assuming overhaul after 10 surveillance tests. 45 1 0.95 AT() 1001 T 0 10 20 30 40 50 60 70 80 90 100 0.95 0.96 0.97 0.98 0.99 1 T = 29 months a(T) Figure 17. Average availability versus T for a relief valve with N = 10 Here, it is important to point out that the value of T that yields the maximum availability (29 months) is smaller than the one that minimizes the cost per unit time for N equals 10 (31 months, from Figure 14). Moreover, since the cost rate function (crf) is decreasing to the left of T opt , it is important to notice that the value of the crf evaluated at T for maximal availability is greater than the one evaluated at T for minimal cost (Figure 18). 1100 980 crf T() 4020 T 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 980 992 1004 1016 1028 1040 1052 1064 1076 1088 1100 [mont hs] [$ / m o n t h ] [$ / m o n t h ] Figure 18. Cost rate function for N = 10 for a safety relief valve 46 According to this, to establish the inspection interval based on cost optimization instead of maximizing availability (traditional approach) yields to important savings, especially if we consider the large number of periodically tested units that may exist in a system or facility. For instance, if we consider the optimal combination obtained for the case study (N = 2, T opt = 51 months), and a plant with 100 valves, we would get: Table 6. Availability versus cost-based optimization results for a relief valve T opt crf(T opt ) T max. availability crf(T max. availability ) Savings per valve per year 51 months 869 $/month 42 months 895 $/month 312 $ Here, the amount saved per month and unit would represent a total saving of 31200$/year in a plant with 100 periodically tested valves. 4.5 Sensitivity evaluation To evaluate the model sensitivity to changes in the different inputs, marginal variations (one input at a time) were considered for the case study. Results of this evaluation are presented in Table 7. We can note from Table 7 that the case study results do not change with moderate variations (? 40%) of the repair duration, and direct cost of inspection and repairs. On the other hand, the optimal combination of N and T is even sensitive to small changes of Ti, Cp or Co. The optimal value of N decreases with increments in Ti or Cp, and increases when Co is augmented. As anticipated, the optimal time between inspections increases with increments in the test duration. 47 Table 7. Results of sensitivity evaluation for the case study Input N opt T opt (months) 0.03 3 39 0.04 2 48 0.05 2 51 0.06 1 71 Ti (months) 0.07 1 74 0.15 2 51 0.20 2 51 0.25 2 51 0.30 2 51 Tr (months) 0.35 2 51 300 2 51 400 2 51 500 2 51 600 2 51 Ci 0 * (dollars) 700 2 51 3000 2 51 4000 2 51 5000 2 51 6000 2 51 Cr 0 * (dollars) 7000 2 51 6x10 5 2 54 7x10 5 2 52 8x10 5 2 51 9x10 5 1 68 Cp ($/month) 10x10 5 1 66 10000 1 61 15000 1 66 20000 2 51 25000 2 53 Co (dollars) 30000 3 46 *with linear increment after every test cycle according to eq. 35 and 36 (m = 50, b = 500) T opt is also very sensitive to changes in the cost of unavailability (Cp). According to the Table, for a given optimal overhaul frequency, the optimal test interval decreases when Cp is augmented. Conversely, T opt becomes larger with increments in the cost of overhaul (Co) for the same N opt. 48 As mentioned above, the optimal test policy does not change with moderate variations of the inspection cost (Ci 0 ) and repair cost (Cr 0 ) for the case of linear increments in these values. The effect of the different incrementing cost models (see Section 3.3) in this case study is presented in Figure 19: 25.00 30.00 35.00 40.00 45.00 50.00 55.00 60.00 65.00 70.00 75.00 12345678910 N To pt [m o n ths ] Linear Non-linear 1 Non-linear 2 Figure 19. T opt versus N for different incrementing cost models for a relief valve Figure 19 shows that the results are almost identical for the three models. Thus, for this particular case study, the optimal test policy is insensitive not only to moderate changes in the values of the direct costs Ci 0 and Cr 0 , but also to the model considered for the increment of these costs after every test cycle. According to this observation, it is important to note that even uncertain approximations of these inputs will yield acceptable results for this example. 49 In general, the values of the test duration and cost of overhaul are well known in most cases (this helps to control the sensitivity). However, the impact of unavailability (p?Cp) is often uncertain. The uncertainty of this last value may be controlled by taking into account different risk scenarios for this input. That is, by considering an event analysis with all the potential scenarios, with their corresponding consequence and likelihood. This analysis can be done with an event tree and probabilistic risk assessment [12]. By using this approach, instead of a single input for the expected impact of unavailability (p?Cp), there would be an expectation composed by many scenarios, e.g. p 1 ?Cp 1 + p 2 ?Cp 2 + p 3 ?Cp 3 +.... p n ?Cp n . A special situation regarding the consequence of unavailability is the case of no impact (p?Cp = 0). Under this assumption (see Figure 20) the cost rate function does not reach a finite minimum. 100 0 crf1 T() crf5 T() crf10 T() 100000T 0 2000 4000 6000 8000 1 10 4 0 10 20 30 40 50 60 70 80 90 100 [months] [$/month] Figure 20. crf vs. T for different N for a relief valve with no unavailability impact 50 4.6 Uncertainty in Weibull parameters Although not shown in Table 7, the parameters of the time-between-failure distribution (? , ? ) constitute the basic inputs for the inspection policy analysis. In most cases, especially when no failure history is available, these values are difficult to establish. The following paragraphs discuss the variability of these parameters through the consideration of uncertainty. To account for the variability of the time-between-failure distribution, the parameters ? and ? of the Weibull pdf were treated as random variables. To conduct this analysis, a conventional Monte Carlo Simulation (MCS) was carried out. Monte Carlo Simulation is a technique based on the use of generated random numbers [20]. For this analysis, uniformly distributed random numbers between 0 and 1 were sampled by using Mathcad? Professional in order to generate a set of random values for the parameters ? and ? . Within the process, the zero-to-one sampled numbers were converted into ? or ? values by using inverse cumulative distribution functions. For this particular study, the following probability distributions were considered (see Mathcad sheet in Appendix): ? : normally distributed, with mean equals to 3500 months and standard deviation equals to 350 months. ? : normally distributed, with mean equals to 1.5 and standard deviation equals to 0.1. Then, using 1000 repetitions of the MCS-sampling process, the following results were obtained for the case study: 51 0.00 20.00 40.00 60.00 80.00 100.00 120.00 12345678910 N To pt ( m on t h s ) Topt f Point estimate To pt ( m on t h s ) Figure 21. Point estimates and 90% confidence bounds for T opt vs. N for a relief valve As presented in Figure 21, with uncertainty in the Weibull parameters, instead of a single number for the optimal test interval, we get a range of values (random variable) for each N. The dots of the figure indicate point estimates (mean of the distributions), while the bars show the 90% confidence intervals (5 and 95 percentiles). 0.00 200.00 400.00 600.00 800.00 1000.00 1200.00 1400.00 1600.00 1800.00 12345678910 N c r f ( Topt ) [ $ / m ot h] Figure 22. Point estimates and 90% confidence bounds for crf(T opt ) vs. N for a relief valve 52 Similarly, Figure 22 shows the point estimates and 90% confidence intervals for the cost rate function evaluated at T opt for different N. It can be seen in both Figure 21 and 22 that the point estimates are consistent with the results plotted in Figures 14 and 15. It also can be noticed (especially in Figure 21) that the 90% confidence intervals become narrower as N increases; so that, we can note that the effect of uncertainty in the Weibull parameters is more significant for small values of the overhaul frequency. Although the optimal N is obvious in Figure 15, we can observe in Figure 22 that the optimal frequency becomes also a random variable when considering uncertainty in the Weibull parameters. From the Monte Carlo Simulation (see Figure 23), N opt is a discrete random variable with mode equals 2 and standard deviation equals 0.8 (5-percentile = 1, 95-percentile = 3). 0 0.1 0.2 0.3 0.4 0.5 12345 Nopt Fr e que nc y Figure 23. Frequency chart for the optimal overhaul frequency for a relief valve 53 Summing up, for this practical example we can say with 90% confidence that the optimal inspection interval for the relief valve is between 36 and 70 months (3 and 5.8 years), with an optimal overhaul frequency that falls between 1 and 3. The use of uncertainty in the inputs of the model provides important information for decision making, especially when there is no failure history available. This kind of analysis provides results in terms of range of values (confidence intervals) instead of single point estimates. Unlike the deterministic analysis, the consideration of certainty bounds usually makes inspection planners confident about the results. 54 Chapter 5: Extensions Previous chapters described the analytical model developed for this work and discussed important results based on general inputs and particular examples related to a case study. The following paragraphs explain possible extension of this work that can be considered in future studies. Among the aspects that can be expanded or added, the following elements can be taken into account: 5.1 Generalized Renewal Process after test cycles The model presented in this report assumes that the component returns to an ?as bad as old? condition after every test cycle (NHPP model). Nevertheless, in real situations it is possible to find cases where the inspection/maintenance tasks restore the unit to conditions different than ?as bad as old?. Even though it has been shown that the NHPP model provides results very similar to those obtained with Generalized Renewal Process (GRP) with 0.1 < q < 1 [17], the GRP approach is still useful for situations with q > 1 (worse-than-old restoration) and q < 0 (better-than-new restoration). To consider the GRP model after the inspection cycle, instead of the NHPP expression (equation 24), the following equation should be considered [15]: ? ? ?? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? + ? ? ? ? ? ? ? ? ? ?= ? ? ? = ? = 1 1 1 1 exp1)( i j ji i j ji tqt t q tF (50) 55 Like equation 24, equation 50 assumes that the interarrival times between successive failures follow a Weibull distribution. However, the GRP expression incorporates an additional parameter (q) to account for the quality of the maintenance tasks. 5.2 Imperfect surveillance inspections The analytical model described in this document is based on perfect surveillance tests. According to this assumption, unraveled failures can always be detected by the inspection tasks (probability of detection equals to 1). A possible extension for this analysis may incorporate the probability of failures remaining undiscovered after the test, to account for the possibility of imperfect inspections. Considerations for optimal inspection and maintenance policies of units with unrevealed failures with less than perfect tests are given by Bad?a et al. [8]. 5.3 Further uncertainty and risk analysis In chapter 4 the model sensitivity was discussed. As presented in Table 7 and explained in section 4.5, the results for the optimal inspection policy are very susceptible to changes in the values of test duration (Ti), Cost of overhaul (Co) and the potential cost of unavailability (p?Cp). As done for the Weibull parameters of the time-between-failure distribution, random variables maybe used to account for the variability of Ti and Co. As presented in the Appendix (Mathcad work sheet) this extended analysis can be modeled by using random numbers and Monte Carlo Simulation. 56 The uncertainty associated with the impact of unavailability maybe handled with a rigorous risk analysis that considers different scenarios. Moreover, the analytical model and risk analysis may be extended to incorporate not only the impact of unavailability in terms of cost per unit time, but also potential accidents with instant safety consequences. 5.4 Systems of periodically tested components This work is focused on concepts and equations for availability and cost analysis at component level. However, future studies may be extended to system analysis. The analysis of systems consists of studying multiple components simultaneously, considering not only the properties of the single items but also the way they are related. Besides single periodically tested units, situations are also encountered for systems with components whose failures are only detected upon inspection. In these cases, the evaluation of availability becomes more complex, for different testing strategies may be considered. To analyze the availability of this type of systems it is important to give especial attention to the test interval, T, and also to the inspection strategy. Depending on the inspection crew and maintenance plan, units can be tested simultaneously or in a staggered manner, which will affect the estimation of the system availability. Lewis [18] developed simple expressions assuming negligible test and repair times with respect to the inspection interval, just to evaluate the effect of staggered testing in system availability. Table 8 presents approximated equations for typical system configurations of periodically tested components with identical constant failure rate (?): 57 Table 8. Average availability for systems with periodically tested units Test strategy 2-unit series system 2-unit parallel system Simultaneous 1 - ? T + 2/3(? T) 2 1 - 1/3(? T) 2 Staggered 1 - ? T + 13/24(? T) 2 1 - 5/24(? T) 2 From Table 8, it is important to point out that the availability of redundant systems is grater with staggered tests, while the availability of series configuration is better with simultaneous inspections. Redundant system with common cause failures Redundant systems are usually subject to dependencies and common cause failures that may affect more than one unit at the same time. For these cases, the inspection strategy plays an important role in system availability, especially for those systems where the component failures remain unrevealed until next inspection. Resent works have explored the effect of test strategies in the availability of redundant system. Particularly, Vaurio [21] presented a set of equations to evaluate the common cause failure probabilities in standby systems, and an economic model to optimize the inspection interval according to different inspection policies. The cost-based optimization for the inspection policy of systems with periodically tested components is still an area of research. Given the additional complication that represents the system itself, research works usually make simplifications at component level when studying complex systems. In general, the optimal inspection policy of systems with periodically tested units allows for advance considerations. Analysis on this topic should take into account not only the optimal test interval and maintenance frequency, but also the optimal degree of test staggering. 58 Chapter 6: Conclusion Inspection and maintenance strategies have been widely studied for monitored components; however, less attention has been paid to units whose failures are detected upon surveillance tests. In this work, a cost rate function model was developed to identify the optimal inspection policy of periodically tested repairable components under aging process. The model presented in this report is based on minimizing the total cost per unit time during the component renewal cycle. It considers the unit availability assuming that the item is ?as old? after tests and repairs and ?as new? after overhauls. The model takes into account costs associated not only with surveillance tests and maintenance, but also with the potential losses related to component unavailability. The model also considers the duration of inspections and repairs, and allows for uncertainty in the parameters of the probability density function of the time between failures. The effect of overhaul policy on the optimal test interval, T opt , was studied considering different values of the overhaul frequency, N. Results obtained from diverse sets of costs and time-to-failure parameters suggest that T opt decreases and tends to stabilize when N is increased. The effect of N on the total cost per unit time was studied by evaluating the cost rate function, crf, in T opt for different values of N. The analysis reveals that an optimal N can be identified, so that, the use of the model provides not only an optimal time between surveillance tests, but also an optimal overhaul frequency for periodically tested components. 59 A practical numerical example carried out for a typical tested unit (a safety relief valve) with mean time between failures of 105 days shows that the optimal policy would be to inspect the valve every 51 months, with preventive overhaul every two test cycles. Comparison between the optimal test interval obtained from cost optimization and the one got for maximal availability (traditional approach) shows that the cost-based optimization may provide significant savings especially in facilities with several periodically tested units. Sensitivity evaluation in the case study indicates that the results are almost insensitive to moderate variations of the repair duration, as well as variations of the direct cost of inspection and repairs (this behavior may not be the case in other analyses). On the other hand, the optimal inspection policy is very susceptible to changes in test duration (Ti), cost of overhaul (Co), cost of unavailability (Cp), and the Weibull parameters of the time between failures (? and ? ). The optimal value of N decreases with increments in Ti or Cp, and increases when Co is augmented. This confirms the significance of these inputs and the importance of considering the test duration (often neglected in literature) in the analysis of periodically tested units. Monte Carlo Simulation was used to study the effect of uncertainty in the Weibull parameters of the relief valve failure distribution. Results obtained from this analysis are consistent with those obtained from deterministic parameters. The use of uncertainty in the inputs of the model provides important information for inspection decision making, especially when there is no failure history available. This kind of analysis offers results in terms of confidence intervals instead of single point estimates. 60 Possible extensions of this work may include the consideration of Generalized Renewal Process (GRP) for modeling the component maintenance (both repairs and preventive overhaul) as well as the inclusion of non-perfect inspections. The use of uncertainty in other model inputs, such as time to inspect and cost of overhaul, and the analysis at system level (arrangements of many tested components) are also important aspects to be considered in future research works. 61 Appendix Mathcad work sheets 62 RISK AND ECONOMIC ESTIMATIN OF INSPECTION INTERVAL FOR PERIODICALLY TESTED REPAIRABLE COMPONENTS i 1 1000.. r i rnd 1() Monte Carlo Simulation ? i( ) qnorm r i 1.5, 0.1, ? i( ) qnorm r i 3571, 357, fti,() ? i() ? i() t ? i() ? i() 1 . e t ? i() ? i() . ? is Weibull scale parameter RTi,()1 0 T tfti,()d FTi,()1RTi,() First cycle Ti 0.05 Tr 0.25 n1 Cio 500 mi 50 Cro 5000 mr 500 Cp 800000 p 0.4 Ci1 Cio mi n . Cr1 Cro mr n . Co 20000 L1 T i,( ) Ti T Tr F T i,() . U1 T i,()TRTi,() . 0 T ttft i,() . d D1 T i,()L1Ti,()U1Ti,() Cost1 T i,( ) Ci1 Cr1 F T i,() . pCp . D1 T i,() . Co Cost rate function y1 T i,() Cost1 T i,() L1 T i,() 63 Second cycle n2 Weibull conditional PDF (Power Law Model) f2 t i, T,() ? i() ? i() t ? i() ? i() 1 . e T ? i() ? i() t ? i() ? i() . R2 T i,()1 T 2T . tf2 t i, T,()d F2 T i,()1R2Ti,() Ci2 Cio mi n . Cr2 Cro mr n . L2 T i,( ) Ti T Tr F2 T i,() . U2 T i,()TR2Ti,() . T 2T . ttT()f2ti, T,() . d D2 T i,()L2Ti,()U2Ti,() Cost2 T i,( ) Ci2 Cr2 F2 T i,() . pCp . D2 T i,() . Cost rate function Cost T i,( ) Cost1 T i,( ) Cost2 T i,() LTi,()L1Ti,()L2Ti,() y2 T i,() Cost T i,() LTi,() 64 Third cycle n3 Weibull conditional PDF (Power Law Model) f3 t i, T,() ? i() ? i() t ? i() ? i() 1 . e 2T . ? i() ? i() t ? i() ? i() . R3 T i,()1 2T . 3T . tf3 t i, T,()d F3 T i,()1R3Ti,() Ci3 Cio mi n . Cr3 Cro mr n . L3 T i,( ) Ti T Tr F3 T i,() . U3 T i,()TR3Ti,() . 2T . 3T . tt2T . ()f3ti, T,() . d D3 T i,()L3Ti,()U3Ti,() Cost3 T i,( ) Ci3 Cr3 F3 T i,() . pCp . D3 T i,() . Cost rate function Cost T i,( ) Cost1 T i,( ) Cost2 T i,()Cost3 T i,() LTi,()L1Ti,()L2Ti,()L3 T i,() y3 T i,() Cost T i,() LTi,() 65 Fourth cycle n4 Weibull conditional PDF (Power Law Model) f4 t i, T,() ? i() ? i() t ? i() ? i() 1 . e 3T . ? i() ? i() t ? i() ? i() . R4 T i,()1 3T . 4T . tf4 t i, T,()d F4 T i,()1R4Ti,() Ci4 Cio mi n . Cr4 Cro mr n . L4 T i,( ) Ti T Tr F4 T i,() . U4 T i,()TR4Ti,() . 3T . 4T . tt3T . ()f4ti, T,() . d D4 T i,()L4Ti,()U4Ti,() Cost4 T i,( ) Ci4 Cr4 F4 T i,() . pCp . D4 T i,() . Cost rate function Cost T i,( ) Cost1 T i,( ) Cost2 T i,()Cost3 T i,()Cost4 T i,() LTi,()L1Ti,()L2Ti,()L3 T i,()L4 T i,() y4 T i,() Cost T i,() LTi,() 66 Fifth cycle n5 Weibull conditional PDF (Power Law Model) f5 t i, T,() ? i() ? i() t ? i() ? i() 1 . e 4T . ? i() ? i() t ? i() ? i() . R5 T i,()1 4T . 5T . tf5 t i, T,()d F5 T i,()1R5Ti,() Ci5 Cio mi n . Cr5 Cro mr n . L5 T i,( ) Ti T Tr F5 T i,() . U5 T i,()TR5Ti,() . 4T . 5T . tt4T . ()f5ti, T,() . d D5 T i,()L5Ti,()U5Ti,() Cost5 T i,( ) Ci5 Cr5 F5 T i,() . pCp . D5 T i,() . Cost rate function Cost T i,( ) Cost1 T i,( ) Cost2 T i,()Cost3 T i,()Cost4 T i,()Cost5 T i,() LTi,()L1Ti,()L2Ti,()L3 T i,()L4 T i,()L5 T i,() y5 T i,() Cost T i,() LTi,() NOTES: The procedure is similar for subsequent test cycles. Optimizations were done by using Mathcad commands MAXIMIZE and MINIMIZE. 67 Bibliography 1. L. CALDAROLA, ?Unavailability and Failure intensity of Components?, Nuclear Engineering and Design J., 44, p. 147, 1977. 2. W. E. VESELY et al., ?FRANTIC II: A Computer Code for Time Dependent Unavailability Analysis?, Washington: U.S. Nuclear Regulatory Commission, NUREG/CR 1924, 1981. 3. S. H. SIM, ?Unavailability Analysis of Periodically Tested Components of Dormant Systems?, IEEE Transactions on Reliability, R-34, 1, p. 88, 1985. 4. E. DIALYNAS and D.MICHOS, ?Time Dependent Unavailability Analysis of Standby Units Including Arbitrary Failure Distributions and Three Inspection/Maintenance Policies?, Reliability Engineering and System Safety, 39, p. 35, 1993. 5. T. HILSMEIER, T. ALDEMIR, and W. E. VESELY, ?Time-dependent Unavailability of Aging Standby Components Based on Nuclear Plant Data?, Reliability Engineering and System Safety, 47, p. 199, 1995. 6. K. ADACHI and T. NISHIDA, ?An Optimum Inspection Policy for a One-unit System with Preventive Maintenance?, Journal of the Operations Research Society of Japan, 26, 2, p. 105, 1983. 7. J. VAURIO, ?Availability and Cost Functions for Periodically Inspected Preventively Maintained Units?, Reliability Engineering and System Safety, 63, p. 133, 1999. 8. F. BADIA, M. BERRADE, C. CAMPOS, ?Optimal Inspection and Preventive Maintenance of Units with Revealed and Unrevealed Failures?, Reliability Engineering and System Safety, 78, p. 157, 2002. 9. S. MARTORELL et al., ?Constrained Optimization of Test Intervals Using a Steady- state Genetic Algorithm?, Reliability Engineering and System Safety, 67, p. 215, 2000. 10. C. LAPA, C. PEREIRA, P. FRUTUOSO, ?Surveillance Test Policy Optimization Through Genetic Algorithms Using Non-periodic Intervention Frequencies and Considering Seasonal Constraints?, Reliability Engineering and System Safety, 81, p. 103, 2003. 11. C. EBELING, ?An Introduction to Reliability and Maintainability Engineering?, New York, McGraw-Hill, 1997. 12. M. MODARRES, M. KAMINSKIY, and V. KRIVTSOV, ?Reliability Engineering and Risk Analysis, New York, Marcel Dekker, 1999. 68 69 13. M ROUSAND and H. ARNLJOT, ?System Reliability Theory?, New Jersey, John Wiley & Sons, 2004. 14. H. ASCHER and H. FEINGOLD, ?Repairable System Reliability: Modeling, Inference, Misconceptions and their Causes?, New York, Marcel Dekker, 1984. 15. M. YANES, F. JOGLAR and M. MODARRES, ?Generalized Renewal Process for Analysis of Repairable Systems with Limited Failure Experience?, Reliability Engineering and System Safety, 77, p. 167, 2002. 16. M. KIJIMA and N. SUMITA, ?A Useful Generalization of Renewal Theory: Counting Process Governed by Non-negative Markovian Increments?, Journal of Applied Probability, 23, p. 71, 1986. 17. J. L. HURTADO, F. JOGLAR and M. MODARRES, ?Generalized Renewal Process: Models, Parameter Estimation and Applications to Maintenance Problems?, International Journal on Performability Engineering, Submitted for Publication (2005). 18. E. LEWIS, ?Introduction to Reliability Engineering?, New Jersey, John Wiley & Sons, 1996. 19. A. JARDINE, ?Maintenance, Replacement and Reliability?, New Jersey, John Wiley & Sons, 1973. 20. J. W. EVANS and J. Y. EVANS, ?Product Integrity and Reliability in Design?, London, Springer-Verlag, 2001. 21. J. VAURIO, ?Common Cause Failure Probabilities in Standby Safety Systems Fault Tree Analysis with Testing-scheme and Timing Dependencies?, Reliability Engineering and System Safety, 79, p. 43, 2003.