ABSTRACT 
 
 
Title of Thesis: SYSTEM LEVEL APPROACH FOR LIFE CONSUMPTION 
MONITORING OF ELECTRONICS 
  
 Sathyanarayan Ganesan, Master of Science, 2004 
  
Thesis Directed By: Professor Michael Pecht  
Department of Mechanical Engineering 
 
Life consumption monitoring involves monitoring the operating and environmental 
conditions to predict the remaining life. This thesis presents a life consumption 
monitoring process that applicable to the system level. Failure Modes, Mechanisms and 
Effects Analysis (FMMEA) is introduced as a new step in the life consumption 
monitoring process that systematically identifies potential failure mechanisms and 
models for all potential failures modes, and prioritizes the failure mechanisms to identify 
the high priority mechanisms. High priority mechanisms help in determining right suite 
of product parameters that need to be monitored for determining the damage and life 
consumed. A case study describing the FMMEA process for a simple electronic circuit 
board assembly is presented. A new methodology for remaining life prediction has been 
introduced in the life consumption monitoring process and is validated from a previous 
life consumption monitoring case study. A discussion on uncertainty and accuracy of 
prediction is also presented. 
 
 
 
SYSTEM LEVEL APPROACH FOR LIFE CONSUMPTION 
MONITORING OF ELECTRONICS  
 
By 
Sathyanarayan Ganesan 
 
 
 
Thesis submitted to the Faculty of the Graduate School of the  
University of Maryland, College Park, in partial fulfillment 
of the requirements for the degree of 
Master of Science 
2004 
 
 
 
 
 
 
Advisory Committee: 
Professor Michael Pecht, Chair 
Associate Professor Peter Sandborn  
Associate Professor Bongtae Han 
 
 
 
 
 
 
 
 
 
 
? Copyright by 
Sathyanarayan Ganesan 
2004 
 
 ii 
 
ACKNOWLEDGEMENTS 
 
First of all, I am grateful to Dr. Michael Pecht and Dr. Diganta Das for giving me the 
opportunity to work in CALCE as a research assistant and undertake this work. They 
have been an advisor to me in more ways than just academically. Without their guidance 
this work wouldn?t have been possible. Next, I would like to thank my thesis committee 
for appreciating and acknowledging my graduate research work. My thanks also extend 
to Dr. Michael Osterman, Dr. Peter Rodgers, Dr. Valerie Everoy, Dr. Michael Azarian, 
Dr. Sanka Ganesan and Dan Donahoe for their inputs and suggestions to the thesis. 
I am greatly thankful to all my colleagues at CALCE (everyone@calce.umd.edu) for their 
help and support. My special thanks to Dr. Miky Lee, Keith Rogers, Raju Shah, 
Satchidananda Mishra, Sanjay Tiku, Paul Casey, Subramaniam Rajagopal, Yuki Fukuda, 
Dr. Ji Wu, Yu-Chul Hwang and Nikhil Vichare, Rajeev Mishra, Anupam Choubey, Sony 
Mathew, Kaushik Ghosh, Manash Dash, Karambu Nathan, Raj Bahadur, Niranjan 
Vijayaragavan, Vidyasagar Shetty, Anshul Shrivastava, Sudhir Kumar and Vimal 
Mayank for their good company. Also last but not the least, my regards and gratitude to 
my parents for their constant support and motivation. 
 
 iii 
 
TABLE OF CONTENTS 
 
ABSTRACT........................................................................................................................ 1 
CHAPTER 1: INTRODUCTION .................................................................................. 1 
CHAPTER 2: IMPROVED LIFE CONSUMPTION MONITORING APPROACH ... 4 
CHAPTER 3: FAILURE MODES, MECHANISMS AND EFFECTS ANALYSIS 
METHODOLOGY ............................................................................................................. 6 
3.1 FAILURE MODES, MECHANISMS AND EFFECTS ANALYSIS METHODOLOGY ................. 11 
3.1.1 System definition, elements and functions ............................................................12 
3.1.2 Potential failure modes ..........................................................................................13 
3.1.3 Potential failure causes ..........................................................................................14 
3.1.4 Potential failure mechanisms.................................................................................14 
3.1.5 Failure models........................................................................................................15 
3.1.6 Life cycle environment and operating conditions..................................................16 
3.1.7 Failure mechanism prioritization...........................................................................16 
3.1.8 Documentation.......................................................................................................21 
3.2 CASE STUDY.............................................................................................................. 22 
3.3 BENEFITS .................................................................................................................. 26 
CHAPTER 4: PROGNOSTICS AND REMAINING LIFE ESTIMATION............... 31 
4.1.1 Leap-Frog Technique.............................................................................................33 
4.1.2 Accuracy of Remaining Life Prediction ................................................................37 
 iv 
CHAPTER 5: CONCLUSIONS................................................................................... 40 
CHAPTER 6: REFERENCES...................................................................................... 42 
 v 
 
LIST OF FIGURES 
 
Figure 1: Improved life consumption monitoring methodology......................................... 5 
Figure 2: FMEA worksheet [13]......................................................................................... 9 
Figure 3: FMMEA methodology...................................................................................... 13 
Figure 4: Failure mechanism prioritization....................................................................... 17 
Figure 5: Elements in the circuit card system................................................................... 22 
Figure 6: Choosing the right data window........................................................................ 33 
Figure 7: LEAP ? Frog Algorithm.................................................................................... 34 
Figure 8: Comparison of remaining life estimation models ? Gradual ............................ 35 
Figure 9: Comparison of remaining life estimation models ? Gradual & Sudden ........... 36 
Figure 10: Accuracy of prediction.................................................................................... 37 
Figure 11: Error and uncertainty in predictions................................................................ 39 
 
 vi 
 
LIST OF TABLES 
 
Table 1: Occurrence ratings 20 
Table 2: Severity ratings 20 
Table 3: Risk Matrix 21 
Table 4: FMMEA worksheet for the Case Study 29 
 1 
Chapter 1: INTRODUCTION 
Health monitoring is a method of assessing the degradation of a product health 
(reliability) in its life cycle environment by continuous or periodic monitoring, and 
interpretation of, the parameters indicative of its health. Based on the product?s health, 
determined from the monitored actual life cycle conditions, procedures can be developed 
to maintain the product [32]. Health monitoring therefore permits new products to be 
concurrently designed for a life cycle environment known through monitoring. Product 
health monitoring can be implemented through the use of various techniques to sense and 
interpret the parameters indicative of: 
1. Performance degradation, (e.g. deviation of operating parameters from their expected 
values); 
2. Physical or electrical degradation (e.g. cracks, corrosion, delamination, increase in 
electrical resistance or threshold voltage); 
3. Changes in life cycle environment (e.g. usage duration and frequency, ambient 
temperature, vibration, shock, humidity, etc.). 
Based on the product?s health, determined from the monitored actual life cycle 
conditions, procedures can be developed to maintain the product [28]. Health monitoring 
therefore permits new products to be concurrently designed for a life cycle environment 
known through monitoring. Health monitoring systems are typically categorized as 
diagnostic, prognostic, or life consumption monitoring (LCM) systems.  
Diagnostic systems monitor the current operating state of health to identify potential 
causes of failure in order to restore the system. These systems are widely used across 
 2 
different industries for fault identification purposes. An example of a diagnostic system is 
the use of piezoelectric sensors, which detect and analyze the ultrasonic acoustic signals 
traveling through machinery to report fault or wearout condition [29]. 
Prognostic systems monitor the faults or precursors to failure, and predict the time or 
number of operational cycles to failure induced by a monitored fault. Examples of 
prognostic systems include Self-Monitoring Analysis and Reporting Technology 
(SMART) employed in computer hard drives [30]. 
LCM is a method of monitoring parameters indicative of a system?s life cycle health 
and converting the measured data into life consumed [32]. The LCM process involves 
continuous or periodic measuring, sensing, recording, and interpretation of product 
parameters to quantify the amount of product degradation. LCM systems have been 
introduced in the automotive industry, for automotive engine oil monitoring [31] the 
degradation of which depends upon time, temperature, and contamination related to 
engine usage. Such LCM systems incorporate physics based mechanisms and predictive 
models which estimate the remaining life of oil based on the monitored engine usage.  
The model algorithms are programmed into the engine control modules to inform the 
driver of the oil life status.  
Failures in electronic products are often attributable to various combinations, 
intensities, and durations of environmental loads, such as temperature, humidity, 
vibration, and radiation. For many of the failure mechanisms in electronic products, there 
are models that relate environmental loads to the time to failure of the product. Thus, by 
monitoring the environment of the product over its life cycle, it may be possible to 
 3 
determine the amount of damage induced by various loads and predict when the product 
might fail [32]. 
Based on the knowledge of the product failure mechanisms, appropriate life 
consumption monitoring systems and prognostics strategies can be developed. This paper 
discusses the application of FMMEA to supporting the implementation of such strategies. 
The FMMEA methodology is presented, having the potential to contribute for effective 
life consumption monitoring of electronics. Ideally the products should be designed such 
that the threshold of damage required to cause failure should not occur within the usage 
life of the product. To achieve that, knowledge of usage environment, the failure modes, 
failure mechanisms and its impact on the design is necessary. To evaluate the product 
reliability and to design for reliability all relevant failure mechanisms must be 
considered. This task of determining the set of relevant failure mechanisms can become a 
large undertaking for most electronic systems. FMMEA can be used to achieve this task 
in a systematic manner. 
 4 
Chapter 2: IMPROVED LIFE CONSUMPTION MONITORING APPROACH 
Ramakrishnan [26] proposed a physics-of-failure based methodology for determining 
the damage or life consumed in a product. It showed how to make the environmental data 
compatible with physics-of-failure models to estimate the amount of accumulated 
damage. Mishra [27] extended the approach to include estimation of the remaining life of 
a product from the collected environmental parameters. 
System level life consumption monitoring requires the systematically identifying all 
the parameters that drive the failure in a system and monitor those select few parameters 
that drive the failure. FMMEA is introduced as a new step in the LCM process that 
systematically identifies all failure mechanisms and models for all potential failures 
modes, and prioritizes the failure mechanisms to identify the high priority mechanisms.  
Ideally all failure mechanisms and their interactions must be considered for product 
design and analysis. In the life cycle of a product, several failure mechanisms may be 
activated by different environmental and operational parameters acting at various stress 
levels, but only a few operational and environmental parameters and failure mechanisms 
are in general responsible for the majority of the failures. High priority mechanisms 
provide effective utilization of resources and are those select failure mechanisms that 
determine the operational stresses and the environmental and operational parameters that 
must be accounted for in the design or be controlled. This enables the right suite of 
product parameters that need to be monitored for determining the damage and life 
consumed. The monitored parameters are then simplified to reduce memory requirement 
and also to be compatible with the failure models associated with the critical failure 
 5 
mechanisms. The simplified data is used for performing stress and damage accumulation 
analysis and accumulated damage is subsequently used for predicting the remaining life.  
The traditional remaining life estimation algorithm used in earlier approaches in life 
consumption monitoring process were overtly simplistic and took into account only the 
previous data point during iteration. A new prognostic algorithm for remaining life 
prediction has been demonstrated that incorporates all data points and a comparison has 
been made with other remaining life prediction models using data from a previous case 
study. 
The new LCM methodology has five steps to estimate the remaining life of an 
electronic product as shown in Figure 1 . These steps include FMMEA, data processing 
and simplification, stress and damage accumulation analysis and remaining life 
estimation.  
Monitor product parameters
Conduct data simplification and processing
Perform stress and damage accumulation analysis 
Conduct FMMEA
Estimate the remaining life of the product 
Remedial action?Schedule Maintenance orReplace product
YesNo
Continue monitoring
 
Figure 1: Improved life consumption monitoring methodology 
 6 
Chapter 3: FAILURE MODES, MECHANISMS AND EFFECTS ANALYSIS 
METHODOLOGY 
The competitive marketplace and need for reducing life cycle cost for products are 
making the product developers and manufacturers look for economic ways to improve the 
product development process. Increased demands on companies for high quality, reliable 
products and the increasing capabilities and functionality of many products are making it 
difficult for manufacturers to maintain the quality and reliability. Industry has been 
interested in a systematic approach that gives a better understanding of the potential 
failures and how they might affect product performance. Some organizations are either 
using or requiring the use of Failure Mode and Effects Analysis (FMEA) towards 
achieving that goal. Failure is the loss of the ability of a product to perform its required 
function [1]. FMEA is a systematic procedure to evaluate potential failures, identify the 
effects of failures, and determine actions which could eliminate or reduce the chance of 
the potential failure occurring [10]. 
FMEA was developed as a formal methodology in the 1950?s at Grumman Aircraft 
Corporation, where it was used to analyze the safety of flight control systems for naval 
aircraft. From the 1970?s through the 1990?s, various military and professional society 
standards and procedures were written to define the FMEA methodology [7] [8] [13] to 
meet the needs for various industry sectors. In 1971, the Electronic Industries Association 
(EIA) G-41 committee on reliability published ?Failure Mode and Effects Analysis?. In 
1974, the US Department of Defense published Mil-Std 1629 ?Procedures for Performing 
a Failure Mode, Effects and Criticality Analysis? which through several revisions became 
the basic approach for analyzing systems. In 1985, the International Electrotechnical 
 7 
Commission (IEC) introduced IEC 812 ?Analysis Techniques for System Reliability ? 
Procedure for Failure Modes and Effects Analysis?. In the late 1980?s the automotive 
industry adopted the FMEA practice. In 1993, the Supplier Quality Requirements Task 
Force comprised of representatives from Chrysler, Ford and GM, introduced FMEA into 
the quality manuals through the QS 9000 process. In 1994, Society of Automotive 
Engineers (SAE) published SAE J-1739 ?Potential Failure Modes and Effects Analysis in 
Design and Potential Failure Modes and Effects Analysis in Manufacturing and 
Assembly Processes? reference manual that provided general guidelines in preparing an 
FMEA. In 1999, Daimler Chrysler, Ford and GM as part of the International Automotive 
Task Force agreed to recognize the new international standard ?ISO/TS 16949? that 
included FMEA and would eventually replace QS 9000 in 2006. 
FMEAs are used across many industries and are often referred to by types such as 
System FMEA, Design FMEA (DFMEA), Process FMEA (PFMEA), Machinery FMEA 
(MFMEA), Functional FMEA, Interface FMEA and Detailed FMEA. Although the 
purpose, terminology and details can vary according to type and industry, the principle 
objectives of FMEAs are to anticipate the most important problems early in the 
development process and either prevent the problems or minimize their consequences. 
FMEA can be applied at any point in the product life cycle from the design to the end-of-
life and provide a formal and systematic approach for product and process development.  
FMEA was initially limited to the analysis of the effects of the failure modes for 
safety analysis. Failure Modes, Effects and Criticality Analysis (FMECA) was considered 
an extension of FMEA that included assessing the probability of occurrence and 
criticality of potential failure modes. Today, the distinctions between the two have 
 8 
become less well defined and the terms FMEA and FMECA are used interchangeably [6] 
[7]. FMEA is also one of the six sigma tools [9] and is utilized by some six sigma 
organizations in some form or the other. 
The FMEA methodology is based on a hierarchical approach to determine how 
possible failure modes affect the system [7]. The basic procedure is to: 
1. Identify elements or functions in the system 
2. Identify all element or function failure modes 
3. Determine the effect(s) of each failure mode and its severity 
4. Determine the cause(s) of each failure mode and its probability of occurrence 
5. Identify the current controls in place to prevent or detect the potential failure modes 
6. Assess risk, prioritize failures and assign corrective actions to eliminate or mitigate 
the risk 
7. Document the process 
FMEA involves inputs from a cross-functional team having the ability to analyze the 
whole product life cycle [15]. To achieve the greatest value, FMEA should be conducted 
before a failure mode has been unknowingly built into the product when the design 
changes are easier and less expensive [4]. A typical design FMEA worksheet is shown in 
Figure 2. For risk assessment, an FMEA uses occurrence and detection probabilities in 
conjunction with severity criteria to develop a risk priority number (RPN). RPN is the 
product of severity, occurrence and detection. After the RPNs are evaluated, they are 
prioritized and corrective actions are taken to mitigate the risk. Once the corrective 
actions are implemented, the severity, occurrence and detection values are reassessed, 
 9 
and a new RPN is calculated. This process continues until the risk level is acceptable. 
Thus, FMEA is reviewed and updated periodically. 
System      Potential  
FMEA 
Number         
Subsystem      Failure Mode and Effects Analysis  Prepared By         
Component      (Design FMEA)  FMEA Date         
Design Lead       Key Date         Revision Date         
Core Team                   Page     of   
                
                      Action Results 
Item / 
Function 
Potential 
Failure 
Mode(s) 
Potential 
Effect(s)  
of Failure 
Sev 
Potential 
Cause(s) of 
Failure 
Prob 
Current 
Design 
Controls 
Det 
RP
N Recommended  Action(s) 
Responsibility 
& Target 
Completion 
Date 
Actions Taken 
Ne
w 
Se
v 
Ne
w 
Oc
c 
Ne
w 
De
t 
Ne
w 
RP
N 
  
Figure 2: FMEA worksheet [13] 
Neither FMEA nor FMECA identify the failure mechanisms and models in analysis 
and reporting process. Failure mechanisms are the processes by which specific 
combination of physical, electrical, chemical and mechanical stresses induce failure [1]. 
In order to understand and prevent failures, they must be identified with respect to the 
predominant stresses (mechanical, thermal, electrical, chemical, radiation) which cause 
them. The knowledge about the cause and consequences of these mechanisms help in 
several design and development steps. These include virtual qualification, accelerated 
testing, root cause analysis and life consumption monitoring which are essential for 
developing reliable products in an economical manner. Besides these benefits, 
understanding the failure mechanisms also helps to identify the acceptable level of 
?defects? and variability in manufacturing and material parameters and to specify 
appropriate ratings for the products. 
Because of its lack of utilization of failure mechanism information, FMEA cannot 
provide meaningful input to procedures such as virtual qualification, root cause analysis 
and accelerated test programs. In FMEA, all failure modes are considered individually 
 10 
and the combined effect of the failure modes is not taken into consideration. Also FMEA 
is based on precipitation and detection of failure and it is not designed to be applied in 
cases that involve a continuous monitoring of performance degradation over time such as 
life consumption monitoring and prognostics. 
Use of environmental and operating conditions is not made at a quantitative level in 
FMEA. At best they are used to eliminate certain failure modes from consideration. For 
failure prioritization in FMEA, a qualitative scale is transformed into a quantitative scale 
for evaluating RPN. Potential failure modes having higher RPNs are assumed to 
represent a higher risk than those having lower numbers. In the transformation of 
qualitative to quantitative scale all three indices, severity, occurrence and detection have 
the same metric and are equally important [9]. Thus, small changes in one of the factors 
from which the RPN is computed can have different effects on the RPN. Hence, some 
implementations of FMEA provide a false sense of granularity between the different 
failure modes when none exists. For example, if detection and occurrence both have a 
rating of 10, a 1 point difference in the severity ranking results in a 100 point difference 
in the RPN; at the other extreme of detection and occurrence have a rating equal to 1, the 
same 1 point difference only gives a 1 point difference in the RPN [11].  
There is a need to prioritize failures using the predominant stresses from the 
environmental and operating conditions from which they arise and evaluate them 
quantitatively to make the prioritization process more scientific. Use of failure 
mechanisms and models is a step in that direction. The task of determining the failure 
mechanisms can become a large undertaking for most products and systems. FMMEA 
can be used to achieve this task in a systematic manner. 
 11 
3.1 Failure modes, mechanisms and effects analysis methodology 
FMMEA is a systematic approach to identify failure mechanisms and models for all 
potential failures modes, and prioritize them to identify high priority failure mechanisms. 
High priority failure mechanisms determine the operational stresses and the 
environmental and operational parameters that need to be accounted for in the design or 
be controlled. 
FMMEA is based on understanding the relationships between product requirements 
and the physical characteristics of the product (and their variation in the production 
process), the interactions of product materials with loads (stresses at application 
conditions) and their influence on product failure susceptibility with respect to the use 
conditions. This involves finding the failure mechanisms and the reliability models to 
quantitatively evaluate failure susceptibility. 
The FMMEA process merges the systematic nature of the FMEA template with the 
?design for reliability? philosophy and knowledge. In addition to the information 
gathered and used for FMEA, FMMEA uses application conditions and the duration of 
the intended application with knowledge of active stresses and potential failure 
mechanisms. The potential failure mechanisms are considered individually, and their 
assessment using appropriate models enables design and qualification the product for the 
intended application. 
The steps in conducting a FMMEA are illustrated in Figure 3. The individual steps 
are described in greater detail in the following sections. 
 12 
3.1.1 System definition, elements and functions 
FMMEA process begins by defining the system to be analyzed. A system is a 
composite of subsystems or levels that are integrated to achieve a specific objective. The 
system is divided into various sub-systems or levels and it continues to the lowest 
possible level, which is a ?component? or ?element?.  
Based on convenience or needs of the team conducting the analysis, the system 
breakdown can be either by function (i.e., according to what the system elements ?do?), 
or by location (i.e., according to where the system elements ?are?), or both (i.e., 
functional within the location based, or vice versa). For example in an automobile 
system, a functional breakdown would involve cooling system, braking system, and 
propulsion system. A location breakdown would involve engine compartment, passenger 
compartment and dashboard or control panel. In a printed circuit board system, a location 
breakdown would include the package, plated though hole (PTH), metallization, and the 
board itself.  
For each component or element all the associated functions are listed. For example 
the primary function of a solder joint is to interconnect two materials. Hence, the failure 
of a solder joint will relate to its inability to perform as a physical and electrical 
interconnect.  
 13 
Identify life cycle environmental 
and operating conditions
Identify potential failure modes 
Identify potential failure mechanisms
Identify failure models
Define system and identify 
elements and its functions to be analyzed
Identify potential failure causes
Prioritize failure mechanisms 
Document the process  
Figure 3: FMMEA methodology 
3.1.2 Potential failure modes 
A failure mode is defined as the way in which a component, subsystem, or system 
could fail to meet or deliver the intended function [10]. For example, in a solder joint the 
potential failure modes are open or intermittent change in resistance, that can hamper its 
functioning as an interconnect.  
A potential failure mode may be the cause of a potential failure mode in a higher level 
subsystem, or system, or be the effect of one in a lower level component. For all the 
elements that have been identified, all possible failure modes for each given element are 
listed. In cases where information on possible failure modes that may occur is not 
available, potential failure modes may be identified using numerical stress analysis, 
accelerated tests to failure (e.g., HALT), past experience and engineering judgment [12]. 
 14 
3.1.3 Potential failure causes 
A failure cause is defined as the circumstances during design, manufacture, or use 
that lead to a failure mode [12]. For each failure mode, all possible ways a failure can 
result are listed. Failure causes are identified by finding the basic reason that may lead to 
a failure during design, manufacturing, storage, transportation or use condition. 
Knowledge of potential failure causes can help identify the failure mechanisms driving 
the failure modes for a given element. For example, in an automotive underhood 
environment the solder joint failure modes open and intermittent change in resistance can 
potentially be caused due to temperature cycling, random vibration and shock impact. 
3.1.4 Potential failure mechanisms 
Failure mechanisms are the processes by which specific combination of physical, 
electrical, chemical and mechanical stresses induce failure [1]. Failure mechanisms are 
determined based on combination of potential failure mode and cause of failure [5] and 
selection of appropriate available mechanisms corresponding to the failure mode and 
cause. Studies on electronic material failure mechanisms, and the application of physics 
based damage models to the design of reliable electronic products comprising all relevant 
wearout and overstress failures in electronics are available in literature [2] [3]. 
Failure mechanisms thus identified are categorized as either overstress or wearout 
mechanisms. Overstress failures involve a failure that arises as a result of a single load 
(stress) condition. Wearout failure on the other hand involves a failure that arises as a 
result of cumulative load (stress) conditions [12]. For example, in the case of solder joint, 
the potential failure mechanisms driving the opens and shorts caused by temperature, 
vibration and shock impact are fatigue and overstress shock. Further analysis of the 
failure mechanisms depend on the type of mechanism. 
 15 
3.1.5 Failure models 
Failure models use stress and damage analysis to evaluate susceptibility of failure. 
Failure susceptibility is evaluated by assessing the time-to-failure or likelihood of a 
failure for a given geometry, material construction, environmental and operational 
condition. For example, in case of solder joint fatigue, Dasgupta [22] and Coffin-Manson 
[20] failure models are used for stress and damage analysis for temperature cycling. 
Failure models of overstress mechanisms use stress analysis to estimate the likelihood 
of a failure based on a single exposure to a defined stress condition. The simplest 
formulation for an overstress model is the comparison of an induced stress versus the 
strength of the material that must sustain that stress. Wearout mechanisms are analyzed 
using both stress and damage analysis to calculate the time required to induce failure 
based on a defined stress condition. In the case of wearout failures, damage is 
accumulated over a period until the item is no longer able to withstand the applied load. 
Therefore, an appropriate method for combining multiple conditions must be determined 
for assessing the time to failure. Sometimes, the damage due to the individual loading 
conditions may be analyzed separately, and the failure assessment results may be 
combined in a cumulative manner [13]. 
Failure models may be limited by the availability and accuracy of models for 
quantifying the time to failure of the system. It may also be limited by the ability to 
combine the results of multiple failure models for a single failure site and the ability to 
combine results of the same model for multiple stress conditions [12]. If no failure 
models are available, the appropriate parameter(s) to monitor can be selected based on an 
empirical model developed from prior field failure data or models derived from 
accelerated testing. 
 16 
3.1.6 Life cycle environment and operating conditions 
Life cycle loads include environmental conditions such as temperature, humidity, 
pressure, vibration or shock, chemical environments, radiation, contaminants, and loads 
due to operating conditions, such as current, voltage, and power [1]. The life cycle 
environment of a product consists of assembly, storage, handling, and usage conditions of 
the product, including the severity and duration of these conditions. Information on life 
cycle conditions, can be used for eliminating failure modes that may not occur under the 
given application conditions. 
In the absence of field data, information on the product usage conditions can be 
obtained from environmental handbooks or data monitored in similar environments. 
Ideally, such data should be obtained and processed during actual application. Recorded 
data from the life cycle stages for the same or similar products can serve as input towards 
the FMMEA process. Some organizations collect, record, and publish data in the form of 
handbooks that provide guidelines for designers and engineers developing products for 
market sectors of their interest. Such handbooks can provide first approximations for 
environmental conditions that a product is expected to undergo during operation. These 
handbooks typically provide an aggregate value of environmental variables and do not 
cover all the life cycle conditions. For example, for automotive application, life cycle 
environment and operating condition can be obtained from SAE handbook [23], but the 
application conditions even in the SAE handbook are limited. 
3.1.7 Failure mechanism prioritization  
Ideally all failure mechanisms and their interactions must be considered for product 
design and analysis. In the life cycle of a product, several failure mechanisms may be 
activated by different environmental and operational parameters acting at various stress 
 17 
levels, but only a few operational and environmental parameters and failure mechanisms 
are in general responsible for the majority of the failures. High priority mechanisms are 
those select failure mechanisms that determine the operational stresses and the 
environmental and operational parameters that must be accounted for in the design or be 
controlled. High priority failure mechanisms are identified through prioritization of all 
the potential failure mechanisms. The methodology for failure mechanism prioritization 
is shown in Figure 4. 
First level prioritization
Second level prioritization 
Evaluate severity
Evaluate
failure susceptibility
Evaluate occurrence
Potential failure mechanism list
High risk Medium risk Low risk  
Figure 4: Failure mechanism prioritization 
Environmental and operating conditions are used for first level prioritization of all 
potential failure mechanisms. If the stress levels generated by certain operational and 
environmental conditions are non-existent or negligible, the failure mechanisms that are 
exclusively dependent on those environmental and operating conditions are assigned a 
?low? risk level and are eliminated from further consideration.  
 18 
For all the failure mechanisms remaining after the first level prioritization, the 
susceptibility to failure by those mechanisms is evaluated using the previously identified 
failure models when such models are available. For the overstress mechanisms, failure 
susceptibility is evaluated by conducting a stress analysis to determine if failure is 
precipitated under the given environmental and operating conditions. For the wearout 
mechanisms, failure susceptibility is evaluated by determining the time-to-failure under 
the given environmental and operating conditions. To determine the combined effect of 
all wearout failures, the overall time-to-failure is also evaluated with all wearout 
mechanisms acting simultaneously. In cases where no failure models are available, the 
evaluation is based on past experience, manufacturer data, or handbooks. 
After evaluation of failure susceptibility, occurrence ratings under environmental and 
operating conditions applicable to the system are assigned to the failure mechanisms. For 
the overstress failure mechanisms that precipitate failure, highest occurrence rating 
?frequent? is assigned. In case no overstress failures are precipitated, the lowest 
occurrence rating ?extremely unlikely? is assigned. For the wearout failure mechanisms, 
the ratings are assigned based on benchmarking the individual time-to-failure for a given 
wearout mechanism, with overall time-to-failure, expected product life, past experience 
and engineering judgment. The occurrence ratings shown in Table 1 are defined below. 
A ?frequent? occurrence rating involves failure mechanisms with very low time-to-
failure (TTF) and overstress failures that are almost inevitable in the use condition. A 
?reasonably probable? rating involves cases that involve failure mechanisms with low 
TTF. An ?occasional? involves failures with moderate TTF. A ?remote? rating involves 
failure mechanisms that have a high TTF. An extremely unlikely rating is assigned to 
 19 
failures with very high TTF or overstress failure mechanisms that do not produce any 
failure. 
To provide a qualitative measure of the failure effect, each failure mechanism is 
assigned a severity rating. The failure effect is assessed first at the level being analyzed, 
then the next higher level, the subsystem level, and so on to the system level. Safety 
issues and impact of a failure mechanism on the end system are used as the primary 
criterion for assigning the severity ratings. In the severity rating, possible worst case 
consequence is assumed for the failure mechanism being analyzed. Past experience and 
engineering judgment may also be used in assigning severity ratings. The severity ratings 
shown in Table 2 are defined below.  
A ?very high or catastrophic? severity rating involves failure mode that may involve 
loss of life or complete failure of the system. A ?high? severity rating may involve a 
failure mode that might cause a severe injury or a loss of function of the system. A 
?moderate or significant? involves failure modes which may cause minor injury or 
gradual degradation in performance over time through loss of availability. A ?low or 
minor? rating may involve a failure mode that may not cause any injury or result in the 
system operating at reduced performance. A ?very low or none? rating does not cause 
any injury and has no impact on the system or at the best may be a minor nuisance. 
Second level prioritization involves prioritizing the failure mechanisms into three risk 
levels using the risk matrix shown in Table 3. In principle, all failure mechanisms with a 
?high risk? level are high priority mechanisms that need to be accounted for and 
controlled. Mechanisms having lower risk levels can also be classified as high priority or 
 20 
further prioritization within a given risk level may be done depending on product type, 
use condition, needs and objectives of organization. 
Table 1: Occurrence ratings 
Rating Criteria 
Frequent Overstress failure or very low TTF 
Reasonably Probable Low TTF 
Occasional Moderate TTF 
Remote High TTF 
Extremely Unlikely No overstress failure or very high TTF 
Table 2: Severity ratings 
Rating Criteria 
Very high or catastrophic System failure or safety-related catastrophic failures 
High Loss of function  
Moderate or significant Gradual performance degradation 
Low or minor System operable at reduced performance 
Very low or none Minor nuisance 
 21 
Table 3: Risk Matrix  
OCCURRENCE  
Frequent 
Reasonably 
Probable 
Occasional Remote 
Extremely 
Unlikely 
Very high or 
catastrophic 
High   High   High   Moderate  Moderate  
High High  High  Moderate  Moderate  Low  
Moderate or 
significant 
High  Moderate  Moderate  Low  Low  
Low or 
minor 
High  Moderate  Low  Low  Low  S
EV
ER
IT
Y 
Very low or 
none 
Moderate  Moderate  Low  Low  Low  
3.1.8 Documentation 
The documentation of the FMMEA process facilitates data organization, distribution, 
and analysis. For products already developed and manufactured, root-cause analysis is 
conducted for identified high priority mechanisms and corrective actions taken to 
mitigate the risk. Once the corrective actions are implemented, the failure prioritization 
may be conducted again to reassess the risks. This process continues until the risk level is 
acceptable. The history and lessons learned contained within the documentation provide a 
framework for future product introductions. 
 22 
3.2 Case study 
A printed circuit board (PCB) assembly used in automotive application was used to 
demonstrate the FMMEA process. The system was an FR-4 PCB with copper 
metallizations, plated through-hole (PTH) and eight surface mount inductors soldered to 
the pads using 63Sn-37Pb solder. The ends of inductors were connected to the PTH 
through the PCB metallization. The PTHs were solder filled and an event detector circuit 
was connected in series with all the inductors through the PTHs.  
The PCB assembly was mounted at all four corners in the engine compartment of a 
1997 Toyota 4Runner. Mountings were not considered as failure locations. System 
failure was defined as one that would result in breakdown, or no current passage in the 
event detector circuit. The system was broken down by location into six different 
elements: surface mount inductor, pad, PTH, PCB, metallization and solder interconnect 
as shown in Figure 5.  
PCB
Inductor
MetallizationPTH
Interconnect
Pad
Mounting 
 
Figure 5: Elements in the circuit card system 
For all the elements listed, the corresponding functions and the potential failure 
modes were identified. The function of all the elements was to maintain electrical 
 23 
continuity. For the PCB, besides electrical continuity, an additional function included 
providing mechanical support to the system. Table 4 shows the physical location of all 
possible failure modes for the listed elements. For example, for the solder joint the 
potential failure modes are open and intermittent change in resistance.  
For sake of simplicity and demonstration purposes, it was assumed that the test set up, 
the board and its components were defect free. Also stresses induced on the board and its 
components from manufacturing, storing, handling and transportation were assumed to be 
negligible. Potential failure causes were identified for the failure modes and the listing is 
shown in Table 4. For example, for the solder joint the potential failure causes for open 
and intermittent change in resistance are temperature cycling, random vibration or sudden 
shock impact caused by vehicle collision. 
Based on the potential failure causes that were assigned for the failure modes, the 
corresponding potential failure mechanisms were identified. Table 4 lists the failure 
mechanisms for the failure causes that were identified. For example, for the open and 
intermittent change in resistance in solder joint, the mechanisms driving the failure were 
solder joint fatigue and shock.  
For each of the failure mechanisms listed, the appropriate failure model was 
identified from literature. Information about product dimensions and geometry were 
obtained from design specification, board layout drawing and component manufacturer 
data sheets. Table 4 shows all the failure models for the failure mechanisms that were 
listed. For example, in case of solder joint fatigue, Coffin-Manson [20] failure model was 
used for stress and damage analysis for temperature cycling. 
 24 
The assembly was powered by a three volt battery source independent of the 
automobile electrical system. There were no high current, voltage, magnetic or radiation 
sources in the area. For the temperature, vibration and humidity conditions prevalent in 
the automotive underhood environment, data was obtained from Society of Automotive 
Engineers (SAE) environmental handbook [23] as no manufacturer field data were 
available for the automotive underhood environment for the Washington, DC area. The 
maximum temperature in the automotive underhood environment was 121?C [23].The car 
was assumed to operate on average three hours per day in two equal trips in the 
Washington, DC area. Random vibration effects were assumed and maximum shock level 
was assumed to be 45G for 3ms. The maximum relative humidity in the underhood 
environment was 98% at 38oC [23]. The average daily maximum and minimum 
temperature in the Washington, DC area [24] for the period the study was conducted 
were 127 ? C and 16 ? C respectively.  
After all potential failure modes, causes, mechanisms and models were identified for 
each element, the first level prioritization was made. The first level prioritization was 
made based on the life cycle environmental and operating conditions. In automotive 
underhood environment for the given test set up, failures driven by electrical overstress 
(EOS), electrostatic discharge (ESD) were ruled out because of the absence of active 
devices, low voltage source of the batteries and relatively large thickness of PCB. 
Electromagnetic interference (EMI) was also not expected, because besides the low 
voltage and current from the batteries used to power the test setup, there was no high 
current, voltage, or magnetic sources in the test area. Hence EOS, ESD and EMI were 
assigned ?low? risk level.  
 25 
The time to failure for the wearout failure mechanisms was calculated using 
calcePWA1. Occurrence ratings were assigned based on benchmarking the time-to-failure 
for a given wearout mechanism with the overall time-to-failure with all wearout 
mechanisms acting together. For the inductors there was no failure model available, and 
the occurrence rating was assigned based on failure rate data of inductors obtained from 
Telcordia handbook [25]. Since no model was available for wearout associated with the 
pads, it was arbitrarily assigned a ?remote? occurrence rating. 
An assessment of shock as overstress mechanism, with a shock level of 45G for 3ms 
using calcePWA produced no failure for interconnects and the board, hence it was 
assigned an ?extremely unlikely? occurrence rating. Since no overstress shock failure was 
expected on the board and the interconnects, it was assumed there would also be no 
failure on the pads. Hence overstress shock failure on pads (for which no model was 
available) was also assigned an ?extremely unlikely? rating. Glass transition temperature 
for the board was 150?C. Since the maximum temperature in the underhood environment 
was only 121?C [23], no glass transition was expected to occur and it was assigned an 
?extremely unlikely? rating. 
A short or open PTH would not have had any impact on the functioning of circuit, as 
it was used only as terminations for the inductors. Hence, it was assigned a ?very low? 
severity rating. For all other elements, any given failure mode of the element would have 
led to the disruption in the functioning of circuit. Hence, all other elements were assigned 
a ?very high? severity rating. 
                                                 
1 A physics-of-failure based virtual reliability assessment tool developed by CALCE Electronic Products 
and Systems Center 
 26 
Second level prioritization and risk assessment for the failure mechanisms is shown in 
Table 4. The entire process was documented in a single worksheet as shown in Table 4. 
Out of all the failure mechanisms that were analyzed, fatigue due to thermal cycling and 
vibration at the solder joint interconnect were the only failure mechanisms that had a high 
risk. Being a high risk failure mechanism they were identified as high priority.  
An FMEA on the assembly would have identified all the elements, their functions, 
potential failure modes and failure causes as in FMMEA. FMEA would then have 
identified the effect of failure. For example, in the case of a solder joint interconnect, the 
failure effect of the open joint would have involved no current passage in the test set up. 
Next the FMEA would have identified the severity, occurrence and detection 
probabilities associated with each failure mode. For example, in case of a solder joint 
open failure mode, based on past experience and use of engineering judgment each of the 
metrics, severity, occurrence and detection would have received a rating on a scale of ten. 
The product of severity, occurrence and detection would then have been used to calculate 
RPN. The RPNs for other failure modes would have been calculated in a similar manner 
and then all the failure modes would have been prioritized based on the RPN values. This 
is unlike FMMEA which used failure mechanisms and models and used combined effect 
of all failure mechanism to quantitatively evaluate the occurrence and then in conjunction 
with severity assigned a risk level to each failure mechanisms for prioritization. 
3.3 Benefits 
FMMEA allows the design team to take into account the available scientific 
knowledge of failure mechanisms and merge them with the systematic features of the 
FMEA template with the intent of ?design for reliability? philosophy and knowledge. The 
 27 
part of the FMEA that is incorporated in the FMMEA aids in being systematic in the 
identification process so that all the elements are considered and nothing gets overlooked. 
The idea of prioritization embedded in the FMEA process is also utilized in FMMEA to 
identify the mechanisms that are likely to cause failures during the product life cycle. 
FMMEA differs from FMEA in a few respects. In FMEA, potential failure modes are 
examined individually and the combined effects of coexisting failures causes are not 
considered. FMMEA on the other hand considers the impact of failure mechanisms 
acting simultaneously. FMEA involves precipitation and detection of failure for updating 
and calculating the RPN, and cannot be applied in cases that involve a continuous 
monitoring of performance degradation over time. FMMEA on the contrary does not 
require the failure to be precipitated and detected, and the uncertainties associated with 
the detection estimation are not present. Use of environmental and operating conditions is 
not made at a quantitative level in FMEA. At best they are used to eliminate certain 
failure modes. FMMEA prioritizes the failure mechanisms using the information on 
stress levels of environmental and operating conditions to identify high priority 
mechanisms that must be accounted for in the design or be controlled. This prioritization 
in FMMEA overcomes the shortcomings of RPN prioritization used in FMEA, which 
provide a false sense of granularity. Thus the use of FMMEA provides additional 
quantitative information regarding product reliability and opportunities for improvement 
than FMEA, as it take into account specific failure mechanisms and the stress levels of 
environmental and operating conditions into the analysis process. 
There are several benefits to organizations that use FMMEA. It provides specific 
information on stress conditions so that that the acceptance and qualification tests yield 
 28 
useable result. Use of the failure models at the development stage of a product also 
allows for appropriate ?what-if? analysis on proposed technology upgrades.  FMMEA 
can also be used to aid several design and development steps considered to be the best 
practices, which can only be performed or enhanced by the utilization of the knowledge 
of failure mechanisms and models. These include virtual qualification, accelerated 
testing, root cause analysis, life consumption monitoring and prognostics. All the 
technological and economic benefits provided by these practices are realized better 
through the adoption of FMMEA. 
FMMEA enhances the value of FMEA, by identifying and evaluating the relevant 
failure mechanisms and models, using stress levels of environmental and operating 
conditions and provides a high return on investment by providing knowledge about the 
possible failures and their causes in a qualitifiable manner. While FMEA and FMECA 
are often implemented as a standard requirement or contractual obligation, FMMEA 
makes the process useful by incorporating the scientific knowledge regarding the failure 
mechanisms and models.
 29 
 
Table 4: FMMEA worksheet for the Case Study 
Element Potential failure mode Potential failure cause 
Potential failure 
mechanism 
Mechanism 
type 
Failure model 
Failure 
susceptibility 
Occurrence Severity Risk 
PTH Electrical open in PTH Temperature cycling Fatigue Wearout 
CALCE PTH 
barrel thermal 
fatigue [16] 
> 10 years Remote Very low Low 
High temperature Electromigration Wearout Black [18] > 10 years Remote Very high Moderate 
High relative humidity Wearout Metallization 
Electrical short/ open, 
change in resistance in 
the metallization traces Ionic contamination 
Corrosion 
Wearout 
Howard [19] > 10 years Remote Very high Moderate 
Component 
(Inductors) 
Short / open between 
windings and the core 
High temperature 
Wearout of 
winding insulation 
Wearout No Model   Remote*  Very high Moderate 
Temperature cycling Wearout 
Coffin-Manson 
[20] 
170 days Frequent Very high High 
Random vibration  
Fatigue 
Wearout Steinberg [21] 43 days Frequent Very high High Interconnect 
Open/Intermittent 
change in electrical 
resistance 
Sudden impact Shock Overstress Steinberg [21] No failure 
Extremely 
unlikely 
Very high Moderate 
                                                 
 
 30 
Element Potential failure mode Potential failure cause 
Potential failure 
mechanism 
Mechanism 
type 
Failure model 
Failure 
susceptibility 
Occurrence Severity Risk 
Electrical short 
between PTHs 
High relative humidity CFF Wearout 
Rudra and Pecht 
[17] 
4.6 years Occasional Very low Low 
Random vibration  Fatigue Wearout Basquin [21] > 10 years Remote Very high Moderate 
Crack / Fracture 
Sudden impact Shock Overstress Steinberg [21] No failure 
Extremely 
unlikely 
Very high Moderate 
Loss of polymer 
strength 
High temperature Glass transition Overstress No model  No failure 
Extremely 
unlikely 
Very high Moderate 
Open 
Discharge of high 
voltage through 
dielectric material  
EOS/ESD Overstress No model  Eliminated in first level prioritization Low 
PCB 
Excessive noise 
Proximity to high 
current or magnetic 
source 
EMI Overstress No model  Eliminated in first level prioritization Low 
Temperature cycling / 
Random vibration 
Fatigue Wearout   Remote Very high Moderate 
Pad Lift / Crack 
Sudden impact Shock Overstress 
No Model 
  
Extremely 
unlikely 
Very high Moderate 
* Based on failure rate data of inductors in Telcordia [25] 
 
 31 
Chapter 4: PROGNOSTICS AND REMAINING LIFE ESTIMATION 
Remaining life estimation provides estimation of the remaining life based on 
accumulated damage of the product. Remaining life estimation involves choosing the 
right amount of historical data for prediction to maximize the system?s adaptability to 
rapid changes in degradation while maintaining an acceptable amount of predictive 
variability and uncertainty. Based on the remaining life information, the user can decide 
whether to keep the product in operation with continuous monitoring or to schedule a 
maintenance or replacement action. 
The first step in finding the right historical data is to use regression analysis with 
largest time window for data acquisition which will give the best estimate of the 
regression fit [33]. This prediction is tested to see if it is reasonably compatible with the 
most recent data points. If so, then the regression is used. If not, then the size of the 
regression window is reduced and the analysis is repeated. This recursive regression 
continues until it yields a small enough window that is compatible with the most recent 
data points. As a result, this method can detect if the most recent data points indicate a 
change from the long-term regression. 
Remaining life estimations are difficult to formulate, as their accuracy is subject to 
stochastic processes. As a result of uncertainty, prognostics methods must consider the 
interrelationships between accuracy, precision and confidence [32]. We have the paradox, 
the more precise the remaining life estimate, the less probable that this estimate will be 
correct.  Finding where an extrapolated trend meets a condemnation threshold may 
provide an expectation of remaining life, but it does not provide sufficient information to 
make a decision.  
 32 
Total useful life of a product at any point on the remaining life vs. time plot can 
be estimated by adding the time in use (or x-coordinate) and the remaining life (or y-
coordinate) at that point. However, this analysis is a one-point estimation and does not 
take into account the product usage trend. The product usage trend can be taken in 
account by extrapolating a trend line using the available remaining life data points. The 
intersection of the trend line with the time axis gives the total useful life of the product as 
shown in Figure 6.  
Statistical methods for remaining life prediction include multivariate regression, 
Bayesian regression methods, time-series analysis and discrimination or clustering 
analysis. Analysis may focus on single or multiple parameters. For single parameter 
remaining life prediction the regression model is applied as the data is collected to 
determine the trends. This is compared in real time to a metric failure limit that is 
established offline. The point of predicted failure is calculated as the intersection of these 
two lines. If an unexpected event occurs that dramatically increases degradation it is 
identified and addressed. 
The remaining life estimation is updated at the end of a pre-selected time period to 
take into account any sudden change in the life cycle environment or usage of the 
product. The amount of data included in the analysis affects the prediction. Use of large 
amounts of data spanning a long window of data acquisition tends to yield more stable, 
less variable predictions. However, it also may yield a prediction that is less sensitive to 
recent changes as shown in Figure 6. Use of a smaller data set spanning the most recent 
operating history tends to produce predictions with more variability but also more 
sensitive to current operating conditions. The goal while carrying out remaining life 
 33 
prediction is to choose among varying sizes of data window to maximize the system?s 
adaptability to change while maintaining an acceptable amount of predictive uncertainty. 
Trend line based on four 
points (useful life: 29 days)
0
10
20
30
40
50
0 5 10 15 20 25 30 35 40 45
Time in Use (days)
Re
ma
ini
ng
 L
ife
 (d
ay
s)
Trend line based on 
all points (useful 
life: 40 days)
Trend line based on 
three points (useful 
life: 26 days)
Re
ma
ini
ng
 L
ife
 (d
ay
s)
Re
ma
ini
ng
 L
ife
 (d
ay
s)
 
Figure 6: Choosing the right data window 
4.1.1 Leap-Frog Technique 
One of the methods for remaining life estimation is the LEAP Frog technique [33]. In 
this method the system is assumed to be in a steady state of health. The goal is to predict 
the future health without being sensitive to data that is correlated and has noise. Also the 
technique is responsive to changes in system performance, hence changes in the system 
health are detected and the predictions adjusted accordingly. 
 34 
Select the full data setl t t  f ll t  s t
Perform regression analysisrf r  r r si  l is
Find the expected 
prediction for the next period
i  t  t  
r i ti  f r t  t ri
Is the expected 
prediction compatible with most 
recent data points ? 
Is t  t  
r i ti  ti l  it  st 
r t t  i ts  
Shorten the data setrt  t  t  s t
Final predictioni l r i ti
Yes
No
 
Figure 7: LEAP ? Frog Algorithm 
The prediction goal is to make a prediction at the current time for the value (and 
uncertainty intervals) of the system at a future time given all past data, and a relatively 
small set of models/time windows. The method begins with a regression analysis using 
large time window for data acquisition, which will likely give the best estimate of the 
regression fit, if the system is at a constant rate of change of health (maybe steady with a 
slow rate of degradation). This prediction and an uncertainty distribution about the 
estimates are tested to see if it the prediction is reasonably compatible with the most 
recent data points. If so, then the regression is used. If not, then the size of the regression 
window is reduced and the analysis is repeated. This method continues until it yields a 
small enough window that is compatible with the most recent data points. A flowchart of 
the LEAP ? Frog algorithm is shown in Figure 7. As a result, this method can detect if the 
 35 
most recent data points indicate a change from the long-term regression (as would be the 
case if the system had a change in rate of degradation). In the end, the method uses the 
longest regression window that does not result in evidence (based on the most recent 
records) that refutes the assumption of good linear fit; and then uses this window to 
predict the remaining life. 
Figure 8 and Figure 9 show three prognostic methods compared with the LEAP-Frog 
method.  
0
10
20
30
40
50
60
70
0 5 10 15 20 25 30 35 40 45 50 55 60
Re
ma
ini
ng
 L
ife
 (D
ay
s)
Days in Use
0
2
4
6
8
10
12
ALL (Reg) Last 5 (Reg) Last 3 (reg) Leap Frog
Av
era
ge
 Pr
ed
ict
ion
 E
rr
or
 (D
ay
s)
Re
ma
ini
ng
 L
ife
 (D
ay
s)
Re
ma
ini
ng
 L
ife
 (D
ay
s)
Av
era
ge
 Pr
ed
ict
ion
 E
rr
or
 (D
ay
s)
Av
era
ge
 Pr
ed
ict
ion
 E
rr
or
 (D
ay
s)
 
Figure 8: Comparison of remaining life estimation models ? Gradual  
 
 36 
0
5
10
15
20
25
30
35
40
45
50
0 4 8 12 16 20 24 28 32 36
Days in Use
Re
ma
ini
ng
 L
ife
0
1
2
3
4
5
6
7
8
9
10
ALL (Reg) Last 5 (Reg) Last 3 (reg) Leap Frog
Av
era
ge
 Pr
ed
ict
ion
 E
rr
or
 (D
ay
s)
Re
ma
ini
ng
 L
ife
Re
ma
ini
ng
 L
ife
Av
era
ge
 Pr
ed
ict
ion
 E
rr
or
 (D
ay
s)
Av
era
ge
 Pr
ed
ict
ion
 E
rr
or
 (D
ay
s)
 
Figure 9: Comparison of remaining life estimation models ? Gradual & Sudden 
The four methods used to predict future remaining life are: 
1. the regression of the damage on all the data since the start of data collection 
2. the regression of the damage on the last 3 data points 
3. the regression of the damage on the last 5 data points 
4. Leap-Frog 
The histograms show the average prediction errors for regression methods under the 
two degradation patterns. The degradation patterns were obtained from the case studies 
conducted in previous life consumption monitoring studies [26] [27]. For the both the 
condition shown in Figure 8 (steady degradation) and Figure 9 (slow steady degradation 
with a sudden change to a fast degradation in between), the LEAP-Frog regression 
 37 
method has the lowest average prediction error. This illustrates the ability of the LEAP-
Frog method to adapt to a rapidly changing situation. 
4.1.2 Accuracy of Remaining Life Prediction 
The rate at which updation of the estimation process is carried out affects the 
accuracy of prediction. The algorithm or model used for estimation and how it trends the 
data into future estimation of remaining life affects the accuracy of prediction. Accuracy 
depends on whether future planned operational usages are factored into the estimates. As 
more data is collected, the level of knowledge about the characteristics of the 
approaching failure can be refined. As the time of actual failure approaches, the 
remaining life prediction is likely to become more and more accurate. 
A B
Time
Actual time to fail
Predicted  time of failure
Fr
eq
ue
nc
y
Prognostic failurePrognostic inefficiency
Fr
eq
ue
nc
y
 
Figure 10: Accuracy of prediction 
Predictions in the area (A) will always precede failure leading to prognostic inefficiency 
with equipment removed with remaining useful life and will have a financial cost 
associated with it. On the other hand predictions in the area (B) will always follow failure 
 38 
and lead to prognostic failure and have inadequate operation performance associated with 
it. 
Although some measure of remaining life estimation is possible using the historical 
data the best estimates will be achieved when the future planned operational usage can be 
factored in to these estimates. While the current remaining life estimation algorithms use 
fairly simple metrics and features to measure and characterize the changes in the sensor 
data, an alternative solution is to use neural nets coupled with appropriate feature 
extractors.  
Prediction can be for a short time horizon or an estimate in time until the part needs to 
be replaced or a failure will occur. Spreading of the error bars associated with the 
remaining life prediction give an indication whether the remaining life estimation models 
are good for a short or longer time horizons. If the error bars spread rapidly then the 
predictions are reliable for only the shorter time horizon. If they are narrow and follow 
the true trajectory accurately then the information from the predictions is useful for the 
longer time horizons. 
Approaches to remaining life prediction can be broadly be classified into three 
categories. The first are the physical models that have been developed and validated with 
large data sets. The second are the systems that use rule of thumb and the third are the 
statistical models that learn from the historical data.  
While the physical and the rule of thumb based models have the capability for 
anticipating fault events that are yet to occur, the learning systems based on statistical 
models are only as good as the data for which they have been trained. But the learning 
systems have the capability to process a wide variety of data types and have the edge over 
 39 
the other methods as they exploit the nuances in the data and this is particularly true for 
new sources of data for which expert analysis, physical models and rules have not yet 
been developed.  
In practice failure indications become more pronounced and easier to interpret as 
remaining life decreases. In general the true remaining life probability density functions 
(PDF) should become narrower (less uncertain) and more stable as the damage condition 
progresses towards failure [34] as shown in Figure 11. 
Tim
e a
nd 
add
itio
nal
 Da
ta I
ncr
eas
e
Curr
ent T
ime
Time
Actual Failure
Expected Failure
Prediction Error
Uncertainty
Prediction error =  Actual ? Expected failure
Uncertainty = Spread of failure distribution
Time
 
Figure 11: Error and uncertainty in predictions 
Allowing higher order models provides greater fit but with extrapolation even a small 
error in the coefficients might get magnified a great deal. Thus using lower order models 
helps prevent the exaggeration of the extrapolated values associated with minor 
coefficient estimation errors. 
 40 
Chapter 5: CONCLUSIONS 
The improved LCM process extends and generalizes the LCM approach to a system 
level and has several improvements over the earlier versions by Ramakrishnan and 
Mishra [26] [27]. System level life consumption monitoring requires the systematically 
identifying all the parameters that drive the failure in a system and monitor those select 
few parameters that drive the failure. FMMEA is introduced as a new step in the LCM 
process that systematically identifies all failure mechanisms and models for all potential 
failures modes, and prioritizes the failure mechanisms to identify the high priority 
mechanisms.  
Ideally all failure mechanisms and their interactions must be considered for product 
design and analysis. In the life cycle of a product, several failure mechanisms may be 
activated by different environmental and operational parameters acting at various stress 
levels, but only a few operational and environmental parameters and failure mechanisms 
are in general responsible for the majority of the failures. High priority mechanisms 
provide effective utilization of resources and are those select failure mechanisms that 
determine the operational stresses and the environmental and operational parameters that 
must be accounted for in the design or be controlled. This enables the right suite of 
product parameters that need to be monitored for determining the damage and life 
consumed. 
The traditional remaining life estimation algorithm used in earlier approaches in life 
consumption monitoring process were overtly simplistic and took into account only the 
previous data point during iteration. A new algorithm called Leap Frog technique for 
remaining life prediction has been incorporated that takes care of the past data and is 
 41 
sensitive to changes in health of system making it easier to quickly adapt to the rapid 
changes in degradation. The superiority of the Leap Frog method is validated by 
comparing it three other with three other prognostic methods using data from previous 
life consumption monitoring case study. Discussion on accuracy of remaining life 
predictions and the uncertainty issues associated with the predictions have been 
introduced. 
 42 
Chapter 6: REFERENCES 
[1] Hu, J., Barker, D., Dasgupta, A., and Arora, A., ?Role of Failure-mechanism 
Identification in Accelerated Testing,? Journal of the IES, Vol. 36, No. 4, pp. 39-
45, July 1993. 
[2] Dasgupta, A. and Pecht, M., ?Material Failure Mechanisms and Damage Models,? 
IEEE Transactions on Reliability, Vol. 40, No. 5, pp. 531-536, December 1991. 
[3] JEDEC Publication JEP 122-B ?Failure Mechanisms and Models for 
Semiconductor Devices,? August 2003. 
[4] JEDEC Publication JEP 131 ?Process Failure Modes and Effects Analysis 
(FMEA),? February 1998. 
[5] JEDEC Publication JEP 148 ?Reliability Qualification of Semiconductor Devices 
Based on Physics of Failure Risk and Opportunity Assessment,? April 2004. 
[6] Bowles, J.B. and Bonnell, R.D., ?Failure Modes, Effects and Criticality Analysis ? 
What Is It and How To Use It,? Tutorial Notes Annual Reliability and 
Maintainability Symposium, 1998. 
[7] Bowles, J.B., ?Fundamentals of Failure Modes and Effects Analysis,? Tutorial 
Notes Annual Reliability and Maintainability Symposium, 2003. 
[8] Kara-Zaitri, C., Keller, A.Z., Fleming, P.V., ?A Smart Failure Mode and Effect 
Analysis Package,? Annual Reliability and Maintainability Symposium 
Proceedings, pp. 414 - 421, 1992. 
 43 
[9] Signor, M.C., ?The Failure-Analysis Matrix: a Kinder, Gentler Alternative to 
FMEA for Information Systems,? Annual Reliability and Maintainability 
Symposium Proceedings, pp. 173-177, January 2002. 
[10] SAE Standard SAE J1739 ?Potential Failure Mode and Effects Analysis in Design 
(Design FMEA) and Potential Failure Mode and Effects Analysis in Manufacturing 
and Assembly Processes (Process FMEA) and Effects Analysis for Machinery 
(Machinery FMEA)? August 2002. 
[11] Bowles, J.B., ?An Assessment of RPN Prioritization in a Failure Modes Effects and 
Criticality Analysis,? Proceedings of Annual Reliability and Maintainability 
Symposium 2003, pp. 380 ? 386, January 2003. 
[12] IEEE Standard 1413.1-2002, IEEE Guide for Selecting and Using Reliability 
Predictions Based on IEEE 1413, 2003. 
[13] ?Guidelines for Failure Mode and Effects Analysis for Automotive, Aerospace and 
General Manufacturing Industries,? Dyadem Press, Ontario, Canada, 2003. 
[14] Failure Modes and Effects Analysis (FMEA): ?A Guide for Continuous 
Improvement for the Semiconductor Equipment Industry,? Technology Transfer 
#92020963B-ENG, SEMATECH, 1992. 
[15] Franceschini, F., Galetto, M., ?A New Approach for Evaluation of Risk Priories of 
Failure Modes in FMEA,? International Journal of Production Research, Vol. 39, 
No. 13, pp. 2991-3002, 2001. 
 44 
[16] Bhandarkar, S.M., et al., "Influence of Selected Design Variables on 
Thermomechanical Stress Distributions in Plated Through Hole Structures," 
Transaction of the ASME - Journal of Electronic Packaging, Vol. 114, pp. 8-13, 
March 1992. 
[17] Rudra, A.B., et al., ?Electrochemical Migration in Multichip Modules,? Circuit 
World, Vol. 22, No. 1, pp. 67-70, 1995. 
[18] Black, J.R., ?Physics of Electromigration,? IEEE Proceedings of International 
Reliability Physics Symposium, pp. 142-149, 1983. 
[19] Howard, R.T., ?Electrochemical Model for Corrosion of Conductors on Ceramic 
Substrates,? IEEE Transactions on CHMT, Vol. 4, No 4, pp. 520 ? 525, December 
1981. 
[20] Foucher, B., Boullie, J., Meslet, B., Das, D., ?A Review of Reliability Predictions 
Methods for Electronic Devices,? Microelectronics Reliability, Vol. 42, No. 8, pp. 
1155-1162, August 2002. 
[21] Steinberg, D.S., ?Vibration Analysis for Electronic Equipment,? 2nd Edition, John 
Wiley & Sons, 1988. 
[22] Dasgupta, A., Oyan, C., Barker, D. and Pecht, M., ?Solder Creep-Fatigue Analysis 
by an Energy-Partitioning Approach,? ASME Transactions, Journal of Electronic 
Packaging, Vol. 114, No.2, pp. 152-160, 1992. 
[23] Society of Automotive Engineers, Recommended Environmental Practices for 
Electronic Equipment Design, SAE J1211, Rev. Nov 1978. 
 45 
[24] Monthly Temperature Averages for the Washington, DC Area, 
<http://www.weather.com/weather/climatology/monthly/USDC0001> accessed 
August 17, 2003. 
[25]  Telcordia Technologies, Special Report SR-332: ?Reliability Prediction Procedure 
for Electronic Equipment Issue 1,? Telcordia Customer Service, Piscataway, N. J., 
May 2001. 
[26] Ramakrishnan, A., ?Health and Life Consumption Monitoring Using Sensor 
Technologies,? Master?s Thesis, Department of Mechanical Engineering University 
of Maryland, College Park 2001 
[27] Mishra, S., ?Life Consumption Monitoring for Electronics,? Master?s Thesis, 
Department of Mechanical Engineering University of Maryland, College Park 2003 
[28] Mishra, S., Pecht, M., and Goodman, D., ?In-situ Sensors for Product Reliability 
Monitoring,? Proceedings of SPIE, vol. 4755, pp. 10-19, 2002. 
[29] Swantech, SWAN? Technology, http://www.swantech. com/technology.html, 
Accessed on May 2004. 
[30] PC Guide, Self-Monitoring Analysis and Reporting Technology (SMART)?, 
http://www.pcguide.com/ref/ hdd/qual/featuresSMART-c.html, Accessed on May 
2004. 
[31] General Motors, ?GM Oil Life Monitoring Systems?, 
http://www.gm.com/company/gmabity/environent/news_issues/news/oillifemonitor
041603.html, Accessed on June 2004.  
 46 
[32] Ramakrishnan, A. and Pecht, M., ?Implementing a Life Consumption Monitoring 
Process for Electronic Product,? IEEE Components Packaging and Manufacturing 
Technology, vol. 26, issue 3, pp. 625-634, 2003. 
[33] Greitzer, F.L. and Ferryman, T.A., ?Predicting Remaining Life of Mechanical 
Systems,? Intelligent Ship Symposium IV April 2-3, 2001. 
[34] Engel, S.J., et al. ?Prognostics, The Real Issues Involved with Predicting Life 
Remaining,? IEEE Aerospace Conference Proceedings, vol. 6, pp.457-469, 2000.