ABSTRACT 
 
 
 
 
Title of Dissertation: PROJECT SCHEDULING DISPUTES: 
EXPERT CHARACTERIZATION AND 
ESTIMATE AGGREGATION 
  
 Lauren Elizabeth Neely, Doctor of Philosophy, 
2017 
  
Dissertation directed by: Dr. Gregory Baecher, Civil and Environmental 
Engineering 
 
 
Project schedule estimation continues to be a tricky endeavor.  Stakeholders bring a 
wealth of experience to each project, but also biases which could affect their final 
estimates.  This research proposes to study differences among stakeholders and 
develop a method to aggregate multiple estimates into a single estimate a project 
manager can defend.  Chapter 1 provides an overview of the problem.  Chapter 2 
summarizes the literature on historical scheduling issues, scheduling best practices, 
decision analysis, and expert aggregation.  Chapter 3 describes data 
collection/processing, while Chapter 4 provides the results.  Chapter 5 provides a 
discussion of the results, and Chapter 6 provides a summary and recommendation for 
future work.   
The research consists of two major parts.  The first part categorizes project 
stakeholders by three major demographics:  “position”, “years of experience”, and 
“level of formal education”.  Subjects were asked to answer several questions on risk 
  
aversion, project constraints, and general opinions on scheduling struggles.  Using 
Design of Experiments (DOE), responses were compared to the different 
demographics to determine whether or not certain attitudes concentrated themselves 
within certain demographics. Subjects were then asked to provide activity duration 
and confidence estimates across several projects, as well as opinions on the activity 
list itself.  DOE and Bernoulli trials were used to determine whether or not subjects 
within different demographics estimated differently from one another.  Correlation 
coefficients among various responses were then calculated to determine if certain 
attitudes affected activity duration estimates. 
The second part of this research dealt primarily with aggregation of opinions 
on activity durations.  The current methodology uses the Program Evaluation and 
Review (PERT) technique of calculating the expected value and variance of an 
activity duration based on three inputs and assuming the unknown duration follows a 
Beta distribution.  This research proposes a methodology using Morris’ Bayesian 
belief-updating methods and unbounded distributions to aggregate multiple expert 
opinions.  Using the same three baseline estimates, this methodology combines 
multiple opinions into one expected value and variance which can then be used in a 
network schedule.  This aggregated value represents the combined knowledge of the 
project stakeholders which helps mitigate biases engrained in a single expert’s 
opinion. 
 
 
 
 
 
 
  
 
 
 
 
PROJECT SCHEDULING DISPUTES: EXPERT CHARACTERIZATION AND 
ESTIMATE AGGREGATION 
 
 
 
by 
 
 
Lauren Elizabeth Neely 
 
 
 
 
 
Dissertation submitted to the Faculty of the Graduate School of the  
University of Maryland, College Park, in partial fulfillment 
of the requirements for the degree of 
Doctor of Philosophy 
2017 
 
 
 
 
 
 
 
 
 
 
Advisory Committee: 
Professor Gregory Baecher, Chair 
Dr. Qingbin Cui 
Dr. Mohammad Modarres 
Dr. Allison Reilly 
Dr. Alaa Zeitoun 
 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
© Copyright by 
Lauren Elizabeth Neely 
2017 
 
 
 
 
 
 
 
 
 
 
 
 
 
  
ii 
 
Preface 
The material in this research is based upon work supported by the National 
Aeronautics and Space Administration under Contract Number NNG10WA14C and 
Contract Number NNG16WA71C.   
 
Any opinions, findings, and conclusions or recommendations expressed in this 
material are those of the author and do not necessarily reflect the views of the 
National Aeronautics and Space Administration 
 
Throughout this work, to simplify the grammar, the “Decision Maker” is referred to 
as a “she” and the “Expert” is referred to as a “he”. 
  
  
iii 
 
Dedication 
This dissertation is dedicated to family.  To my parents, brother, sister-in-law, and all 
my extended family who stood by me and encouraged me throughout this endeavor.  I 
can’t thank you enough for helping me to keep striving towards my goal.  To my 
Wallops family, without whom this research would not have been possible.  The 
dedication of the men and women of Wallops has contributed to the success of 
countless missions and I’m forever grateful that they took time to help me succeed in 
this personal mission. 
  
iv 
 
Acknowledgements 
First and foremost, I thank God for clearing the obstacles I could not and allowing me 
to pursue this opportunity.  I’d also like to thank my advisor for his assistance over 
the past…well…never mind how long it’s been.  His recommendations and guidance 
were instrumental towards focusing my efforts and his suggestions helped shine a 
light on those efforts when I began to flounder into unknown territory.  I would also 
like to thank Steve Kremer, Nancy Olyha, and Lindsay Robertson for taking time out 
of their busy schedules to provide a review for this dissertation.   
  
v 
 
Table of Contents 
 
Preface ........................................................................................................................... ii 
Dedication .................................................................................................................... iii 
Acknowledgements ...................................................................................................... iv 
Table of Contents .......................................................................................................... v 
List of Tables ............................................................................................................. viii 
List of Figures .............................................................................................................. ix 
List of Abbreviations .................................................................................................... x 
Chapter 1: Introduction ................................................................................................. 1 
1.1 The Problem with Scheduling ............................................................................. 1 
1.2 Goals and Objectives .......................................................................................... 2 
1.3 Potential Implications ......................................................................................... 3 
1.4 Background of Wallops Flight Facility............................................................... 4 
1.5 Background of Project Types.............................................................................. 6 
1.6 Research Summary ............................................................................................. 8 
Chapter 2: Literature Review ...................................................................................... 11 
2.1 Scheduling in NASA – GAO Reports .............................................................. 11 
2.1.1 Lack of Resources/Inadequate funding ...................................................... 12 
2.1.2 No overall plan (business case) .................................................................. 23 
2.1.3 Changes, Uncertainty, and the “Experts” .................................................. 33 
2.1.4 Concluding remarks ................................................................................... 43 
2.2 Scheduling Basics ............................................................................................. 44 
2.2.1 Developing the Schedule ........................................................................... 44 
2.2.2 Dealing with uncertainty: Stochastic estimates ......................................... 47 
2.2.3 Problems with PERT.................................................................................. 50 
2.2.4 Other Alternatives ...................................................................................... 58 
2.3 Decision Analysis and Expert Opinion ............................................................. 60 
2.3.1 Recognized Biases and Their Effects ........................................................ 61 
2.3.2 “Your Overconfidence Is Your Weakness” (Marquand 1983) .................. 66 
2.3.3 “Your Faith in Your Friends Is Yours” (Marquand 1983) ........................ 72 
2.3.4 Options for Overcoming Bias .................................................................... 74 
2.3.5 Loss and Risk Aversion ............................................................................. 77 
2.4 Experts as Data in a Bayesian Model ............................................................... 79 
Chapter 3: Methods and Materials, Data Collection ................................................... 87 
3.1 Data Collection ................................................................................................. 87 
3.1.1 Traits/Opinions Survey .............................................................................. 89 
3.1.2 Scheduling and Follow-on Surveys ........................................................... 90 
3.1.3 “Course of Action” (COA) Survey ............................................................ 91 
3.2 Data Processing ................................................................................................. 92 
3.2.1 Categorizing the Subjects .......................................................................... 92 
3.2.2 Risk Tolerance ........................................................................................... 94 
3.2.3 Constraint Preference ................................................................................. 97 
3.2.4 Schedule Survey Data .............................................................................. 101 
3.2.5 Follow-on Survey..................................................................................... 103 
3.3 Data Analysis – Characterization .................................................................... 103 
  
vi 
 
3.3.1 Constraints Analysis – by Constraint ....................................................... 104 
3.3.2 Network Path Standard Deviation ........................................................... 104 
3.3.3 Comparison Questions ............................................................................. 105 
3.3.4 Design of Experiments ............................................................................. 108 
3.3.5 Constraints Analysis/Risk Aversion – by Demographic ......................... 116 
3.3.6 Confidence Analysis ................................................................................ 117 
3.3.7 Correlating the Results ............................................................................. 118 
3.4 Data Analysis - Application ............................................................................ 121 
3.4.1 Participant Behavior in Estimating Durations ......................................... 121 
3.4.2 Calculation of PERT Beta parameters ..................................................... 124 
3.5 Duration Estimate Modeling and Expert Aggregation ................................... 125 
3.5.1 Determining the Prior .............................................................................. 125 
3.5.2 Calibrating the Experts ............................................................................ 131 
3.5.3 Calculating the Posterior Probability ....................................................... 137 
Chapter 4 Results – Opinions on Scheduling Issues ................................................. 145 
4.1 COA Survey – The Results ............................................................................. 145 
4.1.1 Why do projects struggle? – Agreements ................................................ 145 
4.1.2 Why do projects struggle? – Disagreements and Editorials .................... 151 
4.1.3 Summing Up ............................................................................................ 153 
4.2 Scheduling Surveys – Beyond the Duration Estimates ................................... 153 
4.2.1 Adequacy of Resources Assigned ............................................................ 154 
4.2.2 Activity Necessity .................................................................................... 155 
4.2.3 Activity List Completeness ...................................................................... 155 
4.2.4 Summarizing the Results ......................................................................... 158 
Chapter 5 Results – Priorities, Personalities, and Predictions .................................. 160 
5.1 “Course of Action” Survey: Is it really necessary? ........................................ 161 
5.2 Traits/Opinions Results ................................................................................... 163 
5.2.1 Constraints Analysis – by Constraint – The Results ................................ 164 
5.2.2 Constraints Analysis – by Demographic – The Results ........................... 167 
5.2.3 Utility/Risk Tolerance – The Results ....................................................... 168 
5.2.4 Confidence Analysis – The Results ......................................................... 176 
5.3 Scheduling Results .......................................................................................... 177 
5.3.1 Network Path Standard Deviation Results ............................................... 177 
5.3.2 Comparison Results ................................................................................. 178 
5.3.3 Correlation Results................................................................................... 182 
5.3.4 Data Collection Challenges ...................................................................... 184 
5.4 Predicting Te ................................................................................................... 185 
5.4.1 Worst-Case Estimate as Related to Most Likely ..................................... 186 
5.4.2 Expanding the Results – Te Assessment .................................................. 188 
5.4.3 Duration Estimate Skew .......................................................................... 189 
Chapter 6 Results – Aggregating the Estimates ........................................................ 191 
6.1 Determining the Prior ..................................................................................... 191 
6.2 Calibrating the Expert ..................................................................................... 197 
6.3 Calculating the Posterior ................................................................................. 199 
6.4 Further Examples ............................................................................................ 209 
Chapter 7:  Discussion .............................................................................................. 216 
  
vii 
 
7.1 Past is Present: GAO Reports vs. Current Results .......................................... 216 
7.1.1 External Influences .................................................................................. 217 
7.1.2  Internal Influences .................................................................................. 219 
7.2 Stakeholder Responses: What to Expect ......................................................... 222 
7.2.1: The Influence of Demographics ............................................................. 222 
7.2.2 Discrete vs. Continuous Confidence Assessments .................................. 228 
7.2.3 Risk Aversion........................................................................................... 231 
7.2.4 Risk Aversion as Applies to Scheduling .................................................. 233 
7.2.5 Summary .................................................................................................. 235 
7.3 Aggregating Estimates .................................................................................... 236 
7.3.1 The PERT (Beta) Prior ............................................................................. 236 
7.3.2 Bayesian Prior .......................................................................................... 239 
7.3.3 A New Prior Model .................................................................................. 242 
7.3.4 Calibrating the Experts ............................................................................ 247 
7.3.5 Posterior Distribution ............................................................................... 251 
Chapter 8:  Conclusions and Future Work ................................................................ 254 
8.1 Conclusions ..................................................................................................... 254 
8.1.1 Influence of Demographics ...................................................................... 254 
8.1.2 Aggregating Estimates ............................................................................. 259 
8.2 Future Work .................................................................................................... 261 
8.2.1: Participant Dependence .......................................................................... 261 
8.2.2 Research Expansion and Refinement ....................................................... 262 
8.2.3 Data for the Decision Maker .................................................................... 263 
8.2.4 Communication of Assumptions.............................................................. 264 
8.2.5 Dominating Outliers................................................................................. 265 
8.2.6 Confidence and Risk ................................................................................ 266 
8.2.7 Approximations and Direct Calculation .................................................. 266 
8.2.8 Filter settings ............................................................................................ 267 
Appendices ................................................................................................................ 271 
A.1 Recruitment E-mail ........................................................................................ 272 
A.2 Traits/Opinions Survey .................................................................................. 273 
A.3 Scheduling Survey ......................................................................................... 276 
A.4 Follow-On Survey .......................................................................................... 278 
A.5 “Course of Action” (COA) Survey ................................................................ 279 
A.6 Participant List ............................................................................................... 281 
A.7 Utility results .................................................................................................. 282 
A.8 AHP Results ................................................................................................... 283 
A.9 Scheduling Survey – Estimation Results and Calculations ........................... 286 
A.10 GEV Max Beta Filters .................................................................................. 317 
A.11 GEV Min Beta Filters .................................................................................. 318 
A.12 Normal Beta Filters ...................................................................................... 319 
A.13 DesignExpert™ Experiment Settings .......................................................... 320 
Bibliography ............................................................................................................. 323 
 
  
viii 
 
List of Tables 
 
Table 3-1: Demographic Identifiers ............................................................................ 93 
Table 3-2: Example Preference Matrix ....................................................................... 99 
Table 3-3: Example Matrix ....................................................................................... 100 
Table 3-4: Generalized AHP Matrix ......................................................................... 100 
Table 3-5: Comparison Questions ............................................................................ 106 
Table 3-6: Correlation Questions .............................................................................. 120 
Table 3-7: α and β Beta Filter Parameters ................................................................ 133 
Table 3-8: Beta Filter Modes .................................................................................... 134 
Table 3-9: Calculating the Aggregated Posterior Distribution ................................. 138 
Table 3-10: Example Full Process Calculations ....................................................... 139 
Table 5-1: Management COA Response .................................................................. 162 
Table 5-2: Technician COA Response ..................................................................... 162 
Table 5-3: Average weight per constraint ................................................................. 164 
Table 5-4: Statistical Significance of Weight Differences ....................................... 165 
Table 5-5: Significant Factors per Constraint ........................................................... 168 
Table 5-6: Expected weights per factor level ........................................................... 168 
Table 5-7: Expected Confidence Values ................................................................... 176 
Table 5-8: Binomial Analysis by Demographic ....................................................... 181 
Table 5-9: Correlation Results .................................................................................. 183 
Table 5-10: Correlation Conclusions ........................................................................ 184 
Table 5-11: Separation Weight Ratio ....................................................................... 187 
Table 5-12: Outlier Weight Significant Factors ........................................................ 187 
Table 5-13: Outlier Weight Ratio ............................................................................. 187 
Table 5-14: Skew Results ......................................................................................... 190 
Table 6-1: Prior Distributions ................................................................................... 192 
Table 6-2: Mean/Std Dev Comparisons .................................................................... 193 
Table 6-3: Calibration Examples .............................................................................. 198 
Table 6-4: Summary Example Estimates .................................................................. 200 
Table 6-5: Posterior Duration Results....................................................................... 201 
Table 6-6: GEV Max Example Prior Distribution .................................................... 210 
Table 6-7: GEV Min Example Prior Distribution..................................................... 211 
Table 6-8: DM and Expert Complete Agreement ..................................................... 212 
Table 6-9: DM and Expert Severe Disagreement ..................................................... 214 
Table 8-1: Relationship of α and β for the Beta Filter .............................................. 268 
Table A-1: DOE Experiment Set-up – Project Constraints ...................................... 320 
Table A-2: DOE Experiment Set-up –Risk Aversion ............................................... 321 
Table A-3: DOE Experiment Set-up – Confidence Analysis ................................... 321 
Table A-4: DOE Experiment Set-up – Duration Estimate Skew .............................. 322 
Table A-5: DOE Experiment Set-up – Outlying Estimate Analysis ......................... 322 
 
  
ix 
 
List of Figures 
 
Figure 2-1: NASA Project Life Cycle ........................................................................ 28 
Figure 3-1: Example Basic Utility Curves .................................................................. 96 
Figure 3-2: Example Risk Averse and Risk Prone Behavior ...................................... 96 
Figure 5-1: Utility Curve – “Position” Demographic ............................................... 171 
Figure 5-2: Utility Curve – “Years of Experience” Demographic ........................... 172 
Figure 5-3: Utility Curve – “Years of Experience” Demographic (continued) ........ 173 
Figure 5-4: Utility Curve – “Level of Formal Education” Demographic ................. 174 
Figure 5-5: Utility Curve – “Level of Formal Education” Demographic (continued)
................................................................................................................................... 175 
Figure 5-6: Standard Deviation of Te ........................................................................ 177 
Figure 6-1: Decision Maker – GEV and Beta Distribution Models ......................... 194 
Figure 6-2: Expert #1 – GEV and Beta Distribution Models ................................... 195 
Figure 6-3: Expert #2 – GEV and Beta Distribution Models ................................... 195 
Figure 6-4: Expert #3 – GEV and Beta Distribution Models ................................... 196 
Figure 6-5: Expert #1 - GEV Max Calibration Results ............................................ 198 
Figure 6-6: Expert #2 - GEV Min Calibration Results ............................................. 198 
Figure 6-7: Expert #3 - Normal Calibration Results ................................................. 199 
Figure 6-8: Decision Maker and Expert #1 ............................................................... 202 
Figure 6-9: Decision Maker and Expert #2 ............................................................... 203 
Figure 6-10: Decision Maker and Expert #3 ............................................................. 204 
Figure 6-11: Decision Maker, Expert #1, and Expert #2 .......................................... 205 
Figure 6-12: Decision Maker, Expert #1, and Expert #3 .......................................... 206 
Figure 6-13: Decision Maker, Expert #2, and Expert #3 .......................................... 207 
Figure 6-14: Decision Maker, Expert #1, Expert #2, and Expert #3 ........................ 208 
Figure 6-15: GEV Max Example Priors and Posterior ............................................. 210 
Figure 6-16: GEV Min Example Priors and Posterior .............................................. 211 
Figure 6-17: Posterior: Decision Maker and 9 Experts; Full Agreement – GEV Max 
Model ........................................................................................................................ 213 
Figure 6-18: Posterior: Decision Maker and 9 Experts; Full Agreement – GEV Min 
Model ........................................................................................................................ 213 
Figure 6-19: Posterior: Decision Maker and 9 Experts; Full Agreement – Normal 
Model ........................................................................................................................ 214 
Figure 6-20: Decision Maker and Expert #1 – Severe Disagreement ...................... 215 
Figure 8-1: Relationship of α and β for the Beta Filter ............................................. 268 
Figure 8-2: Relationship of Likelihood of Surprise and α for the Beta Filter .......... 269 
 
  
  
x 
 
List of Abbreviations 
AHP Analytic Hierarchy Process 
AOA Activity on Arrow 
AON Activity on Node 
BC Best Case 
brlt basic reference lottery ticket 
CDF Cumulative Distribution Function 
CDR Critical Design Review 
CI Consistency Index 
COA Course of Action 
CPM Critical Path Method 
DM Decision Maker 
DOE Design of Experiments 
EFT Early Finish Time 
EMV Expected Monetary Value 
EST Early Start Time 
EVM Earned Value Management 
FA Formulation Agreement 
FAD Formulation Authorization Document 
GAO Government Accountability Office 
GEV Generalized Extreme Value 
IG Inspector General 
IRB Institutional Review Board 
ISS International Space Station 
JCL Joint Cost and Schedule Confidence Level 
KDP Key Decision Point 
LFT Late Finish Time 
LoE  Level of Formal Education 
LoS Likelihood of Surprise 
LST Late Start Time 
MDR Mission Design Review 
ML Most Likely 
NASA National Aeronautics and Space Administration 
NPR NASA Procedural Requirement 
OMB Office of Management and Budget 
PDF Probability Distribution Function 
PDR Preliminary Design Review 
PERT Program Evaluation and Review Technique 
PMBOK Project Mangement Body of Knowledge 
PRR Production Readiness Review 
SDR System Design Review 
  
xi 
 
SLS Space Launch System 
SRR Systems Requirements Review 
Te Total Network Path Duration 
WBS Work Breakdown Structure 
WC Worst Case 
WFF Wallops Flight Facility 
YoE Years of Experience 
 
  
1 
 
Chapter 1: Introduction 
 
1.1 The Problem with Scheduling 
 The Guide to the Project Management Body of Knowledge (PMBOK) tells us 
that on any given project, several constraints must be managed to achieve project 
success (PMI 2013, para. 1.3).  The schedule constraint, if mismanaged, is one of the 
more immediate indicators of a problem in the project.  On a small scale, if a task 
does not finish on time, it could drive other tasks in the project also to be late.  On a 
larger scale, when the entire project finishes late, stakeholders begin to question the 
capabilities of the project manager.  How then can a project manager give herself the 
best chance of success during the planning stages of the project?  The quick answer 
would be to find experts who know the most about the project and ask them for help 
in putting together the schedule (PMI 2013, para. 6.5.2.1).  Herein lies the problem: 
who exactly is the “expert?”  Is it the engineer/technician who does the work?  Is it 
the functional manager who has seen the work over the course of several years?   Is it 
the senior manager who has a better idea of the “bigger picture” across all projects?   
The people who actually do the work frequently claim that management does 
not allow enough time to complete a given project or task (Goldratt 1997, 40). 
Goldratt, on the other hand, seems to be of the opinion that most time estimates are 
padded and are larger than they actually need to be (Goldratt 1997, 118).  Further 
compounding the issue is the fact that managers and those who do the work (hereafter 
referred to as “technicians”, to include both engineers and technicians/operators) may 
  
2 
 
have different views on what defines the success of any activity or project.  For 
example, a technician’s key concern may be technical accuracy which could also be 
interpreted as the project constraint “quality.”  A manager may be more concerned 
about the schedule and budget (e.g. it may not be up to full operating specs, but if it 
meets the requirements, anything further is unnecessary).  These different definitions 
of success could drive different time estimates.   
Experience can be another major factor in estimating differences  (PMI 2013, 
para. 6.2.2.3).   A senior technician, for example, has seen the worst and will probably 
make estimates based on those experiences (Kahneman 2011, 236–37; Goldratt 1997, 
48).  Things do not always turn out badly, however, so when the activity is completed 
early, it will lead management to believe there was too much padding in the estimate, 
and they will question the next estimate that is provided (Goldratt 1997, 41).  Over 
time, this back and forth can create tension between management personnel and those 
they manage.  Given the considerations discussed above, how then, should a project 
manager use the schedule inputs provided by peers and project team members?  And 
if said project manager is questioned on her final schedule estimate, what basis can 
she use for backing up her decision?   
1.2 Goals and Objectives 
 The goals of this dissertation are broken down into two parts.  The first goal is 
to develop an understanding of differing perspectives of project stakeholders and  
how project stakeholders estimate differently from one another.  The second goal is to 
provide project managers a method to incorporate multiple opinions when developing 
inputs for a network schedule.   
  
3 
 
Through work experience, it was noted that in an effort to develop project 
schedules, there appeared to be disagreements among certain groups of stakeholders 
regarding how long activities should take.  Based on this observation, the first 
objective of this research is to analyze the differences in stakeholder opinions about 
various project constraints and practices based on three major demographics: Position 
(manager vs technician), Years of Experience (YoE) , and Level of Formal Education 
(LoE) .  Using these same demographic categories, the next objective is to study how 
project stakeholders differed from one another when asked to provide duration 
estimates on project activities.   
Based on the results noted in the scheduling estimation study, the final 
objective is to develop a procedure to allow a project manager to use Bayesian 
methods to update her own beliefs about activity durations based on stakeholder 
estimates.  This updating model is not tied to the results of the first part of the study 
in that the updating method only considers the estimate provided by the decisions 
maker and experts, without consideration of their demographic or scheduling trends. 
1.3 Potential Implications 
 Whether it exists or not, there is a perception of a divide between those who 
manage the work and those who perform the work.  If this research can find where 
the differences hide or if, in fact, project stakeholders are not actually so different 
from one another, then perhaps the two groups can open a better dialog.  
Management’s perception seems to be that the technicians inflate their estimates 
when asked “how long will this take.”  The technicians, on the other hand, seem to be 
of the opinion that the schedules are not realistic.  If this research can expose and 
  
4 
 
document these underlying beliefs, then perhaps the dialog between the two groups 
can be improved.   
Current scheduling methodology focuses on creating network schedules based on 
three point estimates (PMI 2013, para. 6.5.2.4).  Personal biases (known and unknown) and 
gaps in information can affect these estimates and ultimately provide bad inputs to the 
network schedule (Regnier 2005b, 8).  By incorporating the estimates of multiple 
stakeholders, biases can be more readily filtered out.   This method could also increase the 
level of stakeholder engagement in the project by allowing everyone to have their say in the 
schedule (Surowiecki 2005, 212, 227; PMI 2013, para. 6.5.2.5).   The final estimate may not 
match any one stakeholder’s estimate, but it does reflect the collective assessment of the 
team.   
Creating this aggregate estimate represents a departure from the current methodology 
both by incorporating multiple estimates and by requiring a new distribution model for the 
three point estimates required by PERT.   
 
1.4 Background of Wallops Flight Facility 
 The data gathered for this research was obtained by analyzing several active 
projects at Wallops Flight Facility (WFF), a launch range and test facility located on 
the Eastern Shore of Virginia.  Like Cape Canaveral, WFF provides a spacelift 
capability (although on a smaller scale than Cape Canaveral), as well as providing a 
launch area for smaller rockets whose primary mission is atmospheric study or 
vehicle validation.  WFF is owned and operated by the National Aeronautics and 
Space Administration (NASA) and its primary mission has been to support smaller 
test and scientific launches as opposed to major spacelift operations, although it has 
  
5 
 
started to expand its spacelift capabilities (Kremer 2013b, 8–10).   “Spacelift” is the 
ability to use a rocket to launch a payload.  Rockets typically consist of two parts: the 
booster and the payload.  The booster comprises most of what one typically thinks of 
when one hears the term “rocket.”  It provides the thrust required to allow the payload 
to travel along its intended trajectory.  That trajectory can either be orbital (the 
payload will orbit the earth) or suborbital (the payload will fly in a parabolic shape 
and return to the earth without ever reaching orbit).  The payload is, in most cases, the 
booster’s raison d’être.  It can be anything from a space shuttle to a simple bank of 
instruments and transmitters (Jenner 2015). 
WFF supports a unique subset of the spacelift mission known as the 
“sounding rocket.”  In the context of the rocket world, a sounding rocket typically 
carries a scientific payload on a sub-orbital voyage to gather atmospheric data or data 
on the geomagnetic fields that create the stunning auroras that can be seen in the 
extreme northern and southern parts of Earth.  These smaller rockets are also used to 
demonstrate vehicle capability.  In these cases, the intent of the mission is not to 
gather data about our atmosphere, but to gather data about the booster itself  (“NASA 
Sounding Rockets Annual Report 2013” 2013, 4, 20). 
Just as the rocket has two parts, a launch campaign also has two parts: the 
vehicle (described above) and the ground support.  The vehicle gathers data and 
transmits it back to systems waiting on the ground.  In order to receive and process 
these signals, an extensive network of equipment is required.  Typically, this ground 
equipment can be divided into three parts: radar, telemetry, and command (Kremer 
2013a, 6). Radars are used to track the flight of the vehicle, which not only tells the 
  
6 
 
scientists/engineers where the vehicle is headed, but also helps determine how well 
the vehicle is performing (Kremer 2013b, 50). Telemetry assets can also be used to 
track the vehicle during fly-out, but typically telemetry assets are more concerned 
with receiving the data transmitted back from the vehicle during its flight (Kremer 
2013b, 43–44). Command assets protect public safety by ensuring that an errant 
vehicle can be destroyed before it violates federal safety criteria (Kremer 2013b, 47).  
Beyond these major categories, several other systems tie together to provide the 
required support infrastructure, including communications and networking, data 
processing, weather measurements, and photo/optical products.  Together, all of these 
systems provide the ground support required to ensure that the data provided by the 
vehicle during fly-out gets back to the appropriate stakeholders (Kremer 2013b, 42). 
 
1.5 Background of Project Types 
 This research deals with three major types of activities at WFF: operations, 
maintenance, and engineering.  Although all three project types accomplish different 
tasks, they all ultimately point to the same end goal and are necessary to accomplish 
WFF’s mission.  
Operations projects involve supporting the preparation, launch, and post-flight 
data collection of the vehicles that launch from WFF or one of its deployed ranges 
such as Poker Flat Research Range in Alaska or the Andøya Space Center in Norway 
(Kremer 2013b, 7).  These projects involve reviewing the requirements of the various 
range customers and supporting pre-launch testing to ensure that the range 
instrumentation (telemetry, radar, command, etc.) is interacting correctly with the 
  
7 
 
vehicle and with the other range instrumentation.  When all of these pieces are in 
place, the range supports a launch by tracking the vehicle and recording the data sent 
back from the vehicle during flight.  After the flight, that data is processed and 
provided to the customer for further analysis.  When supporting at one of its deployed 
ranges, operations projects involve not only supporting the actual mission along with 
its pre-launch tests, but in some cases, also bringing up a site that has not been used in 
several months and ensuring it is still in good working order.  This usually requires a 
team of people to travel to the location prior to the actual operation to get ready for 
the mission before the customer first requires support.   
Beyond operations activities, personnel at WFF are also responsible for 
maintenance projects which entail maintaining the instrumentation and systems that 
support launch operations.  When personnel are not actively supporting a launch, they 
must perform scheduled maintenance activities on the instrumentation.  This applies 
to both WFF and deployed sites that have a more permanent set up (i.e. the 
instrumentation stays in place although the site is not actively manned the entire year 
by WFF personnel).  For the truly deployed sites, the instrumentation is returned to 
WFF where it undergoes its standard maintenance.  Maintenance activities vary in 
complexity and frequency depending on the type of instrumentation or system on 
which the maintenance is being performed.  Typically, there are two types of 
maintenance performed on the instrumentation/systems:  preventative maintenance 
and corrective maintenance.  The former is scheduled and known.  These are specific 
activities to check out the system/instrumentation and ensure it is in good working 
order (e.g. clearing dust out, checking connections, greasing gears, etc.).  The latter is 
  
8 
 
unscheduled and unknown.  This type of maintenance is performed when something 
breaks or does not perform as expected.  This type is harder to estimate with respect 
to completion time  (Kremer 2015, 36–37). 
Engineering projects at WFF can be extremely varied in their scope and type.  
For this research project, the engineering projects could be described in one of two 
ways: system upgrades and system acquisitions.  Projects of the “system upgrade” 
type typically involve upgrading an already-existing system with a new part, 
capability, or software.  These projects take systems that already exist and make 
changes using locally (at WFF) developed products or “Commercial-Off-The-Shelf” 
products which are then tested and integrated into the already-existing infrastructure.  
Projects dealing with system acquisition occur when WFF purchases an already-
developed system and integrates it into the WFF infrastructure.  These projects 
typically involve finding a physical location for the system, assembling and testing 
the system, integrating the system with the existing infrastructure at WFF, and finally, 
certifying the system for operational use (Kremer 2013b, 37–45). 
 
1.6 Research Summary 
 Given the dynamic nature of projects and specifically projects at WFF, 
developing an accurate schedule can be a challenge.  Some believe too much time is 
given on a project while others believe not enough time is allotted.  Unexpected 
challenges during project execution frustrate the technicians who execute the tasks, 
leaving them with a desire for more time for the next similar project.  When the next 
project goes smoothly and does not require the full amount of allotted time, 
  
9 
 
management is left feeling like the project could have been completed more quickly.  
As time progresses, these mindsets become engrained while the project manager is 
left trying to find the “right” answer  (Kahneman 2011, 80–81; Goldratt 1997, 40–
41). 
In order to determine trends in estimating practices, subjects from a variety of 
different backgrounds were asked to provide activity duration estimates on several 
projects of the types described above.  Subjects were provided several surveys, the 
first of which was a survey that captured basic demographic and project-constraint 
preference information.  Later, subjects were provided different project surveys with 
lists of activities required to complete each project.  These surveys were designed to 
capture estimates on how long activities should take and determine whether or not 
subjects believed the provided list was accurate.  A second survey was provided to 
those engaged in executing the projects to record how long the activities actually took 
along with any other changes or challenges that took place during the project.    
These survey responses were compiled and analyzed using Design of 
Experiments (DOE) to determine if there was any correlation between the 
demographics of the subjects and the results of the other surveys (Montgomery 2008, 
208–10). A new estimating method was then developed which used Bayesian 
updating to combine the inputs of multiple experts (Morris 1977). 
Because the human element plays a heavy role in project planning and 
execution, responses obtained during the period of study were also analyzed to 
determine if project stakeholders think differently from one another and if those 
opinions are part of the disconnect that seems to occur when determining how long a 
  
10 
 
project or activity should take.  These observations were then compared to the 
scheduling data to determine if the stated opinions of different stakeholders matched 
their scheduling estimates in the hope of revealing some of the underlying reasons for 
why different stakeholders estimate the way they do. 
Ultimately, this research seeks to provide insight into the mindsets of a 
diverse group of project stakeholders and provide a method to combine these diverse 
opinions into one estimate that can be used in the development of a network schedule.  
By having a better understanding of the thought process behind the estimates and by 
including estimates from multiple experts, a project manager can not only create a 
better project schedule, but can also better defend one should it go awry.  By 
gathering real world data, it is hoped that this will be reflective of what a project 
manager will actually encounter when asked to develop a schedule, making the 
results of this research a useful tool to help accurately assess how long it should take 
to successfully complete a project. 
 
 
 
 
 
  
11 
 
Chapter 2: Literature Review 
 
 
 The process of scheduling a project can be very complicated.  Politics, 
budgets, past experiences, and present “unknowns” are just some of the challenges 
faced by a project manager trying to determine a likely completion date for a given 
project.  Several scheduling “best practices” exist and are available for use by a 
project manager, but those best practices are entirely dependent on the input provided 
to them (Malcolm et al. 1959, 650–51; Grubbs 1962, 914; Pickard 2004, 1569).  The 
inputs to these scheduling best practices should come from the “experts”  (PMI 2013, 
para. 6.5.2.1), but how do those experts decide on what their inputs should be?  Are 
scheduling challenges seen at Wallops Flight Facility unique or has NASA as an 
organization encountered similar problems?  This chapter will provide an overview of 
the scheduling challenges faced by NASA over the past several decades to see if there 
are any trends that can be applied to the scheduling challenges at WFF.  The chapter 
will then go on to discuss best practices for scheduling and some caveats that 
accompany those best practices.  It will then move on to the current literature on 
decision analysis and how it can affect scheduling estimates.  It will conclude with a 
discussion of the Bayesian aggregation method used in this project.   
2.1 Scheduling in NASA – GAO Reports 
 While Wallops Flight Facility may have a unique mission within the 
constructs of NASA, the project management (and specifically scheduling) challenges 
  
12 
 
experienced by the project teams at WFF are not unique to the facility.  According to 
its website, the Government Accountability Office (GAO) is responsible for 
monitoring government spending of American tax dollars.  Within this role, they 
provide reports on how well certain programs are being managed along with any 
concerns about the ability of the project to be successful.  These reports document 
challenges encountered and often provide recommendations for overcoming these 
challenges and how to proceed. (“About GAO” 2015)  A word search on “Schedule” 
was conducted on the GAO website, with those results being further narrowed down 
to those reports related to NASA.  This search returned nearly 800 results, and of 
those approximately 75 were chosen and reviewed based on the apparent applicability 
provided in the report’s abstract.  These reports spanned a variety of projects and 
several decades, but many seemed to have several common themes that played out 
over and over again.  The information below is a summary of the issues identified in 
those reports which seem to be contributing factors to schedule challenges. One 
interesting thing to note throughout this section are the years shown in the references.  
The first two-digit number in each of the references describes the year the report was 
written.  In several cases, the same issue is described years (and even decades) apart.  
2.1.1 Lack of Resources/Inadequate funding 
 One recurring theme seen throughout several of the reports was that of 
schedule delays being caused by a lack of resources and/or inadequate funding.  In 
the movie Apollo 13, there is a scene where engineers are working to develop a 
procedure to turn the Command Module back on after it had been shut down for 
several days.  The required systems are determined, but those systems will overreach 
  
13 
 
the available power budget.  At one point, one of the engineers states that the 
command module thrusters must be warmed up due to the extreme cold of space and 
the other engineer replies that he will have to trade off the parachutes or something to 
make that happen.  The first engineer responds that if the parachutes do not open, 
then there is no point to continue trying.  The second engineer then replies with a 
statement that has stuck with this author as applicable to nearly all resources 
constraints:  “You’re telling me what you need.  I’m telling you what we have to 
work with at this point. I’m not making this stuff up.”  (Howard 1995).  The same 
principle can be seen with nearly any resource required for a project.  Although 
funding is the resource that comes most readily to mind, there are several which must 
be considered, including: time, money, technology, personnel, and knowledge (GAO 
2011, 7, 2009b, 6).  In one example involving the Space Launch System (SLS), the 
report stated that the program’s budget was $400 million short of what it needed.  
Without the required funds in place (among other issues) officials at NASA were not 
able complete the contracts needed to proceed with development.  This in turn 
increased the risk to both the cost and the schedule to the program. (GAO 2014, 10–
11).  NASA told the government what it needed and the government replied with 
what NASA had to work with.  This is just one example, but it can be seen over and 
over again across multiple projects spanning nearly forty years.  Without the 
resources required to execute the tasks in the schedule, whether it be people, money, 
or equipment, it does not matter how well one estimates how long something should 
take.  Without the capability to get started, the duration will remain “indefinite”.  
  
14 
 
 Returning to the example of the SLS, NASA realizes its need to operate 
within a constrained budget.  While it is doing its best to keep within the prescribed 
funding limits, the program has consistently struggled to ensure technical and 
programmatic requirements of the system are met within the constraints of available 
funding.  The program has listed this as its number one risk and stated that it does not 
believe its current planned budget will cover the current design, which does not even 
account for changes and challenges during development and testing.  This lack of 
funding is predicted to delay the launch date by six months which, in turn, increases 
the overall cost (GAO 2014, 11).  Even forty-five years later on a project designed to 
once again carry humans into space, one group is telling the other what it needs, the 
other responds with what it has to work with.  Based on a recommended “best 
practice called the Joint Cost and Schedule Confidence Level (JCL), NASA requires 
its launch programs to have a 70% probability of meeting its cost and schedule 
baselines.  The JCL looks at the proposed requirements, cost, and schedule goals of a 
given project and analyzes the probability that the project can meet those goals (GAO 
2014, 5–6).  Given the problems already encountered by the SLS system, NASA must 
decide what it will sacrifice in order to keep the project moving forward:  increased 
cost, increased schedule, or pressing forward with a JCL rating of less than 70% 
(GAO 2014, 10–11). 
 A mismatch of resources and requirements is not necessarily always the fault 
of the project team, especially in the case of research and development.  In some 
cases, the teams knew what was required to successfully complete the project, but the 
resources simply were not available (GAO 1991a, 29; Martin 2012, 27).    A recurring 
  
15 
 
theme throughout several of these reports seems to be delays in receipt of funding 
from Congress (GAO 1988a, 1,2,5, 12, 14-15, 1991a, 4, 1977d, 3).  In some cases, 
this was due to governmental constraints that were out of NASA’s hands.  One report 
released in 2012 states that, since its inception in 1959,  NASA has started the fiscal 
year with its allocated funding only seven times.  Without the funding in hand, 
managers had to restructure the project plan in order to conform to the available 
resources (usually in the form of some type of continuation) (Martin 2012, vii).  In 
other cases, if Congress does not believe that a project can meet its stated cost and 
schedule estimates, it can delay funding until NASA can provide such assurances 
(GAO 1991a, 31, 2008, 10, 1997, 6).  If designs and plans lag behind early in the 
project, Congress may delay funding until it has some assurance that the project can 
succeed.  If the perception is that the project is mired in problems, then it is less likely 
that Congress will authorize funding, even if the program is already in work (GAO 
1991a, 30–31, 1991b, 5).  In other cases, funds were simply not approved, causing 
delays in start dates which propagates through the project (GAO 1980a, 44–45).  In 
one report, a response from NASA criticizes the author for failing to acknowledge 
that funding constraints were a major contributor to projects running behind schedule 
and that these funding constraints were externally driven (GAO 1980a, 65).  
In yet another budgeting challenge with Congress, project managers must 
contend with increased scope and stagnant budgets  (Martin 2012, 29).  This is 
another example of the Apollo 13 phenomenon of funding:  NASA tells Congress 
what it needs, Congress responds with what NASA has to work with.  As mentioned 
before, in the SLS program, NASA is striving to remain within the budget profiles set 
  
16 
 
by Congress.  Despite efforts to remain within this profile, the number one risk is that 
it will run out of funding prior to the first launch.  Which will push the launch date 
out.  Which will cause an increase in required funding. Which will push the launch 
date out… (GAO 2014, 11).  Given the vast portfolio that must be managed, NASA 
works to create levels of prioritization among its projects.  The theory is that NASA 
will rank its projects such that the approved projects will fall within the funding 
profile allocated by Congress and ensure that the most important projects get the 
funding they need.  The problem, though, is that even with this prioritization, NASA 
was exceeding the likely allocation it would be provided by Congress.  When the 
allocated funding is not received, sacrifices must be made to other project constraints 
(GAO 1994a, 1–2).  
Another major recurring theme was that of NASA officials having to manage 
and estimate project costs based on annual budgets as opposed to life-cycle costs  
(GAO 1988a, 19, 2002a, 2).  Because NASA is required to manage projects based on 
annual funding requirements, funding may not necessarily be available in accordance 
with the planned schedule (GAO 2002b, 10–11).  In cases such as these, the funding 
seems to be driving the schedule as opposed to matching funding to scheduled 
milestones as would be recommended in an Earned Value Management (EVM) 
construct (GAO 1994a, 1; Mantel Jr. et al. 2004, 237–44).  When that funding is not 
available, adjustments must be made to the project in order to remain within the 
budget constraints (GAO 1988a, 5, 19).  Even high priority projects such as the space 
shuttle fall victim to managing by annual budget.  One report stated that aspects of the 
program experienced schedule extensions of 13-15 months with the primary driver 
  
17 
 
being the need to remain within the annual budget (GAO 1977d, 3).  A report issued 
that same year described a space telescope project that was delayed from the 
beginning by at least one year due to requests for funds being denied by the Office of 
Management and Budget (OMB) (GAO 1977c, iii, 4).  In another example, from a 
later date, even the International Space Station (ISS) experienced schedule delays that 
resulted from trying to make the project plan fit the annual funding schedule.  In this 
same report, NASA admitted that in this instance the funding delays were not a major 
issue, but that the uncertainty caused by unstable funding profiles did negatively 
affect the stability of the project. It also stated later in the report that trying to match 
the project plan to allocated funding forced a schedule delay of 18 months, although it 
did provide some improved stability in the plan (GAO 1991a, 4, 29, 34).  In an 
interview conducted by the Inspector General (IG), personnel across NASA were 
asked about different challenges facing their projects.  In this survey, funding 
instability was cited as a major challenge (nearly 75% of respondents listed this).  
When the budget was changed, the teams had to adjust their projects accordingly 
which often affected the overall schedule (Martin 2012, 25).  Because of the lifespan 
of several of these development projects, NASA also faces the challenge of keeping 
funding in the face of changing government officials in both the executive and 
legislative branch.  An effort that was a priority for one president may not be a 
priority for another.  Congressional leaders change and with those changes, the 
allocation of funding can change as well (GAO 2008, 18). 
Given all of these issues with funding, there are some other basic underlying 
causes which are major contributors to the scheduling problem.  One of these issues 
  
18 
 
(which will be discussed later in this chapter) is that much of what NASA deals with 
is research and development.  These types of projects are notoriously difficult to 
estimate because of all the unknowns.  As the teams progress in the project, they gain 
more and more understanding and unknowns resolve themselves into increases in the 
requests for budgets and schedules (GAO 2014, 7, 2012, 12).  The problem is that, 
whether legitimate or not, that initial budget declaration becomes an anchor point 
from which NASA and Congressional leaders base their perceptions (Kahneman 
2011, 119).  Projects that do not live within that perception can then run into funding 
issues when they request more money (GAO 2014, preface).  This also gives the 
perception to Congress that the project is not under control, which makes Congress 
less likely to provide more money (GAO 1991b, 5, 1991a, 30–31, 2003, 8–9).   
Another major contributor is what has been dubbed the “Hubble Psychology” 
(Martin 2012, 16).  The Hubble telescope was a complete disaster from a project 
management perspective, exceeding cost and schedule estimates and initially being 
plagued by technical problems.  Despite this, it continued to receive funding and 
schedule support and engineers were ultimately able to resolve issues.  Now Hubble 
provides unprecedented views of our universe, making its project management 
failures pale in comparison to its technical success (Martin 2012, vi).  This 
psychology has given rise to the belief that as long as a team can achieve technical 
success, sins against the more materialistic success criteria will be forgiven.  This 
does not inspire project managers to be overly concerned with whether or not their 
projects come in on time and on budget as long as the project is a technological 
success (Martin 2012, 11–12).  In general, NASA has a culture of optimism which 
  
19 
 
helps bring about these technological successes.  Its “go forth and conquer” mentality 
allows people to accomplish amazing things (Martin 2012, 37–38).  An interesting 
contrast to this culture of optimism, however, is NASA’s culture of safety/mission 
assurance-before-cost/schedule, but it results in the same prioritization of mission 
success over project constraints.  For all the wonderful things it has accomplished, 
when NASA fails in its technological endeavors, it tends to fail spectacularly (or 
worse).  Missions often consist of one-of-a-kind payloads or, even more importantly, 
human lives.  In the event of a mishap, the former is difficult to recover from, the 
latter, impossible.  Because of these high stake missions, NASA must carefully 
consider its management of project constraints (GAO 1988b, 18, 1977d, 60, 1977a, 9; 
PMI 2013, para. 1.3; Martin 2012, 13,18).  A quote from Walt W. Williams, the 
Program Manager for X-15 and Mercury perfectly sums up the attitude of NASA 
towards safety versus schedule: “You will never remember the many times the launch 
slipped, but the on-time failures are with you always.” (waynehale 2015)  In an 
environment such as this, it is highly unlikely that risk mitigation options will favor 
relieving cost and schedule risks when those mitigations could potentially cause a 
technological mission failure (GAO 2017, 15–17, 22–23; Mantel Jr. et al. 2004, 105). 
Once a project is under way and effort has been expended on it, it becomes 
much more difficult from a psychological perspective to give up on the project (Arkes 
1985, 129).  The longer the team works on the project and the more money invested, 
the more attached team members and managers become, reflecting the concept of  
“sunk cost” (Kahneman 2011, 345; Arkes 1985, 132).  As resources are “sunk” into 
the project, the attitude of, “we’ve already put so much into this, let’s just finish it” 
  
20 
 
becomes harder to escape (Kahneman 2011, 354; Arkes 1985, 135).  Some would 
argue to ignore what has been done and focus only on whether or not it makes sense 
to continue down the current path, although others would argue careful consideration 
of all factors is required (Kahneman 2011, 343; Mantel Jr. et al. 2004, 270; Farr 2012, 
5–13; Arkes 1985, 124).  A prevailing attitude at NASA, however, is that as long as 
the project continues to make technical progress, “someone” will find extra funding 
to keep the project alive (Martin 2012, vi). The problem with this, though, is that in 
some cases, the funding must come from other, lower priority projects (Martin 2012, 
viii).  Part of the “sunk cost” struggle is that it means admitting defeat on a goal.  
NASA encourages a “can do” culture of optimism that translates over into its project 
management.  When given a project, the tendency is to say “yes”, despite possible 
funding and schedule challenges (GAO 1993a, 12; Martin 2012, iv). If the project 
managers cannot remain grounded in the initial stages of planning, then the project 
has little hope of meeting the already-unrealistic schedule once issues and challenges 
arise (Martin 2012, 12–13). 
As previously mentioned, one of the major hurdles to successfully managing 
project constraints at NASA is the instability of available funding.  The resulting 
uncertainty leads to issues not only with actual funding concerns, but also with 
another critical resource: people.  The revolving funding door takes its toll on project 
members and their motivation to continue work knowing that at any moment, their 
project could be on the chopping block (GAO 1991a, 23, 29, 2008, 18, 1993a, 11, 
1991a, 32).  When people are worried about their jobs, they will be less likely to 
focus on solving the technical problems at hand.  This in turn means that project 
  
21 
 
managers will need to spend more time focusing on managing personnel issues and 
less time managing project constraints (GAO 1991a, 32).   
Even the fictional space research and development projects run into personnel 
problems.  In the movie Return of the Jedi, the project manager in charge of Death 
Star construction insisted that the schedule could not be met because he needed more 
men (Marquand 1983; Ward 2015, 68). As previously mentioned, the aerospace 
career field is highly specialized and requires a very specific skill set, so even if there 
are enough people available, having the right skill set is equally important.  A good 
project manager can help keep a project moving towards schedule completion, but, to 
quote David Mamet, “Old age [experience] and treachery [also experience] will 
always beat youth and exuberance” (Mamet 2015).  Although research and 
development projects can be very different as far as requirements, experience can 
teach a project manager where to look for pitfalls and also how to “work” the system 
to get things done (e.g. where to get approvals, who to ask, good times to ask, bad 
times to ask, how to anticipate and mitigate personnel issues, etc.).  NASA is facing a 
growing concern over its workforce development as its experienced project managers 
and engineers are beginning to reach retirement age.  Those who know the ins-and-
outs of the systems and who also know how to recognize a trend which can lead to a 
problem are starting to leave.  Those who remain behind will become good project 
managers in their time, but they still need time to develop (GAO 2006a, 4, 2006e, 
10). 
The other problem facing NASA is the capability to backfill people once they 
retire or as new projects come online.  Funding limitations make it difficult to hire on 
  
22 
 
new people, not just in management roles, but in technical roles as well (GAO 2006d, 
6).  Personnel are also challenged with performing work on multiple projects, forcing 
them to prioritize which projects receive attention.  In these cases, trying to do more 
with few people usually results in work being put off until time is available. 
Personnel find it difficult to remain dedicated to side projects when their primary jobs 
are already consuming a significant amount of time (GAO 1991b, 27, 2006e, 10, 15, 
1980b, 13).  Further complicating this issue is the fact that NASA has outsourced 
much of its technical knowledge base which now rests more with contractors than it 
does with the government civilians (GAO 2006a, 3–4).  Because of this shift, NASA 
must now provide technical and project oversight to new contractors who may or may 
not have experience with the types of projects NASA requires them to do.  This 
inexperience can lead to costly delays as work must be re-done to meet the required 
standard (GAO 1991b, 27–28).  In some cases, it is not only the contractor who lacks 
experience, but, as described above, the NASA project manager as well.  This 
inexperience can affect how well the project is managed not only from a technical 
perspective, but from a project management perspective as well.  Without the proper 
direction, contractors hired to do the job must fulfill requirements to the best of their 
understanding, but that understanding may be incorrect (GAO 2006f, 14, 1994a, 5, 
1991b, 28).  Schedule challenges are further complicated when funding is not 
available for outsourced work to be completed or when a contract cannot be 
definitized.  When allocated funding is withheld, contractors cannot begin (or 
continue) to work.  This can delay work to the point that it affects the overall 
completion of the entire project (GAO 2014, 17, 2009a, 14). 
  
23 
 
 While funding is one of the major resources in short supply on a given project, 
other resources can also wreak havoc with planned schedules.  In a specialized field 
such as aerospace, facilities can also be a cause for concern with respect to schedule.  
When several projects are vying for the same test facility, invariably someone must 
give way, which will usually result in a schedule delay (GAO 2008, preface, 2008, 
13).  Other times, facilities with the required capabilities are no longer in existence, 
having been shut down in previous rounds of budget cuts (GAO 2008, 14).  In some 
cases, facilities are available, but there are no people to man the facilities (GAO 1976, 
i).  Facilities are not the only material resources that can end up in short supply.  
Hardware and software can also delay schedules when it is either late in delivery or 
quality issues require re-work.  This requires finding alternative ways to make the 
technology work which can, in turn, lead to more schedule delays. In some cases, 
equipment has become so obsolete that the technology required to put together the 
equipment no longer exists or is much harder to find which can also cause schedule 
delays (GAO 2004, 11, 1994b, 3–4; Martin 2012, 22–23). (GAO 2012, 31)   
2.1.2 No overall plan (business case) 
While some of the funding issues discussed in the previous section were out 
of the project manager’s control, other issues may have been exacerbated by a failure 
to have a valid business case that adequately described the resource needs of the 
project (GAO 2014).  One project management best practice states that prior to the 
start of any project, a business case should be developed to demonstrate the need for 
the project at hand  (PMI 2013, para. 4.1.1.1).  NASA goes on to define the business 
case as ensuring that project resources are matched to customer needs.  Here, NASA 
  
24 
 
defines resources not only as time, money and people, but also knowledge (GAO 
2006a, 10).  As mentioned in the previous section, a major problem facing NASA 
right now is the retirement and outsourcing of its project management staff (GAO 
2006a, 22).   This exodus is a major concern for NASA because, as people leave, the 
knowledge leaves with them.  Without this knowledge, it is much more difficult to 
accurately estimate how much something will cost or how long it will take to 
complete (GAO 2006a, 4).   Both PMBOK and NASA state that cost and schedule 
estimates should be derived from past project’s records and expert opinion, but when 
all the experts leave, the ability to make good estimates leaves with them (GAO 2007, 
4, 2006a, 11,22-23, 2003, 7, 2004, 2, 2006b, 2–3, 2012, 4, 2011, 8, 2009c, 5–6; PMI 
2013, para. 6.5.2.1, 7.2.2.1). 
One GAO report recommends that NASA should implement policies which 
require better reviews before moving from one project development stage to the next.  
They refer to this approach as a “knowledge-based” approach to systems engineering.  
Basically, each project is required to prove that they have the “knowledge” needed to 
proceed to the next phase of development (GAO 2008, 16).  This includes 
understanding the requirements and how the project will meet those requirements as 
well as (and this is stressed several times) whether or not the technology currently 
available to the project is capable of meeting those requirements.  They further state 
that these projects should have good requirements and well defined cost and schedule 
estimates before progressing from “formulation” to “implementation” (GAO 2006a, 
3, 2012, 5).  This matches with PMBOK’s recommendation of planning the project 
before moving on to the execution phase (PMI 2013, para. 3.4).  Several GAO reports 
  
25 
 
mention that a failure to obtain the correct knowledge base prior to beginning a 
project or moving to the next phase significantly increases the probability of a 
“project management” failure of the project (GAO 2014, preface).  One of the 
recurring themes in the later GAO reports is that the NASA teams seem to start 
projects without the knowledge required to truly evaluate the probability of success.  
It is almost a “figure it out along the way” mentality.  Interestingly, this concept of 
knowledge-based engineering appears to be specifically called out more frequently 
only within the last ten to fifteen years.  Prior to that, the general idea may have been 
mentioned, but problems were mostly blamed on the familiar culprits of inadequate 
funding, frozen budgets, and changing requirements.   
NASA indicates that from their perspective, a business case must not only 
address the technical specifications of the program, but it must also show that the 
required technology is available and that the basis for the budget and schedule is  
reasonable (GAO 2009c, 6).  PMBOK states that a business case is created to, 
“determine whether or not the project is worth the required investment” (PMI 2013, 
para. 4.1.1.2).   Based on these GAO reports, NASA’s business cases seem to have a 
slightly different purpose to them in that they seem to occur later in the project 
lifecycle than is discussed in PMBOK.  According to PMBOK, development of the 
project charter occurs in the “Initiating” Process Group, which is the first process 
group in the lifecycle of a project.  The project charter is the official approval to 
proceed for any project which means that no real work can begin on the project until 
it is approved (PMI 2013, para. 3.3).  The business case is listed as one of the inputs 
to the project charter, meaning that the business case must be developed before any 
  
26 
 
work on the project officially begins.  The business case itself is an analysis of a 
statement of work which describes the high-level need and general scope of the 
project.  The business case then provides a high-level analysis of the statement of 
work to determine if the benefit of undergoing the project has enough return to justify 
the cost of the effort (PMI 2013, para. 4.1.1.2).  At this early stage of the project, it 
would be nearly impossible to have a good understanding of exactly what the project 
would entail with respect to requirements, cost, and schedule.  According to the 
PMBOK model, only high level information would be available about the project at 
this time. In fact, in some cases, a project manager has not even been assigned at this 
stage  (PMI 2013, para. 3.3,  4.1). 
 The next process group according to PMBOK is the “Planning” process 
group.  In this process group, the project manager and the team take the high-level 
information of the project charter and begin to refine it into actionable parts.  This 
process should result in the Project Management Plan which should document every 
aspect of what will be required to successfully meet the business need stated in the 
project charter (PMI 2013, para. 4.2).  The first step in creating the Project 
Management Plan is to define the scope of the project (referred to as “Project Scope 
Management”) and one of the major steps of project scope management is to collect 
the requirements (PMI 2013, para. 5.2).  This step is crucial to the success of the 
project as all project constraints will be tied to these requirements.  The success or 
failure of the project will also be judged in most cases by how thoroughly these 
project requirements are met  (PMI 2013, para. 3.4, 5.1.3.1-5.1.3.2).  After 
determining requirements, it is recommended that the project team define the scope 
  
27 
 
and create the Work Breakdown Structure (WBS) of the project.  A basic scope has 
previously been defined in the project charter, but now that the team has well-defined 
requirements, the scope can be more accurately defined (PMI 2013, para. 5.3).  
Defining the scope helps prevent “scope creep” where project stakeholders seek to 
expand on the requirements.  These expansions can wreak havoc with project costs 
and schedules, but they can be difficult to challenge if they can be tied to something 
already within the scope of the project  (Mantel Jr. et al. 2004, 42).  The final step in 
the planning process group of the scope process is to establish a WBS.  The WBS 
translates the requirements into actions to be taken by the project team.  These actions 
can then be assigned a cost in terms of labor and materials and can also be assigned a 
duration (how long it should take to complete the activity) and organized into a 
schedule (PMI 2013, para. 5.4.2.2, 5.4.3.1).  At this point, the project manager should 
have what is needed to portray to “the powers that be” an accurate depiction of the 
best estimate of what it will take to complete the project.  
 NASA’s project development processes are defined in NASA Procedural 
Requirement (NPR) 7120.5E.  These processes are further described in a “best 
practices” handbook called the NASA Space Flight Program and Project 
Management Handbook (NASA/SP-2014-3705) which was released in September 
2014.  The document covers both program management and project management, 
stating, much like PMBOK, that projects must fit into the overall strategic goals of 
the organization (NASA 2014, 21; PMI 2013, para. 4.1.1.2).  NASA’s planning 
processes break major projects into six phases (designated “A” through “F”), and in 
some cases a “pre-phase A” for concept development.   Each phase concludes by 
  
28 
 
undergoing a boarded review, information from which is use in a “Key Decision 
Point” (KDP).  These KDPs provide senior leadership the chance to review the 
project’s current progress and determine whether or not to allow it to continue.  Each 
KDP is a gateway point that the project must pass before entering into that particular 
phase, so “KDP A” will usher in Phase A (as opposed to concluding it)  (NASA 2014, 
114).  Figure 2-1 (NASA 2014, 26) below shows the entire project process along with 
the associated reviews and decisions points.  Several of these will be described in the 
following paragraphs.  
 
Figure 2-1: NASA Project Life Cycle  
 
The six phases just discussed are further divided into two stages referred to as 
“Formulation” and “Implementation”.  Prior to Formulation, the project engages in 
“pre-phase A” activities where a need or concept is identified and analyzed to ensure 
it aligns with the overall strategic goals of NASA.  These projects then undergo a 
  
29 
 
high-level analysis to determine feasibility and potential challenges that could face 
the program (NASA 2014, 138).  These concept studies probably most closely match 
the “business case” as described in PMBOK.  They look at the different mission ideas 
presented to upper management and determine which one is the most likely to 
produce a good return on investment.  Once a mission concept is selected, upper level 
management at NASA develops the Formulation Authorization Document (FAD).  
This document most closely matches a “Project Charter” as defined by PMBOK in 
that it officially authorizes the project to begin and covers a wide variety of high-level 
project characterizations such as scope, funding, authority, and constraints.  
According to NPR 7120.5E, this document should contain, “requirements, schedules, 
and project funding requirements.” (NASA 2015a, 24, 2014, 141)  The NASA Project 
Handbook further clarifies that these should be project level requirements at this stage 
and project-level cost and schedule, reflecting at least the completion date and 
possibly broken down further into the cost and general schedule of each phase of the 
project (NASA 2015a, 143, 146).  The project team then responds with the 
Formulation Agreement (FA) which is a preliminary plan to meet the requirements 
described in the FAD (NASA 2015a, 25). 
Once the project has been officially approved and passes KDP-A, it begins to 
refine the mission concept.  Throughout Phase A, a preliminary Project Plan should 
be developed containing many of the same sections as a PMBOK recommended 
Project Plan (NASA 2015b, 33, 2015a, 137–77).  At the end of Phase A (at KDP-B), 
the project requirements should be refined to at least the system level and the project 
team should have an idea of what sub-system requirements will be (NASA 2014, 
  
30 
 
153).  At KDP-B, the project team should be able to provide external stakeholders a 
general roadmap describing when and where the time and money will be spent 
(NASA 2014, 154).  Within this phase, the team will conduct a Systems 
Requirements Review (SRR) which is meant to demonstrate that the project 
requirements as understood by the team will fill the need defined at the program level 
(NASA 2014, 32).  Once the requirements are approved, the team will continue to 
develop its architecture and undergo a System Design Review (SDR)/Mission Design 
Review (MDR).  These reviews communicate the team’s plan of execution to the 
review board who will then provide an assessment as to whether or not the course of 
action will meet the approved requirements (NASA 2014, 153).   The cost estimates 
should be broken down into fiscal years expanding over the expected life of the 
project by this point (NASA 2014, 160).   At KDP-B, the team should have a good 
understanding of what the project should accomplish (requirements), how to 
accomplish that objective (technical plans), the resources needed to complete those 
plans (time, money, people, materials, etc.), and they should be reasonably certain 
that it can be accomplished within the provided estimates of those aforementioned 
resources (NASA 2014, 153–55, 165–66).  All project planning to date should be 
consolidated into a preliminary Project Plan, which should be available for review by 
stakeholders by the SDR/MDR (NASA 2014, 173). 
Once the team has successfully navigated through KDP-B, Phase B can begin.  
This phase is characterized by further refining the requirements and planned design.  
By the end of this phase, requirements should be baselined down to the sub-system 
level.  Cost and schedule updates should be made based on the team’s understanding 
  
31 
 
of the current risks facing the project and the Project Plan will be baselined prior to 
the Preliminary Design Review (PDR) (NASA 2014, 183,185).  The team should also 
begin refining its time-phased cost estimates and comparing it to the project budget to 
be provided by Congress.  The non-monetary resource requirements are also updated 
at this point to reflect the project team’s better understanding of requirements and 
plans (NASA 2014, 182–183,185).   
“Phase C” is characterized by further refinement of the plans in “Phase B”.  
This is the last phase before full scale fabrication and testing of the system to be 
delivered by the project, so the team and review panel must ensure that details are 
understood (NASA 2014, 182–183,185).  As stated before, the Project Plan has been 
baselined by this stage, so the team begins to implement the described execution 
plans.  The team should also continue to provide updates on cost, schedule, risks, and 
resources throughout this phase.  At this point, especially for large and expensive 
projects, the team must inform upper-level management of any milestone that is 
anticipated to be delayed over six months.  They must also inform upper management 
of any cost growth in excess of 15%.  For projects expected to have a life-cycle cost 
over $250 million, increases above 15% must be reported to Congress.  Increases 
over 30% could be subject to re-authorization.  In this phase, the team must undergo a 
Critical Design Review (CDR) to prove that the design is ready and also a Production 
Readiness Review (PRR) to prove that the team is ready to produce the systems 
required to successfully complete the project (NASA 2014, 189–98).  “Phase D” is 
where the team actually implements all of the technical plans and begins to build and 
test the system.  Drawings and technical documents reflect the “as-built” 
  
32 
 
configuration and are baselined.  “Phase D” completes with the successful initial 
operational function of the project in question (NASA 2014, 196–205). 
Ultimately, each lifecycle phase of the project is an expansion and refinement 
of the previous phase.  As the team learns more about the project, requirements are 
better defined, which allows for more detailed designs, which allows for a better 
informed cost and schedule estimate.  In the years prior to the NASA Program and 
Project Management Handbook previously described, the GAO criticized NASA for 
failing to follow good project management practices.  Based on data from one report 
on the Constellation program, the major culprit seems to have been that the 
program/project manager did not fully develop the required information in the early 
phases of the project lifecycle.  There was a lack of understanding by those involved, 
especially when it came to managing customer expectations such that they fit within 
the allowable resources of the project   The report also stated that the project team 
lacked a good understanding of the requirements and exactly what resources would be 
required to meet those requirements (more will be discussed on requirements in the 
next section). It was also stated that the project team fell victim to its own optimism 
and failed to correctly estimate how much time and money it would take to 
successfully complete the project (GAO 2009c, 1,3,5-6).  
It does appear that NASA has made great strides in its efforts to close the 
knowledge gaps called out in multiple GAO reports (GAO 2008, 8, 2006a, 13, 2006b, 
2).  In a report from 2006, GAO recommended implementing several different 
“Knowledge Points”, where Knowledge Point 1 represented the point where the team 
could show that the requirements could be met with the available resources.  It also 
  
33 
 
stated that it believed that NASA did not have a system in place which adequately 
analyzed whether or not the current level of technology was adequate to meet the 
requirements of the project (GAO 2006a, 3–4, 10, 13–15). NASA seems to have 
taken this advice to heart and has updated its best practices as described in the 
previous paragraphs.  These updates include multiple reviews and plans that ensure 
that the right people are looking at the project to ensure technology and other 
resources are in place prior to making a major commitment to the project.  In previous 
versions of NPR 7120.5 (the version active when the above reports were written), 
there were reviews required, but they were not nearly as extensive as the current 
version.  The NPR also did not have as many phases and “back down” points as the 
current version (NASA 2015b).  
While NASA’s phases and definitions of project planning may vary from that 
of PMBOK, the overall end-goal is the same.  Both groups seek to clearly define how 
a project will further the overall goals of the company/agency and both processes are 
designed to help manage project constraints and ensure that stakeholders have a good 
understanding of what is being asked of them.  By following these best practices, the 
project team is given its best opportunity to successfully complete a project (GAO 
2006a, 11). 
2.1.3 Changes, Uncertainty, and the “Experts” 
 The previous section described the best practice of developing a viable 
business case.  It also described the issues that were caused when a project failed to 
develop this business case.  There are many challenges NASA has faced in trying to 
  
34 
 
develop an overall viable business case, some having to do with failure to follow 
best-practices, some well out of control of the project manager. 
One of the major struggles faced by many projects at NASA was the fact that 
requirements often were not well defined prior to the start of the development phase 
of the project (GAO 1993b, 4, 1993a, 11). It is nearly impossible to fully develop a 
complete requirements list early in the project and high level requirements rarely 
provide enough detail to develop a truly legitimate schedule (GAO 2014, 25).  One 
report stated that a failure to adequately define requirements for both technical and 
management aspects of the program was the most significant cause of both cost and 
schedule growth (GAO 1993a, 11).  When a system is not fully defined, the design 
team may need to spend a significant amount of both time and resources working re-
designs (GAO 1991a, 4).  NASA is aware of the struggles and consequences of a 
failure to develop detailed requirements and has even stated that it is expected that 
research and development projects are going to experience changes  (GAO 1977b, ii).  
In some cases, this is simply a matter of requirements changing due to a better 
understanding of the system and how it will work, as opposed to an outright failure to 
adequately define requirements (GAO 1977c, 14).  NASA’s process even allows for 
this, as discussed in the previous section, where requirements are refined even after 
the official requirements review.  Trying to develop a schedule in the midst of this 
uncertainty presents a challenge to project teams.  Without fully knowing what 
changes will occur, it can be difficult to anticipate how long something will take 
GAO 2014m, 3).  When project requirements are not well defined, the project team 
must make assumptions about the intent of the requirements as they are written.  
  
35 
 
When the team begins to design when requirements or other aspects of the project 
plan are still in flux, there is a very good chance changes will be required after the 
team has already invested significant time and effort into a plan. In some cases, the 
project will make it all the way to the Implementation Phase before design problems 
are discovered which can cause massive amounts of rework (GAO 2001, 4, 2001, 7).  
Another issue in developing requirements is ensuring that all stakeholders are able to 
review and discuss requirements before they are finalized.  If key stakeholders are 
excluded from the reviews, costly re-work could become necessary when the project 
is more developed and less adaptable (GAO 1998, 16,19, 1992a, 3–4, 1982, 1,3).  
In the case of the Ares I (rocket) and Orion crew transport (payload), although 
requirements actually were baselined at the project level, some uncertainty remained 
regarding the more specific requirements at the system level.  These efforts were both 
separate projects, but were tied to one another and being developed together.  When 
the team had uncertainty regarding specific technical requirements about the systems, 
it made it difficult to guess at the correct design that would be optimal for both 
projects, which led to re-baselining at least one of the projects (GAO 2008, 8).  In 
another example of another major NASA project, the James Webb telescope ran into 
trouble because the launch vehicle was not selected until the telescope was already 
being designed.  Once the vehicle was selected, it was discovered that the telescope 
would not fit.  It can be inferred from the report that working this issue resulted in a 
one year delay of the mission GAO 2014p, 7–8).  Time and again, it appears that this 
inadequate definition of requirements led to either a schedule delay or a cost increase.  
In some cases requirements were simply not well defined, while in other cases, the 
  
36 
 
requirements themselves actually changed (GAO 1991a, 14, 20).  Either way, it 
presented a challenge to the design team to ensure that the actual product produced 
met the overall objective of the project (GAO 1992b, 2, 2003, 9, 2002a, 2, 2014, 21–
22, 1998, 16).  
 In other cases, the problem was not so much with the requirements, but with 
the design itself.  Beyond the struggle of contending with undefined requirements and 
designs, some projects had to work around requirements/designs that changed mid-
stream (GAO 1991a, 4).  Changes were sometimes caused by a better understanding 
of how the technology would realize the end goal of the projects, but other times the 
requirements were changed by direction from a higher power (for example a review 
board or even Congress) due to budgetary and schedule concerns (GAO 1993a, 11, 
1991a, 4).  NASA has stated that one of its accepted best practices is to ensure that at 
least 90% of the engineering drawings for a system are mature enough at the CDR 
that they could, in theory, be released to the production team with minimal changes 
required (GAO 2014, 7).  Several GAO reports mentioned challenges with NASA 
project personnel failing to stabilize the design of the system, which led to challenges 
with both cost and schedule.  Most reports mentioned a generic difficulty in 
stabilizing designs, but in one report, the GAO stated that NASA had failed to follow 
this best practice and that many projects had reached CDR without first stabilizing the 
design.  Another GAO report (written nearly two decades later) stated that the 
majority of the projects that had conducted a CDR during the year assessed failed to 
stabilize the system design prior to that review (GAO 1991a, 4, 2010, 5, 2003, 12, 
1993a, 17, 2009b, 13). 
  
37 
 
Part of the problem with achieving a stable design was the complexity of 
many of the systems (GAO 1991b, 2–3).  The teams would begin development based 
on what they thought they understood about the requirements, but as the design 
progressed, it became apparent that actually meeting the requirements would be a 
much more complicated endeavor than originally anticipated (GAO 1991c, 6, 1993b, 
4, 1989, 21).  Given that NASA is often pushing the boundary of what is defined as 
scientifically possible, project managers have stated that they struggle to discover 
how to achieve technical success, let alone project management success (Martin 
2012, 17).  This can affect schedule in a variety of different ways including re-design 
of implementation plans, delays in receiving parts, and problems selecting the correct 
contractor to implement the design (GAO 1991b, 23).  One GAO report stated that in 
a study of 29 programs, “technical complexities” was one of the six major categories 
of reasons for cost and schedule changes (GAO 1993a, 11).  One report completed by 
the IG nicely summed up the relationship between technical complexity and schedule 
delays.  It stated that, based on past evidence, the more technically complex a given 
project is, the more likely it is that schedule-busting problems will plague the 
program (GAO 2013, 18). 
This complexity contributed to another major struggle encountered by many 
projects which involved battling issues that arose during the design and testing of the 
system.  (GAO 2010, 5) These technical challenges seem to occur over and over 
again, which is not surprising given the nature of the work performed by NASA.  In 
several GAO reports, technical challenges are listed as the cause of cost increases and 
schedule slippages (GAO 2004, 10, 1991b, 15, 2006f, 18, 1991c, 4, 1993a, 17).  
  
38 
 
Several GAO reports simply refer to “technical problems” in an overarching term, but 
some reports specify things such as: failures during testing or testing 
restrictions/limitations (GAO 2008, 13, 2006c, 9, 1991c, 5, 1977c, preface), 
reductions in available tests that might have detected possible issues earlier (GAO 
1977d, 6), problems with the actual technology itself (GAO 2006f, 18), and 
integration challenges (GAO 2013, 23, 2009a, 17).  
One recurring major recurring theme that was specifically called out 
throughout these reports was the failure of the planned technology to meet an 
appropriate level of maturity.  In one report in from the early 1990s, four of thirteen 
projects were cited for a failure to adequately mature the required technology prior to 
fabrication, implying that time and money were being spent to build something that 
had never been proven to work as expected.  Any problems encountered would 
require re-work and a probable increase in schedule (GAO 1991a, 4, 2009a, 16).  In 
the Ares/Orion example cited earlier, a report from 2008 predicted problems for the 
project because a design review for the entire rocket was conducted prior to the first 
stage, “demonstrat[ing] maturity” (GAO 2008, 12).  The James Webb Space 
Telescope was also listed as being in danger of a schedule slip, with one of the 
primary causes listed as a failure to adequately mature technologies (GAO 2006c, 9).  
An IG report which covered several of these problems issued a recommendation as to 
when a project should be allowed to proceed.  In this recommendation it listed 
“mature technologies” as a resource which was critical to success (Martin 2012, 20).  
Another recurring theme was the difficulty in managing contractors hired to 
complete much of the technical work for NASA.  While not a technical challenge per 
  
39 
 
se, it appears that much of the difficulty in managing the contractors arose from a 
failure by the contractor to fully appreciate the difficulty of the work involved in the 
project (GAO 2006c, 9).  In some cases, contractors brought in to complete the work 
underestimated the difficulty involved or did not have the skills or expertise to deal 
with the technical challenges that arose (GAO 2009b, preface, 14, 2009a, 19, 2012, 
12).  Further compounding the issue is that NASA has struggled in the past with 
providing the proper management and oversight of the contractors completing the 
work (GAO 1993a, 16, 1991b, 2–3).  When the contractors run into technical 
problems, the overall project can suffer with delays in schedule and cost as more time 
is required to resolve these issues (GAO 2006f, 11, 1991b, 27).  Sometimes lack of 
knowledge on both the contractor and NASA sides can result in issues.  If NASA 
does not provide good direction and the contractors do not have the required 
knowledge, the likelihood of technical challenges which will cause schedule 
problems will increase (GAO 1991c, 8, 1993a, 16).  Some oversight challenges are 
the results of a lack of personnel resources (i.e. personnel were busy trying to 
complete other commitments and could not dedicate the time required to provide 
adequate oversight to the contractor (GAO 1989, 4, 1980b, 13).  It should also be 
noted that when bidding a job, a contractor is going to be “in it to win it”.  One report 
even suggests that the bids are deliberately understated in an effort to win the overall 
contract (Martin 2012, 20).  This will involve seeking ways to offer the lowest 
possible bid, which may ultimately result in problems once the contract is awarded 
because of overconfidence in capability or the assumption that past success in a 
  
40 
 
different field will translate into present success in the space field. (GAO 1991b, 19, 
23). 
Some challenges were not due to new technology but were caused by trying to 
retrofit heritage technology to make it useful to current projects (GAO 2010, 5, 
1991a, 4).  The theory is that heritage technology is already developed and tested.  It 
is a “known quantity” that can help reduce uncertainty about technical performance as 
well as cost and schedule.  Unfortunately, though, heritage technology is just that:  
heritage.  Like trying to install new software on an older computer, sometimes there 
are compatibility issues that must be overcome..  In this case teams must integrate 
new technology with the old technology, which is bound to present some challenges.  
NASA must weigh the challenges of developing completely new technology against 
the challenges of developing integration solutions for a new/heritage mix.  Take, for 
example, the SLS program.  From the outside, the vehicle is very reminiscent of the 
Saturn V rocket used to launch the Apollo astronauts toward the moon.  Despite the 
similarities, nearly fifty years separate the current vehicle from the first Apollo launch 
and many things have changed since then, including design standards.  The design 
team must figure out how to integrate what NASA has already accomplished with 
what it still wants to accomplish (GAO 2009b, 11, 2014, 16–17). 
In one GAO report, it was stated that problems with heritage technology were 
encountered in over half of the projects under review.  In this case, the team 
underestimated the difficulty of using this technology, even though it had flown on 
previous missions.  The result of that underestimation was a schedule slip of nine 
months (GAO 2009b, 14; Martin 2012, 23).  As stated in the previous section, one of 
  
41 
 
the resource challenges faced by NASA is the inability to obtain required parts, 
especially in situations where the use of heritage technology is required.  Companies 
that develop parts for spaceflight do not have the advantage of mass production to 
increase profitability.  If a certain technology is no longer needed, it can be difficult 
for these companies to maintain enough profit margin to stay in business.  If NASA 
then decides to go back and use an older technology, there is a chance that the 
original source no longer exists and that the knowledge base that developed that 
original source disappeared with the company (Martin 2012, 22). 
In some cases, this trade-off did not work.  For example, the Ares and Orion 
projects mentioned earlier originally tried to use heritage technology.  Ultimately, 
however, changes to the designs resulted in the team distancing itself from heritage 
technology because newer development was deemed to be more cost effective.  In 
another case it was discovered during testing that heritage material that was originally 
deemed acceptable for use did not fit the bill, forcing the team to look for other 
options.  This ultimately resulted in a schedule delay of nine months (Martin 2012, 
23).  Another challenge to using heritage technology was that the project team was 
having trouble re-creating it.  As discussed earlier in the previous section, this may 
have been due to the retirement of knowledgeable personnel or the lack of facilities 
still capable of manufacturing the required parts (GAO 2008, 6).  In theory, it makes 
sense to try and leverage past knowledge and previous designs to meet current goals, 
but in practice, it tends to be more of a challenge than anticipated (Martin 2012, 22). 
In some cases, design stability is further threatened by changes mandated by levels 
above the project (Martin 2012, 27).  In the reports reviewed, the primary driver for 
  
42 
 
these changes seemed to derive from one of two sources: either the project was 
seriously over cost/schedule estimates and the project was directed to re-design the 
system to reign it back in (GAO 1991a, 22, 1994a, 1–2) or there was a directive to 
remain within a predetermined budget profile which dictated that the system had to be 
re-designed to fit within the profile (GAO 1991a, 4).  In the first case, projects bring 
the re-design on themselves.  Significant technical problems call into question the 
feasibility of the program, causing Congress to question whether or not NASA has bit 
off more than they can chew (Martin 2012, 27).  Projects also fall victim to the failure 
to define requirements.  The project goes to Congress too early and too optimistically 
and once the project figures out what is really required, the increase in cost and 
schedule is no longer palatable to those who control the purse strings (GAO 1993a, 
11; Martin 2012, 12).  In an IG report, some interviewees even hinted that NASA’s 
estimates to Congress were low-balled just to get the project out of the gate, meaning 
it was not just the contractors who were guilty of underestimating.  The theory, as 
discussed before, was that if the project could just get started, it could probably get 
funding to continue as needed.  If the cost was too  high, it would not have a chance 
to start in the first place (GAO 2004, 11, 17; Martin 2012, 13, 20, 32).    
In the second case, it was often Congress or even the President directing 
NASA to make changes to the design.  The project’s design would be reported to 
Congress who would then determine whether or not the proposed cost fit within the 
pre-determined funding profile.  If it did not, NASA was directed to re-design the 
project to meet the funding limits (GAO 1991a, 4).  In other cases, the prices quoted 
to Congress amounted to what was effectively sticker shock and NASA was sent back 
  
43 
 
to the drawing board to try again (GAO 1991a, 17).  Design changes of this nature, 
while helping to ensure fiscal responsibility with the limited resources available do 
have a tradeoff.  Redesigns lead to schedule increases, so there must be a careful 
balance struck between cost savings gained from a new design versus the cost 
increases derived from an increase in the project schedule (GAO 1991a, 25). 
2.1.4 Concluding remarks 
 As can be seen throughout this section, scheduling challenges are nothing new 
for NASA and its partners.  While there are multiple causes for these schedule delays, 
there also seem to be common themes weaving throughout the past four decades.  In 
order to have the best chance for finishing a project on time, one must first understand 
what it is one is trying to do and how it fits into the overall grand scheme.  
Requirements must be understood and clearly documented and funding and resources 
must be available at the appropriate time.  Once the team understands what is 
required, they can begin to design and build the system.  Herein is the difficult part.  
Even if all requirements are fully understood and all resources are firmly in place, 
problems will still occur as the team works through the design and fabrication phase.  
How then, should a project team schedule these activities to allow for these problems, 
but still keep within a reasonable constraint of how long a project should take?  The 
next section will discuss current recommended practices for creating a project 
schedule and some of the challenges with implementing these practices. 
  
44 
 
2.2 Scheduling Basics  
 This section describes the recommended best practices for developing a 
project schedule, as well as some of the challenges with the currently proposed 
methods.  It also describes some alternative methods to the best practices designed to 
help alleviate some of the noted challenges.   
2.2.1 Developing the Schedule 
 Once the project is approved and requirements are defined, one of the first 
steps of building a schedule is to take each element of the lowest level of the WBS 
and break it down into its component activities (Mantel Jr. et al. 2004, 73; PMI 2013, 
para. 6.2).  When developing this activities list, it is recommended that the subject 
matter experts and team members get involved early in the process.  Personnel who 
are familiar with the deliverable described by the WBS package will most likely be 
the most knowledgeable about what activities will be required to produce said 
deliverable (Mantel Jr. et al. 2004, 75; PMI 2013, para. 6.2.2).  Given that each level 
of the WBS further specifies the previous level, and that the activity list is the lowest 
required specificity, if the project team successfully identifies each required activity, 
then completion of those activities will roll up into its WBS package which will in 
turn roll up into the next WBS level, ultimately resulting in the successful delivery of 
the projects ultimate deliverable (Mantel Jr. et al. 2004, 73; PMI 2013, para. 5.4, 
5.4.2.2).   
Once project activities have been successfully identified, they must be placed 
in the proper order.  PMBOK refers to this as “sequencing” the activities (PMI 2013, 
para. 6.3).  Activities are arranged in a logical order and are connected to one another 
  
45 
 
in such a way that the team can tell which activities have predecessors (activities 
which must be completed before the current activity can take place) and successors 
(activities that must follow the current activities).  Not all activities will be tied to one 
another, but every activity will have at least one predecessor and one successor (PMI 
2013, para. 6.3).   Sequencing activities naturally lends itself to producing some type 
of chart which can easily demonstrate the predecessor/successor relationships of each 
of the activities.  The current method for sequencing activities is referred to as 
Activity on Node (AON).  AON networks depict activities as “nodes” and 
dependencies as arrows connecting the nodes (Mantel Jr. et al. 2004, 136).   
After sequencing the network, resources are assigned to each activity which 
then allows a project manager to begin working with the team to estimate how long 
each activity will take.  According to both PMBOK and the original developers of the 
PERT system, these duration estimates should come from the people most familiar 
with the work to be completed (the experts) (Malcolm et al. 1959, 650; PMI 2013, 
para. 6.5.2).  These estimates are typically informed by recorded durations of a 
particular activity or project (“analogous estimating”), or, when that data has not been 
recorded, it can be based on the previous experience of the project team member 
(PMI 2013, para. 6.5.2.1-6.5.2.3).  Durations estimates can be either deterministic or 
stochastic, depending on what input a project manager is able to glean from the 
project team  (Mantel Jr. et al. 2004, 147; PMI 2013, para. 6.5.2.4).  The latter will be 
discussed in greater detail in the next section. 
Now that the schedule has been sequenced, resources have been assigned, and 
a duration has been determined, the project manager can determine the duration of the 
  
46 
 
entire project.  A popular procedure to achieve this is referred to as the Critical Path 
Method (CPM) and involves following the “path” of the activities based on their 
sequencing from the beginning of the project to the end (Mantel Jr. et al. 2004, 138–
41).  The completion time of each activity is basically the completion time of the 
predecessor activity plus the current activity’s duration.  If an activity has two or 
more predecessors, the largest predecessor completion time is carried forward as the 
start time of the current activity.  This procedure, called the “forward pass” is 
completed for all activities and across all possible paths of the network schedule.  The 
result provides the earliest possible point at which the project could finish and also 
provides the Early Start Time (EST) and Early Finish Time (EFT) of each activity.  
Once completed, the same procedure is applied, but in reverse.  Starting at the end of 
the project with the previously calculated project duration from the forward pass, 
each possible path is followed back to the start of the project, where the start time of 
each successor activity becomes the completion time of the current activity.  For 
activities with two or more successors, the successor with the smallest start time 
becomes completion time of the current activity. This result provides the Late Start 
Time (LST) and Late Finish Time (LFT) of each activity and allows for the 
calculation of “total float” for each path through the network as well as the 
calculation of “free float” which shows how long an individual activity can be 
delayed before it affects the EST of its successor.  This calculation of float allows a 
project manager to determine the “critical path” of activities .  This critical path is the 
longest possible path (also the shortest possible completion time) through the network 
and has the smallest amount of total float (typically no float or negative float).  If any 
  
47 
 
activity on this path is delayed, it will delay the overall completion date of the project 
(Mantel Jr. et al. 2004, 134–43; Malcolm et al. 1959, 654–57; PMI 2013, para. 
6.6.2.2). 
The preceding paragraphs provided a basic description of simple schedule 
development.  In practice, project schedules will incorporate things such as lead/lag 
time (e.g. time for ordering materials early or required delays between the completion 
of one activity and the start of the next) and can have a variety of different 
predecessor/successor relationships such as finish-to-start, start-to-start, start-to-
finish, and finish-to-finish.  The basic method of calculating a schedule remains the 
same, but these nuances can complicate the development.  For larger, more complex 
schedules, software is available that will allow the user to enter activities, durations, 
predecessor/successor relationships, lead/lag times, etc. and will calculate the critical 
path and project duration, as well as display the schedule in a Gantt chart for quick 
assessments of project progress.  While all of these tools are extremely helpful, 
ultimately the accuracy of the schedule is going to be dependent on the accuracy of 
the duration estimates received from the “experts” and in an uncertain world, 
deterministic estimates probably will not fit the bill (Mantel Jr. et al. 2004, 141,162, 
167; Regnier 2005b, 8; PMI 2013, para. 6.5.2.4, 6.7.3.2). 
2.2.2 Dealing with uncertainty: Stochastic estimates  
 In the previous section, CPM was discussed as a way to organize the schedule 
and determine the estimated completion time of the project.  It showed how much 
contingency time was available on each network path and within each activity.  The 
Program Evaluation and Review Technique (PERT) created in the early 1960s by the 
  
48 
 
Navy to assist with the development of the Polaris system used a similar method for 
organizing its project activities (Regnier 2005a, 1).  The creators of PERT went 
beyond the organizational tactics of scheduling, however, and introduced a method to 
try and account for the uncertainty in those deterministic methods.  Their method 
used three estimates for each activity: most likely, best case (if everything went 
right), and worst case (if everything went wrong) (Malcolm et al. 1959, 650–51).  
These values were then combined in a weighted average using Equation 2-1 which 
provided the expected value of the duration of the activity (Mantel Jr. et al. 2004, 
144; Malcolm et al. 1959, 651; PMI 2013, para. 6.5.2.4).  This expected value could 
then be used within the network schedule to follow the procedure described above for 
determining project durations and float time (Mantel Jr. et al. 2004, 146).  𝑇𝑇𝑒𝑒 = 𝐵𝐵𝐵𝐵+4𝑀𝑀𝑀𝑀+𝑊𝑊𝐵𝐵6                                                              Eqn 2-1 
 
where Te is the expected duration of time , BC is the optimistic (“best case”) duration, 
ML is the “most-likely” duration, and WC is the pessimistic (“worst case”) duration. 
 The PERT formula can also be used to determine the standard deviation of the 
estimate distribution by using Equation 2-2.  The variance can be found by squaring 
the value of σ found using Equation 2-2 (Mantel Jr. et al. 2004, 145; Malcolm et al. 
1959, 652). 
𝜎𝜎 = 𝑊𝑊𝐵𝐵−𝐵𝐵𝐵𝐵
6
                                                                 Eqn 2-2: 
 
where σ is the standard deviation , BC is the optimistic duration, and WC is the 
pessimistic duration. 
  
49 
 
The “6” in Equation 2-2 indicates the belief that the values between the two 
outside estimates (optimistic and pessimistic) cover over approximately 99% of all 
possible durations of an activity and that the duration of the activity will be outside of 
this range less than 1% of the time.  For those who are less confident in their 
estimates, the dividend of Equation 2-2 can be altered to represent the different 
confidence levels (with 95% and 90% being other popular choices).  When converted 
back to a variance by squaring the result of Equation 2-2, these individual variances 
can be useful in determining the overall variance of either the critical path or other 
paths of interest in the overall project network, assuming each activity can be treated 
as statistically independent.  The variance can also help the project manager 
determine the level of uncertainty that went into the original estimates based on the 
size of said variance (Mantel Jr. et al. 2004, 145–46, 151).  To determine this range, 
the creators of PERT looked to the Normal distribution as a guide.  A Normal 
distribution is defined from (-∞, ∞), but truncating the distribution at a standard 
deviation of + 2.66 encompasses 99.2% of the probability density and also results in 
the standard deviation equaling 1/6th of the range.  Given the assumption that there 
was negligible density below the BC estimate or above the WC estimate, the creators 
of PERT decided that a good approximation for the variance of their beta distribution 
was to borrow from the Normal distribution and assume that the relationship between 
the standard deviation and the range was also 1/6th.    (Clark 1962, 406; Regnier 
2005b, 6; NIST 2017a).  
  
50 
 
2.2.3 Problems with PERT 
 The developers of the PERT method of schedule duration estimation were 
themselves working under a very tight deadline.  They were tasked to provide a 
process to analyze a complex schedule and they were only provided one month to 
accomplish this task (Malcolm et al. 1959, 647).  Given the short timeline, the team 
developed a basic methodology, but as the system became more widely used, some of 
the finer points of the methodology came into question.   
 One of the first questions involved the beta distribution itself.  The creators of 
PERT did not have a particular distribution in mind, but in developing their risk 
concept, they felt that a unimodal distribution with low probabilities at the tails would 
adequately model the behavior of an activity duration (Malcolm et al. 1959, 650–51; 
Clark 1962, 406).  These assumptions easily lent themselves to settling on a beta 
distribution as the chosen model (Malcolm et al. 1959, 651–52).  Since that time, the 
beta distribution has become the generally accepted model used to account for 
uncertainty in duration estimates (Keefer and Verdini 1993, 1087; Pickard 2004, 
1570–71; Bennett, Lu, and AbouRizk 2001, 513; D. Johnson 2002a, 457–58; David 
Johnson 1997, 387).  Having said that, the fact remains that the beta distribution is an 
assumption and the true distribution of the activity durations is not known (D. 
Johnson 1998, 254–55; Grubbs 1962, 914–15; Bennett, Lu, and AbouRizk 2001, 513; 
D. Johnson 2002a, 463–64, 1998, 253; Pickard 2004, 1567). 
 A second concern involved the estimates obtained from the experts and their 
correspondence to true statistical values.  The creators of PERT asked personnel to 
provide their estimates of the best case, worst case and most likely estimates.  From 
  
51 
 
there, a bounded distribution was created with the most likely value representing the 
peak of the curve, while the best case and worst case values represented the bounds of 
the curve.  Given these three numbers, the developers then calculated the mean and 
variance of the estimates.  From a practical standpoint, this gives an idea of how long 
the person performing the activity thinks it should take, as well as a proxy measure of 
their uncertainty in their estimate (Malcolm et al. 1959, 650–51).  From a statistical 
perspective, however, this method is problematic.  Typically a distribution curve is 
based on multiple observed data points and the mean and variance are derived from 
this data.  The PERT process creates a distribution using just three estimated numbers 
provided by personnel who may or may not have a background in statistics (Grubbs 
1962, 914; Golenko-Ginzburg 1988, 770; Keefer and Verdini 1993, 1087; Pickard 
2004, 1567).  Without knowing the true underlying distribution, these estimates may 
or may not encompass the full range of possible duration values for an activity  
(Grubbs 1962, 914–15; Regnier 2005b, 8; Pickard 2004, 1569).  A similar problem 
occurs with the estimate of the most likely (mode) value which is simple enough 
conceptually, but not easily estimable in the true statistical sense (D. Johnson 2002b, 
457).  
Because the ultimate goal is to determine the completion date of an entire 
project, the creators of the PERT methodology required the calculation of the 
expected time and variance for each activity.  Assuming independence of each 
activity, at its most basic level this allowed the calculation of the total project 
duration by summing all of the activities along a given network path (Clark 1962, 
406; Malcolm et al. 1959, 651–52).  This assumption of independence allows a 
  
52 
 
decision maker to apply the Central Limit Theorem when summing the means and 
variances of each activity along a given path which ultimately provides the mean and 
variance of the total project duration (Keefer and Verdini 1993, 1086; Steyn 2001, 
365).   Pickard stated that this is a prime example of an inverse statistics problem 
where the decision maker wishes to know a certain parameter of a statistical 
distribution, but he must derive that information given different parameters of the 
distribution (Pickard 2004, 1567–68).  In the PERT case, estimation of the beta mean 
and variance is not intuitive, making it easier to derive these parameters from the 
three estimates that are more easily understood (D. Johnson 2002b, 457).  This case is 
further complicated by the fact that the true distribution is unknown (Pickard 2004, 
1567).   In this case, the desired statistical parameters are the mean and variance and 
the available estimated parameters are the mode and extremes of the distribution as 
provided by technical personnel working the activity (Malcolm et al. 1959, 648, 659; 
Clark 1962, 406; Pickard 2004, 1569).  Because the underlying distribution is 
unknown and given the challenges with estimating the mode (most likely) and 
extremes (best case/worst case), it is entirely possible that the values used to derive 
the mean and variance do not accurately describe the true distribution of the variable  
(Pickard 2004, 1573; Steyn 2001, 368).  This in turn can lead to inaccurate estimates 
of the true mean and variance of each activity which ultimately results in an 
inaccurate project duration.  Pickard has suggested that some of these challenges may 
be overcome by supplementing the standard three estimates with information about 
the supplier’s previous experience with similar projects (i.e. the number of times the 
estimator had worked on a similar project).  Converting this into a likelihood, Pickard 
  
53 
 
was able to develop a method, using several assumptions, to fully characterize the 
beta distribution in a more statistically sound manner (Pickard 2004).   
Further compounding the issues just discussed is the concern regarding the 
accuracy of Equations 2-1 and 2-2.  To calculate the true mean and variance of a beta 
distribution, one must know the defining parameters of the curve, α and β which 
describe the distribution.  Because this curve is developed based on estimates and not 
on observation of actual events, the defining parameters of the curve are not known  
(D. Johnson 2002b, 457).  The mean and variance must therefore be derived based on 
available information, namely some combination of the mode/median and extreme 
estimates of duration (Pickard 2004, 1568).  The creators of PERT made some 
assumptions regarding the values of α and β and developed Equations 2-1 and 2-2 
based on those assumptions (Malcolm et al. 1959, 651–52; Mantel Jr. et al. 2004, 
144; Grubbs 1962, 914; Golenko-Ginzburg 1988, 768; D. Johnson 2002b, 457). 
These equations provided a good approximation for specific values of α and β, but 
when later compared to a wide range of beta distributions with varying set values for 
α and β, it was discovered that Equations 2-1 and 2-2 did not provide good estimates 
for the true means and variances of the distributions as calculated using Equation 2-3 
and 2-4  (Keefer and Verdini 1993, 1089; Regnier 2005b, 7; Grubbs 1962, 914). 
𝜇𝜇 = 𝛼𝛼
𝛼𝛼+𝛽𝛽
                                                                         Eqn 2-3 
𝜎𝜎2 = 𝛼𝛼𝛽𝛽(𝛼𝛼+𝛽𝛽)2(𝛼𝛼+𝛽𝛽+1)                                                         Eqn 2-4: 
 
 Keefer and Verdini consolidated several recommended modifications to the 
original PERT formula and compared their various estimating capabilities by 
  
54 
 
comparing the results of the approximating equations for the mean and variance to the 
true mean and variance as derived by Equations 2-3 and 2-4.  These values were 
calculated using the inverse cumulative distribution function (CDF) of a normalized 
beta distribution which falls between the interval 0<x<1.  It was shown that the most 
accurate approximation for the mean was a method developed by Pearson-Tukey, 
which resulted in a maximum error of 0.07% for the mean and -1.6% for the variance 
(as opposed to the original PERT formula which resulted in a maximum error of 
451% for the mean and 5506% for the variance).  A close second was the Swanson-
Megill approximation which resulted in errors of 0.33% and 11.1% for the mean and 
variance respectively.  These two approximations are provided below in Equations 2-
5 and 2-6 (Keefer and Verdini 1993; Pearson and Tukey 1965; Megill 1971).   
𝜇𝜇 = (0.63𝑥𝑥(0.5) +  0.185[𝑥𝑥(0.05) +  𝑥𝑥(0.95)]                 Eqn 2-5: 
 
𝜇𝜇 = (0.4𝑥𝑥(0.5) +  0.300[𝑥𝑥(0.10) +  𝑥𝑥(0.90)]             Eqn 2-6 
where x(a) represents the value of the inverse CDF calculated at the “a”th fractile  
(Keefer and Verdini 1993, 1087–88).  It should be noted that while the original PERT 
formula uses the mode of the distribution, Equations 2-5 and 2-6 rely instead on the 
median.  It should also be noted that while the PERT formula requires personnel to 
provide estimates encompassing nearly all possible durations of a given activity, 
Equations 2-5 and 2-6 reign in the estimate to a smaller range.  In this case, estimators 
are required to provide estimates wherein the duration would fall 95% of the time (for 
Equation 2-5) and 90% of the time (for Equation 2-6) (Keefer and Bodily 1983, 1090; 
Mantel Jr. et al. 2004, 144; Regnier 2005b, 8). 
  
55 
 
 Given that the underlying distribution for an activity time is unknown, it is 
possible that the random variable representing duration is not, in fact, beta distributed 
(D. Johnson 2002b, 464).  Even if the underlying distribution is actually beta, it is 
highly unlikely that accurate parameters for the curve could be directly assessed (D. 
Johnson 2002b, 459).  Without these parameters, it is impossible to know the exact 
location of the mean within the distribution, although Equations 2-5 and 2-6 have 
been shown to be good approximations.  To help alleviate this issue, Johnson 
proposed the use of a triangular distribution whose descriptive parameters match the 
three values gathered during a PERT survey (i.e. two extremes and the 
mode/median).  Johnson was able to show that for a beta distribution where p, q > 2, 
parameters for the triangular distribution could be derived such that the error between 
the values provided by the beta distribution and the triangular distribution were 
minimized.  He went on to show that by sacrificing some accuracy in matching the 
shape of the beta distribution, the descriptive parameters for the triangular distribution 
could be modified to better determine the mean and variance of the triangular 
distribution.  These estimates of the mean and variance closely matched the estimates 
developed by Pearson-Tukey when using the 5% fractile , further confirming that this 
equation is best estimation of the mean/variance for an unknown, yet positively 
skewed beta function (D. Johnson 1998, 260).  Johnson went on to prove that his 
method of estimating the triangular mean and variance works well for a variety of 
distributions making knowledge of the underlying distribution less critical since these 
equations could provide a good estimate of the mean and variance in the absence of 
absolute knowledge of the true distribution (D. Johnson 2002a, 464).  Ultimately, 
  
56 
 
however, all of these distributions and estimates are still dependent on the raw data 
provided by the expert.   
 The final two challenges relating to the PERT estimation problem deal with 
the actual estimates provided.  The first challenge involves estimating the extremes.  
Research has shown that it is extremely difficult to estimate the extreme range of a 
truly “best case” and “worst case” durations as described by the originators of PERT .  
These values are meant encompass over 99% of all possible durations , but it can be 
difficult to provide estimates which truly encompass this wide range. As that range is 
narrowed, it begins to fall more readily into the realm of experience of those 
providing the estimates who are then able to provide a more accurate assessment of 
what will occur “90% of the time” or “95% of the time” (D. Johnson 1998, 253; 
Mantel Jr. et al. 2004, 145; Moder and Rodgers 1968, B-76; Selvidge 1980, 502; 
Davidson and Cooper 1980, 67; Murphy and Winkler 1977, 790, 792).  The second 
challenge involves estimating the mode.  The original creators of PERT asked for the 
mode, but they did not specify where the mode should fall in relation to the extremes 
(Malcolm et al. 1959, 651; Pickard 2004, 1568).  Research has shown that, when 
based on estimates provided by project personnel, the mode is often placed closer to 
the best case value than the worst case value (Moder and Rodgers 1968, B-82; 
Golenko-Ginzburg 1988, 767).  It was further proven that not only does the estimated 
mode typically reside closer to the best case estimate, but that the relation between 
the mode and the extremes was (using the notation of this research): ML = 
(2BC+WC)/3 (Golenko-Ginzburg 1988, 770).  Further complicating matters is the 
fact that the mode, while easily understood in theory, does not translate over as an 
  
57 
 
easily assessable probability, especially by those with no probability background  
(Golenko-Ginzburg 1988, 770). Without knowing the underlying distribution and 
being able to physically observe the peak of the curve, there is no way to easily place 
the mode in the appropriate fractile (D. Johnson 1998, 255, 2002b, 457).  The median, 
on the other hand, translates easily into the language of probability, falling at the 0.5 
fractile.  It also translates easily into a survey question wherein the participant is 
asked for an estimate such that 50% of the time the activity duration will be smaller 
than the estimate, and 50% of the time it will be greater than the estimate.  A another 
benefit is that it allows for a readily available measure of the calibration of the 
individual’s estimating capabilities  (D. Johnson 1998, 255, 2002b, 457).  This makes 
Equations 2-5 and 2-6 even more attractive because not only are they better 
estimators of the true mean of a beta distribution across a wide range of α and β, but 
they also make sure of estimates which are more likely to be accurate (Keefer and 
Verdini 1993, 1087–88; Grubbs 1962, 914–15). 
 In summary, although the PERT equation remains is recommended practice 
for accounting for uncertainty (PMI 2013, para. 6.5.2.4), there are several problems 
with the formula.  First, it cannot be proven that the beta distribution is, in fact, the 
underlying distribution of the activity. Assuming it is the correct underlying 
distribution, its defining shape parameters are not known.  Because its shape 
parameters are not known, equations have been developed to estimate the 
distribution’s mean and variance.  These equations, however, have been proven as 
poor estimators of the true mean and variance of a beta distribution except in a very 
narrow margin.  Beyond this, the three estimates typically required in a PERT 
  
58 
 
equation are very difficult to estimate in a true statistical sense.  Despite all these 
challenges, PERT remains a popular method for compensating for the unknown when 
estimating schedule durations.  
2.2.4 Other Alternatives 
 Other solutions to manage schedule risk have also been developed, focusing 
less on the PERT equation itself and more on the path through the chains of activities.  
One of the major issues with the PERT methodology is its deterministic assumption 
of a single critical path (Keefer and Verdini 1993, 1086; Goldratt 1997, 157; Mantel 
Jr. et al. 2004, 155).  Any path could potentially become critical given a long enough 
delay in one or more of the activities (Goldratt 1997, 157–59; Gould 2005, 262; 
Mantel Jr. et al. 2004, 155–56; PMI 2013, para. 6.6.2.2, 6.7.2.1).  With the rise of 
computers and scheduling software, computer simulations provided a new way to 
account for path duration uncertainty.  Specifically, the Monte Carlo simulations, as 
applied in a scheduling context, use the basic concepts of PERT, but take into account 
the fact that some project activities will match their optimistic estimate, some their 
“most likely” and some will even match the pessimistic estimate.  An appropriate 
range and distribution are chosen for each activity, and each activity is organized 
within a network path.  The simulation is then run multiple times (user’s discretion) 
using myriad values for each activity (within its previously stated parameters) along 
the assigned network path, finally arriving the distribution for the overall project 
completion time based on the longest path through the network.  Because this 
simulation accounts for multiple different scenarios involving activity completion 
times, it provides a better picture of what could happen versus simply choosing one 
  
59 
 
value for each activity and calculating the project duration from that single data point 
(Mantel Jr. et al. 2004, 156–60). Simulation, especially in complex projects, requires 
the use of software which project managers may or may not have access to and 
requires an understanding of the background statistics to effectively use the software 
(Mantel Jr. et al. 2004, 161; Shih 2005, 744).  Simulations are also completely 
dependent on the input ranges for each of the activities (Mantel Jr. et al. 2004, 157).  
Different input ranges could result in drastically different schedules, making it crucial 
to gather estimates from knowledgeable personnel.  This, however, begs the question: 
what defines a knowledgeable person.   
 Another risk mitigation technique, suggested by Goldratt, is referred to as the 
Critical Chain.  This method again uses the basic PERT methodology of developing a 
network of activities and eliciting estimates from project personnel.  Goldratt 
maintains that project personnel typically pad their estimates to mitigate the risk of 
finishing late and that their “most likely” estimates are much closer to their “true” 80-
90% completion times.  These estimates build extra time, referred to as buffers by 
Goldratt”, into each activity.  Goldratt maintains that the risk mitigation happens in 
the wrong place.  According to the Critical Chain methodology, estimates should be 
much closer to the 50% mark, where there is only a 50% chance of completing the 
activity on time.  Risk mitigation is handled by a project buffer which consolidates 
each of the activity buffers as a dummy activity at the end of the project (Goldratt 
1997, 45, 65–67, 154–55; Steyn 2001, 365–66).  Personnel are no longer responsible 
for the due date of each activity, only for ensuring that the entire project is completed 
by the due date (Steyn 2001, 365). The basic concept is that each activity is a link in 
  
60 
 
the project chain and the project buffer is used to strengthen each link as it becomes 
weak (i.e. gets behind schedule) (Goldratt 1997, 89–95). The method also adopts the 
concept of feeding buffers. These buffers protect the critical path from delays in 
network paths that merge into the critical path.  Another key aspect of the Critical 
Chain is that it factors resource constraints into the schedule.  This allows project 
managers to not only monitor the time of each activity, but also potential bottlenecks 
in resources which are required by more than one activity (Goldratt 1997, 89–95, 
157–58, 215–21; Steyn 2001, 366). 
 
 
2.3 Decision Analysis and Expert Opinion 
The previous two sections discussed why projects struggle at NASA and the 
current best practices for scheduling a project.  The scheduling best practices 
described several methods for organizing inputs and developing statistical methods to 
predict the expected value of an activity duration.  The GAO reports describing 
NASA’s struggles to complete projects on time indicate that the best practices have 
flaws.  Where, then is the problem?  Is the issue the method or is it the inputs into the 
method (Regnier 2005b, 8; Shanteau 1992, 12; Meehl 1954, 136–38).  From 
discussions in the previous section, probably a little of both.  The previous sections 
described obstacles to project success and challenges with the method of PERT 
scheduling.  This section will provide information on the challenges of obtaining 
good inputs for the PERT method. 
  
61 
 
2.3.1 Recognized Biases and Their Effects 
Human beings must process vast amounts of information on a daily basis.  
Kahneman has proposed that in order to cope with this constant barrage of 
information, the mind has developed a two-tiered processing scheme.  System 1 is the 
gate-keeper.  It receives various stimuli and provides a quick judgment on the 
appropriate response to said stimuli based on its previous experiences (Kahneman 
2011, 11,24,56-57,71).  System 2 is the decision maker who does not like to be 
disturbed.  For most decisions, System 1 sends along an information package with a 
sticky-tab saying “sign here” and System 2 is happy to oblige.  When System 1 has 
trouble processing a new situation, it sends System 2 a package with a note saying, 
“Further information required.”  System 2 must then research, consider, deliberate, 
and ultimately make the final decision in how to act (Kahneman 2011, 24).   If 
System 2 relies too readily on System 1, it may use past memories to inaccurately 
assess current situations.  These faulty associations result in biases which can 
severely affect judgment (Kahneman 2011, 3, 6, 127; Winkler 1981, 482; Hogarth 
1975, 284).  
 Working in concert, these two systems prefer to make sense of the world by 
using past experiences to fit a new experience into a recognizable picture.  
(Kahneman 2011, 24,66; Zajonc 1968, 23).  Problems occur when System 2 blithely 
accepts inputs from System 1 without critical review of the input data.  System 1 has 
a problem with overconfidence in its assessment abilities.  In its efforts to make sense 
of a stimulus, it will explain away inconsistencies in the narrative it is trying to 
construct.  It invests itself in the narrative and passes the information along to System 
  
62 
 
2 as the truth (Kahneman 2011, 114; Tversky and Kahneman 1981, 457).  If System 2 
accepts the narrative without applying critical thinking, the narrative becomes truth, 
and is established in memory as a good point of reference from which to make future 
decisions  (Kahneman 2011, 45, 114; Tversky and Kahneman 1974, 1124).  Because 
our minds are programmed to create a narrative which describes the stimuli we 
encounter, and because System 1 gets a first crack at organizing all of the 
information, it will pull information from memory that confirms its initial assessment 
(Tversky and Kahneman 1981, 453).  Unless System 2 actively examines all of the 
data, including the memories which conflict with the current narrative, the story 
created by System 1 will get stronger and stronger, resulting in a bias from the truth 
(Kahneman 2011, 134, 247).   This type of bias is referred to as “confirmation bias” 
because the brain will seek out stories which fit the narrative and reject those that do 
not (Kahneman 2011, 45, 61 80-81; Mumpower 1996, 196).  This is tied closely to 
what researchers have dubbed the “priming effect”, where one stimuli triggers the 
ability to remember other events that are deemed to be tied to the original stimuli.  
These memories can also go one to trigger other memories in what is known as the 
ideomotor effect (Kahneman 2011, 52–53).  The more often one sees something, the 
more readily it will be brought to mind the next time (Kahneman 2011, 60; Benson, 
Curley, and Smith 1995, 1650; Whittlesea 1990, 716).  Because System 2 can 
redefine System 1’s definition of surprise, once System 2 has decided what is 
“normal”, System 1 will work to retrieve memories to confirm that new opinion and 
discount conflicting evidence (Kahneman 2011, 122–24; Hubbard 2010, 222–23). 
  
63 
 
 In some cases, however, recollection of memory is less a bias and better 
defined as “learning”.  System 1 has learned to recognize that certain stimuli lead to 
certain results.  Kahneman states that, “Intuition is nothing more and nothing less 
than recognition (Kahneman 2011, 235–36).  In some cases, the experience gained by 
an expert allows him to recognize stimuli and pull relevant information to predict a 
result.  Kahneman points out that in situations where feedback is frequent and 
immediate, an expert’s predictions can be better trusted because System 1 is 
recognizing the stimuli and responding accordingly.  Because the feedback is 
immediate, the cues are less likely to be confounded by spurious data and the effect is 
more closely tied to the actual cause (Kahneman 2011, 240, 242).  In fact, one 
proposed definition of an “expert” is that the person is both consistent with his 
prediction when provided similar stimuli and that he is also in agreement with other 
experts who have been provided the same stimuli  (Einhorn 1974, 564–65; Winkler 
1968, B-70). Others have pointed out, however, that one must be cautious when 
considering group agreement since a group can be in total agreement and still be 
wrong (Rowe and Wright 2001, 343).  Different authors agreed with the premise that 
the expert should be consistent in his predictions, but added the criteria that he should 
also be able to distinguish among environmental cues that appeared similar.  They 
referred to these traits as discrimination and consistency and showed through 
experiments that those deemed to be experts were more capable of demonstrating 
discrimination among cues than lay-people.  They also pointed out, however, that the 
scale was only comparative among the experts within the group  (Weiss and Shanteau 
2014, 107, 108, 112; Dawes 1979, 573; Shanteau 1992, 16).  They based their 
  
64 
 
definition of expertise on the behavior of the person in question as opposed to 
whether or not the person’s prediction was correct or if the title was conferred by 
others (Weiss and Shanteau 2014, 104; Rowe and Wright 2001, 341–42; Önkal et al. 
2003, 182; Shanteau 1992, 17).   They also maintained that expertise is non-
transferable stating that a person who is an expert in one field may not necessarily be 
an expert in another field, which would seem to contradict the practice of requesting 
“almanac” type questions to determine the calibration of an expert (Weiss and 
Shanteau 2014, 105–6; D. V. Lindley 1982, 122; Shanteau 1992, 13).  Another study 
pointed out that when asking experts for their assessments, the value to be assessed 
should be within the domain of the tasks which the expert would typically perform 
(Rowe and Wright 2001, 343).   Lindley has also pointed out that an estimator’s 
knowledge of an unknown variable and his skills as a probability assessor are two 
different things.  (D. V. Lindley 1982, 121) 
 Another bias also caused by the priming affect is referred to as “anchoring”.  
In this bias, a number was provided to a person, who was then asked to estimate an 
unrelated quantity.  Because that initial number had triggered the brain to reference 
similar numbers, it was shown that the estimates provided were frequently in range of 
the initial number that was provided.  Ariely goes on to point out that this initial 
anchor can continue to affect future decisions (Kahneman 2011, 119–20; Tversky and 
Kahneman 1974, 1128–29; Ariely 2009b, 26, 45; Tversky 1974, 154). 
 Closely related to the confirmation biases and the priming affect is the 
availability heuristic.  In this case, when trying to estimate how often an event 
happens, the brain recalls information about your personal experiences with that 
  
65 
 
event.  If System 2 does not catch this error, then personal experiences becomes 
“truth” as opposed to verifying your experiences against the actual frequency of 
occurrence (Kahneman 2011, 129–30, 132, 135, 159; Tversky and Kahneman 1974, 
1127–28; Tversky 1974, 152).   This availability heuristic becomes even more 
pronounced when combined with the affect heuristic. The affect heuristic occurs 
when emotions are brought to bear on the situation and beliefs are determined not by 
data, but by feelings (Kahneman 2011, 102, 237, 301).  Ariely further demonstrated 
the affect heuristic by proving that decisions made while a particular emotion was 
triggered created, in Kahneman’s parlance, a reference point for System 1.  When met 
with a similar situation in the future, participants in Ariely’s study followed the same 
decision patterns demonstrated during the initial scenario even though the strong 
emotions were no longer present (Ariely 2009a, Loc. 3572, 3590, 3625, 3643). 
 Kahneman also pointed out (and Meehl also suggested) that the brain is 
performing a type of substitution.  System 1 does not know the answer to the question 
it has been asked, so it replaces the original question with a different question that it 
is capable of answering (Meehl 1954, 111; Kahneman 2011, 97).  It then passes this 
answer on to System 2 as the correct answer, which, if not checked, will be accepted 
and convert into a belief (Kahneman 2011, 100, 129–30, 138–39).  The primary issue 
with substitution is that the new question System 1 substitutes may not be an accurate 
representation of the original question  (Kahneman 2011, 97, 138, 149). Additionally, 
heuristics will rarely withstand scrutiny under statistical analysis (Kahneman 2011, 
149, 151).  Kahneman warns that relying wholly on intuition will always make the 
predictor overconfident because System 1 is working to form a complete picture that 
  
66 
 
will ignore any evidence to the contrary (Kahneman 2011, 194, 200, 249, 261; 
Benson, Curley, and Smith 1995, 1648; Hubbard 2010, 221–22). 
 
2.3.2 “Your Overconfidence Is Your Weakness” (Marquand 1983) 
Despite their drawbacks, Kahneman does point out that heuristics can be 
useful in guiding initial responses (Kahneman 2011, 151).  When these heuristics 
dominate the discussion, however, it can result in stereotyping which Kahneman 
describes as, “…statements about the group that are (at least tentatively) accepted as 
facts about every member.” (Kahneman 2011, 167)  Beyond the obvious social 
implications, that basic description implies that in everyday decision making, people 
are depending on their past experiences in one case to describe what will occur every 
time a similar situation occurs (Kahneman 2011, 173; Ariely 2009b, 211, 2009a, Loc 
3590, 3607).   
 In System 1’s efforts to create a plausible story to explain the world around it, 
it will find an explanation that is most readily available.  When things turn out well, 
the most ready explanation it that the expert was wise beyond his years.  When things 
turn out poorly, it is due to incompetence.  Kahneman warns, however, that we as 
humans tend to give ourselves too much credit.  We ignore the elements that are 
outside our control and attribute everything to the skill sets (or lack thereof) of those 
involved in solving the problem (Kahneman 2011, 13, 199–204, 207).  Narratives 
such as these become engrained in the psyche, as does the belief that the situation was 
under better control than it actually was. (Kahneman 2011, 209–11)  When 
performing a project post-mortem, or even when the next similar project comes 
  
67 
 
around, System 1 searches its database to create a story that will quickly and easily 
explain the outcome.  The simplest story is that the outcome was driven mostly by the 
decisions of the decision makers or experts in the groups.  As the old saying goes, 
“Hindsight is 20/20” and because System 1 is searching for the simplest explanation, 
it begins to assign causes to the outcomes based on the benefit of hindsight 
(Kahneman 2011, 217; Mumpower 1996, 197).  As Silver puts it, “…we are seeing 
signals in the noise” (Silver 2012, 7).   
 In our efforts to explain the world around us, research suggests that we have 
placed too much faith in the abilities of the experts.   Multiple studies, several of 
which are discussed in Meehl (Meehl, 1954, pg 83-128), have shown expert 
predictions do not offer significant advantages over outcomes predicted by simple 
formulas.  (Kahneman 2011, 218–19, 222–23; Dawes 1979, 573; Ruland 1978, 441; 
Silver 2012, 51–52; Roebber and Bosart 2014, 554; Dawes and Corrigan 1974, 96–
97; Meehl 1954, 119).  A formula can typically outperform a human because it is not 
subject to the biases that plague humans.  A formula is programmed to respond a 
certain way at a certain threshold, so it will process information in the same way 
every time (Kahneman 2011, 240).  Humans, however, have a tendency to over-think 
the problem and try to accommodate for the nuances of each different situation 
(Kahneman 2011, 223).   
Despite the fact that the formulas typically outperform the experts, the 
formulas still require inputs from the experts.  One  good measure of an expert is 
whether or not more information produces better predictions (Silver 2012, 99–100; 
Heath and Gonzalez 1995, 308).  Unfortunately, studies showed that experts did not 
  
68 
 
increase in accuracy with more information, but their confidence in their estimates 
increased (Heath and Gonzalez 1995, 310; Tsai, Klayman, and Hastie 2008, 98, 102; 
Hubbard 2010, 225–26).  Because System 1 continually insists that our intuitions are 
correct, they become entrenched to the point that to question the belief is to question 
competency (Kahneman 2011, 45; Silver 2012, 216). When confronted with 
dissenting opinions, one study found that participants actually increased their 
confidence in their estimates as opposed to decreasing it in the face of opposing 
information.  They proposed that the reason for this was because participants had to 
prepare mental counterarguments for each conflicting viewpoint which convinced 
them even more of the rightness of their beliefs.  The study also showed that while 
these interactions increased the confidence of the participants, they did not increase 
the accuracy of their decisions.  This effect has been termed the “inference 
certification hypothesis”, where new information is treated as a confirmation of 
previously evaluated information as opposed to actual new information   (Heath and 
Gonzalez 1995, 305–6, 310–11, 317, 321; Tversky and Kahneman 1974, 1125–26; 
Ariely 2009b, 53–54).   
Another study went on to show that the more information an expert had 
available, the higher the confidence rating, even if the accuracy of the assessment did 
not improve.  They asserted that humans are simply not capable of accurately 
processing large amounts of information, although they believe they are.  Their 
research confirmed the opinion that an expert should be defined by his ability to 
successfully interpret relevant cues while discarding irrelevant information.  They 
went a step further to show that experts are capable of recognizing relevant cues, but 
  
69 
 
they struggle with assimilating those cues to create a more accurate representation of 
the situation, similar to a computer reaching its processing capacity and being unable 
to accept new input (Tsai, Klayman, and Hastie 2008, 98,100-102) .  
 Experts also had a tendency to try and explain away their mistakes with a 
series of excuses.  It was never the expert’s fault, just the circumstances surrounding 
the prediction.  (Kahneman 2011, 218–19; Tetlock 2005, 132).  Tetlock went on to 
describe two different types of experts, based on the work of Berlin:  the hedgehogs 
and the foxes.  Hedgehogs are exceedingly confident in themselves and their ability.  
They make bold predictions and cannot fathom the thought that they could be wrong.  
The foxes are more careful.  They include a wider margin in their predictions and 
their predictions are not as drastic (Tetlock 2005, 72, 80; Berlin, Hardy, and Ignatieff 
2013, 15, 50–53).  A fox’s star may not shine as brightly because his predictions will 
be more tempered, but he also does not have as far to fall when his predictions are 
wrong (Kahneman 2011, 191; Tetlock 2005, 84–85).   
 Humans tend to think optimistically about things they are personally involved 
in (Ariely 2009b, 180; Hubbard 2009, 37).  Kahneman and Tversky applied this 
tendency to the scheduling context by identifying another bias for the optimists of the 
project world.  In what they termed the “planning fallacy”, this bias causes project 
planners to provide a “most likely” estimate that is more descriptive of the “best case” 
scenario than the “most likely” scenario (Kahneman 2011, 246, 249; Buehler, Griffin, 
and Ross 1994, 366).  Buehler et al. noted that in practice, the planning fallacy was 
not as apparent when estimators were providing durations for other people’s work.  It 
was hypothesized that these outsiders were not personally invested in the outcome of 
  
70 
 
the estimate and therefore had less need to prove their capability to complete a task 
quickly.  It was also pointed out that outside-estimators could blame the people 
performing the work for failing to try hard enough while the people doing the work 
will blame circumstances beyond their control (Buehler, Griffin, and Ross 1994, 368, 
378).  Estimators exhibit overconfidence by not accounting for potential setbacks.  
(Kahneman 2011, 86, 258–60; Brenner et al. 1996, 212; Önkal et al. 2003, 177).  That 
baseline may be a very detailed list of things which must happen, but it will most 
likely end by just creating the list without creating a probability tree of potential 
failures.  They short themselves on time estimates and when the issues and setbacks 
invariably occur, there is no contingency to mitigate those delays (Kahneman 2011, 
251–52, 255).  In some cases, Kahneman maintains that this fallacy is driven by the 
same desire seen in the GAO reports:  project teams know that optimistic schedules 
are more likely to get approved, and once the project is approved, it is much harder to 
shut the project down despite any issues.  (Kahneman 2011, 250).  This behavior is 
reflective of another fallacy referred to as the “sunk cost” fallacy where project team 
members focus more on what has already been invested than the potential benefits of 
finishing the project (Kahneman 2011, 344–45).  The defense against the planning 
fallacy is to pay attention to past projects of a similar nature.  Actual durations 
provide a much better baseline than biased personal assessments.  Actual durations 
will include all the challenges, setbacks, and excluded activities that have been 
encountered in the past.  The fact that many, even experts, do not pay attention to 
these historical data points is known as base-rate neglect and will be discussed further 
in Section 2.4 (Kahneman 2011, 251–52; Tversky 1975, 164). It has been suggested 
  
71 
 
that one reason for the neglect of base rates is that estimators struggle to match the 
current situation with past experience.  There is always just enough variance in the 
present case to make it “special” and therefore incapable of being compared to the 
past  (Buehler, Griffin, and Ross 1994, 367). 
 There is also some research which shows that overconfidence is a problem no 
matter the level of expertise such that and that experts could be even more guilty of it 
than less experienced assessors  (Mosleh, Bier, and Apostolakis 1988, 66; Baecher 
1999, 5; Murphy and Winkler 1977, 42; Önkal et al. 2003, 176–77, 181).  In a study 
closely applicable to the present research, Mosleh et al. showed that, when asked to 
estimate the median duration of a task, subjects provided estimates that smaller than 
experience would dictate by a factor of three.  A study by the same group went on to 
show that when asked for duration estimates, subjects demonstrated a moderate 
degree of overconfidence in their estimations (Mosleh, Bier, and Apostolakis 1988, 
70, 79).  Another study showed that, once again, when asked how long an activity 
would take, those polled would fall victim to the planning fallacy and under-estimate 
the time required.  This study had the added advantage of polling for continuous time 
estimations of familiar events as opposed to point-estimate almanac-type questions 
(Buehler, Griffin, and Ross 1994, 367).  In another study conducted by Buehler et al. 
an optimistic bias in providing estimates was again discovered.  Most respondents to 
this study failed to complete their tasks by their “most likely” estimates and fewer 
than half finished before their “worst case estimate”.  In variations of this study, 
Buehler et al. discovered that these optimistic biases did not seem to be solely driven 
by a desire to appear able to complete the task quickly.  Even in cases where the 
  
72 
 
participants were unaware of the intent of the experiment, the bias still manifested 
itself.  One suggested reason for this bias is that people are typically in a success-
oriented mindset when planning.  Obstacles to success are much more difficult to 
imagine than the simple and clear path.  They maintained that estimators must be 
made to associate past experiences with present prediction problems, or they will not 
make the connection (Buehler, Griffin, and Ross 1994, 369, 371–72, 376).  Hubbard 
went on to point out that experts can be guilty of the confirmation bias when it comes 
to assessing their confidence levels.  Without maintaining accurate records of the 
results of their assessments, experts will rely on their memory to determine whether 
or not their decisions have been good ones, and their memories confirm that their 
decisions have usually been good (Hubbard 2010, 225).  Hubbard’s assertion falls in 
line with other research on human behavior since, as Ariely points out, people tend to 
look favorably on their own performance and since, as Kahneman points out, there is 
a tendency to only remember things that confirm a person’s already-held opinion.   
 
2.3.3 “Your Faith in Your Friends Is Yours” (Marquand 1983) 
Because System 1 wants all data to confirm its initial impression, it is no 
surprise that people seek out others who will confirm their beliefs. When enough 
people band together these beliefs get reinforced and an us-vs.-them mentality is 
established (Kahneman 2011, 217; Ariely 2009b, 216; Silver 2012, 3; Heath and 
Gonzalez 1995, 323).  When one group has had a bad experience with another group, 
the affect heuristic comes into play, and the divide becomes even stronger.  As 
members of the same group provide back each other up with more evidence of the 
  
73 
 
duplicity of the other group, the distrust basically enters a feedback loop which is 
very difficult to escape (Ariely 2009b, 258, 265, 268).  In some of his experiments, 
Ariely showed that the distrust had become so engrained that participants in the study 
were incapable of recognizing true statements from a distrusted source (Ariely 2009b, 
261). 
 When members of each group consider themselves experts, it stands to reason 
that members of the other group are considered “lay-people” who do not have the 
knowledge necessary to make an accurate decision.  Unfortunately for the experts, 
however, research has shown that experts are not immune to the biases previously 
described (Kahneman 2011, 140). Beyond the biases already discussed is another bias 
unofficially termed the “Not Invented Here” bias (Ariely 2009a, Loc. 1443).  In this 
case, ideas and concepts developed outside the group are considered less valid than 
the solutions proposed by the group (Ariely 2009a, Loc. 1467, 1571, 1608; Ruland 
1978, 441).  
 If there are multiple groups who consider themselves experts, what then, are 
the possible sources of difference driving the experts?  Hammond attributes these 
disagreements to one of three causes:  incompetence, venality, and ideology.  
Incompetence can be described by a baseline lack of knowledge or a judgment based 
on misinformation.  Venality describes decision based not on data, but on acting in 
one’s best interest.  Ideology could be described by the confirmation bias developed 
by Kahneman and Tversky and combines with base-rate neglect to cause experts to 
ignore their statistical accuracy rate and base their confidence on cases where they 
remember being correct (Mumpower 1996, 193–94; Buehler, Griffin, and Ross 1994, 
  
74 
 
368; Hammond 1996, 272, 290; Tetlock 2005, 40; Tversky and Kahneman 1974, 
1124; Kahneman 2011, 80–81; Einhorn and Hogarth 1978, 399–401, 413–14). 
Mumpower, however, maintains, that in some cases, differences are due less to 
incompetence or malice and more because people simply perceive information 
differently from one another.  Differing assessments could be the result of different 
ways of organizing the available information, priming, differences in assessments of 
the relevance of the cues, and differences in personal biases.  The priming effect 
describes the case where information received first drives the perception of future 
information (Kahneman 2011, 52). Mumpower goes on to say that a further source of 
expert disagreement is because experts are not assessing the situation that “is”, but are 
assessing what they believe the result of the situation ought to be.  When these beliefs 
differ, the resulting assessments will differ  (Mumpower 1996, 195–96, 201).  Bram 
goes a step further and points out that in the case of management, personnel are 
making the best decisions they can based on the information provided, but in some 
cases, that information may be skewed to a positive light.  No one wants to give the 
boss bad news, so it is feasible that reports provided to management may not 
accurately represent problems or challenges for fear of being perceived as ineffective 
(Bram 2011, 95, 112). 
2.3.4 Options for Overcoming Bias 
Given all of the biases, differing opinions, and general pitfalls of assessing the 
probability of an uncertain event, it seems unlikely that any one person will have all 
of the necessary tools to produce a valid judgment (D. V. Lindley, Tversky, and 
Brown 1979, 147).  Differences in the order in which the data was received can affect 
  
75 
 
which memories are brought to mind first and cascade information can relegate 
important information to background clutter.  Mumpower states that poor feedback or 
poor/missing information can affect an expert’s ability to provide a valid assessment.  
Differing personalities, analytical styles, and social norms can also affect how 
difference experts process data (Mumpower 1996, 196, 203, 205–6; Surowiecki 2005, 
37) .  Each individual stakeholder, though, may possess new information or a 
different perspective about the problem.  These perceptions can also be influenced by 
previously held beliefs.  As Ariely points out, a previous belief can strongly influence 
the perception of a present stimulus (Ariely 2009b, 204, 206).  The literature of expert 
aggregation maintains that a group assessment will provide a more accurate 
assessment than any one person’s estimate under specific circumstances (Kane 1995, 
63; Surowiecki 2005, 4–10; D. V. Lindley, Tversky, and Brown 1979, 149; Budescu 
and Rantilla 2000, 373–74; Sniezek and Henry 1990, 77–78).  There is some research 
to suggest, however, that expert collaboration does have it limits and will eventually 
start decreasing the accuracy of the decisions being made (Hubbard 2010, 225).  One 
suggested aggregation option is to aggregate the opinions of teams as opposed to the 
opinions of individuals.  This allows different groups to interact with one another to 
incorporate different knowledge areas.  Each team input is treated as if they were 
developed by a single entity which is then aggregated as it would be if the estimate 
had truly come from a single person.  This method increases the chances of cancelling 
out biases while still allowing team members to feed off each other’s knowledge base.   
They point out, however, that informal, non-structured elicitation sessions may not 
result in the mathematically rigorous results that are desired in such assessments, 
  
76 
 
typically because experts are relying on heuristics. (Mosleh, Bier, and Apostolakis 
1988, 72, 78–79; Surowiecki 2005, 10)  Obtaining the opinions of multiple 
stakeholders with different viewpoints can help ensure that all perspectives are 
considered without looking the group into one particular point of view.  (Hubbard 
2009, 47–48; Surowiecki 2005, 29; Winkler 1968, B-71) 
 If humans are incapable of fully processing all available data, then having 
several experts examine the problem would seem to alleviate that issue.  Experts 
should be able to pick out relevant cues, but people pick out cues differently based on 
how they are hardwired and in what order they receive the information.  Bringing 
experts together results in a better overall judgment. (Tsai, Klayman, and Hastie 
2008, 100–103; Surowiecki 2005, 29; Sniezek and Henry 1990, 77–78). 
Before aggregating expert judgment, one must first gather the judgments of 
several experts.  There is a wide field of study on the elicitation of expert judgment 
with many different methods proposed.  Several of these methods propose a 
regimented process where an analyst works with the subject matter expert to help him 
refine his assessments and point out inconsistencies.  Benson et al. also point out that 
even when a decision analyst works with an expert, it can be very difficult to 
overcome the biases exhibited by the expert (Benson, Curley, and Smith 1995, 1640, 
1642).   There are some elicitation methods, however, that provide better results than 
others, including asking the expert to actively assess cases that contradict his point of 
view or to break down the problem into more smaller, more easily assessable 
estimates (Mosleh, Bier, and Apostolakis 1988, 64, 67). As in the single expert case, 
there is evidence to support that aggregation formulas will outperform consensus 
  
77 
 
gathering methods where the possibility of one stakeholder dominating the 
assessment is much greater  (Mosleh, Bier, and Apostolakis 1988, 67–68).  Mosleh et 
al. maintain that proper elicitation methods in a group setting can protect the 
stakeholders agreeing with each other simply for the sake of agreeing (Mosleh, Bier, 
and Apostolakis 1988, 81). 
2.3.5 Loss and Risk Aversion 
While not a bias per se, humans do tend to exhibit aversions to certain things 
which could negatively impact their lives.  Two aversions which can have an effect 
on scheduling estimations are loss aversion and risk aversion.  Loss aversion refers to 
the desire to avoid the perception that one has taken a step backward from the status 
quo, no matter what that status quo may be (Ariely 2009b, 172, 177).  Risk aversion 
is a tendency to invest extra resources to avoid falling victim to certain risks.  In 
utility theory, this behavior is represented by a utility curve that is concave at all 
points (Kahneman and Tversky 1979, 264; Raiffa 1968, 68).  Risk-seeking behavior, 
on the other hand is represented by a convex region of the utility curve (Raiffa 1968, 
94–97).  Tversky discovered that for many people, behavior near in the tails of the 
probability spectrum takes on what he termed a “certainty effect.”  This effect 
describes the human tendency to take disproportionately large measures to ensure that 
a small risk probability will drop to zero and, conversely, take these same large 
measures to make a highly probable opportunity a certainty (Tversky 1975, 167; 
Kahneman and Tversky 1979, 265).   
 Ariely points out that this concept of loss aversion applies not only to things, 
but also to beliefs and opinions (Ariely 2009b, 60–61, 177).  Earlier in this 
  
78 
 
discussion, it was pointed out that the harder one works for an idea, the more 
ownership one takes.  The more one feels one owns an idea, the more personally one 
will take any criticism of the idea due to the perception of losing face (Ariely 2009a, 
Loc 1328, 1518).  In a scheduling context, merging the concepts of loss aversion and 
risk aversion could be thought of as actions taken which help the participant save 
face.  For example, if a manager is responsible for ensuring a project finishes on time 
and on budget.  In order to save face, the manager may take actions to ensure the 
project is completed within the allotted time and budget.  According to the certainty 
effect, this indicates that actions taken will be for the purpose of reducing a risk with 
negligible probability to zero probability or an opportunity with high probability to a 
certainty.  Kahneman and Tversky pointed out that the standard utility curve does not 
account for this certainty effect and that the effect must be considered from a 
reference point.  The function below this reference point will be concave, indicating 
risk aversion and the desire to minimize the risk of losing the status quo, while the 
function above this point will be convex, indicating risk-seeking behavior that will 
increase the probability of gaining an advantage above the status quo.  They also 
pointed out that this curve tends to be, “steeper for losses than for gains”, indicating 
once again that humans will work harder to prevent negative changes to the status quo 
than they will to produce positive changes  (Kahneman and Tversky 1979, 274, 277–
79; Tversky and Shafir 1992, 307; Ariely 2009b, 174, 2009a, Loc 3770; Kahneman 
2011, 348).  Ultimately, this results in an S-shaped utility curve where the status-quo 
is the point of reference about which the curve revolves (Tversky and Wakker 1995, 
1255).  Once a decision maker’s utility curve is established, it can be used by a 
  
79 
 
different person as a point of reference for how a decision maker would respond in 
different circumstances (Raiffa 1968, 69–70).  Dawes and Corrigan maintain that 
linear models can provide a similar capability and provide a general guideline for 
how a manager can be expected to respond in certain situations.  These rules help 
mitigate the effects of primacy and other mental cues, since an average behavior of 
the manager will be comprised of multiple situations that were managed under 
different sets of information (Dawes and Corrigan 1974, 100–102).   
 
2.4 Experts as Data in a Bayesian Model 
From the previous sections, while experts do provide a wealth of knowledge, 
they are subject to both personal and environmental biases.  Past experiences, 
political beliefs, and even the order in which information was received can affect the 
estimating process.  The project manager is then faced with two choices:  pick the 
estimate of one stakeholder and base the schedule on that stakeholder’s opinion or 
aggregate the inputs of all available stakeholders into one final number that can be 
used in a network schedule.  If the project manager chooses the second option, an 
aggregation method must be selected. 
From the literature, there appear to be two popular methods for expert 
aggregation.  The first is a weighting scheme where opinions of different experts are 
weighted according to how much faith the decision maker places in the expert (West 
and Crosse 1992, 286; Winkler 1986, 300).  Winkler provides several suggestions for 
weighting methodologies which can be applied to the distributions provided to the 
Decision Maker (DM) by the experts.  These methods include both subjective 
  
80 
 
assessments regarding the abilities of the expert and assessments based on 
performance data (Winkler 1968, B63–64).   
The second method involves the application of Bayes Theorem.  In this 
model, a decision maker forms a prior probability based on the information she 
already possesses.  According to Bayes’ Theorem, a DM makes an initial probability 
assessment about an unknown variable based on the information she has available to 
her.  This assessment is referred to as the “prior” and essentially describes the point at 
which a person is willing to take action based on their assessment (Silver 2012, 255; 
Chaloner and Duncan 1983, 174; Simon French 1980, 45; Mumpower 1996, 198–99; 
Baecher 1999, 3).  Experts are then treated as if they were data points by which a DM 
(DM) can update her beliefs (Morris 1977, 680; Clemen 1987, 373–74; S. French 
1986, 315; Morris 1974, 1235; Winkler 1986, 299; Morris 1983, 24; D. V. Lindley, 
Tversky, and Brown 1979, 146; Savage 1971, 797; Simon French 1980, 44; D. V. 
Lindley 1982, 117–18; Roberts 1965, 52–58; Simon French 1985, 188–91).  The 
concept is that there is a “truth” to each unknown variable and that a DM can 
approach the truth by gathering more and more data and updating her beliefs based on 
that data  (Silver 2012, 232, 241; Benson, Curley, and Smith 1995, 1644–45, 1647).  
Another advantage to the Bayesian method is that it allows for the messiness of the 
real world and the potential for human biases through the use of a prior distribution 
(Silver 2012, 252–55).  It also accounts for how difficult it can be to gather real-world 
data by allowing for a constant updating process when new information is obtained 
(Silver 2012, 409).   
  
81 
 
As new data is received, the DM updates her beliefs resulting in a new 
probability assessment referred to as the “posterior.”  Formally, this can be described 
mathematically using Equation 2-7 (“Bayes’ Theorem” 2017; Gelman et al. 2013, 7; 
Dennis V. Lindley 1983, 1; West and Crosse 1992, 285–86; Winkler 1981, 479; S. 
French 1986, 315).  
𝑝𝑝(𝜃𝜃|𝑦𝑦) =  𝑝𝑝(𝜃𝜃)𝑝𝑝(𝑦𝑦|𝜃𝜃)
𝑝𝑝(𝑦𝑦)                                                        Eqn 2-7 
 
In this method, each expert provides an assessment on an unknown variable, 
θ, based on his previously held knowledge, preferably before talking to any of the 
other experts (Silver 2012, 245).  According to Morris and Gelman, for a continuous 
variable, the variance of the expert’s assessment can be used as a gauge for the 
certainty the expert has in his estimate where a smaller variance suggests higher 
certainty (Gelman et al. 2013, 32; Morris 1977, 688).   
Earlier in this chapter, it was mentioned that humans are prone to fall victim 
to a fallacy described as base-rate neglect  (Kahneman 2011, 88; Brenner et al. 1996, 
217)  The base-rate is a measure of the number of entities in a particular class in 
relation to the entire population.  In a Bayesian context, this is the p(y) term from 
Equation 2-7 (Kahneman 2011, 146; “Bayes’ Theorem” 2017).  According to 
Kahneman, there are two types of base rates:  statistical (which describes overall 
populations) and causal (which describe information about a specific case).  He 
maintains that people will typically ignore the statistical base rate when given 
information that is specific to the situation (Kahneman 2011, 166–67).  He also 
maintains that in the midst of uncertainty and the absence of any other data, the base 
  
82 
 
rate should rule the day (Kahneman 2011, 152).  In a scheduling context, a study by 
Buehler et al. showed that observers (those outside the project) were much more 
likely to use the statistical base rate of the actor’s past performance than the actors 
(those involved directly in the project).  The actors in this study seemed to ignore 
their past performance in favor of anticipated future performance.  As part of this 
study, Buehler et al created a condition where they attempted to get their subjects to 
associate past performance with future results.  Although participants still resisted the 
association, bringing up past struggles did seem to cause the participants to at least 
consider some of the challenges which they could face (Buehler, Griffin, and Ross 
1994, 367, 376, 379) 
Given the human tendency of ignoring base rates and basing probability 
assessments on memory, it becomes necessary to calibrate the human assessment 
machine to account for biases.  Just as with any measuring instrument, when 
compared to a “truth” source, if the measurement is off, the results must be calibrated 
(e.g. a thermometer that always reads ten degrees higher than the actual temperature).  
Calibration refers to the ability of a forecaster to make accurate forecasts (Weiss and 
Shanteau 2014, 105, 109; Morris 1983, 24).  In a forecasting context, when a forecast 
predicts that there is an “X” probability of an event happening, over an extended 
period of time given the same circumstances, that even should happen X% of the time 
(DeGroot and Fienberg 1983, 13; D. V. Lindley, Tversky, and Brown 1979, 147; 
Morris 1983, 24).  For a continuous variable, a similar concept applies, except that 
instead of a point estimate, the estimate provided should correspond with the 
appropriate point in the cumulative distribution function (CDF) (Morris 1983, 28–29).  
  
83 
 
Based on this assessment method of calibration, it is possible for forecasters to “game 
the system” by providing estimates which, over time, allow them to appear well 
calibrated.  (DeGroot and Fienberg 1983, 14)  Weiss and Shanteau maintain that in 
order to be properly calibrated, a probability assessor must demonstrate both 
discrimination (recognizing changes to the situation) and consistency (providing the 
same assessment in the same situation)  (Weiss and Shanteau 2014, 109).  Onkäl et al. 
agree with the assessment of discrimination and add the requirement to apply the cues 
provided in the situation.  (Önkal et al. 2003, 179). 
From Section 2.3.2 and a study conducted by Alpert and Raiffa (Alpert and 
Raiffa 1982, 294–305), research has shown that humans tend to be overconfident in 
their probability assessments.  It has been suggested that this trait can be used as a 
marker for calibration of experts when requesting probability assessments.  (Mosleh, 
Bier, and Apostolakis 1988, 71)  It has also been shown that training can help 
mitigate some of this overconfidence and that training can help people become 
overall better assessors of probability (Mosleh, Bier, and Apostolakis 1988, 80; 
Dennis V. Lindley 1983, 9; D. V. Lindley 1982, 125; Selvidge 1980, 502; 
Lichtenstein, Fischhoff, and Phillips 1977, 294,316-317).   
Knowing that humans are subject to some form of bias, according to Morris’ 
method, the DM will first establish her prior, based on her past experiences and the 
data available to her at the time (D. V. Lindley, Tversky, and Brown 1979, 146).  
Each expert will also individually establish his prior based on this same information 
and provide that information to the DM (Morris 1977, 680).  If empirical calibration 
data is available, the DM can use that information to modify the expert’s prior to 
  
84 
 
more accurately reflect his ability to perform a probability assessment.  If empirical 
data is not available subjective calibration is an alternative option (S. French 1986, 
316; Morris 1983, 24, 1986, 327, 1977, 688–89).  Using subjective calibration, the 
DM encodes her beliefs about the expert’s ability as a probability assessor by 
determining the likelihood that the actual value will fall within the tails of the expert’s 
prior distribution (Morris 1974, 1235, 1977, 691).  For independent experts, the 
posterior distribution is then described by the prior of the DM multiplied by the 
calibrated priors of the experts.  This calibrated prior is effectively a likelihood 
assessment describing the likelihood that the expert will provide that particular 
probability assessment, given the revealed value of the unknown variable.  If the DM 
believes that the expert overstates his level of knowledge, she can modify the prior 
such that the variance is wider (Winkler 1981, 482).  If she believes he understates his 
knowledge, she can modify the prior such that the variance is smaller.  The degree of 
modification reflects her belief in the expert’s ability as a probability assessor.  The 
final result is a single posterior assessment that provides a decision maker with a 
recommended course of action based on Bayesian principles of probability (Morris 
1977, 679; Silver 2012, 327). 
 One of the major concerns with any expert aggregation problem is the 
dependence of the experts.  Statistical independence is difficult to achieve in the 
expert aggregation case due to a variety of factors including similar expert 
backgrounds and review of similar data (Clemen 1987, 373; Winkler 1968, B-65; D. 
V. Lindley 1982, 120; Winkler 1981, 480; Clemen 1986, 313; Winkler 1986, 302; 
Morris 1974, 1236, 1983, 24; Clemen 1987, 374-375; 378-379; Harrison 1977, 320).  
  
85 
 
The concern derives from the fact that the posterior distribution will change based on 
the level of dependence of the experts (Clemen 1987, 374; Winkler 1981, 487).  
Morris also points out that the simple multiplicative rule described in his 1977 paper 
(Morris, 1977) is only applicable under certain conditions, one of which is 
independence.  In cases of dependence, a more complex method is required (Morris 
1986, 322).  He goes on to say, however, that in a case where the expert is treated as 
“…a measurement instrument measuring data from a physical experiment”, the 
multiplicative rule is appropriate (although still requiring modification when 
modeling dependence) (Morris 1986, 325; D. V. Lindley, Tversky, and Brown 1979, 
149).   
One specific concern with respect to dependence is that subjective calibration 
of the experts can never be truly independent of the decision maker performing the 
calibration (S. French 1986, 320; Clemen 1986, 313–14; Winkler 1986, 299, 302; 
West and Crosse 1992, 287, 291; Genest and Schervish 1985, 1198; Simon French 
1980, 43–46; Schervish 1984). Morris points out, however, that the expert-decision 
maker dependence problem may not be a major issue.  He maintains that in most 
cases, the expert will have significantly more information than the decision maker or 
that the decision maker will refrain from expressing an opinion to prevent swaying 
the judgment.  In both cases, Morris maintains that the degree of dependence then 
becomes negligible (Morris 1986, 326). 
 A second concern is dependence not on the decision maker, but on the shared 
backgrounds of the experts.  It is highly unlikely that personnel from the same work 
area involved in the same project will not be privy to the same information, training, 
  
86 
 
or experiences.  From earlier in this chapter, it was seen that perceptions can differ 
based on various factors (e.g. time of receipt of data, order of receipt of data, mood 
when data was received, etc.).  The actual data, however, will probably be the same, 
even if it is processed differently by each expert.  These shared backgrounds and data 
sets complicate the Bayesian updating process due to, as Clemen and Morris phrased 
it,  “double counting” the data (Clemen 1986, 314; Morris 1986, 324; Schervish 1986, 
309; Simon French 1980, 46).  The DM cannot update her prior with the expert’s 
information if they both are using the same information.  This does not provide the 
DM with any new data, only a rehash of the data she already has (T. R. Johnson, 
Budescu, and Wallsten 2001, 137–39; Budescu and Rantilla 2000, 374).  While this is 
certainly an issue for further exploration, for the purposes of this research, it is 
assumed that the experts are all statistically independent of one another.  
 
 
 
 
  
87 
 
Chapter 3: Methods and Materials, Data Collection 
 
 Determining differences in the priorities and estimating practices of project 
stakeholders required a diverse set of both projects and stakeholders.  As an 
operational facility, NASA’s WFF provided both of these conditions.  Once the data 
were collected, it was organized and masked to the best extent possible.  Masking was 
required to protect the anonymity of the research subjects to the best extent possible.  
The chapter begins by describing the surveys used to collect data from the subjects 
who consented to participate in the research, followed by a description of how these 
inputs were processed and consolidated.  Following this is a description of the 
analysis performed on the processed data, as well as a high-level description of the 
Design of Experiments (DOE) factorial analysis method used to analyze the data.  
The chapter concludes with a description of how the DOE process can be used to 
predict the response of an estimator within a particular demographic and a description 
of a proposed estimate aggregation method.   
 
3.1 Data Collection 
 Data were collected from on-going projects planned, managed, and executed 
by personnel working at NASA’s WFF.  Projects were selected based on whether or 
not they were active during the data collection period and whether or not they had 
timelines which seemed to be a source of conflict among the stakeholders.  For 
example, given the nature of the work at WFF, certain projects, although still projects 
  
88 
 
by definition, have become routine in their execution and the durations for each 
activity seem to be known and accepted.  In these cases, the schedule is mostly driven 
by the project documentation process, so project team members are given the due date 
and are left to take the actions required to meet that due date.  Other projects, 
however, involve more planning and are subject to more scrutiny from project 
stakeholders.  These projects often leave stakeholders asking, “why does it take this 
long” or “why is the schedule so compressed”, depending on the job and perspective 
of the stakeholder.  The data collection period for this research project was 
approximately 26 March 2014 – 20 March 2015 and the projects selected typically 
fell into one of three categories: operations, maintenance, and engineering.  
Operations is defined as a project which is directly tied to launch campaign support, 
including set up and testing of all equipment.  Engineering is defined as an effort to 
implement new or upgrade existing technical systems.  Maintenance is defined as 
actions taken to ensure the operational systems remain in good working order.  While 
maintenance is technically not a project since it has no defined beginning or ending, 
certain aspects of the maintenance performed at WFF lend themselves to the project 
definition (PMI 2013, para. 1.2). 
Initial project selection was made by determining which projects were either 
currently in work or were just entering (or just about to enter) the planning phase.  
Personnel assigned to the project were contacted individually using “Recruitment E-
mail” (Appendix A.1).  If the person responded that he/she was willing to participate, 
a meeting was set up to complete the consent form required by the Institutional 
Review Board (IRB) and to provide instruction on filling out the various surveys.   
  
89 
 
Four surveys were created:   
1) Traits/Opinions Survey, 
2) Scheduling Survey, 
3) Follow-on Survey, 
4)  Course of Action Survey. 
3.1.1 Traits/Opinions Survey 
The first survey was called the “Traits/Opinions” survey (Appendix A.2) and 
was used to gather basic demographic information about the participant as well as get 
his/ her opinion on managing project constraints and attitudes toward risk.  This 
survey was created using GoogleDrive™ and could be accessed using a website 
provided to the participant.  Once the participant completed the survey, the results 
were consolidated into a separate web page.  During the consent process, each 
participant chose a random three-digit number that would be used for identification 
when filling out this survey.  This number was stored on an iPad ™ using a program 
called SafeNote™.  When filling out the Traits/Opinion survey, each participant 
would begin by filling in this random 3-digit number.  All responses of the participant 
would then be associated with that number during data compilation.  This number 
was then transferred to a spreadsheet which associated the name of the participant 
with the number selected.  Given that the scheduling surveys were submitted through 
e-mail, this “linking” spreadsheet was critical to associating the results supplied by   
e-mail (identified by the sender) to the results supplied in the GoogleDrive™ survey 
(identified by the random 3-digit number). 
  
90 
 
3.1.2 Scheduling and Follow-on Surveys 
The Scheduling Survey and Follow-on Survey (Appendix A.3 and A.4, 
respectively) were administered over e-mail.  These two surveys were project specific 
and allowed subjects to: state how much time they thought should be allotted for each 
activity of the project, if they believed adequate resources were assigned, if they 
believed the activity list accurately described the scope of the project, and provide an 
estimate of the overall project duration.  The activity list was developed with the 
assistance of a knowledgeable technician or manager directly involved with the 
project.  Once the activity list was populated, the survey was e-mailed out to 
appropriate members of that project team who had consented to participate in this 
study.  For the technicians, activity lists were sometimes broken down by different 
task sections so that subjects only had to provide estimates on applicable project tasks 
(as opposed to all tasks in the project).  Managers were asked to complete estimates 
for a full project task list, since typically a manager is expected to oversee all 
activities within a project (PMI 2013, para. 1.7).  All subjects were asked to complete 
and return the surveys prior to starting any of the activities on the list so that the 
values provided would be estimates and not a recording of what had already 
happened.  A “Follow-on” survey was also e-mailed along with the “Scheduling 
Survey” to each subjects directly involved with project execution.  This Follow-on 
Survey allowed subjects to record the actual durations of each of the listed activities 
on the Scheduling Survey, as well as describe major changes in team size, work 
hours, or availability that occurred while executing the project.  This survey also 
allowed the participant to comment on challenges encountered during the project. 
  
91 
 
Section 1.5 mentioned that data from three types of projects were collected for 
this study (operations, engineering, and maintenance).  Due to the close-knit nature of 
the community at Wallops Flight Facility, the projects discussed in these results are 
not divided by type to help maintain the anonymity of the subjects who provided data 
for this research. 
When collecting estimates from various subjects, organizational strategies and 
project risk tolerances were not explicitly accounted for in their affects on the 
estimation process. Subjects were asked to provide scheduling estimates based on 
their understanding of project requirements, while also factoring in their 
understanding of possible constraints both within the project and other organizational 
commitments.   
3.1.3 “Course of Action” (COA) Survey 
The COA survey (Appendix A.5) delved into the question of different 
perceptions of “risk-mitigation” versus “gold plating”, where “gold plating” is 
defined as going over and above a requirement to provide unnecessary capability or 
performance.  This survey was developed and uploaded to GoogleDrive ™, and the 
address of the survey was e-mailed to each of the subjects.  This survey was intended 
to be anonymous, so no survey identifiers were collected.  After initially clicking on 
the link, subjects were asked to choose “management” or “technician” based on the 
choices they picked in the Traits/Opinions survey.  This selection directed the 
participant to a new page where the primary question was phrased in such a way to 
make it applicable to the participant (whether management or technician).  The 
primary question involved a generalized scenario stating that a piece of equipment 
  
92 
 
was within specifications but just barely.  It went on to say that fixing the equipment 
would cause a schedule delay.  Two questions were then posed to each participant:  
should they fix the equipment or leave it alone and:  would they consider taking the 
extra time to fix the equipment risk mitigation or gold plating (see Appendix A.5 for 
the exact verbiage).  A final question on this survey asked subjects to describe why 
they believed projects in general fell behind schedule. 
 
3.2 Data Processing 
 This section describes the methods used to organize and aggregate the raw 
data collected from the surveys.  It also describes the concepts behind the survey 
questions described in Section 3.1 as well as some challenges encountered during the 
data collection process. 
3.2.1 Categorizing the Subjects 
 Results from the “Demographics” portion of the Traits/Opinions survey were 
categorized by three demographics which were then broken down into different 
levels.  The demographics chosen were: position (technical or management), years of 
experience (0-7, 8-15, 16-23, 24+), and completed level of formal education (high 
school, associate’s/technical, bachelor’s, master’s).   
The Position demographic was labeled as an “M” or a “T”.  Subjects were 
categorized as “management” if they were primarily responsible for managing 
personnel or project constraints.  Subjects were categorized as “technical” if they 
were primarily responsible for completing the technical work required to achieve the 
technical project objective.   
  
93 
 
The Years of Experience (YoE) demographic was labeled as a number 
between one and four (with “1” being 0-7 years of experience and “4” being 24+ 
years of experience).  The Level of Formal Education (LoE) as either an “H”, “T”, 
“B”, or “M” corresponding to the levels of education described above.  For example, 
a manager with 0-7 years of experience and a master’s degree would appear as 
“M1M”.   Given that some subjects shared the same demographic traits (i.e. there was 
more than one manager who had 0-7 years of experience and a master’s degree), it 
was necessary to assign another designator to distinguish all of the subjects.  Table 
3-1 summarizes the demographics and their representations. 
 
Demographic Levels 
Position M: Management 
T:  Technical 
Years of 
Experience 
1:  0-7 years 
2:  8-14 years 
3:  15-23 years 
4:  24+ years 
Level of Formal 
Education 
M: Master’s 
B:  Bachelor’s 
T:  Associates/Technical 
H:  High School 
Table 3-1: Demographic Identifiers 
As results from the Traits/Opinions surveys came in, each respondent’s selected 3-
digit number was converted to a second, randomly selected number (to help protect 
anonymity).  This number was then linked to the appropriate demographics 
designator which was linked to the responses from the other surveys in this study.  
This process ensured each participant was assigned a demographic category while 
still ensuring that results of the other surveys were associated with the correct 
participant. 
  
94 
 
When collecting data for this research, training in the field of project 
management was not directly taken into consideration.  Subjects were asked to 
provide their level of completed formal education, but were not asked specifics about 
the field of study.  Participants were also asked to provide the number of years of 
experience in their field, but were not asked if any of this experience was related to 
technical training, management in general, or specifically project management.  
 
3.2.2 Risk Tolerance 
As a measure of risk behavior, participant responses to various gambles were 
used to create a utility curve (Dennis V. Lindley 1983, 2).  Based on a method 
developed by Raiffa, in a situation with two outcomes, one considered a “win” and 
one considered a “loss”, a decision maker would be handed a “basic reference lottery 
ticket” (brlt).  This ticket contained a value between zero and one, and represented the 
probability of winning.  This probability was represented by the variable “π”.  The 
probability of losing was “1- π”.  This “π” designator is the reason the general name 
of the “ticket” was referred to as a “π –brlt” (Raiffa 1968, 57–60).  Although there are 
many uses for the π-brlt, especially in decision trees involving non-monetary values, 
this research focused primarily on their use in developing utility curves for those who 
did not behave according to the principles of Expected Monetary Value (EMV) 
(Raiffa 1968, 61–65).  EMV is calculated by multiplying a monetary value (whether 
positive or negative) by the probability of “winning” said monetary value.  This will 
result in a straight line passing through the points [xmin,0] and [xmax, 1]  (Raiffa 1968, 
8-9,66-67). When a decision maker feels that the EMV line is either too aggressive or 
  
95 
 
not aggressive enough, a similar procedure can be used to determine a new curve that 
more accurately represents the decision maker’s risk thresholds (Raiffa 1968, 51–53).  
This curve is referred to as the π-indifference curve and is created by plotting the π-
brlt value on the y-axis and the monetary value at which a decision maker will trade 
the chance of winning for actual money on the x-axis.  Starting with [xmin, 0] and 
finishing with [xmax, 1], points in between these two values describe the shape of the 
curve (Raiffa 1968, 66–67).  
The general case of the π-indifference curve is referred to as the utility curve.  
Since these values are used for decision makers who do not follow an EMV line, the 
π-brlt value provides a conversion point which can be used in a decision tree to 
determine the best course of action.  In this context, the π-brlt value can be referred to 
as the utility of the monetary value in question (i.e. the point at which a decision 
maker is indifferent between on-the-table money and the probability of winning the 
full prize).  The curve is described by πi = u(xi), where “u” is the shape of the utility 
curve, and πi is the value of the curve evaluated at a given monetary value, xi  (Raiffa 
1968, 86–89).  The shape of this curve can provide an indication of the risk tolerance 
of a participant.  A concave curve indicates risk aversion while a convex curve 
indicates risk-prone behavior.  The more pronounced the curve, the more risk-
averse/risk-prone the participant (Raiffa 1968, 68–70, 94).  Using the example survey 
provided to subjects in this survey, an example of the three basic risk attitudes (EMV, 
risk averse, and risk prone) can be seen in Figure 3-1.  Figure 3-2 shows two subjects 
who demonstrate different levels of extremity for both risk averse and risk prone 
behavior.  
  
96 
 
 
Figure 3-1: Example Basic Utility Curves 
 
Figure 3-2: Example Risk Averse and Risk Prone Behavior 
In order to model the risk tolerance of the subjects, a scenario was presented 
wherein the participant was handed an hypothetical piece of paper that gave him/her a 
certain probability of winning $5000.  The participant was then asked how much 
“cash on the table” would be required to trade the chance at winning $5000 for the 
immediate cash-out (see “Risk Tolerance” in Appendix A.2).  The percentages 
provided were: 10%, 35%, 50%, 68%, and 87%.  These percentages were chosen 
such that they would cover a wide spectrum of probabilities of winning, but not at 
  
97 
 
such easily calculated intervals that they would lend themselves to choosing utility.  
Given the extreme curvature of some of the responses (discussed further in Chapter 
5), only the responses at the 50% probability were analyzed.  Risk aversion was 
measured by distance from the EMV value of $2500 (i.e. $5000*(0.5)).  To the left of 
this point, risk aversion increased as the trade-in value decreased.  To the right of the 
EMV point, risk aversion decreased as the trade-in value increased (Raiffa 1968, 68–
69). 
Once the subjects submitted their answers, an Excel ™ table was developed 
where the first column listed out the probabilities of winning and the second column 
listed the EMV monetary value (Raiffa 1968, 8–9).   The following columns listed 
responses from each participant where each response was listed in the row 
corresponding to the appropriate probability of winning (see Appendix A.7).  Once 
this table was established, the Excel ™ graph function was used to determine the 
curve of each of the participant’s response in relation to an EMV utility curve (a 
straight line passing through [0,0; 0.5, 2500; 1, 5000]) (see Figure 5-1 through Figure 
5-5).  The monetary values represented by the participant’s responses served as the x-
axis, while the probabilities of winning served as the y-axis.  The format of the lines 
themselves (dots, dashes, etc.) were based on the 3-digit number assigned to the 
participant and were used to help distinguish one line from another.  These lines were 
then plotted using the “connect the dots” feature in Excel ™. 
3.2.3 Constraint Preference 
One of the suppositions in this research is that stakeholders, based on their 
demographics, have different priorities when determining which project constraints 
  
98 
 
(PMI 2013, para. 1.3) are more important than the others.  The hypothesis was that 
certain demographics will tend to sacrifice performance on one particular constraint 
for better performance in another preferred constraint. 
To test this theory, a set of questions was developed which set four of the 
major project constraints against one another (Appendix A.2, “Preference Analysis), 
• Cost 
• Schedule 
• Quality 
• Risk 
Based on concepts derived from the Analytic Hierarchy Process (AHP), the 
survey pitted two constraints directly against one another and then asked the 
participant to choose which one should be sacrificed for the sake of the other (e.g. 
subjects were asked to choose between having an increased cost or an increased 
schedule on any given project).  (Winston 2003, 785–91; Mantel Jr. et al. 2004, 5–6; 
PMI 2013, para. 1.3) 
Responses to each question in this survey resulted in each of the four chosen 
constraints being compared to one another.  In their responses, subjects were asked to 
write down which option they chose, as well as a preference between one and nine, 
indicating how strongly they preferred their choice (Winston 2003, 787).  In this 
scale, a “1” response meant that the participant was indifferent between the two 
options and a “9” meant the participant strongly preferred one option over the other.  
The other numbers represented various levels of preference between those two 
  
99 
 
extremes (Zio 1996, 129).  A scale was provided on the survey itself that explained 
the meaning behind each number on the scale.     
Once the subjects’ responses were received, a grid was developed in Excel ™ 
to compile the results.  For each participant, a 4x4 matrix was created which listed the 
four project constraints in the same order down the rows and across the columns (see 
Table 3-2 for a partially populated example matrix).  The cells along the diagonal 
were populated with a “1” since a constraint cannot be compared to itself.  The 
participant’s preference ranking was then placed in the appropriate cell where the two 
constraints under consideration intersected.  The reciprocal of this value would be 
placed in the reciprocal row and column.  For example, the first question in the 
survey asked if the participant preferred an increased cost or an increased schedule.  
If the participant responded, “Increased Cost, 7” indicating he had a moderate 
preference for increased cost over increased schedule, then a “7” would be placed at 
the intersection of the “Cost” row and the “Schedule” column.  Conversely, a “1/7” 
would be placed in the “Schedule” row and the “Cost” column (see Table 3-2).  This 
process was repeated for each constraint comparison and ultimately resulted in each 
constraint being compared to all the others. 
 
XXX Cost Schedule Quality Risk 
Cost 1 7   
Schedule 1/7 1   
Quality   1  
Risk    1 
Table 3-2: Example Preference Matrix 
Once the entire grid was filled out, a weight for each constraint was calculated 
using Equation 3-1 (Winston 2003, 788).  This weight represented a subject’s 
  
100 
 
willingness to sacrifice that constraint for the sake of achieving success in meeting 
the other constraints.  For example, based on the responses of the subject, if the 
schedule constraint had a calculated weight of 0.42 and the quality constraint had a 
calculated weight of 0.12, that indicates the subject is more willing to sacrifice the 
schedule for improved quality than he is to sacrifice quality for a faster completion 
time. 
 
𝑊𝑊𝑖𝑖 = ∑ 𝑎𝑎𝑎𝑎𝑎𝑎∑ (𝑎𝑎𝑎𝑎𝑎𝑎)𝑛𝑛𝑎𝑎=1𝑛𝑛𝑎𝑎=1 𝑛𝑛                                                            Eqn 3-1 
 
where “i” is the row number, “j” is the column number, and “n” is the total number of 
constraints under consideration.  See Table 3-3 for a fully populated example matrix 
and its resulting calculated weights.  The generalized form of this table can be seen in 
Table 3-4. 
 
 Cost Schedule Quality Risk Wi 
Cost 1     4     4     6     0.55 
Schedule  1/4 1     4     4     0.27 
Quality  1/4  1/4 1     2     0.11 
Risk  1/6  1/4  1/2 1     0.07 
Table 3-3: Example Matrix 
 
 
 
 C constraint 1 Constraint 2 … Constraint n Wi 
Cost a11 a12 … a1n W1 
Schedule a21 a22 … a2n W2 
Quality … … … … … 
Risk an1 an2 … ann Wn 
Table 3-4: Generalized AHP Matrix 
  
101 
 
Equation 3-2 calculates the consistency of the participant preferences.  The 
result of this equation is referred to as the Consistency Index (CI).   
𝐶𝐶𝐶𝐶 =  ⎝⎛∑
∑ 𝑎𝑎𝑖𝑖𝑖𝑖∗𝑊𝑊𝑖𝑖=𝑖𝑖𝑛𝑛𝑖𝑖=1
𝑊𝑊𝑖𝑖
𝑛𝑛
𝑖𝑖=1
𝑛𝑛
 
⎠
⎞  −  𝑛𝑛
𝑛𝑛−1                                                  Eqn 3-2 
Based on a table provided in Winston, because there were four preferences 
under consideration, a Random Index (RI) value of 0.90 was selected  (Winston 2003, 
788–89).  The CI value was then divided by the RI value to determine the ratio of 
CI/RI.   According to Winston, if this ratio is less than 0.10, then the participant can 
be considered “consistent” and their preferences considered valid (Winston 2003, 
789).  For this research, the consistency for each participant was only calculated as a 
point of interest to determine whether or not the subjects were behaving in a 
consistent manner.  Consistency is a measure of comparison among each of the 
constraints.  If a participant weights Constraint B twice as much as Constraint A, and 
Constraint C four times as much as Constraint A, then, by definition Constraint C 
should be weighted twice as much as Constraint B.  If not, then the participant is not 
consistent in his preferences.  The ratio of CI/RI provides a measurement of the 
consistency of the participant.  A perfectly consistent matrix will result in a CI of zero  
(Winston 2003, 786).  Appendix A.8 provides the compiled list of project weights and 
consistency ratings for each participant. 
3.2.4 Schedule Survey Data  
 Responses to the scheduling surveys were compiled using Excel™ 
spreadsheets where participant numerical/demographic designators were listed in the 
  
102 
 
same row as their responses (Appendix A.9).  Projects that had multiple independent 
parts were split up and treated as separate projects, except in one case where the 
subjects for the two separate parts were all the same; those responses were merged 
and treated as one “project”.  Data provided by each participant consisted of a “most 
likely” (ML) estimate describing how long the participant believed the activity should 
normally take, a “best case” (BC) estimate describing how long it should take if 
everything went well, and a “worst case” (WC) estimate describing how long it 
should take if everything went poorly.  Subjects were also asked to provide an 
estimate of how confident they were in their ML estimate (further discussion on this 
to be provided in Chapter 5).   
 Question 5 on the Scheduling Survey (Appendix A.3) asked subjects to 
provide either a completion date or a recommended start date for the project, 
depending on whether or not the completion date for a given project had already been 
set.  To further clarify, at WFF, some projects are given a completion date and the 
team must work to be ready by said completion date, so the schedule is effectively 
built by first conducting a “backward pass”.  Other projects are more traditional in 
that they would be built using the “forward pass” first in order to determine the 
completion date.  This question was an effort to compensate for the fact that an actual 
network schedule with activity relationships was not developed.  It was hoped that a 
final completion date would give an idea of the total expected duration of the project 
regardless of how the activities were related to one another within the project itself.  
Unfortunately, for various reasons, surveys for each project were not necessarily sent 
or completed at the same time (e.g. time of participant consent, operational 
  
103 
 
commitments, etc.) and the format of responses varied among subjects, requiring 
several assumptions regarding the intent of the response.  For these reasons, these 
results were not included in data analysis or Appendix A.9. 
3.2.5 Follow-on Survey  
The Follow-on Survey was intended to compare the estimates provided by 
each participant to the actual duration of each activity.  Unfortunately, only a few 
follow-on surveys were returned.  Of the responses received, it would have been 
challenging to confirm the reported values.  In some cases, the reported values differ 
among project subjects, indicating that the perception of what constituted task 
completion may not have been standard across all project team members.  Because of 
these challenges, results from the Follow-On surveys were not included in data 
analysis or in Appendix A.9. 
3.3 Data Analysis – Characterization 
 The following section describes the methods used to characterize the subjects 
in this study.  It describes the method used to compare project constraint preferences 
and differences in total project duration estimates, without regard to demographics. It 
goes on to describe several questions that were developed to compare differences in 
the responses of stakeholders of varying demographics.  It then describes the method 
used to analyze the results of the project constraint questions, as well as the method 
used to analyze the confidence and risk aversion levels of the subjects with respect to 
their stated demographics.  It provides a high-level description of the Design of 
Experiments (DoE) process used to analyze the data.  Table A-1 through Table A-5 
  
104 
 
are provided to allow for the recreation of the experiments conducted here using 
DesignExpert™ and the data found in Appendix A.7 through Appendix A.9.  Those 
tables show the inputs that were used in that program to set up the experiment prior to 
analysis.  Following this section is a description of the method used to determine the 
correlation between personality traits and project estimating practices. 
3.3.1 Constraints Analysis – by Constraint 
After developing the normalized matrix for the project constraints as 
described in Section 3.2.3, results for each participant were organized by constraint 
such that all “cost” weights from all subjects were grouped together, all schedule 
weights, etc.  The average weight for each constraint was then calculated.  To 
determine whether or not the differences seen in the average weights among each 
constraint were statistically significant, a simple t-test was performed using the Data 
Analysis package on Excel™.  The “t-statistic with unequal variances” was selected 
with H0: μ1 = μ2, and alpha = 0.05.  If the one-tailed p-value was below 0.05, the 
difference in the averages was considered statistically significant.   
3.3.2 Network Path Standard Deviation 
 Within each project, the total duration estimates for each participant were 
compared to determine whether or not there were differences in the way different 
stakeholders estimated duration times. It was hypothesized that if, given the same 
information, all stakeholders in a project were in agreement about how long the 
project should take, the standard deviation among the project duration estimates 
would be zero.  Because network schedules were not built for these projects, each 
  
105 
 
project is assumed to have a single path where each activity begins upon the 
completion of the previous activity.  Given that assumption, the total duration of the 
project is the sum of each activity’s PERT average and is represented by the variable 
Te.  Once Te was calculated for each participant on each project, the standard 
deviation of Te among each of the subjects per project was calculated.  Demographics 
of the subjects were not considered in this part of the analysis.  A histogram plot was 
created using Excel™ to display the results.   
3.3.3 Comparison Questions 
The three baseline estimates provided by each participant were further 
analyzed to determine whether or not certain demographics exhibited similar 
estimating trends.  For each of the three demographics considered, eleven questions 
were developed.  Questions 1-7 and 10 were derived from the Scheduling survey.  
Questions 8, 9, and 11 were derived from the Traits/Opinions survey.   
Table 3-5 lists out the questions for the Position demographic.  For the YOE 
demographic, the questions remain the same, but the word “management” is replaced 
with “fewer years of experience” and the word “technician” with “more years of 
experience”.  The same is true for the LOE demographic, where “management” is 
replaced with “more formal education” and technician with “less formal education”.  
The questions were answered by comparing the results of each of the surveys.  The 
YOE and LOE demographics each had four levels of comparison as opposed to the 
Position demographic which only had two.  If a certain project had members of only 
one level of the demographic in question, that survey was excluded from that group 
(e.g. if a project had two respondents, both of which fell under the “management” 
  
106 
 
category, that project was excluded from the Position group because there would be 
no point of comparison on that demographic).   
Question 1: Is management's total project duration estimate Te (based on PERT) lower than 
the technician's? 
Question 2:  For all demographics, is the separation between the ML and BC estimates 
smaller than the separation between the ML and WC estimates? 
Question 3: Does management have a smaller separation between the ML and BC estimates 
than technicians? 
Question 4: Does management have a smaller separation between the ML and WC 
estimates than technicians? 
Question 5: Is management's ML estimate higher than the technicians’? 
Question 6: Is management’s BC estimate higher than the technicians’? 
Question7: Is management’s WC estimate higher than the technicians’? 
Question 8: Are management personnel less risk averse than technicians’? 
Question 9: Is management’s variance smaller than technicians’? 
Question 10: Is management’s confidence greater than technicians’? 
Question 11: Is management’s willingness to sacrifice schedule less than technicians’? 
     
Table 3-5: Comparison Questions 
 
 For each of the questions in Table 3-5, the Excel ™ “IF” command was used 
in one of the following ways, depending on how the question was asked: 
=IF(X<Y, “Yes”, “No”)        or       =IF(X>Y, “Yes”, “No”). 
 In the command above, “X” and “Y” are represented by the responses 
provided by each participant in each project.  For projects with only two subjects, a 
simple comparison was performed for each project activity using the formula above.  
When a project had more than two subjects each participant was compared to the 
others in its group, as long as they did not share the same demographic.  For example, 
for a project where the respondents consisted of two managers and two technicians, 
responses from Manager 1 would be compared to Technician 1 and then the 
procedure would be repeated with Technician 2.  Manager 2 would then be compared 
  
107 
 
to Technician 1 and then to Technician 2, with each comparison providing a different 
data point.  This procedure ultimately resulted in comparing each participant with 
every other participant in that project, as long as they did not share the same 
demographic.   
 Each “Yes” answer was tallied along with the total number of responses to 
determine the total percentage of times the data supported the premise of the 
questions seen in Table 3-5.  To determine the statistical significance of the results, 
RStudio™ was used to conduct a binomial test by using one of two commands listed 
below (R Core Team 2014):        
      binom.test(#of successes, # of tests, 0.5, alternative = “greater”) 
binom.test(#of successes, # of tests, 0.5, alternative= “less”) 
The null hypothesis was that if all stakeholders behaved the same, each group should 
agree with the premise of the questions in Table 3-5 about 50% of the time.  For 
example, all things being equal, 50% of the time management should have a higher 
total duration estimate and 50% of the time technicians should have a higher total 
duration estimate (i.e. 0.5 in the RStudio™ command).   For those cases where the 
number of “Yes” answers divided by the total number of answers was greater than 
0.50, the binomial test determined whether or not the true probability of success (a 
“Yes” answer) was greater than 50%..  For cases where it was less than 0.50, the test 
determined whether or not the true probability of success was less than 50% 
(“Binomial Distribution” 2016; Gelman et al. 2013, 29–32).  Statistically significance 
was defined as p<0.5 at a confidence level of 95%.   
  
108 
 
3.3.4 Design of Experiments 
 In several of the sections below, results are analyzed using Design of 
Experiments (DoE) using the DesignExpert™ software.  DOE allows an experimenter 
to monitor how certain factors affect the response of a system and which of those 
factors has a significant part in driving the response  (Montgomery 2008, 162–64).  
For example, if a machine had three levers, each with a high and low setting, an 
experimenter could run eight experiments, testing the response of the system at each 
combination of each lever.  These results could then be analyzed to determine which 
of the three levers, if any, had the most significant effect on the performance of the 
machine.  The experimenter could then run each of these experiments again to 
confirm the results.  These runs would be referred to as replicates and they help 
ensure that the results are not due to random error or factors outside of the study.  
Repeat measurements are measurements that are taken within the same experimental 
run  (Montgomery 2008, 12–13).  The advantage of the DOE process is that it 
minimizes the number of runs necessary to determine which, if any, of the factors are 
significant.  In the above example, this would mean that the experimenter would not 
necessarily need to complete all eight experiments to determine the significant factor.   
Each participant was treated as an experimental run and his or her 
demographics represented the different settings of the human machine.  Responses to 
several surveys were then gathered and analyzed to determine how the different 
demographic settings changed the response of the human machine. Different subjects 
within the same demographic category were treated as replicate measurements.  
When a participant provided responses for multiple surveys of the same type, the 
  
109 
 
responses were treated as repeat measurements and were averaged to create one 
overall response for that participant (Montgomery 2008, 607). 
Although humanity is a machine with myriad factors, this research concerned 
itself with only three:  position (management or technical), years of experience 
(YoE), and level of formal education (LoE).  The position factor had two levels, 
while YoE and LoE had four settings each.  The breakdown of these factors is 
described in Section 3.2.1.  The goal of this part of the research was to poll several 
subjects, each representing different levels of the three factors and determine which, 
if any of those factors was driving the differences in response.  This process also 
predicted the expected response of someone outside the study who shared the same 
demographic factor.   
Analysis on the data for this research was conducted using a mixed factorial 
design, specifically a 2k and 4k factorial design.  A 2k factorial design studies the 
effects and interactions of “k” number of factors, each with two settings:  high and 
low. A 4k factorial design is a modification of the 2k design (Montgomery 2008, 162, 
207, 382–85).  According to the process, the experimenter would determine the 
factors he wished to study and develop a run-sheet where he systematically altered 
each factor between its high and low settings.  Once completed, the run sheet would 
represent all possible combinations of all possible settings for all factors.  The sheet 
would then be randomized to reduce the chance of a hidden variable affecting the 
results  (Montgomery 2008, 12). The experiments would then be completed with the 
run-sheet telling the experimenter how to configure the system for each experimental 
  
110 
 
run.  Results from each run would then be recorded in the results column of the 
appropriate row and the entire sheet would be analyzed when completed.   
For this research, the collected results were effectively randomized by the fact 
that subjects were independent of one another and were polled after signing the 
consent form.  Constructing experiments using this method allows researchers to 
determine not only the effects of each chosen factor upon the results using the 
minimum number of experiments, but also if interactions among the factors have an 
effect (Montgomery 2008, 4, 162–64). 
Two of the factors under study had four levels instead of two levels (YoE and 
LoE), requiring a slightly modified version of the 2k design referred to as a mixed 
level design.  To accomplish this, a method called replacement was used where each 
level of the four-level factor was described using two replacement variables.  This 
effectively converts the 4-level factor into two 2-level factors.  Each level of the 4-
level factor is represented by two replacement variables.  The replacement variables 
are set at their low value for the first level of the 4-level factor, at their high level for 
the fourth level of the 4-level factor, and at alternating high/low settings for the two 
levels in between.  This effectively turns the mixed design into a higher-order 2k 
design.  For example, in this case, if each of the three factors had been at two levels, 
this research would have used a 23 design which would require eight runs for a full 
factorial analysis.  Because two of the factors were at four levels for this research, the 
total number of runs increased.  Both of the 4-level factors (YoE and LoE) were 
treated as if it were two 2-level factors, plus the actual two level factor (the position 
demographic), effectively resulting in five 2-level factors.  To test each combination 
  
111 
 
of each factor, a 25 design would be needed which would required 32 experimental 
runs (Montgomery 2008, 382–85). 
 For a full factorial design, upon completion, the run-sheet would show the 
response of the system at every possible factor combination.  In order to reduce the 
number of runs required, the experimenter could also use a fractional factorial design.  
In this case, the number of required runs can be reduced based on the principle of 
sparsity of effects.  This principle states that responses are most likely to be driven by 
the main effects and/or the lower-order interactions.  When reducing the design, terms 
become aliased with on another where the result of that factor level combination is 
described by the main effect plus its aliased factors (interaction factors which share 
the same sign as the main effect).  In these cases, the sparsity of effects principle 
states that the result is most likely being driven by the main effect and the higher-
order interactions can be ignored (Montgomery 2008, 290–93).  There are several 
methods available to reduce the number of runs required to successfully analyze the 
data (Montgomery 2008, 380, 450–53).   
Because this research used a mixed design of both 2- and 4- level factors and 
did not have enough data points for a full factorial design, the DesignExpert™ 
software was used to create a D-optimal design using a coordinate exchange 
(Montgomery 2008, 380).  A D-optimal design creates a run-sheet where the selected 
experiments to be completed will minimize the variance of the regression coefficients 
of the model (Montgomery 2008, 254, 336, 452).  This is accomplished by 
performing matrix multiplication on a matrix created from the factor level settings of 
each selected combination on the run-sheet by its transpose and then maximizing the 
  
112 
 
determinant of that matrix (Montgomery 2008, 253–54).  The coordinate exchange is 
an algorithm that searches the design space based on the initial parameters selected by 
the experimenter (i.e. the factors, their levels, and the estimated standard deviation of 
the response) for the best combination of experimental runs that meet the D-
optimality criteria (Montgomery 2008, 453, 456–57).   
The Design Expert™ software sets the minimum number of runs which must 
be completed in order to successfully analyze the data, but options for extra 
experimental runs are left to the experimenter’s discretion based on the availability of 
data.  Additional runs beyond those required by for basic analysis help improve the 
accuracy of the predicted results and increase the power of the analysis (Montgomery 
2008, 12; DesignExpert (version 9.0.6.2) 2015, n. Help File).  Design Expert™ then 
creates a run-sheet with the desired number of experimental runs based on the input 
from the experimenter and adhering to the requirement to minimize the variance of 
the regression coefficients.  If the design is orthogonal, the coefficients are 
independent of one another and all have equal variances.  If the design is not 
orthogonal, the coefficients could change based on the selection of the model terms 
(DesignExpert (version 9.0.6.2) 2015, n. Help File).  The run-sheets created by 
DesignExpert™ using this method created combination of factors that were not 
available in the data set (e.g. a manager whose highest level of formal education was 
high school).  Because of this, modifications to the run-sheet were required which 
meant that the design no longer met the optimality criteria, but could still be used for 
analysis.   
  
113 
 
 Once the experiment was designed and the results from the subjects were 
matched to the appropriate lines in the run-sheet, a predicted response could be 
developed using the regression model in Equation 3-3 (Montgomery 2008, 164). 
𝑦𝑦 =  𝛽𝛽0 +  𝛽𝛽1𝑥𝑥1 + 𝛽𝛽2𝑥𝑥2 … + 𝛽𝛽𝑖𝑖𝑥𝑥𝑖𝑖 + 𝛽𝛽12𝑥𝑥1𝑥𝑥2 + ⋯𝛽𝛽1∗…∗𝑖𝑖𝑥𝑥1 ∗ … ∗ 𝑥𝑥𝑖𝑖 +  𝜀𝜀         Eqn 3-3   
where y is the expected response, β0 is average of all responses collected (also known 
as the intercept), βi is the coefficient of the factor, and xi is a coded variable 
representing one of the factors in the experiment.  The first part of the equation 
represents the main effects of the actual factors, the second part of the equation 
represent interactions between the factors (e.g. 𝛽𝛽12𝑥𝑥1𝑥𝑥2 is a term describing the 
results of an interaction between Factor 1 and Factor 2), and ε is a term to cover the 
random error.  The “x” terms in Equation 3-3 are derived from the desired level of 
each factor.  For this research, each factor was qualitative, so the coded variables are 
set to either -1 or 1 to represent the “high” and “low” levels of each factor.  
DesignExpert™ represented the 4-level factors by using replacement variables whose 
different settings represented the required four levels.   
 In order to calculate the β coefficients for the main effects, the sum of the 
results when the main effect is at the “low” level is subtracted from the sum of the 
results when the main effect is at the “high” level and then divided by 2  
(Montgomery 2008, 163, 208–9).  The interaction coefficient is calculated by 
summing the results obtained for one factor when another factor is at its low level and 
subtracting that value from the summation of the first factor when the second factor is 
at its high level, and then taking the average.  The magnitude and direction of the 
resulting coefficients will provide a relatively good indication of whether or not the 
  
114 
 
effect or interaction is driving the result.  If the term does not have a significant effect 
on the final result, it can be removed from the equation, leaving only those terms that 
do significantly affect result (Montgomery 2008, 164, 210; DesignExpert (version 
9.0.6.2) 2015, n. Help File).  
 The error term from Equation 3-3 can be used to determine whether or not the 
model is a good fit and conforms to the basic assumption that the errors in the model 
are, “…normally and independently distributed with mean zero and constant but 
unknown variance”.  These errors are analyzed by determining the residuals of the 
model (Montgomery 2008, 75). The residuals are calculated by solving for Equation 
3-3 at each factor setting from the run-sheet and subtracting that value from the actual 
response recorded for that setting.  For experiments with replicates, this process 
would be repeated with each replicate (Montgomery 2008, 213–14).  These residuals 
are then plotted versus several comparison criteria to check the assumption of 
normalcy and to ensure that there are no significant factors hidden within the error 
term.  A correct model with no hidden factors in the error term will show a “fat-
pencil” straight line in the normalcy test and a random scatter on the remaining tests.  
In the event of non-constant variance, several transformations are available to 
mitigate this effect.  The DesignExpert™ program automatically provides a 
recommended transform if it detects an issue (Montgomery 2008, 75–83). 
 In order to determine whether or not a factor is statistically significant, 
analysis of variance (ANOVA) using the sum of squares and mean squares is used.  
The sum of squares for each factor is calculated by squaring the contrast (i.e. the sum 
of the results of all observations at the high setting of that factor subtracted from the 
  
115 
 
sum of all the results of all observations at the low setting of that factor) divided by 
total number of observations multiplied by the square of the contrast coefficients (i.e. 
the value of the xi variable for that particular factor setting, either -1 or 1).  For 
orthogonal designs, the sum of squares of the contrast coefficients will be equal 
across all main effects and interactions.  This will not be the case for non-
orthogonality.   
The total sum of squares is calculated by first summing together the results of 
each factor combination treatment, squaring that sum, and dividing it by the square of 
the contrast coefficients multiplied by the total number of experiments.  This value is 
then subtracted from the result obtained by squaring each result in the experiment 
summing them together.  The sum of squares of the error is calculated by subtracting 
the sum of squares of each factor and interaction from the sum of squares of the total.  
The remainder is the sum of squares of the error (Montgomery 2008, 70, 87–90, 208–
12).  The mean square error is calculated by dividing the sum of squares for the 
model/factor/interaction in question by the total number of degrees of freedom used 
by that model/factor/interaction (Montgomery 2008, 67, 223).  The F-statistic is then 
calculated by dividing the mean square of the model/factor/interaction in question by 
the mean square of the error of the model.  The p-value can then be calculated from 
this F-statistic to determine the statistical significance of the model/factor/interaction 
in question (Montgomery 2008, 69). 
 The DOE method in general, and the factorial design in particular allows an 
experimenter to not only determine whether an effect is statistically significant, but 
also allows him to make statements about the response of a population based on 
  
116 
 
sample data collected (Montgomery 2008, 213).  By determining the response of a 
system based on the levels of certain factors, an experimenter can produce the desired 
results by only setting the system at the levels which produce the desired results.  In a 
human context, these experiments provide a reference point for what to expect from 
personnel at various levels of the different demographics. 
3.3.5 Constraints Analysis/Risk Aversion – by Demographic 
 In Section 3.2.3, a procedure was described to compare project constraints to 
one another with no respect to the demographics.  This section looks at each project 
constraint individually and analyzes the weights assigned by each participant with 
respect to the three demographics described in Section 3.2.1.  This is done to 
determine if certain demographic traits affect how subjects view that particular 
constraint.  It also describes the process used to determine whether or not any 
demographic factors had a significant effect on a participant’s risk tolerance (i.e. 
utility).   
 For the project constraints (cost, schedule, risk, and quality), a total of 36 data 
points were available from the 36 weights obtained for each constraint calculated 
using Equation 3-1.  Thrity-eight data points were available for the risk-aversion 
analysis.  Table A-1 in the Appendix describes the parameters used to configure 
DesignExpert ™ for the project constraints model.  Table A-2 in the Appendix 
describes the parameters used for the development of the risk aversion model.   
DesignExpert™ then produced a matrix of 36 model points comprised of 
various levels of the three demographic settings.  Normally this program is used to 
help inform the collection of data points, where an experimenter would set each factor 
  
117 
 
to the level recommended by DesignExpert™ , and then observe and record the 
response.  In this case, the data had already been collected, so the design matrix was 
modified to accurately reflect the demographic levels of the pool of subjects.  For 
each constraint, the weights calculated using Equation 3-1 (see Appendix A.8) were 
then entered into the “results” column of the design matrix to allow the program to 
perform its analysis.  
3.3.6 Confidence Analysis 
 As part of the Scheduling survey, each participant was asked to provide a 
“most likely” estimate for each activity, as well as their confidence in that estimate 
(see Appendix A.3 for the verbiage used in the survey).  For each project, the average 
of each participant’s confidence levels across all activities in that project was 
calculated.  This average was interpreted as the confidence in the total project 
duration for that participant.  Because some subjects provided estimates for more than 
one project, a second average consisting of all responses from that participant was 
calculated.  This was necessary to avoid “repeat measurements” which would affect 
the DOE process (Montgomery 2008, 13).  This resulted in each participant having 
one confidence value that represented their confidence in their estimating ability.   
DesignExpert™ was again used in the same manner as described in Section 
3.3.4.  Design parameters used for this analysis are listed in Table A-3.  A total of 26 
data points were used in this model.  The total number of responses is different from 
the previous analysis because not all subjects who provided responses in the 
Traits/Opinions survey provided responses to the Schedule survey. 
  
  
118 
 
3.3.7 Correlating the Results 
 Five questions were developed to study whether or not personality traits (e.g. 
confidence and risk tolerance) and project constraint preferences had any bearing on 
the final duration estimates provided by the participant.  The questions and rationale 
behind the questions are listed in Table 3-6.  The “C” after each question is to identify 
it as a separate list from the eleven questions in Table 3-5. 
Question # Question 
1C Are confidence levels and standard deviation negatively correlated 
(i.e. a lower confidence results in a larger standard deviation)? 
 
Rationale: Each participant was asked to give their confidence level 
on their “most likely” estimate.  It was effectively asking “how 
confident are you that the activity will take exactly the time you 
listed as your most likely estimate”.  Based on this, it would seem the 
more confident one was in the “most likely” estimate, the smaller the 
standard deviation would be since there would be less need to 
compensate for uncertainty. 
 
Method:  Each activity had its own standard deviation and 
confidence, providing a 1-to-1 comparison.  A correlation coefficient 
was calculated for each participant on each project.  The average of 
all of these coefficients was then calculated to provide one final 
value across all projects. 
2C Does higher confidence positively correlate to higher utility values 
(i.e. are people who are confident in their estimates less worried 
about risks? 
 
Rationale: Someone who has high confidence in their estimates is 
optimistic they will succeed.  A higher utility value is someone who 
believes they can beat the odds and will succeed.   
 
Method:  Calculated the average confidence value for each 
participant for each project such that each participant had one value 
for confidence and one value for utility (per project). The average of 
all of these coefficients was then calculated to provide one final 
value across all projects. 
 
 
  
119 
 
Question # Question 
3C Are utility values and standard deviation negatively correlated (i.e. a 
smaller utility value means a wider standard deviation)? 
 
Rationale:  The standard deviation for a PERT average is described 
by Equation 2-2.   Using utility as an indicator of optimism, a person 
with a higher utility value is going to be more likely to believe things 
will go well.  Once they have established a “most likely” estimate, 
the “best case” and “worst case” estimates will be closer to the “most 
likely” estimate because they will not feel as much need to “hedge 
their bet”, where the “bet” is whether or not the advertised 
completion time is correct. 
 
Method:  For each participant, calculated the variance for each 
activity and summed them together.  Took the square root of the sum 
to determine the project standard deviation (Keefer and Verdini 
1993, 1086).  This resulted in one standard deviation and one utility 
value for each participant for each project, providing a 1-to-1 
comparison.  The average of all of these coefficients was then 
calculated to provide one final value across all projects. 
4C Are utility values and Te negatively correlated (i.e. does a smaller 
utility value mean a higher Te value? 
 
Rationale:  Te is the sum of the PERT weighted average of the 
activity durations.  A smaller utility value corresponds to “risk 
aversion”.  If risk is defined as “failing to complete the project within 
the projected time”, then someone who is averse to risk will have a 
higher Te estimate to allow for things to go wrong, but still have a 
chance of finishing the project by the advertised completion time.   
 
Method:  Straight comparison.  When provided by the participant, 
each participant had only one utility value and one Te per project, 
providing the 1-to-1 comparison for each participant on each project.  
The average of all of these coefficients was then calculated to 
provide one final value across all projects. 
5C Does a higher Schedule AHP weight positively correlate to a higher 
Te value (i.e. does willingness to sacrifice schedule mean a higher 
duration estimate)? 
 
Rationale: The way the survey asked the question, a larger AHP 
value indicated more willingness to sacrifice the schedule in favor of 
preserving other project constraints (cost, risk, and quality).  
Working under the assumption that higher quality requires longer 
durations (as indicated by subjects stating they are always too rushed 
to do their jobs), someone who is ready to sacrifice the schedule for 
  
120 
 
Question # Question 
the sake of quality will provide higher duration estimates because 
they want to make sure they have enough time to do the work to their 
satisfaction.  Someone less willing to sacrifice the schedule will have 
lower PERT estimates because a project completed quickly keeps 
end users happy 
 
Method:  Straight comparison.  When provided by the participant 
each participant had only one Schedule AHP value and one Te per 
project, providing the 1-to-1 comparison for each participant for each 
project.  The average of all of these coefficients was then calculated 
to provide one final value across all projects. 
Table 3-6: Correlation Questions 
 In order to calculate a correlation coefficient for each of the comparisons 
listed in Table 3-6, for each project, two arrays were created, one with the results of 
the first value in question, the other with the second value.  The Excel™ function 
“=correl(array1, array2)” was then used to determine the correlation coefficient 
within each project of the two values in question.  An average correlation coefficient 
was then calculated across all projects.  A second correlation coefficient was 
calculated by removing all results where only two subjects provided values and re-
calculating the average correlation coefficient.  In those cases where only two 
subjects provided estimates, the results were either fully positively correlated or fully 
negatively correlated (i.e. “1” or “-1”) .  When these results were factored into the 
average, it was believed that they were unduly influencing the results.  Removal of 
these binary data points was an effort to remove this influence.  When constructing 
the arrays used in the “correl” function, if a value was missing, it was replaced with 
the letters “BLNK” to indicate that the value was blank. 
 
  
121 
 
3.4 Data Analysis - Application 
 The previous section characterized the subjects based on the responses 
provided to the various surveys and matching those responses to behaviors in relation 
to scheduling.  This section delves into the estimates themselves and compares the 
duration values provided to the demographics of the subjects.  This was done to 
determine the expected behavior of subjects within a particular demographic group.  
This section also describes a Bayesian aggregation method that uses the prior 
distributions of a Decision Maker (DM) and several experts to develop a single 
posterior distribution which can be used to in a network schedule.   
3.4.1 Participant Behavior in Estimating Durations 
 The methods described in Section 3.3.3 looked at “fact of” differences of the 
responses provided by the subjects.  They analyzed whether or not estimates for one 
particular demographic were higher than another, but they did not analyze the 
magnitude of the difference, nor were they able to determine if one particular 
demographic had the largest effect on the responses provided.  This section uses the 
three point estimates provided by the subjects to determine whether or not there are 
any trends in the estimation practices of stakeholders based on their demographics. 
 For each participant on each project, activity estimates were added together to 
get a total BC estimate, a total ML estimate, and a total WC estimate.  It can be 
shown that the total network path duration (Te) calculated by summing the PERT 
average across all activities in a network path results in the same duration as summing 
each of the three point estimates across all activities in the path and then calculating a 
PERT average using these totals (Keefer and Verdini 1993, 1086): 
  
122 
 
∑ 𝐵𝐵𝐵𝐵𝑎𝑎+
(4∗𝑀𝑀𝑀𝑀𝑎𝑎)+ 𝑊𝑊𝐵𝐵𝑎𝑎)
6
𝑛𝑛
𝑖𝑖=1   =   (∑ 𝐵𝐵𝐵𝐵𝑎𝑎𝑛𝑛𝑎𝑎=1 +∑ 4(𝑀𝑀𝑀𝑀𝑎𝑎𝑛𝑛𝑎𝑎=1 )+ 𝑊𝑊𝐵𝐵𝑎𝑎)6                Eqn 3-4 
where “n” is the total number of activities in the path.  Note that this same principle 
does not hold when calculating the variance.   
 For this section, the ML, BC, and WC values refer to the sum of those values 
across all activities, as opposed to the value for each individual activity.  To test the 
skew of the estimates, the sum of the ML estimates, the BC estimates, and WC 
estimates were calculated for each participant for each project.  A comparison was 
then made of the ratio of the separation between the ML and BC and the sum of the 
separation between the ML and BC estimate and the WC and ML estimate (see 
Equation 3-5).    
(𝑀𝑀𝑀𝑀−𝐵𝐵𝐵𝐵)(𝑀𝑀𝑀𝑀−𝐵𝐵𝐵𝐵)+ (𝑊𝑊𝐵𝐵−𝑀𝑀𝑀𝑀)                                                          Eqn 3-5 
If the values of the two terms in the denominator of Equation 3-5 are equal, then the 
ratio created by the numerator and the denominator would be 0.5, since each term in 
the denominator is contributing half of the total and the numerator consists of the first 
term in the denominator.  If the (ML-BC) value is slightly smaller than the (WC-ML) 
value, it will contribute slightly less to the overall total, thus changing the ratio (e.g. 
45% as opposed to 50%).  Smaller values of (ML-BC) result in smaller ratios for 
Equation 3-5.   
 After computing this ratio for each participant for each project, the results 
were organized as described in Section 3.3.5.  Multiple results from the same 
participant were averaged into one value such that each participant was represented 
by a single ratio, resulting in 29 data points.  These results were then entered into 
  
123 
 
DesignExpert™ to determine if there were any significant factors driving the results.  
Design parameters are listed in Table A-4.  
 These results, however, only told half the story.  Given the ML value, any 
number of BC and WC values could result in the expected values predicted by 
DesignExpert™.  To further narrow down the possible results, a second set of ratios 
was developed.  This time, for each participant for each project, the BC estimates 
were summed and ML estimates were summed to provide a single value for each of 
the two estimates in a project.  Two ratios were then calculated using Equation 3-6 
and Equation 3-7 to determine the ratio of the outlying estimate as compared to the 
sum of the outlying estimate and the ML estimate.  For Equation 3-6, a result close to 
0.5 meant that the BC estimate was relatively close to the ML estimate.  As the 
separation between the BC and ML value increased, the result of Equation 3-6 would 
get smaller and smaller.  For Equation 3-7, a result of 0.5 meant that the WC estimate 
was relatively close to the ML estimate.  Larger values indicated a wider separation 
between the WC estimate and the ML estimate.  
𝐵𝐵𝐵𝐵(𝑀𝑀𝑀𝑀+𝐵𝐵𝐵𝐵)                                               Eqn 3-6   
 
 
 
𝑊𝑊𝐵𝐵(𝑀𝑀𝑀𝑀+𝑊𝑊𝐵𝐵)                                                 Eqn 3-7   
 
The results from Equation 3-6 and Equation 3-7 were organized as described in the 
previous section.  Multiple results from the same participant were averaged such that 
each participant was associated with a single value, resulting in a total of 29 values.  
  
124 
 
These values were then entered into DesignExpert™ with the design parameters 
shown in Table A-5.  
 
3.4.2 Calculation of PERT Beta parameters 
 Chapter 2 describes how the beta distribution is widely accepted as descriptive 
of the probability distribution of activity duration times (Malcolm et al. 1959, 651).  
The probability density function (pdf) of the beta distribution is described by 
Equation 3-8:.   
𝑓𝑓(𝑥𝑥;𝛼𝛼,𝛽𝛽) = 1
𝐵𝐵(𝛼𝛼,𝛽𝛽) 𝑥𝑥𝛼𝛼−1(1 − 𝑥𝑥)𝛽𝛽−1;  0 ≤ 𝑥𝑥 ≤ 1              Eqn 3-8: 
 
where α and β describe the parameters of the function and “B” is the Beta function   
(“NIST/SEMATECH E-Handbook of Statistical Methods” 2016, 1.3.6.6.17, “Beta 
Distribution” 2016). 
 
Because the pdf of the beta function is described only on the interval of zero to one, t 
Equation 3-9 converts “x” to “t”, where “t” is the actual time estimate provided by the 
participant (Grubbs 1962). 
t = BC + (WC – BC)x                                     Eqn 3-9 
 
Using the value of “t”, the mean (Te) and the mode (ML) can be calculated using 
Equation 3-10 and Equation 3-11 (Grubbs 1962). 
Te = BC + (WC-BC)[ 𝛼𝛼
𝛼𝛼+ 𝛽𝛽]                          Eqn 3-10 
 
 
 
ML = BC + (WC-BC) [ 𝛼𝛼−1
𝛼𝛼+ 𝛽𝛽−2]                       Eqn 3-11 
                        
  
125 
 
Using a system of simultaneous equations, and assuming Equation 2-1 is a valid 
approximation of Te, these two equations were manipulated to solve for the two beta 
parameters, α and β using Equation 3-12 and Equation 3-13. 
𝛼𝛼 = � −1+2(ML−BC𝑊𝑊𝑊𝑊−𝐵𝐵𝑊𝑊)
�
ML−BC
𝑊𝑊𝑊𝑊−𝐵𝐵𝑊𝑊
�+�
WC−𝑇𝑇𝑇𝑇
𝑇𝑇𝑇𝑇−𝐵𝐵𝑊𝑊
��
𝑀𝑀𝑀𝑀−𝐵𝐵𝑊𝑊
𝑊𝑊𝑊𝑊−𝐵𝐵𝑊𝑊
�−1
�                                     Eqn 3-12 
 
 
𝛽𝛽 = �(𝑊𝑊𝐵𝐵−𝑇𝑇𝑒𝑒)(𝑇𝑇𝑒𝑒−𝐵𝐵𝐵𝐵) 𝛼𝛼�                                                    Eqn 3-13 
 
 
3.5 Duration Estimate Modeling and Expert Aggregation 
 The following sections describe the proposed new method for modeling 
duration estimates and aggregating those estimates based on the work Roberts 
(Roberts 1965) as refined by Morris (Morris 1977).  In light of some of the challenges 
described in Section 2.2.3 with the PERT beta model, Section 3.5.1 provides a 
recommendation for a new model.  It also describes the method for converting the 
three baseline estimates obtained in the PERT methodology into the shape parameters 
required to describe the new model.  Section 3.5.2 describes a method for calibrating 
estimates provided by expert stakeholders.  Section 3.5.3 describes Morris’ 
aggregation method as applies specifically to the problem of aggregating the duration 
estimates of multiple project stakeholders. 
3.5.1 Determining the Prior 
When using the opinion of one person to develop a schedule, the beta 
distribution serves as a good model for activity durations and the approximations of 
its mean and variance provide a quick way to determine the values required for 
  
126 
 
network scheduling techniques (Malcolm et al. 1959, 659).  When multiple opinions 
are available, however, the beta distribution becomes more problematic because the 
distribution is zero outside the defined range.  The method to combine these different 
estimates will be described in more detail in Section 3.5.3, but to summarize, the 
distribution functions of each estimator (the “priors”) are evaluated across a large 
number of durations ranging from the smallest estimate to the largest estimate among 
all estimators.  The results are then multiplied together to produce a new distribution 
(the “posterior”) which is a single distribution with a new mean and variance to be 
used in the network schedule.  In the case of a beta distribution, the posterior will 
have density only between duration values shared among the estimates; all values 
outside this range will drop to zero.  To resolve this issue, a distribution was needed 
that, like the beta distribution, had a single mode and was capable of handling left- 
and right-skewed data, as well as data without any skew.  Unlike the beta distribution, 
this alternate distribution had to be defined on the entire positive real number line to 
ensure it could accommodate all possible estimates.  A Type I Generalized Extreme 
Value (GEV) distribution met nearly all of these criteria; its “maximum” form 
handled the right-skewed estimates and its “minimum” form handled the left-skewed 
estimates.  A Normal distribution could model a case with no skew (“Statistical 
Distributions” 2016). 
Once a new distribution was selected, the standard ML, BC, and WC 
estimates were modeled using this new distribution.  The intent was to match the 
shape of the beta distribution as closely as possible given that it has been used as a 
model for over fifty years and, as the creators pointed out, does seem to provide a 
  
127 
 
reasonable approximation of the behavior of activity durations (Malcolm et al. 1959, 
651).  A GEV distribution is defined by three parameters:  ξ, k, and, μ where ξ 
defines the GEV Type, k is the shape parameter, and μ is the location parameter.  For 
this research ξ = 0 since a Type I distribution was used to model the data.  The μ 
parameter was defined as the participant’s ML estimate.  This left only the shape 
parameter, k, undefined.  The desire was to match the form of the beta distribution as 
closely as possible, so with the mode of the new distribution already defined, the next 
step was to define the “end points” of the distribution  (NIST 2016b, 1.3.6.6.16; 
“Generalized Extreme Value Distribution - Wikipedia” 2016).  
For most subjects’ estimates, the separation between the BC and ML estimates 
was less than the separation between the ML and WC estimates, indicating that the 
subjects were compensating more for things going wrong than assuming things would 
go right.  This group was referred to as the “pessimists” because they were planning 
for the worst.  With these characteristics, the GEV distribution was able to model the 
estimates due to the right-skew of the estimates.  In some cases, however, the 
separation between the BC and ML estimates was larger than the separation between 
the ML and WC estimates.  This group was referred to as the “optimists” because 
they appeared to be less inclined to believe things would go badly and more inclined 
to believe things would go well as indicated by their left-skewed distribution.  The 
form of the GEV distribution used for the pessimists (referred to hereafter as GEV 
Max) did not accurately model this left-skewed distribution, but by negating the “x” 
value and subtracting from one, the GEV Max right-skewed distribution could be 
converted into a left-skewed distribution (referred to hereafter as GEV Min)  
  
128 
 
((“Generalized Extreme Value Distribution - Wikipedia” 2016, MATLAB (version 
9.1.0.441655) 2016, n. Help File). 
In cases where the separation between the ML and BC estimates and the ML 
and WC estimates was equal, the Normal distribution served as the model.  In this 
case the two parameters, μ and σ, are pulled directly from the estimates of the 
subjects.  The first parameter, μ, is set to the ML value provided by the participant.  
The second parameter, σ, is the separation between the ML value and either endpoint 
(e.g. WC – ML or ML-BC; either will work since the separations are equal) divided 
by 3.  This ensures that 99.7% of the density falls between the BC and WC estimates 
(Farr 2012, 29).  Between the GEV Max, GEV Min, and Normal distributions, all 
subjects’ estimates were modeled.  One particular case that occurred in a few of the 
estimates cannot be modeled by any standard distribution curve.  This was a case 
where the participant provided the same estimate for both the BC and the ML values.  
In these cases, it is recommended that the decision maker request a new estimate 
where the BC value must be less than the ML value.   
Because the GEV distribution is defined from (-∞,∞), in order to mimic the 
PERT beta model, it was necessary to solve for “k” such that most of the density fell 
between the BC and WC estimates.  For the GEV Max case, when graphing various 
estimates it was noted that the graph of the PDF began to rise above the x-axis when 
“k” was set such that the cumulative distribution function (CDF), when evaluated at 
the BC estimate was equal to 0.0001.  For the GEV Min case, the same behavior was 
noted at the WC estimate when it was evaluated at 0.99995.  Equation 3-14 through 
Equation 3-19 provide the PDF and CDF of each distribution model (“Generalized 
  
129 
 
Extreme Value Distribution - Wikipedia” 2016, “Normal Distribution” 2016, 
MATLAB (version 9.1.0.441655) 2016, n. Help File)  
𝑃𝑃𝑃𝑃𝑃𝑃 𝐺𝐺𝐺𝐺𝐺𝐺 𝑀𝑀𝑎𝑎𝑥𝑥 = 1
𝑘𝑘
𝑒𝑒
−(𝑥𝑥−𝜇𝜇)
𝑘𝑘 𝑒𝑒−𝑒𝑒
−(𝑥𝑥−𝜇𝜇)
𝑘𝑘                                        Eqn 3-14 
 
      𝐶𝐶𝑃𝑃𝑃𝑃 𝐺𝐺𝐺𝐺𝐺𝐺 𝑀𝑀𝑎𝑎𝑥𝑥 = 𝑃𝑃(𝑥𝑥) =  𝑒𝑒−𝑒𝑒−(𝑥𝑥−𝜇𝜇)/𝑘𝑘    Eqn 3-15 
 
𝑃𝑃𝑃𝑃𝑃𝑃 𝐺𝐺𝐺𝐺𝐺𝐺 𝑀𝑀𝑖𝑖𝑛𝑛 = 1
𝑘𝑘
𝑒𝑒
(𝑥𝑥−𝜇𝜇)
𝑘𝑘 𝑒𝑒−𝑒𝑒
(𝑥𝑥−𝜇𝜇)
𝑘𝑘                                              Eqn 3-16 
 
𝐶𝐶𝑃𝑃𝑃𝑃 𝐺𝐺𝐺𝐺𝐺𝐺 𝑀𝑀𝑖𝑖𝑛𝑛 = 𝑃𝑃(𝑥𝑥) =  1 − (𝑒𝑒−𝑒𝑒𝑥𝑥−𝜇𝜇𝑘𝑘 )              Eqn 3-17 
 
𝑃𝑃𝑃𝑃𝑃𝑃 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑎𝑎𝑁𝑁 = 1
√2𝜎𝜎2𝜋𝜋
𝑒𝑒
−(𝑥𝑥−𝜇𝜇)2
2𝜎𝜎2                                            Eqn 3-18 
.   
     𝐶𝐶𝑃𝑃𝑃𝑃 𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑁𝑎𝑎𝑁𝑁 = 𝑃𝑃(𝑥𝑥) =  1
2
[1 + erf (𝑥𝑥−𝜇𝜇
𝜎𝜎√2
)]              Eqn 3-19 
 
Setting the probabilities to these extreme values in both cases also helped 
ensure that most of the probability density would fall within the outlying estimates as 
the creators of PERT intended (Malcolm et al. 1959, 651).  It also meant, however, 
that unlike PERT, there was a non-zero probability that the duration could be outside 
the extreme estimates, which helps account for the cases when the estimator is wrong 
about the location of the extremes of the distribution.   
  
130 
 
To solve for “k” in the GEV Max model, Equation 3-15 was set to equal 
0.0001 and evaluated at the BC estimate, where x = BC and μ = ML.  Rearranging the 
variables to solve for k resulted in Equation 3-20 for the GEV Max case.   
𝑘𝑘 =  (𝐵𝐵𝐵𝐵−𝑀𝑀𝑀𝑀)(−�ln�−(ln(0.0001))��)                                                Eqn 3-20 
It was noted that there was a relationship between the k parameter and the 
separation between the ML and BC estimates.  When plotted using Excel™ and using 
the “trendline” option, it was discovered that this relationship was linear and, for a 
GEV Max model, solving for k simplifies to:  
𝑘𝑘 =  0.45038 ∗ (𝑀𝑀𝑀𝑀 − 𝐵𝐵𝐶𝐶)                                   Eqn 3-21  
To solve for “k” in the GEV Min model, Equation 3-17 was set to 0.99995  
and evaluated at the WC estimate, where x = WC and and μ = ML.  Rearranging to 
solve for “k” resulted in Equation 3-22.   
𝑘𝑘 =  (𝑊𝑊𝐵𝐵−𝑀𝑀𝑀𝑀)(ln(−ln (−(0.99995−1)))                                               Eqn 3-22 
There was also a linear relationship between the “k” parameter and the 
separation between the WC and ML values for the GEV Min model.  For this model, 
solving for k simplifies to:  
𝑘𝑘 =  0.43613 ∗ (𝑊𝑊𝐶𝐶 −𝑀𝑀𝑀𝑀)                                       Eqn 3-23 
Solving for “k” using the endpoint estimates meant that all three distribution 
parameters were defined.  The subjects’ beliefs regarding the probability of activity 
duration could then be described by the PDFs of the GEV Max distribution , the GEV 
Min distribution, or the Normal distribution  as seen in Equation 3-14, Equation 3-16, 
and Equation 3-18, respectively.   
  
131 
 
Unfortunately, this meant that for the GEV Max model, while the “ML and 
the BC estimates matched a beta distribution reasonably closely, the WC value did 
not necessarily match up as well (see Figure 6-1).  The opposite was true of the GEV 
Min model, where the ML and WC estimates matched, but the BC estimate did not.  
This is because the GEV distribution is determined by two parameters, so only two of 
the three values can be set.  The Normal distribution matched closely on both ends 
due to its symmetrical nature.  A discussion on this limitation of the model will be 
provided in Chapter 5. 
3.5.2 Calibrating the Experts 
 Using the method described in Section 3.5.1, a project manager can take the 
three estimates provided by any given project team member (including herself) and 
model his belief about the probability of the task duration.  It has been shown, 
however, that in general, people are not good estimators of probability for many of 
the reasons discussed in Chapter 2 (Morris 1977, 682; Hubbard 2010, 57).  To 
compensate for that limitation, the prior distribution must be calibrated, just as a 
weight scale must be calibrated to compensate for any off-set in the measurement.  
Since empirical calibration data was not available for this study, a subjective 
calibration scheme was used instead.  In Morris’ work, the CDF of the expert’s prior 
distribution is run through a calibration function which is defined by the decision 
maker’s beliefs about the expert (Morris 1977, 682, 684–88).  If the decision maker 
believes the expert has understated his knowledge, the calibration function will take 
on a different form than if the decision maker believes the expert has overstated his 
knowledge.  When a decision maker believes an expert has understated his 
  
132 
 
knowledge, it is as if the decision maker looks at the estimate and says, “I believe you 
know more about this than you think you do.  We don’t need to account for such a 
wide range and can tighten up this variance.”  Conversely, if the decision maker 
believes an expert has overstated his knowledge, the decision maker would say, “I 
don’t believe you know as much about this as you think you do.  We need to account 
for a wider range of values and widen this variance”.  Based on this belief, the 
calibration function will modify the expert’s prior distribution to alter the variance 
and make it either smaller or larger, depending on how the decision maker feels about 
the expert (Morris 1977, 688; Savage 1971, 796). 
 For this research, the beta distribution was chosen to calibrate the experts.  
The beta distribution was able to accommodate both the overstated and understated 
experts in the left-skewed, right-skewed, and symmetrical cases.  It can also handle 
the case of a fully calibrated expert.  In this case, the beta distribution is treated less 
as a distribution function and more as a filter through which the signal of the expert’s 
prior distribution is processed.  To avoid confusion, it will be referred to as the beta 
filter from this point forward.  
 Morris recommends a calibration scheme based on whether or not the expert 
will be surprised by the revealed value of the variable, in this case, the actual duration 
of the activity.  He defines surprise as the revealed value occurring below the 0.1 
fractile or above the 0.9 fractile (Morris 1977, 691). The probability the decision 
maker assigns to the likelihood of this event (i.e. the actual value falls in the tails of 
the expert’s distribution) is the basis for the shape of the calibration curve.   
  
133 
 
 The decision maker’s belief about the expert can be modeled by altering the 
parameters of the beta filter.  The table below provides a general guideline for the 
relationship between the two beta filter parameters so that the model will reflect the 
decision maker’s belief about the expert.  
Expert’s Prior Skew DM’s Belief about the Expert Beta Parameter Setting 
Left, Right, or Symmetrical Fully Calibrated  α = β = 1           
Left Understated α < β  ;  α > 1 ;  β >  1 
Right Understated α > β  ;  α > 1 ;  β > 1 
Left  Overstated α < β  ;  α <1 ;  β < 1 
Right Overstated α > β  ;  α < 1 ;  β < 1 
Symmetrical Understated α = β > 1  
Symmetrical Overstated α = β < 1  
  Table 3-7: α and β Beta Filter Parameters 
 By providing estimates, experts are indirectly providing information on their 
uncertainty about the duration (Gelman et al. 2013, 32; Morris 1977, 688).  When 
those beliefs are modeled as the “prior”, the expert indirectly informs the decision 
maker of the revealed value that would surprise him.  The lower bound of “surprise” 
is determined by setting the CDF to 0.1 and solving for “x”, where ML and “k” have 
already been set.  The upper bound of “surprise” is determined by setting the CDF to 
0.9 and again solving for “x” (Morris 1977, 683,691; Keefer and Verdini 1993, 1087).  
If the decision maker believes the expert is fully calibrated, the sum of the area under 
the two tails of the beta filter curve will be 0.2, where the area between zero and 0.1 
is 0.1 and the area between 0.9 and 1.0 is also 0.1 (Önkal et al. 2003, 181; Yates 
  
134 
 
1990, 21–23, 69–71).  This is effectively saying that the decision maker believes that 
the likelihood of the revealed value of the duration falling in one of the tails of the 
expert’s prior is 20%.   If the decision maker believes that the expert is understating 
his knowledge, she believes the sum of the areas under the tails will be less than 0.2.  
If she feels the expert is overstating his knowledge, she believes the sum will be 
greater than 0.2 (Morris 1977, 688; Winkler 1981, 482). 
Once the decision maker’s beliefs about the expert are determined, the values 
of the parameters can be set.  The mode of a beta distribution (and therefore the beta 
filter) is described by Equation 3-24 (“Beta Distribution” 2016). 
𝐵𝐵𝑒𝑒𝐵𝐵𝑎𝑎 𝑃𝑃𝑖𝑖𝑁𝑁𝐵𝐵𝑒𝑒𝑁𝑁 𝑀𝑀𝑁𝑁𝑀𝑀𝑒𝑒 =  𝛼𝛼−1
𝛼𝛼+𝛽𝛽−2
                                            Eqn 3-24 
 For the three distribution models used (GEV Max, GEV Min, and Normal), 
when the CDFs of these distributions are evaluated at the ML value (i.e. the mode), 
the result is always the value shown in Table 3-8.  Since the intent was to only alter 
the variance of the expert’s prior distribution and not its location on the number line, 
the values of α and β were set such that solving Equation 3-24 using those values of α 
and β would result in the values below. 
Expert’s Prior Distribution Shape Value of the Beta Filter Mode 
GEV Max 0.3679 
GEV Min 0.6321 
Normal 0.5 
  Table 3-8: Beta Filter Modes 
  
135 
 
With that constraint in place, the values of α and β were then adjusted so that, 
while still meeting the constraint of Equation 3-24, the values of α and β also set the 
resulted in the desired likelihood of surprise (see Appendix A.10 through A.12 ).   
The CDF of the beta filter is described by the incomplete beta function which 
is numerically challenging to evaluate (“Beta Distribution” 2016).  As a practical 
implementation, the Excel™ function “=betadist(x,a,b) was used, where “x” is the 
value at which the CDF of the beta distribution (in this case the beta filter) is 
evaluated, and “a” and “b” are the shape parameters of the distribution.  To calculate 
the Likelihood of Surprise (LoS), Equation 3-25 was entered into an Excel™ 
spreadsheet.  
𝑀𝑀𝑁𝑁𝐿𝐿 = (betadist(0.1,α,β)) + (1 – (betadist(0.9, α, β)))        Eqn 3-25 
where α and β are the shape parameters of the filter.  The first term calculates the 
likelihood that the revealed value will fall below the expert’s prior’s 0.1 fractile and 
the second term calculates the likelihood that the revealed value will fall above the 
expert’s prior’s 0.9 fractile.  Together, they represent the likelihood that the revealed 
value will surprise the expert.  Using the “Solver” application on Excel™, Equation 
3-25 was systematically set to all values from 0.1 to 0.99 in increments of 0.1 (i.e. 
likelihoods from 1% to 99%), subject to the constraint that the values of α and β had 
to satisfy Equation 3-24 when set to the values shown in Table 3-8.  Appendix A.10 
through A.12 shows the different combinations of α and β that result in the desired 
likelihood.  This appendix consists of tables for all three prior distribution models, as 
well as the values of 1/(B(α,β) which will be required for Eqn 3-25. 
  
136 
 
 With the expert’s prior and the beta filter both fully determined, the process of 
calibrating the expert is relatively straight forward.  First, the CDF of the expert’s 
prior is calculated based on their original estimates by using Equation 3-15, Equation 
3-17, or Equation 3-19 as appropriate. Once these values are determined, they can be 
processed through the beta filter described by Equation 3-26 (the beta filter) 
(“NIST/SEMATECH E-Handbook of Statistical Methods” 2016, 1.3.6.6.17, “Beta 
Distribution” 2016)).  
𝛷𝛷(𝑃𝑃(𝑥𝑥)) = 1
𝐵𝐵(𝛼𝛼,𝛽𝛽) 𝑥𝑥𝛼𝛼−1(1 − 𝑥𝑥)𝛽𝛽−1    Eqn 3-26 
where “x” is the value of F(x) as calculated by Equation 3-17, or Equation 3-19, α 
and β are the filter parameters (previously determined), and B(α,β) is the beta 
function (used to normalize the equation) evaluated at α and β  (“Beta Function” 
2016).  This result is then multiplied by the original PDF of the expert to produce the 
calibrated prior function. 
𝑓𝑓𝑐𝑐𝐸𝐸𝑎𝑎(𝑥𝑥) = � 1𝐵𝐵(𝛼𝛼,𝛽𝛽) 𝑥𝑥𝛼𝛼−1(1 − 𝑥𝑥)𝛽𝛽−1 � ∗ 𝑓𝑓𝐸𝐸𝑎𝑎(𝑥𝑥)  Eqn 3-27 
where 𝑓𝑓𝐸𝐸𝑎𝑎(𝑥𝑥) is the prior distribution of the “i”th Expert, as described by either 
Equation 3-14, Equation 3-16, or Equation 3-18, depending on the original estimates 
provided by the Expert and “x” is the duration estimate at which the function is 
evaluated.   The result is a calibrated expert prior that can then be combined with the 
decision maker’s prior to calculate the posterior probability of the activity duration 
(Morris 1977, 683, 685–86). 
  
137 
 
3.5.3 Calculating the Posterior Probability 
 With the prior probability of the decision maker defined and the expert 
defined and calibrated, a posterior probability distribution for the activity duration 
could be calculated.  In a Bayesian belief-updating model, a decision maker forms a 
prior probability distribution based on the available data (Simon French 1985, 189; 
Jeffreys 1983, 33–34).  As new data are received, this distribution is updated to 
reflect the new data  (Silver 2012, 241, 247–48).  Using a Bayesian belief-updating 
model and treating each expert’s estimation as new data, the decision maker’s belief 
about the probability of the activity duration is updated based on the information 
provided by the experts (Morris 1977, 680).  For the purposes of this research, the 
assumption is made that the decision maker and all experts are independent of one 
another in the statistical sense.   
 According to Morris’ method, assuming independence, the posterior 
probability curve is described by Equation 3-28 
𝑓𝑓𝑝𝑝(𝑥𝑥) = 𝑐𝑐 ∗  𝑓𝑓𝐷𝐷𝑀𝑀(𝑥𝑥) ∗  𝑓𝑓𝑐𝑐𝐸𝐸𝑎𝑎(𝑥𝑥) ∗ … ∗  𝑓𝑓𝑐𝑐𝐸𝐸𝑒𝑒𝑛𝑛(𝑥𝑥);    for i = 1…n          Eqn 3-28 
where 𝑓𝑓𝑝𝑝(𝑥𝑥) is the posterior distribution, “c” is a normalizing constant, 𝑓𝑓𝐷𝐷𝑀𝑀(𝑥𝑥) is the 
decision maker’s prior probability distribution, and 𝑓𝑓𝑐𝑐𝐸𝐸𝑎𝑎(𝑥𝑥) is the ith expert’s 
calibrated prior probability distribution for “n” total experts  (Morris 1977, 687).  
Evaluating each term on the right side of the equation over a range from zero to some 
value larger than the largest WC value (to compensate for the tail) will produce a 
fully defined posterior curve.  To normalize the curve, each value is then multiplied 
by the reciprocal of the area under the curve.  
  
138 
 
The full process for calculating the posterior is described below using an 
example activity, a DM, and one Expert.  The prior distributions of both the DM and 
Expert are modeled using a GEV Max distribution.  Table 3-9 explains the methods 
used to arrive at the example values shown in Table 3-10.  
 Value in the column 
represents: 
Calculated using: 
Column A Duration in increments of 0.1 N/A 
Column B DM’s prior distribution PDF Equation 3-14, where x = value in 
Column A 
Column C DM’s prior distribution CDF Equation 3-15, where x = value in 
Column A 
Column D Expert’s prior distribution 
PDF 
Equation 3-14, where x = value in 
Column A 
Column E Expert’s prior distribution 
CDF 
Equation 3-15, where x = value in 
Column A 
Column F Calibration of the expert Equation 3-26, where α = 1.44 and β 
= 1.76, and x = value in Column E 
Column G Expert’s Calibrated Prior Column D multiplied by Column F 
Column H Normalized aggregated 
posterior distribution PDF 
Column B multiplied by Column G 
and the normalizing constant, c 
Column I Aggregated posterior 
distribution CDF 
Column H multiplied by 0.1 
c Normalizing constant The reciprocal of the sum of all 
values in Column I 
Table 3-9: Calculating the Aggregated Posterior Distribution  
From Table 3-10, it can be seen that the maximum value of the posterior 
distribution (Column H) occurs at a duration of 21.8 (Column A). 
  
139 
 
 
  Table 3-10: Example Full Process Calculations 
To avoid the algebraic quagmire of finding a general equation to describe the 
curve in Column H, it seemed best to use an approximation with known equations for 
the probability density, cumulative probability, mean and variance.  It was noted that 
plotting the results of Equation 3-28 resulted in a curve that still closely resembled 
either a GEV Max, GEV Min, or Normal distribution.  The end-points and mode were 
in new locations, but the general shape remained similar.  To determine the required 
shape, Equation 3-29 subtracted the sum of the area above the mode from the sum of 
the area below the mode. 
𝐿𝐿 = (∑ 𝐶𝐶𝑃𝑃𝑃𝑃𝑑𝑑)𝑑𝑑=𝑚𝑚−1𝑑𝑑=0 −  (∑ 𝐶𝐶𝑃𝑃𝑃𝑃𝑑𝑑)𝑑𝑑=𝑛𝑛𝑑𝑑=𝑚𝑚+1                     Eqn 3-29 
where “d” is a duration estimate, “m” is the mode, “n” is the total number of activities 
and CDFd is the is the value of Equation 3-28 multiplied by the value by which the 
duration is incremented (e.g. 0.1 in this example).   
 Once this value for “S” is calculated, the following Excel™ command was 
used to quickly determine the best approximation for the posterior curve.   
  
140 
 
=IF(AND(S < 0.13,S > -0.13),”Normal”,IF(S > 0.13, “GEV Min”, IF(S < -0.13,”GEV Max”))) 
After determining the general shape of the resulting posterior distribution 
curve using the Excel™ commands just described, it was necessary to once again 
determine the parameters that defined the mean and the variance needed for 
development of a network schedule.  The new mode (i.e. ML estimate) was 
determined by finding the largest value of 𝑓𝑓𝑝𝑝(𝑥𝑥) from Equation 3-28.  This was 
accomplished by visually scanning the results of the spreadsheet (Column H from 
Table 3-10 in this example) with the assistance of Excel’s™ color-scaling feature.  
This value represented the peak of the posterior distribution curve. The “x” value 
from Column A (i.e. the duration) associated with that peak was set as the mode of 
the posterior distribution, which in turn defined the “μ” parameter of the GEV and 
Normal approximations.  This left only “k” or “σ”, depending on the form of the 
approximating curve, to be defined.  After some experimentation, it was determined 
that the best method for matching the GEV approximation to the posterior distribution 
as calculated using Equation 3-28 was to set the Equation 3-14 (for GEV Max) or 
Equation 3-16 (for GEV Min) to 𝑓𝑓𝑝𝑝(𝑁𝑁𝑁𝑁𝑀𝑀𝑒𝑒), which is the value of  Equation 3-28 
evaluated at the mode, and solve for “k”. When evaluated at the mode, Equation 3-14 
and Equation 3-16 reduce to Equation 3-30.  
𝑘𝑘 = 0.367879/𝑓𝑓𝑝𝑝(𝑁𝑁𝑁𝑁𝑀𝑀𝑒𝑒)   Eqn 3-30 
With both “μ” and “k” defined, the GEV approximation for the decision maker’s 
posterior probability is fully characterized.  This, in turn, allows for the calculation of 
the mean and variance of the posterior probability by using Equation 3-31 through 
Equation 3-33 (NIST 2016b, 1.3.6.6.16; “Generalized Extreme Value Distribution - 
  
141 
 
Wikipedia” 2016, “MinStableDistribution—Wolfram Language Documentation” 
2017). 
𝑀𝑀𝑒𝑒𝑎𝑎𝑛𝑛𝐺𝐺𝐸𝐸𝐺𝐺 𝑀𝑀𝑀𝑀𝑥𝑥 =  𝜇𝜇 + 𝑘𝑘𝑘𝑘   Eqn 3-31 
𝑀𝑀𝑒𝑒𝑎𝑎𝑛𝑛𝐺𝐺𝐸𝐸𝐺𝐺 𝑀𝑀𝑖𝑖𝑛𝑛 =  𝜇𝜇 − 𝑘𝑘𝑘𝑘   Eqn 3-32 
𝐺𝐺𝑎𝑎𝑁𝑁𝑖𝑖𝑎𝑎𝑛𝑛𝑐𝑐𝑒𝑒 = 𝑘𝑘2 𝜋𝜋2
6
   Eqn 3-33 
where “μ” is the mode of the GEV approximation, “k” is the shape parameter, and “γ” 
is Euler’s constant (approximated at 0.57721) (“Euler–Mascheroni Constant” 2016).     
In the event the closest model of Equation 3-28 is a Normal distribution, the 
same procedure is followed as for the GEV approximations, but Equation 3-30 is 
replaced by Equation 3-34 which defines the variance, and mean is equal to the mode 
(“Normal Distribution” 2016). 
𝜎𝜎 = �( 1𝑓𝑓𝑝𝑝(𝑚𝑚𝑚𝑚𝑚𝑚𝑇𝑇))2
2𝜋𝜋
                                                         Eqn 3-34 
For a typical network schedule, the mean and variance of each activity 
distribution are sufficient to calculate the total project duration and project variance.  
In some cases, however, it may be desirable to determine a BC and WC value for the 
posterior distribution, perhaps for use in a Monte Carlo simulation (Mantel Jr. et al. 
2004, 156–60).  Because the GEV Max and Min approximations only have two 
parameters, only one of the extremes can be set, but the other can be approximated.   
For a GEV Max prior distribution, the BC estimate was used to solve for the 
shape parameter “k” by setting the CDF of the distribution to 0.0001 and evaluating 
at the BC estimate.  For the posterior distribution, the shape parameter has already 
been set as described above.  With the shape parameter set, Equation 3-20 and 
  
142 
 
Equation 3-21 can be rearranged to solve for the BC value as seen in Equation 3-35 
or, for a simplified version, Equation 3-36.   
𝐵𝐵𝐶𝐶 =  −𝑘𝑘(ln�−(ln(0.0001))�) + 𝑀𝑀𝑀𝑀      Eqn 3-35 
𝐵𝐵𝐶𝐶 =  𝑀𝑀𝑀𝑀 − � 𝑘𝑘
0.45038�                                                 Eqn 3-36 
For the GEV Min case, the WC parameter was used to solve for k by setting 
the CDF of the distribution to 0.99995 when evaluated at the WC value.  For this 
distribution model, rearranging Equation 3-22 and Equation 3-23 solves for the WC 
value as seen in Equation 3-37 and, for the simplified version, Equation 3-38.   
𝑊𝑊𝐶𝐶 = (ln(− ln(0.00005)))𝑘𝑘 + 𝑀𝑀𝑀𝑀                          Eqn 3-37 
𝑊𝑊𝐶𝐶 =  𝑀𝑀𝑀𝑀 + � 𝑘𝑘
0.43613�                                                  Eqn 3-38 
The remaining extreme estimate (WC for the GEV Max distribution and BC 
for the GEV Min) cannot be calculated since there are no remaining parameters in the 
distribution equation, but the values can be set such that the density between the BC 
and WC values is at the desired level.  The original creators of PERT intended that 
most of the density should fall between the BC and WC estimates  (Malcolm et al. 
1959, 651).  Given that 3σ of the Normal distribution comprises 99.7% of the density 
between the BC and WC estimates, it is recommended to set the remaining parameter 
for both models of the GEV distribution such that 99.7% of the density will also fall 
between the BC and WC value (to standardize across all models)  (Farr 2012, 29). 
Equation 3-39 and Equation 3-40 calculate the probability density between the 
BC and WC values for the GEV Max and GEV Min distributions, respectively.    
𝛥𝛥𝐺𝐺𝐸𝐸𝐺𝐺 𝑀𝑀𝑀𝑀𝑥𝑥  = �𝑒𝑒−𝑒𝑒−𝑊𝑊𝑊𝑊−𝜇𝜇𝑘𝑘 � − �𝑒𝑒−𝑒𝑒−𝐵𝐵𝑊𝑊−𝜇𝜇𝑘𝑘 �                                                Eqn 3-39 
  
143 
 
 𝛥𝛥𝐺𝐺𝐸𝐸𝐺𝐺 𝑀𝑀𝑖𝑖𝑛𝑛 = (1 − �𝑒𝑒−𝑒𝑒𝑊𝑊𝑊𝑊−𝜇𝜇𝑘𝑘 �) −  (1 − �𝑒𝑒−𝑒𝑒𝐵𝐵𝑊𝑊−𝜇𝜇𝑘𝑘 �)             Eqn 3-40 
For Equation 3-39,  𝛥𝛥𝐺𝐺𝐸𝐸𝐺𝐺 𝑀𝑀𝑀𝑀𝑥𝑥 is set to 0.997 to force 99.7% of the density to 
fall between the BC and WC estimates.  The final term of that equation is equal to 
0.0001 based on the discussion in Section 3.5.1.   Rearranging Equation 3-39 to solve 
for the WC value results in Equation 3-41  
𝑊𝑊𝐶𝐶 =  𝑀𝑀𝑀𝑀 − 𝑘𝑘(ln�−(ln(0.9969))�      Eqn 3-41 
For the GEV Min case, 𝛥𝛥𝐺𝐺𝐸𝐸𝐺𝐺 𝑀𝑀𝑖𝑖𝑛𝑛 is once again set to 0.997 and the first term 
on the right of the equation reduces to 0.99995 based on the discussion in Section 
3.5.1.  Rearranging Equation 3-40 and solving for the BC value results in Equation 
3-42. 
𝐵𝐵𝐶𝐶 =  𝑀𝑀𝑀𝑀 + 𝑘𝑘(ln�−(ln(0.99705))�      Eqn 3-42 
 
In the event of a Normal approximation, the ML value will equal the mean, 
and the BC and WC values can be found by solving Equation 3-43 and Equation 
3-44. 
𝐵𝐵𝐶𝐶 = 𝑀𝑀𝑀𝑀 − (3𝜎𝜎)                                         Eqn 3-43 
𝑊𝑊𝐶𝐶 = 𝑀𝑀𝑀𝑀 + (3𝜎𝜎)                                         Eqn 3-44 
 Ultimately, the method just described provides a means for incorporating the 
beliefs of multiple team members while still using the network scheduling methods 
that have been developed and refined over the last fifty years.  It also provides a way 
for project managers to show a basis for estimate when presenting the schedule to 
  
144 
 
senior leadership.  Examples of this process and its results will be demonstrated in 
Chapter 7. 
 
  
145 
 
Chapter 4 Results – Opinions on Scheduling Issues 
 
The previous chapter described the methods used to categorize subjects, 
gather their inputs on activity durations, and explore some of the thinking behind 
those estimates.  This chapter presents the results of the “essay” questions from the 
“Course of Action” (COA) survey and the Scheduling surveys.  Section 4.1 
consolidates the response from the COA survey which asked why subjects believed 
projects fall behind schedule.  Section 4.2 provides the results of the second part of 
the Scheduling survey.  These questions were related to whether or not subjects 
believed the provided task list covered all required activities, if any activities were 
missing, and if the resources provided to the project were adequate.    
4.1 COA Survey – The Results 
The following section describes participant responses to the question “why do 
projects fall behind schedule.  Subjects were identified only by their Position 
demographic (management or technical).  Responses are organized by the ways in 
which the two groups agreed, the ways in which they disagreed (along with some 
“editorial” comments which provide further insights into the perceptions held by 
members of each group), and finally, a summary of the results.  
4.1.1 Why do projects struggle? – Agreements 
 From Part 2 of the survey found in Appendix A.5, subjects were asked why in 
general, given their professional experiences, they believed projects fell behind 
  
146 
 
schedule.  Although the answers varied, several themes emerged among the subjects 
and even across the management/technical boundary.  In some cases, the same theme 
appeared across this boundary, but the sides took opposing views.  Below is a 
summary of the responses organized by general theme.  Note that throughout this 
section, the thoughts and opinions expressed are those of the research subjects.  Some 
phrases are taken directly from the responses of the subjects while others are 
paraphrased, but all of the ideas and concepts are derived from the anonymous 
surveys submitted by the subjects. 
 One of the themes mentioned by both management and technical subjects was 
a perceived inability to properly plan out a project.  Several subjects simply listed out 
“poor planning” or “inadequate planning” as an explanation for why projects fail, 
which in this context, is interpreted to be primarily focused on the failure to truly 
capture all activities required to complete a project. Other subjects expanded this 
definition of “poor planning” by explaining that resources (which seemed to refer 
mostly to people) were not properly managed because adequate time was not spent on 
resource allocation.  Still others went on to clarify that this failure to plan resulted in 
schedule slips because oversights in the planning phase resulted in problems during 
the execution phase which led to re-work and re-design.  Interestingly, while 
technicians listed funding as a cause of schedule slips, subjects in the management 
category did not.  The specific funding issues mentioned by the subjects were: lack of 
funding, timeliness of funding, and end-of-fiscal-year spending driving purchases 
before requirements and design were completed (i.e. a system is purchased and the 
  
147 
 
project is planned around the already-purchased system as opposed to planning a 
project and purchasing a system to meet the needs of that project).   
Another cause of schedule slips which fell under the “poor planning” category 
was a project’s inability to deal with unforeseen circumstances.  Some subjects listed 
“unforeseen issues” as a cause while others more specifically called out weather or 
equipment failures, the latter notably mentioned only by the technicians and not by 
management.  Still others listed logistics problems or delays by organizations outside 
of the project team’s control.  Subjects in both the management and technical 
categories mentioned the need to build contingency into the schedule to handle these 
unforeseen issues.  One participant contended that most projects do not include 
contingency because those with approval authority are more likely to approve 
projects that initially show a quick completion date. 
 Another theme, closely tied to “poor planning” and mentioned mostly by 
management subjects, but also by at least two technicians, is a failure to adequately 
define requirements.  The concerns mostly centered on the fact that a project rarely 
has a complete understanding of everything that is required at the outset.  Teams 
develop requirements as they understand them, but things are complicated by 
unknowns (especially in research and development projects), changing requirements, 
and poor communication with stakeholders at the beginning of the project.  As one 
participant mentioned, the failure to fully define requirements at the beginning of the 
project hurts the project team in later project phases, as they must incorporate updated 
requirements during development and execution.  Another participant pointed out that 
project teams are sometimes hamstringed by processes that do not necessarily match 
  
148 
 
the project and that it is sometimes impossible to fully define requirements at the 
outset of the project unless the team were to be eternally stuck in requirements 
development.  The participant pointed out that the danger of remaining in 
requirements definition too long was that technology would speed past the 
development team, who would then be stuck in an endless loop of updating 
requirements to match technological capabilities.   
 One cause mentioned multiple times by subjects in both categories was the 
belief that most project schedules are too aggressive from the start.  Both 
management and technical subjects felt there was a problem with unrealistic 
expectations/goals being placed on the project team to complete the project by a 
given date.  One management participant even went on to say that the schedules were 
aggressive because of the need to meet an already-unachievable target date and a 
technical participant stated that adequate time was not allocated from the beginning of 
the project.  One of the reasons listed by both management and technical subjects was 
that schedules were created and assumptions were made at the milestone level and 
that these milestones did not adequately describe the level of work that needed to be 
completed.  One participant went on to say that it was not possible to properly 
allocate resources using only a milestone schedule.  The participant stated that this 
allocation could only be completed with a fully realized schedule, but creation of 
detailed schedules was rarely accomplished.  Further complicating matters was the 
belief by both managers and technicians that schedules were developed and presented 
in a way that would ensure customer approval to proceed (or continue) as opposed to 
a schedule that accurately reflected how long it would take to complete the project.  
  
149 
 
Technical subjects also believed that there was a problem with non-technical 
personnel dictating the schedule.  The belief seemed to be that these non-technical 
personnel did not have enough understanding of the details of the work and were 
therefore not able to develop accurate schedules.  These responses also suggested that 
the technical team had not had the opportunity to review these schedules before they 
were presented outside the team.   
 Probably the most frequently mentioned cause of schedule delays across both 
groups of subjects was the fact that personnel assigned to a project were pulled off the 
project prior to completion or they were not allowed to focus solely on the project at 
hand due to having several other projects that also needed attention.  Management 
subjects pointed out that when schedules are created, they are based on the known 
available resources.  When those resources are decreased, the schedule will also slip.  
Another subject pointed out that even if the resources are re-assigned to the project, 
there is a learning curve associated with getting them re-acquainted with the project 
and caught up on current progress.  Technical subjects seemed to focus more on 
resources being over-tasked.  Their contention was that there were too many activities 
required to be completed by too few people and that schedules do not account for the 
realities of a matrixed organization. 
 An interesting theme that manifested itself mostly in the technical group was 
too much interference by personnel who were not directly involved in the project.  
These subjects believed that too many people who were not directly involved in the 
project and did not have intimate knowledge of the details were able to affect the 
schedule.  One believed that the constant review process hampered progress and one 
  
150 
 
subject went so far as to say these reviewers were putting up roadblocks that the team 
must then overcome.  Another subject pointed out that project personnel may be 
following certain project methodologies because they have been directed to by a 
higher authority, and not because it is a good fit for the project.  As mentioned before, 
one complaint was that inexperienced personnel were shrinking the schedule because 
they did not understand the full scope of the project.  This theme was also reflected in 
two of the management responses.  One believed there was a problem with the 
schedule planner being too inexperienced to know to build contingency into the 
schedule and the other believed that the weak-matrixed nature of the organization did 
not allow for a project manager to understand what other projects his/her resources 
were also assigned to.  This second issue is reflective of another issue mentioned by 
one subject: poor communication.  On a more positive note, one technical response 
did state the belief that as project managers were gaining experience, they were 
learning to manage contingency instead of simply using it for the sake of using it. 
 Several other causes were mentioned which fell outside of the major themes 
just described.  Poor estimating was mentioned by both management and technical 
subjects, with one subject stating that it is difficult to learn from mistakes because 
there were no records of how long previous projects actually took.  One management 
subject also mentioned poor execution and poor teamwork as causes for schedule 
delays.   Technical subjects brought up inadequate training on equipment and no 
concrete delivery date (which meant no accountability for the completion).  This 
second cause was further clarified to say that those who will accept the project may 
  
151 
 
not necessarily feel any pressure to approve the work if there is no official due date 
set. 
4.1.2 Why do projects struggle? – Disagreements and Editorials 
 The previous section described several similarities between the perspectives 
of management and technical subjects regarding why schedules fall behind.  Despite 
these similarities, there were a few marked differences.  Two management subjects 
pointed out that they believed schedules were being lengthened because team 
members kept trying to improve the system in question “just a little bit more”.  One 
subject quoted that the “enemy of good is better” and believed that it was okay to try 
and optimize the system as long as the project could absorb the effort.  Otherwise, the 
subject feared that the project teams would get caught in an improvement loop.  
Another management subject stated the belief that there was too much “gold plating” 
in projects overall.  This subject specifically mentioned the customer and technical 
leads pushing to continue testing even once the system had proved operational 
capability. 
 On the other side of the fence, the technicians believed that the problem was 
on the other end.  One subject stated that [technical people] prefer to have all systems 
as close to perfection as possible and will therefore usually push for as much testing 
as the project is willing to give them.  This statement was offered without hinting 
whether or not the subject thought this was good or bad.  Other subjects, however, 
believed that the attitude of “close enough” was too prevalent and that this ultimately 
caused problems later on in the project.  One subject stated that the survey question 
itself was contradictory because if the system had “issues” then, by definition it 
  
152 
 
needed to be repaired.  The subject went on to say that with a management hat on and 
without all the facts (implying that managers typically do not have all of the technical 
details) that system acceptance would be acceptable given that the system was 
meeting requirements.  Another subject echoed these sentiments by saying that in the 
given scenario, one should make the system better in order to allow for system 
adjustments later on.  One management subject did state that it was important to note 
the trending behavior of the system.  This subject stated that if one could predict the 
system would be out of specification by the time it was needed, then one should take 
the extra week at the onset as opposed to accomplishing several rounds of testing and 
then having to re-do the work when the system failed.  Two technical subjects 
brought up a conflict between funding to replace/upgrade/repair the system versus the 
funding it would take extend the project to fix the system.  Both subjects advocated 
increased funding for the systems to mitigate potential future schedule delays driven 
by equipment failures.   
 Although not related directly to why schedules fail, subjects did provide some 
insight into what seem to be prevalent attitudes between management and technical 
personnel.  A management participant stated that there can be a wide variation in 
estimates among management, functional supervisors, and project personnel.  The 
participant also stated that, based on experience, the longest estimates came from the 
project personnel (i.e. the technical people).  Speaking somewhat to that point, one of 
the technical subjects hinted that, in the past, estimates had been exaggerated with 
respect to how long it should take to complete a given task.  The participant stated 
that in an effort to curb this trend, activity durations were cut significantly, with the 
  
153 
 
suggestion that the cuts may have been too severe.  The participant also stated that the 
durations are starting to lengthen again as project managers gain experience.   
4.1.3 Summing Up 
 To summarize the responses to the question “why do projects fall behind 
schedule”, it would appear that multi-tasking of project resources is a major offender, 
followed by overly aggressive initial schedules which are then subject to revision by 
those who are not directly involved in the day-to-day details.  There was a belief that 
technical personnel were not given adequate input into the schedule, but also that, 
when given input, their estimates were typically the longest.  Lack of funding and the 
allocation of that funding were also mentioned, as well as a debate regarding the 
criticality of certain project activities. 
4.2 Scheduling Surveys – Beyond the Duration Estimates 
 The following section provides the results of the second part of the Scheduling 
survey.  In the first part of that survey, subjects were asked to provide duration 
estimates for a list of activities that were deemed necessary to finish the project 
described in the survey.  The second part of the survey asked three questions: 
• Are the resources assigned adequate to successfully complete the project? 
• Are all activities on this list required for successful project completion? 
• Are there any activities missing from the list that are required for successful 
project completion? 
 
These questions gave each participant the opportunity to provide comments on 
human resource levels and legitimacy of the activity list provided.  This section 
  
154 
 
analyzes those results to determine if there are any patterns that could help explain 
why estimating continues to be an ongoing challenge.   
  
4.2.1 Adequacy of Resources Assigned 
The first question asked if more personnel needed to be assigned to the 
project.  Of the 70 responses received, 48 responded that no additional personnel 
were required and that the suggested number of project personnel was adequate to 
meet the requirements.  Eight other subjects answered with a qualified no.  The 
qualifiers generally fell into one of two categories.  One qualification was that the 
number of people assigned was adequate as long as those assigned were able to 
dedicate their time primarily to the task at hand.  The other category focused more on 
project unknowns.  If training was involved or if someone was unable to work, then 
the participant would have preferred to have an extra person available.   
Five subjects believed that either one or two more project team members were 
needed to successfully execute the project.  Four subjects provided a qualified “yes”, 
with three stating that more people would be needed if the desire was to decrease the 
schedule.  One of the subjects who responded with a “yes” may have misunderstood 
the intent of the question.  When reading the clarification statements, it appeared that 
the “yes” response focused on activities that were outside the scope of the project 
phase in question.  Two subjects were undecided on whether or not more people were 
needed.  Both of these subjects mentioned overtasking personnel with too many 
competing priorities.  One participant stated that either more people needed to be 
added or those already working on the project needed more time to focus solely on 
  
155 
 
the project.  The other stated that the roles needed to be clarified to ensure the correct 
additions or it would not matter who was added.  Three subjects did not provide a 
response. 
4.2.2 Activity Necessity 
 In the next question, subjects were asked if they believed each of the listed 
activities needed to be completed in order to successfully complete the project.  Of 
the 70 responses, 45 stated that all activities listed in the survey were required for 
successful completion.  Seven responses could be categorized as a “qualified yes” 
with reasons including some activities were only needed due to a special case on that 
project, a statement that the participant was “still learning”, and a statement of the 
belief that one of the activities had already been completed when the survey was 
filled out.  Twelve responses indicated that that not all items listed in the survey 
needed to be completed or that activities listed had already been accounted for in 
other activities.  Two subjects provided a “qualified no” with one indicating that if no 
problems were found some activities would not be necessary and the other stating that 
some activities were not technically required, but should be given a “best effort”.  
One response was categorized as “undecided” because the participant was unsure if 
another activity would be needed for testing.  Three subjects did not provide 
responses. 
4.2.3 Activity List Completeness  
 The next Scheduling Survey question asked subjects if they believed any 
activities were missing from the activity list provided.  Of the 70 responses, 28 
  
156 
 
responded that they believed the activity list provided needed to be expanded.  Most 
subjects who responded “yes” stated that only 1 to 3 activities needed to be added, 
although some requested 4 or more.  Additional close-out activities and reviews, 
along with activities required to close out those reviews were mentioned several times 
as extra activities that needed to be accounted for in the activity list.   One participant 
stated that they were anticipating additional requirements while another stated that 
there was work to be done that, “was not explicitly called out in the schedule”.  This 
participant stated that the work was probably covered under activities that were listed, 
but that extra time was added to those activities to account for those not specifically 
called out.  Another participant stated that there were activities that were not 
essential, but that would assist the project team.   
Three subjects were listed in the “qualified yes” category, with one participant 
listing roughly 15 activities to be added, but upon closer inspection, a case could be 
made that these activities were expansions of overarching activities already listed in 
the survey.  Another participant factored in extra time for unaccounted for activities 
and also seemed to suggest that additional activities were recommended, but were not 
explicitly added.  The third “qualified yes” participant replaced one activity with 
another, removing an erroneously duplicated activity in the original survey and 
replacing it with a new activity which had been left out.   
Twenty-five subjects responded that they believed the list provided was 
adequate. Nine subjects provided answers that were categorized as a “qualified no”.  
Some subjects listed activities that needed to be completed, but they were either 
associated with a different project or out of the scope of the project phase covered by 
  
157 
 
the survey.  One participant added activities that had previously been completed.  
Another participant listed management activities, but stated these would not cause 
additions to the schedule.  One participant in this category listed unknowns that could 
potentially increase the level of activities and another commented that there were 
complications to listed activities, but that no new activities needed to be added.  Five 
subjects did not provide a response.  Of those five, one participant’s response is 
unknown.  In the raw data consolidation spreadsheet, the response says “comment” to 
signify further clarification written elsewhere beyond a “yes”/”no” answer, but the 
comment could not be located. 
To clarify the number of responses, in some cases, projects could be broken 
up into several different independent sections that were all needed to successfully 
achieve project success, but were not necessarily dependent on one another.  In these 
cases, subjects responsible for managing multiple sections were given one survey 
with all of the different sections, while those responsible for execution of the project 
were given only sections applicable to their assignments.  In the results, each of these 
different sections were broken up and treated as separate projects.  If a manager 
responded “no” to the overall survey on any of the questions, it was assumed that the 
answer applied to each of the different sections of that survey.  The manager’s 
response, therefore, is counted independently for each section.  For example, if a 
project had activities for Team A, Team B, and Team C, each member within that 
team would receive a survey specific to that team, but the manager would receive a 
larger survey with activities for all three teams.  If the manager then responded “no” 
  
158 
 
to any of the questions just described, that “no” response would be tallied in each of 
the individual surveys of Team A, Team B, and Team C. 
4.2.4 Summarizing the Results 
The results from the first question, regarding whether or not more team 
members were needed, indicate that the number of assigned project personnel is not 
the major concern of project teams.  Factoring in both the “no” and “qualified no” 
responses, 80% of respondents believed the number of personnel was adequate.  Of 
the approximately 13% of those who said that more people needed to be added, all 
factor levels of all demographics were represented with the exception of the 24+ YoE 
factor and the High School LoE factor.  What does seem to be a major concern, 
however, is allowing personnel who are assigned to a project to focus primarily on 
that project.  The implication here is that if personnel who are assigned to multiple 
projects are allowed to focus entirely on one project, then extra personnel will be 
needed to backfill the other projects, or those other projects must resign themselves to 
a delayed schedule until personnel are again available.   
For the next question which asked whether or not all activities on the list 
needed to be completed, accounting for both the “yes” and “qualified yes” answers, 
74% of the subjects agreed that all activities on the list were required for successful 
project completion.  Of the 20% (“no” and “qualified “no”) that believed activities 
could be removed, all factor levels of the three demographics were represented.  
Based on the data collected, it would appear that most stakeholders, regardless of 
demographic, are in reasonable agreement when presented with a list of activities for 
a project.  These results show slightly less agreement among stakeholders than was 
  
159 
 
seen on the first question, but it does appear that disagreement on proposed activities 
is not driving the disagreements regarding schedule duration.  This, however, is only 
one half of the coin. 
The second question discussed whether or not stakeholders believed listed 
activities needed to be completed.  The final question asked whether or not any 
activities were left off of the task list.  This question seemed to be the point of most 
disagreement among stakeholders.  Factoring in both the “no” and “qualified no” 
answers, approximately 48.5% of respondents believed the provided list was adequate 
and additional activities were not needed.  On the other hand, factoring in both the 
“yes”/”qualified yes” responses, approximately 44% of respondents believed 
additional activities were needed to successfully complete the project.  All factor 
levels of all demographics were covered in both categories.  Of the three questions 
just discussed, this question represents the most likely driver behind differing 
duration estimates.  While it may not be a significant contributor, if one stakeholder’s 
assumptions regarding required activities differ from another’s within the same 
project, it could cause disagreements on how long the project should take, especially 
if those assumptions are never discussed and one set of assumptions is driving the 
schedule.    
  
160 
 
Chapter 5 Results – Priorities, Personalities, and Predictions 
 
The previous chapter described the opinions of several subjects regarding why 
they believe projects struggle to finish on time.  It also described opinions regarding 
the activity lists from the Scheduling survey and whether or not both the activity list 
and assigned resources were adequate.  This chapter covers the remaining survey 
questions and investigates how the three demographics chosen for study, Position, 
Years of Experience (YoE), and Level of Formal Education (LoE), relate to 
personality traits such as confidence and risk aversion.  It also investigates how these 
demographics relate to schedule duration estimating practices.   
The DesignExpert™ software was used to set up factorial experiments which 
used ANOVA to determine which, if any, of the demographic factors were driving 
the results.  The software was also used to determine the expected response of a 
stakeholder with a given set of demographics based on the results seen in this study.  
Correlations between personality traits and estimating practices were also examined.  
Seventy subjects were contacted regarding participation in this study.  Of those 70, 45 
signed the consent form and agreed to participate.  Throughout the different surveys, 
the total number of respondents differed because not all subjects responded to the 
surveys and of those who did respond, not all subjects answered all questions.  The 
total number of responses for each part of each survey is listed in the sections below.   
 
  
161 
 
5.1 “Course of Action” Survey: Is it really necessary?  
Subjects were given a “Course of Action” COA survey asking what they 
would do given a situation where equipment was barely within specifications, but 
repairing it would cause a schedule delay (see Appendix A.5).  The purpose of this 
question was to determine whether or not managers and technicians perceived the 
criticality of an activity differently from one another.  “Gold Plating” is defined as 
unnecessarily going above and beyond stated requirements.   
It was hypothesized that one possible cause of scheduling disagreements was 
due to differing perceptions of what constituted “necessary” work.  This survey 
received a total of 27 responses, 11 from those identifying as “management” and 16 
from those identifying as “technical”.  The breakdown of responses from 
management subjects and technical subjects can be seen below in Table 5-1 and 
Table 5-2, respectively.  The rows describe the action recommended by the 
participant and the columns describe whether or not the participant believed the extra 
effort was truly necessary.  For example, in Table 5-1 five subjects recommended 
taking an extra week to bring the equipment up to full operating specification, as 
opposed to leaving it in its “barely operational” status.  They considered this work a 
necessary action to mitigate the risk of system failure.  In contrast, four subjects 
believed the equipment should be left alone, believing that any troubleshooting efforts 
constituted unnecessary work (i.e. “gold plating”). 
 
 
 
  
162 
 
Management Risk 
Mitigation 
Gold Plating 
Take extra 
week to fix 
6 0 
Leave “As Is” 1 4 
Table 5-1: Management COA Response 
 
Technician Risk 
Mitigation 
Gold Plating 
Take extra 
week to fix 
8 0 
Leave “As Is” 6 2 
Table 5-2: Technician COA Response 
 
When the survey was initially provided, it was believed that the subjects 
would respond in one of two ways:  take the extra week to fix the problem as a risk 
mitigation strategy or leave the system “as is” because further work would be 
unnecessarily going beyond the required work.  As can be seen from Table 5-1 and 
Table 5-2, a third option was also selected: the subjects stated the belief that extra 
time spent working on the system would constitute risk mitigation, but chose to leave 
the system “as is”.  This particular selection was favored more by the technicians than 
the managers.   
Knowing that technicians are primarily responsible for ensuring the 
equipment is functioning, it is interesting that some technicians, believing that an 
extra week would help mitigate a potential risk, would still forgo that extra week, 
thereby allowing the project to meet its schedule.  This would seem to contradict 
results of an experiment described later in this chapter regarding whether or not 
  
163 
 
schedule should be sacrificed for the sake of quality.  It should also be noted that the 
split between risk mitigation and gold plating for the managers was 64/36 while the 
split on the technician side was 87.5/12.5.  This indicates that while managers 
somewhat disagree about what constitutes a necessary fix, technicians are more 
united in their opinions.  Although the technicians are in better agreement than the 
managers regarding opinions about the necessity of the work, both groups are nearly 
evenly divided as to whether or not to take the extra week to repair the system or 
leave it as is, with the managers showing a very slight preference to take the extra 
time to fix the system. 
 
5.2 Traits/Opinions Results  
From Section 5.1, it was seen that there are differences in the way managers 
and technicians perceive what constitutes “necessary” work.  This section expands on 
that line of inquiry.  The Traits/Opinions survey organized subjects by the 
demographics of Position, YoE, and LoE.  Beyond this basic categorization, this 
survey also gathered information about each participant’s level of risk aversion and 
also their preferences for what project constraint to sacrifice first when things go 
wrong.  The Scheduling survey collected estimates from subjects on activity durations 
across several different projects.  The results of both surveys were then compared to 
the demographic results to determine whether or not stakeholders in different 
demographics respond differently from one another.  Correlations between the 
personality traits of risk aversion/confidence and schedule estimates were then 
calculated.  For example, did a lower risk tolerance correlate with a wider standard 
  
164 
 
deviation in the schedule estimate?  Or did rating schedule slips as a low priority 
correlate to higher estimates in the scheduling surveys?   
5.2.1 Constraints Analysis – by Constraint – The Results 
 After gathering basic demographic information, the Traits/Opinions survey 
asked subjects to rate which project constraint they would sacrifice first should a 
project start to falter.  Because it can be difficult to assign a quantitative number to a 
preference, this method gauged whether or not subjects treated each project constraint 
equally.  If they did, for each subject, the calculated weights for each constraint 
would be equal.  As can be seen in Table 5-3, however, this is not the case.  Thirty-six 
subjects responded to this survey.  After gathering the initial constraint rankings as 
described in Section 3.2.3 from the data collected from the survey in Appendix A.2, a 
weight was calculated for each constraint for each subject using Equation 3-1  (see 
Appendix A.8 for each subject’s individual constraint weights).  A higher weight 
indicates more willingness to fail at meeting that constraint for the sake of 
successfully meeting the others.  For each constraint, the weights provided by the 36 
subjects and the results are provided in Table 5-3.  
Constraint μ 
Schedule 0.40 
Cost 0.35 
Risk 0.15 
Quality 0.10 
Table 5-3: Average weight per constraint 
 From the table, it can be seen that the average weights for each constraint are 
not equal and that the average weight for the Schedule and Cost constraints are higher 
than the average weight for the Risk and Quality constraints.  This indicates more 
  
165 
 
willingness to sacrifice cost and schedule for the sake of minimizing project risk or 
decreasing project quality that there is a clear preference for increasing cost and 
schedule before decreasing quality or increasing risk.   
As described in Section 3.3.1, the 28 responses per constraint and Excel™ “t-
test with unequal variances” function were used to determine whether or not the 
differences seen in the averages shown in Table 5-3 were statistically significant, 
where significance is defined as p<0.05, H0: μi=μj, and “i” and “j” are the two 
constraints being compared as seen in Table 5-4. 
Constraints Compared Difference in Average Significant? P-value 
i = Schedule; j = Cost No 0.087 
i = Schedule; j = Quality Yes 2.63E-13 
i = Schedule; j = Risk Yes 4.63E-10 
i = Cost; j = Quality Yes 9.82E-12 
i = Cost; j = Risk Yes 3.86E-08 
i = Quality; j = Risk No 0.062 
Table 5-4: Statistical Significance of Weight Differences 
 Based on the p-values in the last column of Table 5-4, the null hypothesis 
(equal constraint averages) can be rejected when the either the Cost or Schedule 
constraint is compared to either the Risk or Quality constraint.  It cannot be rejected, 
however, when the Cost and Schedule constraints are compared or when the Quality 
and Risk are compared.   
From these results, it can be inferred that quality is the most important 
constraint, followed by risk (defined as minimizing the risk of project failure), then 
cost, then schedule.  This matches what was found in the GAO reports:  technical 
success is the key indicator of project success (Martin 2012, vi).  Project concerns 
such as cost and schedule increases will be forgotten as long as there is technical 
  
166 
 
success and no one got hurt.  Based on these responses, if problems arise, the 
schedule will take a hit to ensure the overall technical quality is maximized.   
There are some issues concerning the returned data that should be considered 
when interpreting the results.  First, as described in Section 3.2.3, the weights among 
each preference should have a consistency rating of less than 0.1 in order to be 
considered valid.  Out of all of the subjects who responded to the survey, only 32% 
(12/36) were below the 0.1 threshold for consistency.  One participant ranked all 
preferences equally, thus resulting in perfect consistency, but providing little insight 
to the actual preference.  This response was removed from the data set and was not 
included in the analysis.  Another 32% (12/36) of the subjects exhibited slightly 
inconsistent behavior, with their consistencies falling above the 0.1 threshold, but 
below 0.2.  The remaining 32% of subjects (12/36) were very inconsistent among 
their preferences.  Having said that, this exercise was only a gauge meant to provide 
insight into how employees at WFF view the importance of meeting different project 
constraints.   
Another more serious issue resulted from a potential misunderstanding of 
what was asked in the survey.  Some subjects regarded the rankings as a sliding scale 
where a “1” meant Constraint A was preferred over Constraint B,  a “9” Constraint B 
was preferred over Constraint A, and a “5” meant there was no strong preference 
either way.  When it was obvious this mistake was made (usually because the 
participant only provide a numerical ranking with no constraint associated with it), 
the participant was asked to resupply answers using the correct ranking system.  The 
  
167 
 
error may not have been as obvious, however, if some subjects listed both a constraint 
and a numerical ranking.   
Another potential misunderstanding revolved around what exactly was meant 
by the project constraint “risk”.  When the survey was originally developed, risk was 
listed as one of the project constraints, but the meaning of the term was not clearly 
defined.  The intent was a somewhat vague concept intended to depict the risk of 
project failure or risk to personal safety.  Some subjects were unsure what was meant 
by risk in that it is usually tied to one of the other project constraints (e.g. risk of 
schedule increase, risk of cost increase, etc.).  Given this ambiguity, some subjects 
may have understood the meaning behind “risk” differently from one another, which 
could have affected their rankings. 
Finally, some subjects struggled with the fact that the question asked what the 
preferred constraint was “in general”.  The participant who rated everything equally 
said that it was impossible to pick unless project specifics were known (e.g. for some 
projects, schedule is very important, for others cost is very important), so it is 
impossible to know what to sacrifice without knowing the nature of the project.   
5.2.2 Constraints Analysis – by Demographic – The Results 
Section 5.2.1 provided the results of the constraint preferences across all 
participant responses.  This section looks at each constraint individually to determine 
whether or not one of the three demographics (Position, YoE, or LoE) is a significant 
factor driving the differences seen in the responses.  If subjects in a particular 
demographic regarded all project constraints as equally important, there would be no 
statistically significant difference among the average weight for different levels of the 
  
168 
 
demographic for that particular constraint.  For these results, statistical significance 
was defined as p<0.1 where H0: μ1 = μ2 = … = μn, where “n” is the number of factors 
under consideration (Montgomery 2008, 70–71).  Table 5-5 consolidates the results. 
Constraint Significant Factors? Factor 
P-value 
Cost None N/A 
Schedule  Position 0.0706 
Quality* Level of Formal Education 0.0621 
Risk* None N/A 
*Inverse Square Root Transformation 
Table 5-5: Significant Factors per Constraint 
 
Based on the data collected, Table 5-6 shows the expected weight for the 
Schedule constraint for stakeholders in the two Position categories.  It also shows the 
expected weight for the Quality constraint for stakeholders in the four LoE categories.   
 Management Technical 
Schedule 0.33 0.43 
 
 Masters Bachelors Tech/Associates High School 
Quality 0.147 0.069 0.07 0.098 
Table 5-6: Expected weights per factor level 
 From these results, it can be expected that a technician will be more willing to 
sacrifice schedule than a manager, as indicated by the larger expected value of the 
weight.   For the Quality constraint, it can be expected that those with a Master’s 
degree will be more ready to sacrifice quality, with the remaining three categories 
significantly less willing. 
5.2.3 Utility/Risk Tolerance – The Results 
 Sections 3.2.2 and 3.3.5 described the process for obtaining the risk tolerance 
of each participant and determining which (if any) of the three demographics under 
  
169 
 
study drove the response.  To summarize that procedure, in the survey, subjects were 
asked to provide the monetary value required to trade in their chance of winning 
$5000 for cash-in-hand.  Each participant was asked for that value against 5 different 
probabilities of winning.  There were 38 total responses for this question. 
 When plotting the results of this survey, the plots showed that most subjects 
did not exhibit risk-averse behavior across the entire spectrum of probabilities as 
would be indicated by a curve that was concave at all points (Raiffa 1968, 68).  This 
is not entirely unexpected, however, as previous research has demonstrated, utility 
curves that are both concave and convex are actually very common and reflect 
changing preferences as the risk/rewards ratio varies (Raiffa 1968, 8–9, 94–95).   
 Some subjects provided responses that resulted in an extreme curvature which 
may be tied to the way the question was asked.  The questions in the survey 
mentioned only the possibility of winning $5000 and did not conclude with the 
statement, “…or of walking away with nothing.”  It is believed some subjects may 
have anchored on the possibility of winning the full $5000 total, meaning that 
anything less than that total would be considered a loss.  From Chapter 2, it was 
shown that most people will focus more on a possible loss than a potential gain 
(Kahneman 2011, 119, 281–84).   In this case, turning in the ticket represented a loss 
of the difference between the initial $5000 and the trade in value.  In some cases the 
prospect of that loss appeared to cause the subjects to demand a trade value greater 
than utility.  Given the questionable nature of the resulting curves and the issue 
described with the question itself, the results were simplified as described in the next 
  
170 
 
paragraph.  The complete responses for each participant can be found in Appendix 
A.7. 
  The results below reflect only three points:  (0,0), (X, 0.5), and (5000, 1).  
These points corresponded to the minimum monetary trade value and the minimum 
probability of winning, the participant’s monetary trade in value at the 50% chance 
point, where X is the participant’s monetary trade value, and the maximum possible 
monetary trade value and the maximum probability of winning.  These simplified 
curves are shown in Figure 5-1 through Figure 5-5.  In these figures, the top two 
graphs show the utility curves described above.  Each point represents a response 
from the participants in the demographic category as described in the chart title.  The 
bottom two graphs are a histogram of the frequency of a particular response.   
Figure 5-1 describes the behavior along the management/technical divide, 
Figure 5-2 and Figure 5-3 describe the responses among the different ranges of the 
YoE demographic, and Figure 5-4 and Figure 5-5 describe the responses among the 
LoE demographic.  Utility curves were created using the MatLab™ “fit” command 
with the ‘power1’ fit option with one exception.  One participant responded that the 
trade-in value at 0.5 percent chance of winning was the full $5000 offered.  An 
acceptable fit curve was not found to match this data, so that plot simply connects the 
three points [0,0], [5000,0.5], [5000,1].  These results can be seen in the top-left of 
Figure 5-1, the top-right of Figure 5-2, and the top-right of Figure 5-4.     
  
171 
 
 
Figure 5-1: Utility Curve – “Position” Demographic 
  
172 
 
 
Figure 5-2: Utility Curve – “Years of Experience” Demographic 
  
173 
 
 
Figure 5-3: Utility Curve – “Years of Experience” Demographic (continued) 
  
174 
 
 
Figure 5-4: Utility Curve – “Level of Formal Education” Demographic 
  
175 
 
 
Figure 5-5: Utility Curve – “Level of Formal Education” Demographic (continued)
  
176 
 
 While differences can be seen in the responses among the different 
demographics, the model was not statistically significant and of the three 
demographics, there were no significant factors driving the results.  The effect of risk 
aversion on estimating practices is provided in Section 5.3.3and discussed further in 
Section 7.2.4. 
5.2.4 Confidence Analysis – The Results 
 Based on the methods described in Section 3.3.6, the model describing the 
confidence estimates was significant at a p-value of 0.02 and the YoE factor was the 
significant factor driving the participant responses (p<0.02).  There were 26 data 
points used in the analysis.  Table 5-7 provides the expected confidence level for 
subjects at each of the YoE factor levels:  
Factor Level Expected Response 
0-7 0.725 
8-14 0.767 
15-23 0.858 
24+ 0.891 
Table 5-7: Expected Confidence Values 
 These results indicate that stakeholders gain confidence in their estimates as 
they progress through their careers.  Based on these results, it would appear that 
experience is driving confidence when it comes to schedule estimation.  A discussion 
on the meaning of “confidence” as applies to a single value will be provided in 
Chapter 7. 
 
 
  
177 
 
5.3 Scheduling Results  
 The previous section provided results on stakeholder project constraint 
priorities and stakeholder personality traits.  The following section provides the 
results of the Scheduling surveys and analyzes those results in the light of the 
demographics of the respondents.   
5.3.1 Network Path Standard Deviation Results 
 For each participant response within a particular project, a total project 
duration (Te) was calculated by summing the PERT average duration for each activity 
in the project.  Within each project, the standard deviation among the Te value for 
each participant was calculated.  The resulting standard deviations are provided below 
in Figure 5-6.  It was hypothesized that if stakeholders agreed about the total duration 
of a project, within that project, the standard deviation of the total time to completion 
should be zero.    
 
Figure 5-6: Standard Deviation of Te 
0
1
2
3
4
5
6
7
8 16 24 32 40 48 56 64 72 80 More
Fr
eq
ue
nc
y
Bin - Hours
Standard Deviation of Te (in hours)
Frequency
  
178 
 
From this chart, it can be seen that for the 19 projects used in the analysis, the 
standard deviation in total duration for six projects was less than eight hours (i.e. a 
standard work day).  On the other end of the chart, the standard deviation was several 
days, with the most extreme deviation being over a month.   For projects below the 
“80 hours” bin, total project duration did not seem to be a driving factor of the 
standard deviation (i.e. longer projects did not necessarily always have a larger 
standard deviation).  For projects above the “80 hours” bin, there did appear to be a 
correlation between project length and standard deviation, but the correlation was not 
linear and did not hold for all projects.  The gap in the middle is reflective of the 
estimated durations of the individual activities.  For projects to the left of the gap, 
most activities were estimated to take ten hours or less.  On the opposite end, activity 
estimates are much higher, especially the “worst case” estimates.  With more room to 
maneuver, subjects had a wider variety of opinions regarding how long things should 
take, resulting in a wider standard deviation.   
These results show that there are differences in the estimates provided by 
stakeholders; otherwise the standard deviation within each project would be zero.  
While some of these differences are nearly insignificant (less than half of a standard 
work day), other differences are quite extreme.  The question then becomes what is 
driving these differences in estimation. 
5.3.2 Comparison Results 
 Surveys for several different types of projects were created and provided to 
subjects assigned to those projects who had agreed to participate in the study.  Thirty-
nine individual surveys were created, along with eight “collective” surveys consisting 
  
179 
 
of compilations previously created surveys, where independent parts of the same 
project were combined (i.e. all parts had to be completed for the overall project to be 
successful, but the individual parts did not interact with one another).  These 
“collective” surveys were provided to management subjects who were responsible for 
more than one aspect of a project, but contained the same activity lists as the 
individual surveys.  When analyzing the data, responses from the “collective” surveys 
were broken out and associated with their original individual survey.   
One survey was left out of the final analysis because the responses of the 
subjects were so disparate in their format, it would have been extremely challenging 
to accurately compare them without making several assumptions.  Taking into 
account the information just provided, of the 39 surveys created, usable responses 
were received from at least one participant on 30 of the surveys.  In the raw data 
provided in Appendix A.9, it should be noted that there are thirty-five projects listed 
with their associated estimates.  Five of these projects are “dummy projects” and are 
being used to help further mask the subjects.  Data from these projects was not used 
during the analysis process.   
 Out of the 45 subjects who agreed to participate in the study, 31 provided 
responses to the scheduling surveys.  Several subjects provided responses to more 
than one survey.  Appendix A.9 provides a summary of each of the responses of each 
participant on each survey and provides a description of how to read the consolidated 
data. 
 The data from Appendix A.9 (and some from Appendices A.7 and A.8) were 
organized using the questions listed in Table 3-5.  The results of the analysis for each 
  
180 
 
demographic are listed below in Table 5-8.  The first column indicates the question 
number from Table 3-5. The second column indicates the number of successes (i.e. 
the total number of “Yes” answers for that particular question).  The third column 
indicates the total number of trials.  The fourth column indicates the sample 
probability of success and is calculated by dividing the first column by the second 
column (R Core Team 2014).   For example, in the management demographic, for 
question #1, the 0.88 value indicates that 88% of the time a management participant 
provided a Te value higher than a technician.  The fifth column indicates the 
alternative hypothesis for each question, where the alternative hypothesis is either that 
the true population success rate is greater than 50% or less than 50%, depending on 
the results of the sample success rate.  The sixth column indicates the p-value for each 
binomial test and the seventh column indicates whether or not the results are 
statistically significant (p<0.05).    
Q# # of 
successes 
Total 
Tests 
Sample 
success 
rate 
Alternative 
Hypothesis 
p-value Statistically
 Significant? 
Position Demographic 
1 21 26 0.81 μ1 > 0.5 0.001247 Yes 
2 447 602 0.74 μ1 > 0.5 <2.2x10-16 Yes 
3 106 217 0.49 μ1 < 0.5 0.393 No 
4 136 217 0.63 μ1 > 0.5 0.0001 Yes 
5 55 217 0.25 μ1 < 0.5 9.92x10-14 Yes 
6 73 305 0.24 μ1 < 0.5 <2.2x10-16 Yes 
7 65 305 0.21 μ1 < 0.5 <2.2x10-16 Yes 
8 17 33 0.52 μ1 > 0.5 0.5 No 
9 200 305 0.66 μ1 > 0.5 2.897x10-8 Yes 
10 62 152 0.41 μ1 < 0.5 0.014 Yes 
11 23 28 0.82 μ1 > 0.5 0.0004 Yes 
Years of Experience Demographic 
1 20 40 0.50 μ1 > 0.5 0.5627 No 
2 447 602 0.74 μ1 > 0.5 <2.2x10-16 Yes 
3 179 367 0.49 μ1 < 0.5 0.3381 No 
  
181 
 
Q# # of 
successes 
Total 
Tests 
Sample 
success 
rate 
Alternative 
Hypothesis 
p-value Statistically
 Significant? 
Years of Experience Demographic (cont.) 
4 165 367 0.45 μ1 < 0.5 0.0300 Yes 
5 147 367 0.40 μ1 < 0.5 8.171x10-5 Yes 
6 183 482 0.38 μ1 < 0.5 7.103x10-8 Yes 
7 191 482 0.40 μ1 < 0.5 3.025x10-6 Yes 
8 25 49 0.51 μ1 > 0.5 0.5 No 
9 244 482 0.51 μ1 > 0.5 0.4099 No 
10 71 241 0.29 μ1 < 0.5 7.61x10-11 Yes 
11 35 40 0.88 μ1 > 0.5 6.913 x10-7 Yes 
Level of Formal Education Demographic 
1 25 41 0.61 μ1 > 0.5 0.1055 No 
2 447 602 0.74 μ1 > 0.5 <2.2x10-16 Yes 
3 161 361 0.45 μ1 < 0.5 0.02268 Yes 
4 185 361 0.51 μ1 > 0.5 0.3969 No 
5 153 361 0.42 μ1 < 0.5 0.0022 Yes 
6 158 416 0.38 μ1 < 0.5 5.405x10-7 Yes 
7 158 416 0.38 μ1 < 0.5 5.405x10-7 Yes 
8 18 45 0.40 μ1 < 0.5 0.1163 No 
9 220 418 0.53 μ1 > 0.5 0.1522 No 
10 106 264 0.40 μ1 < 0.5 0.0008 Yes 
11 28 37 0.76 μ1 > 0.5 0.001282 Yes 
Table 5-8: Binomial Analysis by Demographic  
 For the results focused solely on the schedule estimates (Questions 1-7,9), the 
Position demographic has the smallest probability of occurring by chance (i.e. 
smallest p-value) with one exception (Question #2 is the same for each demographic 
and is therefore discounted):  for Question #3, which looks at the separation between 
the ML and BC values, the number of successes in the LoE demographic is 
statistically significant while the others are not.   
The results for the remaining questions (Question 8,10-11) were not as clear-
cut.  These results were based on answers from the Traits/Opinions survey and the 
confidence estimates from the Scheduling Survey.  These results show that there are 
no significant factors driving risk-aversion in stakeholders among the different 
  
182 
 
demographics.  This general result matches with the results achieved using DOE, but 
the p-values seen using this binary yes/no method, do not correspond to the 
significance order seen in DOE.   DOE showed that the LoE demographic was the 
factor with the least effect on the results provided by the subjects.  These binary 
comparisons would indicate it is the largest (although still not statistically 
significant).  With respect to confidence, the Years of Experience demographic 
produced the most significant results, which correlates with the results found using 
DOE with the actual estimates.  When determining readiness to sacrifice the schedule 
for other project constraints, the Years of Experience demographic once again 
exhibited the most significant results.  These results differ from those found using 
DOE where the Position demographic was determined to be the most significant 
factor driving the results. 
Across all demographics, for Question #2, when examining the separation 
between the ML and BC estimates versus the separation between the ML and WC 
estimates, nearly 75% of subjects are allowing more time for things to go wrong than 
they hope that things will go right.  This result correlates with the literature that states 
most people fear loss more than they appreciate gain (Kahneman 2011, 281–84).  
These results indicate that subjects were compensating more for unknowns by 
providing a larger WC estimate than they were assuming things would go well which 
would be indicated by a small BC estimate.   
5.3.3 Correlation Results 
 One objective was to determine whether or not certain demographics 
exhibited traits and, if so, did those traits have an effect on project duration estimates.   
  
183 
 
The results described in previous sections of this chapter have discussed some 
different characteristics such as risk aversion and project constraint preferences.  This 
section provides the results to the correlation questions described in Section 3.3.7.  In 
Table 5-9, the third column displays the data using all available data.  The fourth 
column displays the results if only projects with three or more subjects were included.   
Correlation 
Question 
Correlation Factors Correlation 
Coefficient 
(all projects) 
Correlation 
Coefficient 
(projects with 
3 or more 
subjects) 
QC1 Confidence and standard deviation 
negatively correlated 
-0.30 N/A* 
QC2 Confidence and Utility positively 
correlated 
0.1 0.14 
QC3 Utility values and standard 
deviation negatively correlated 
0.15 -0.21 
QC4 Utility and Te negatively 
correlated 
-0.04 -0.29 
QC5 AHP and Te positively correlated 0.02 -0.10 
* In this case, a correlation coefficient was calculated for each participant within a project.  These 
correlation coefficients were then averaged across all projects to provide the value shown.  Because the 
coefficient was calculated per participant and not per project, there was no case where only two values 
were used in the correlation.  
Table 5-9: Correlation Results 
  
 Based on the results of the chart above, using the results from the fourth 
column where only projects with three or more subjects were considered, the 
following conclusions were drawn as shown in Table 5-10: 
Question # Conclusion 
QC1 There is a weak negative correlation between subjects having a larger 
standard deviation and a lower confidence.  This indicates that project 
stakeholders who are less confident in their ML estimate will probably 
provide a wider range between their BC and WC values to 
compensate for that uncertainty.   
QC2 There is a very weak positive correlation between confidence in the 
ML estimate and Utility.  This would indicate that level of risk 
aversion does not significantly affect confidence levels regarding the 
  
184 
 
Question # Conclusion 
ML estimate.  This confidence level may be more driven by 
familiarity with the project as opposed to an overarching personality 
trait. 
QC3 There is a weak negative correlation between Utility values and 
standard deviation.  This would indicate that personnel who exhibit 
risk-averse behavior will manifest that behavior in a scheduling 
context by compensating for the unknown with a wider range of 
possible activity completion times.  This wider range increases the 
probability that the actual time will fall somewhere within the 
provided estimate, thus mitigating the risk of failing to provide a good 
estimate. 
QC4 There is a weak correlation between Utility and Te.  This would 
indicate that personnel who exhibit risk-averse behavior in general 
will manifest that behavior in a scheduling context by compensating 
for the unknown with a higher Te.  This behavior increases the chance 
that the final completion time will fall below the expected value as 
calculated by summing activity times using the PERT average.   
QC5 There is a very weak correlation between willingness to sacrifice 
schedule (as measured by the AHP weight) and Te.  It also indicates 
that what little correlation exists is negative.  This indicates that 
willingness to sacrifice schedule in the event of problems on the 
project does not significantly affect the initial Te estimate.  It also 
hints that those willing to sacrifice schedule first are providing smaller 
Te estimates.   
 
Table 5-10: Correlation Conclusions 
 
5.3.4 Data Collection Challenges 
When collecting the data, some challenges arose which may have affected the 
estimates provided.  Every attempt was made to gather inputs prior to the beginning 
of project execution such that the estimates provided were true estimations and not 
after-the-fact reconstructions of the actual events.  Some subjects provided estimates 
a day or two after the project started (cells highlighted in gray in Appendix A.9), but 
the estimates are included because these were all management subjects and it is 
believed that very little information about activity completion had been reported by 
  
185 
 
the time the estimate was provided.  In many cases, the projects under consideration 
had already been planned out by the time the estimates were gathered.  Subjects were 
asked to provide estimates based on what they would recommend if they were 
completely in charge, but the already-planned timelines may have affected the 
estimates.  Another factor which may have affected the outcome was the availability 
of resources on the project (i.e. percentage of time project personnel were allocated to 
work on the given project).  Some project surveys stated that assets should be 
considered to be allocated at 100%, so these should not be an issue.  For other 
projects, the assumption of allocated time was less than 100%.  Some respondents 
may have provided an estimate based on the provided availability, but others for that 
same survey may have assumed 100% availability of all resources.  These differences 
could have affected the total durations provided for each activity.  In rare instances, 
subjects provided BC estimates that were larger than the ML estimates.  These 
activity estimates were removed from the data set.  If a PERT duration could not be 
calculated for any activity in a project, that activity was removed from the Te 
summation for all subjects in an effort to standardize the number of activities used in 
the summation. 
 
5.4 Predicting Te 
 The previous section discussed how demographics can affect personality traits 
and even how one estimates activity durations.  This section compares the results of 
the duration estimates based on the demographics of the subjects who provided the 
estimates.  This was done to not only determine which demographic drives the 
  
186 
 
response, but to also predict the future estimates of stakeholders belonging to that 
particular demographic.  If a project manager can only get one estimate from a 
stakeholder, the results of this section will allow her to calculate the remaining two 
estimates needed to determine a PERT average, as long as she knows which 
demographic category the stakeholder belongs to.  
5.4.1 Worst-Case Estimate as Related to Most Likely 
 The first part of Section 3.4.1 described the method for studying the skew of a 
participant’s prior distribution, assuming a PERT beta model.  If subjects accounted 
equally for things going well and things going poorly, the distribution would have no 
skew, indicating that the separation between the ML and WC estimates should have 
been equal to the separation between the ML and BC estimates.  With the BC and 
WC estimates being equidistant from the ML estimate, performing Equation 3-5 on 
the estimates should result in a value of 0.5 because the numerator will always be half 
of the denominator.  If the result of Equation 3-5 was less than 0.5, it would indicate a 
positive skew, where the smaller the value, the larger the skew.  A value greater than 
0.5 indicates a negative skew, where a larger indicates a larger skew.  After 
performing Equation 3-5 on the estimates from each participant, consolidating, and 
analyzing the data, it was determined that the significant factor affecting these results 
was the Position demographic (p < 0.015).  It was also seen that, based on the 
calculated expected value, both managers and technicians exhibited some level of 
positive skew in their estimates.  The predicted results for the Position demographic 
are listed below in Table 5-11. 
  
187 
 
Factor Level (𝑀𝑀𝑀𝑀−𝐵𝐵𝐵𝐵)(𝑀𝑀𝑀𝑀−𝐵𝐵𝐵𝐵)+ (𝑊𝑊𝐵𝐵−𝑀𝑀𝑀𝑀)  1 – ( (𝑀𝑀𝑀𝑀−𝐵𝐵𝐵𝐵)((𝑀𝑀𝑀𝑀−𝐵𝐵𝐵𝐵)+ (𝑊𝑊𝐵𝐵−𝑀𝑀𝑀𝑀)) 
Management 0.3882 0.6118 
Technician 0.293 0.707 
Table 5-11: Separation Weight Ratio 
 These results show that management subjects exhibit a roughly 40/60 split in 
the separation between the ML and BC estimates and the ML and WC estimates, 
respectively.  Technicians, on the other hand, exhibit a roughly 30/70 split.  This 
would indicate that, in general, technicians have a higher positive skew than their 
manager counterparts and are compensating for what could go wrong more than the 
managers. 
 This result tells only part of the story, however, since many different estimates 
could combine to produce these ratios.  To narrow down the possible values, 
Equation 3-6 and Equation 3-7 were applied to each participant’s estimates, 
consolidated, and analyzed. Once again, it was shown that the “Position” 
demographic was the significant factor driving the results which are summarized 
below in Table 5-12. 
 Result Model 
Significant? 
P-value Significant 
Factor 
P-value 
BC/(ML+BC) No 0.194 N/A N/A 
WC/(ML+WC) Yes 0.048 Position 0.048 
Table 5-12: Outlier Weight Significant Factors 
 Given that the “WC” ratio was significant, the expected response for the two 
Position demographic levels are listed below in Table 5-13. 
Demographic WC/(ML+WC) =  
Management 0.5577 
Technician 0.6315 
Table 5-13: Outlier Weight Ratio 
  
  
188 
 
5.4.2 Expanding the Results – Te Assessment 
 The results from Table 5-12 and Table 5-13 and Equation 3-5, Equation 3-6 
and Equation 3-7 resulted in the system of simultaneous equations listed below for 
the management demographic:  
𝑊𝑊𝐵𝐵(𝑀𝑀𝑀𝑀+𝑊𝑊𝐵𝐵) = 0.5577                                                 Eqn 5-1   
   
 
(𝑀𝑀𝑀𝑀−𝐵𝐵𝐵𝐵)(𝑀𝑀𝑀𝑀−𝐵𝐵𝐵𝐵)+ (𝑊𝑊𝐵𝐵−𝑀𝑀𝑀𝑀) = 0.3882                                       Eqn 5-2 
 
Solving the first equation for the WC value in terms of the ML value, and then 
substituting that term for the WC value in the second equation, provided the  
following results for the management case:  
WC = 1.2609*(ML)                                   Eqn 5-3  
 
BC = 0.8345*(ML)                                        Eqn 5-4 
 
For the technicians, the following equations were used: 
𝑊𝑊𝐵𝐵(𝑀𝑀𝑀𝑀+𝑊𝑊𝐵𝐵) = 0.6315                                                        Eqn 5-5 
                  (𝑀𝑀𝑀𝑀−𝐵𝐵𝐵𝐵)(𝑀𝑀𝑀𝑀−𝐵𝐵𝐵𝐵)+ (𝑊𝑊𝐵𝐵−𝑀𝑀𝑀𝑀) = 0.293                                            Eqn 5-6 
 
This resulted in the following values for the technician demographic: 
WC = 1.7137*(ML)                          Eqn 5-7 
 
  
189 
 
BC = 0.7042*(ML)                                         Eqn 5-8 
 From these equations, without knowing anything except the demographic of 
the stakeholder and the ML estimate of an activity from that stakeholder, it is possible 
to make a reasonable assumption on the value of the BC and WC values.  These 
values also provide some insight as to what in the estimate is driving the higher 
duration times.  Based on these ratios, assuming both a technician and a manager 
provide the same ML value, the expected value of the activity duration as calculated 
by Equation 2-1 will always result in the technician having a higher estimate than the 
manager.  These results also indicate that the expected standard deviation as 
calculated using Equation 2-2 will be larger for a technician than for a manager. 
 
5.4.3 Duration Estimate Skew 
The results from Section 5.4.2 were calculated based on the summation of 
each of the estimates for each project.  From these results, at the project level, the 
relation of the BC, ML, and WC estimates typically resulted in a positive skew.  
When looking at each individual activity estimate and performing a simple 
comparison of the separation between the ML and BC values and the ML and WC 
values, the results tell a slightly different story.  Using the equation (WC-ML) – (ML-
BC) on each individual activity estimate, the following results were obtained:   
 
 
 
  
190 
 
The result of             
(WC-ML) – (ML-BC)    
is: 
< 0 
Negative 
Skew 
= 0 
No Skew 
> 0 
Positive Skew Total 
Total 50 105 447 602 
Management 32 71 201 304 
Technical 18 34 246 298 
Table 5-14: Skew Results 
  
From Table 5-14, it can be seen that across the board for all subjects, most 
subjects provided an estimate with a positive skew.  It can also be seen, however, that 
several estimates had no skew at all.  In some rare cases, some subjects even had a 
negative skew, indicating there was more uncertainty about the BC estimate than the 
WC estimate.  When these numbers were broken down into the two Position 
demographic factors, it can be seen that the technicians heavily favored the positively 
skewed distributions.  Managers also favor this distribution, but they are more likely 
to provide estimates resulting in either no skew or negative skew. 
  
  
191 
 
 
Chapter 6 Results – Aggregating the Estimates 
 
The previous chapter described how stakeholders from different demographic 
categories respond differently from one another with respect to project constraint 
preferences and schedule estimating practices.  It also showed how stakeholders differ 
in personality traits, even if there was not one specific demographic driving those 
differences.  The correlation analysis showed that these different personality traits did 
appear to have some bearing, however slight, in how stakeholders estimated activity 
durations.  Given these results and the decision analysis literature which shows how 
biases and perceptions can affect assessments of the unknown, this chapter describes 
a method which allows a project manager to aggregate all of the duration estimates 
provided by the team into one final estimate that can be used in a network schedule. 
 
6.1 Determining the Prior 
 Section 3.5.1 described the method for converting the estimates provided by 
project team members into a probability distribution.  In Bayesian statistics, this 
distribution is referred to as the “prior” because it is based solely on the individual’s 
prior state of knowledge.  This section will step through the method described in 
Section 3.5.1 using example estimates and compare those results to what would have 
been derived using the PERT beta distribution.   
  
192 
 
 For this process example, it is assumed that a project manager, also known as 
the Decision Maker (DM) is developing a schedule and has asked for inputs from 
three other people: a fellow manager who has worked several similar projects in the 
past and two technicians who are currently assigned to the project.  These three 
people will be referred to as Expert #1 (manager), Expert #2 (technician), and Expert 
#3 (technician).  Table 6-1 shows the estimates (in hours) for one particular activity in 
this project and the resulting mean and standard deviation.  In Table 6-1, k is the 
shape parameter of the GEV distribution (Max or Min) as calculated using Equation 
3-20 and Equation 3-22, the location parameter is determined by the ML estimate, 
and Delta is the difference between the GEV CDF evaluated at the WC value and the 
GEV CDF evaluated at the BC value.  For Expert #3, k is replaced by σ, the standard 
deviation of a Normal distribution, and Delta is the standard value of 3σ for a Normal 
distribution.   The mean for the GEV Max and GEV Min case were calculated using 
Equation 3-31 and Equation 3-32, respectively, and the standard deviation was 
calculated by taking the square root of Equation 3-33. 
 ML BC WC Type k or σ Delta Mean Std Dev 
DM 17 10 31 GEV Max 3.15269 0.988 18.82 4.04 
Expert #1 25 13 51 GEV Max 5.40461 0.992 28.12 6.93 
Expert #2 15 8 19.5 GEV Min 1.96259 0.972 13.87 2.52 
Expert #3 20 11 29 Normal 3 0.997 20 3 
 Table 6-1: Prior Distributions 
 
  
193 
 
As a point of reference, the PERT means and standard deviations are provided 
in Table 6-2.  In this table, k is the shape parameter and μ is the location parameter 
for the GEV distributions, with the means and standard deviations calculated as 
described above.  The Normal distribution is described by the location parameter μ 
and the shape parameter σ.  The PERT beta distribution is described by the α and β 
shape parameters as calculated using Equation 3-12 and Equation 3-13.  The means 
and standard deviations of the PERT examples were calculated using Equation 2-1 
and Equation 2-2.  In order to solve for the two beta parameters, α and β using 
Equation 3-12 and Equation 3-13, an assumption was made that the PERT mean (as 
calculated using Equation 2-1) was reasonably close to the true beta mean as 
calculated by Equation 2-3. 
 GEV Approximation PERT Beta Approximation 
 k or σ μ Mean Std Dev α β Mean Std Dev 
DM 3.15269 17 18.82 4.04 2.33 3.67 17.33 2.67 
Expert #1 5.40461 25 28.12 6.93 2.26 3.74 27.33 6.33 
Expert #2 1.96259 15 13.87 2.52 3.43 2.57 14.58 1.92 
Expert #3 3 20 20 3 3 3 20 3 
Table 6-2: Mean/Std Dev Comparisons 
By definition, the entire density of a beta distribution must fall between the 
BC and WC estimates since the distribution is zero outside that stated range (“Beta 
Distribution” 2016; Grubbs 1962, 913).  For the two GEV distributions and the 
Normal distribution, there is some density outside the chosen range (“Generalized 
Extreme Value Distribution - Wikipedia” 2016). 
  
194 
 
 Figure 6-2 through Figure 6-4 show plots of the distributions described in 
Table 6-2. In the legend, “c” is the normalizing constant used on the beta distribution 
so it could be easily compared to the GEV and Normal distribution models. 
 
Figure 6-1: Decision Maker – GEV and Beta Distribution Models 
 
  
195 
 
 
Figure 6-2: Expert #1 – GEV and Beta Distribution Models 
 
Figure 6-3: Expert #2 – GEV and Beta Distribution Models 
  
196 
 
 
Figure 6-4: Expert #3 – GEV and Beta Distribution Models  
 
 As can be seen from Figure 6-2 through Figure 6-4, the graph of the GEV 
Max approximation begins to move away from the x-axis (i.e. gain appreciable 
density) at the location of the BC estimate and the GEV Min begins to move away 
from the x-axis at the WC estimate.  It can also be seen that the mode of both the 
Beta/GEV and Beta/Normal approximations occur at the ML value.  The major 
differences between the two approximations occur at the WC value (for the GEV 
Max distribution) and the BC value (for the GEV Min distribution).  With the GEV 
distribution defined on the entire real number axis, only two of the three parameters 
could be set.  The choice was made to set the parameter which seemed to exhibit less 
uncertainty (the BC estimate for the GEV Max model and the WC estimate for the 
  
197 
 
GEV Min model).  This resulted in the remaining parameter not matching as closely 
to the PERT Beta approximation.   
6.2 Calibrating the Expert 
 Once the prior distribution had been established, it could be subjectively 
calibrated based on the DM’s belief about the Expert by using the charts in Appendix 
A.10 – Appendix A.12).  Multiplying together Equation 3-26 (using the appropriate 
values of α and β) and the appropriate prior shape as described either Equation 3-14, 
Equation 3-16, or Equation 3-18, causes the variance of the original prior to shrink or 
grow.  The examples below show the affects of different calibrations schemes on the 
three prior distribution types.  The parameters for the beta filter are provided in Table 
6-3 as a reference (the entire list can be found in Appendix A.10 – Appendix A.12). 
 
Figure # Distribution Type Calibration Percentage α β 
Figure 6-5 GEV Max 5%     -   Understated  2.03 2.77 
 GEV Max 10%   -   Understated 1.44 1.76 
 GEV Max 15%   -   Understated 1.17 1.29 
 GEV Max 30%   -   Overstated 0.80 0.65 
 GEV Max 35%   -   Overstated 0.73 0.54 
 GEV Max 40%   -   Overstated 0.68 0.45 
     
Figure 6-6 GEV Min 5%     -   Understated  2.77 2.03 
 GEV Min 10%   -   Understated 1.76 1.44 
 GEV Min 15%   -   Understated 1.29 1.17 
 GEV Min 30%   -   Overstated 0.65 0.80 
 GEV Min 35%   -   Overstated 0.54 0.73 
 GEV Min 40%   -   Overstated 0.45 0.68 
     
Figure 6-7 Normal 5%     -   Understated  2.09 2.09 
 Normal 10%   -   Understated 1.53 1.53 
 Normal 15%   -   Understated 1.22 1.22 
  
198 
 
Figure # Distribution Type Calibration Percentage α β 
Figure 6-7 (cont) Normal 30%   -   Overstated 0.71 0.71 
 Normal 35%   -   Overstated 0.60 0.60 
 Normal 40%   -   Overstated 0.52 0.52 
Table 6-3: Calibration Examples 
In Figure 6-5 through Figure 6-7, the figure on the left represents an 
understated expert calibrated at the 15%, 10%, and 5% levels.  The graph on the right 
shows an overstated expert calibrated at the 30%, 35% and 40% levels.  As can be 
seen in these figures, the variance of the calibrated expert will change while the mode 
remains in the same location along the x-axis.  
 
Figure 6-5: Expert #1 - GEV Max Calibration Results 
 
Figure 6-6: Expert #2 - GEV Min Calibration Results 
  
199 
 
 
 
Figure 6-7: Expert #3 - Normal Calibration Results 
6.3 Calculating the Posterior 
 Once the Expert’s prior has been calibrated, the final posterior distribution can 
be calculated.  The posterior is calculated by multiplying together the DM’s prior 
with all of the calibrated Expert estimates that have been provided.  Because the final 
equation describing the resulting curve is unknown, an approximation is used to 
calculate the mean and variance of the posterior.  Using Excel™ and evaluating the 
posterior curve from zero to some value larger than the largest WC estimate (among 
all estimates provided) allows the Decision Maker to determine the maximum value 
(i.e the mode) of the resulting posterior distribution.  Once normalized, Equation 3-30 
can be used to solve for k, the shape parameter of the distribution.  With the mode 
and shape parameter determined, the mean and variance of the posterior distribution 
can be calculated.   
 Using the example estimates shown in Table 6-1 (reproduced in part below in 
Table 6-4, it can be seen how the various parameters change with different 
combinations of expert opinion and different calibration levels.  Table 6-5 provides a 
  
200 
 
summary of several example combinations.  In Table 6-5, the first column describes 
the combination of prior estimates and the resulting posterior distribution model (i.e. 
GEV Max, GEV Min, or Normal).  In the subsequent columns, ML is the resulting 
posterior mode, c is the normalizing constant used when multiplying together the 
priors, k is shape the parameter for the GEV posterior approximation, and σ is the 
standard deviation of the Normal model.  Note that the mean and standard deviation 
for the GEV model are calculated using Equation 3-32 and Equation 3-33, while the 
mean and standard deviation for the Normal distribution match the ML estimate and 
the σ parameter. 
 ML Type k σ Mean Std Dev 
DM 17 GEV Max 3.15269 N/A 18.82 4.04 
Expert #1 25 GEV Max 5.40461 N/A 28.12 6.93 
Expert #2 15 GEV Min 1.96259 N/A 13.87 2.52 
Expert #3 20 Normal N/A 3 20 3 
Table 6-4: Summary Example Estimates 
 
Posterior 
Combination/Approximation 
ML c k or σ Mean Std Dev 
Expert Calibration = No Calibration 
DM * E1 – GEV Max 20.8 38.905 2.93712 22.50 3.77 
DM * E2 – Normal 15.6 15.748 1.36851 15.6 1.37 
DM * E3 – Normal 18.8 13.227 2.41226 18.8 2.41 
DM * E1 * E2 – Normal  16.7 3520.98 1.05089 16.7 1.05 
DM * E1 * E3 – Normal  20.4 451.428 2.0966 20.4 2.10 
DM *E2 * E3 – Normal  16.3 345.924 1.14922 16.3 1.15 
DM * E1 * E2 * E3 – Normal  17.1 50249.4 0.94969 17.1 0.95 
Expert Calibration = 10% 
DM * E1 – GEV Max 21.8 38.120 2.70253 23.36 3.47 
DM * E2 – Normal 15.4 17.426 0.93837 15.4 0.94 
DM * E3 –Normal 19.2 11.855 2.16911 19.2 2.17 
DM * E1 * E2 – Normal 16.7 11402 0.87696 16.7 0.88 
DM * E1 * E3 – Normal  20.8 426.25 1.72798 20.8 1.73 
DM * E2 * E3 – Normal  16.3 475.547 0.95227 16.3 0.95 
DM * E1 * E2 * E3 – Normal  17 185451 0.78244 17 0.78 
  
201 
 
Expert Calibration = 30% 
DM * E1 – GEV Max 20.1 41.533 3.03118 21.85 3.89 
DM * E2 – Normal 15.7 17.989 1.49371 15.7 1.49 
DM * E3 – Normal 18.4 14.154 2.68723 18.4 2.69 
DM * E1 * E2 – Normal  16.7 2468.3 1.1748 16.7 1.17 
DM * E1 * E3 – Normal  20 531.53 2.39821 20 2.40 
DM * E2 * E3 – Normal  16.3 336.75 1.30241 16.3 1.30 
DM * E1 * E2 * E3 – Normal  17 35256 1.0787 17 1.08 
 
Table 6-5: Posterior Duration Results 
 Figure 6-8 through Figure 6-14 provide a graphical representation of the data 
shown in Table 6-5.  The graph in the top left shows the prior distributions of each 
participant.  The remaining graphs show the posterior distribution when calculated 
using Equation 3-28 and also the resulting GEV Max, GEV Min, or Normal 
approximation, as appropriate.  The top right graph shows the resulting posterior 
distribution if all experts are fully calibrated, the bottom left graph shows a 10% 
calibration level, and the bottom right graph shows a 30% calibration level.  
  
202 
 
 
Figure 6-8: Decision Maker and Expert #1 
  
203 
 
 
Figure 6-9: Decision Maker and Expert #2 
  
204 
 
 
Figure 6-10: Decision Maker and Expert #3 
  
205 
 
 
Figure 6-11: Decision Maker, Expert #1, and Expert #2 
  
206 
 
 
Figure 6-12: Decision Maker, Expert #1, and Expert #3  
  
207 
 
 
Figure 6-13: Decision Maker, Expert #2, and Expert #3  
  
208 
 
 
Figure 6-14: Decision Maker, Expert #1, Expert #2, and Expert #3
  
209 
 
6.4 Further Examples 
 Section 6.3 provided several examples of how the posterior distribution 
changes based on various combinations of calibration schemes and expert inputs.  
Because the example estimates in Table 6-1 represented all three distribution models, 
the resulting posterior distribution was often most closely modeled by a Normal 
distribution.  This section will provide several examples of the posterior distribution 
when the prior distributions all share the same from and one case where the DM and 
the Expert provide wildly disparate estimates.  To simplify these examples, it is 
assumed that all experts are fully calibrated.  
 In Section 6.3, when different types of distributions were multiplied together, 
the tails effectively cancelled each other out, resulting in a posterior that most closely 
resembled a Normal distribution.  As seen in Table 5-14, however, most people 
favored a positively skewed distribution which would be most closely modeled by a 
GEV Max distribution.  When all of the prior distributions are of the same type, the 
resulting posterior distribution will maintain its GEV shape unless there are a large 
number of experts providing estimates Equation 3-29 and the Excel™ command 
described just below Equation 3-29 will provide an indicator of when one should 
switch from the GEV Max approximation to the Normal approximation.  Table 6-6 
shows the initial estimates and prior distribution parameters for a DM and two 
Experts.  It also shows the resulting parameters of the posterior distribution, 
calculated using Equation 3-28.  The priors for all three stakeholders, as well as the 
resulting posterior can be modeled by a GEV Max distribution.  Figure 6-15 provides 
a graph of the priors, the result of Equation 3-28, and the GEV Max approximation 
  
210 
 
used to describe the curve that results from Equation 3-28.  It also shows the Normal 
approximation for reference.  In Table 6-6, the normalizing constant, c, is not 
required for the prior distributions.  For a GEV Max approximation, the BC estimate 
can be calculated using Equation 3-35 and the WC estimate can be approximated 
using Equation 3-41.   
 ML BC WC c k  Mean Std Dev 
DM Prior  17 10 26  3.15269 18.82 4.04 
Expert #1 Prior 25 13 49  5.40461 28.12 6.93 
Expert #2 Prior 15 9 35  2.70230 16.56 3.47 
Posterior 18.7 14.1 30.8 972.32 2.09243 19.91 2.68 
    Table 6-6: GEV Max Example Prior Distribution 
 
 
Figure 6-15: GEV Max Example Priors and Posterior 
The same results can be demonstrated in the GEV Min case.  Table 6-7 
provides similar information to Table 6-6, except the prior estimates and posterior are 
all described by the GEV Min distribution.  Figure 6-16 provides a graph of the priors 
on the left and the posterior on the right.  The three posterior graphs represent the 
  
211 
 
result of Equation 3-28 as performed on the priors, the resulting GEV Min 
approximation, and the Normal approximation, provided for reference. For the GEV 
Min case, the BC value can be solved using Equation 3-42, and WC value can be 
approximated using Equation 3-37.   
 ML BC WC c k  Mean Std Dev 
DM Prior 35 19 40  2.18066 33.74 2.80 
Expert #1 Prior 25 15.3 30  2.18066 23.74 2.80 
Expert #2 Prior  40 32 46  2.61679 38.49 3.36 
Posterior 27.2 20 30 103860 1.23115 26.49 1.58 
Table 6-7: GEV Min Example Prior Distribution 
 
 
Figure 6-16: GEV Min Example Priors and Posterior 
The final examples in this chapter deal with two extreme cases.  The first case 
shows the results of multiple experts in complete agreement with one another and the 
second case shows the results of a DM and Expert in complete disagreement with one 
another.  For each of the prior distribution models, Table 6-8 shows the parameters 
for the DM and nine experts, all of whom are in complete agreement with one 
  
212 
 
another, so their prior distributions are all exactly the same.  Table 6-8 also shows the 
parameters of the resulting posterior distributions (with rounded ML,BC, and WC 
values) in these cases of extreme agreement.  In each of the three posterior cases, the 
resulting posterior curve is most closely modeled by a Normal distribution.  The 
shape parameter, mean, and standard deviation are all reflective of this Normal 
model.  Figure 6-17 through Figure 6-19 show the resulting posterior distributions in 
each of the three cases. 
 ML BC WC c k or σ Mean Std Dev 
DM & 9 Experts: 
GEV Max  
17 10 26  3.15269 18.82 4.04 
DM & 9 Experts: 
GEV Min 
25 15.3 30  2.18066 23.74 2.80 
DM & 9 Experts: 
Normal 
20 11 29  3 20 3 
Posterior – GEV Max 
Model - Normal 
17 14 20 847946379 1.00531 17 1.01 
Posterior – GEV Min 
Model - Normal 
25 23 27 30727596 0.69535 25 0.70 
Posterior – Normal 
Model - Normal 
20 17 23 243164795 0.94868 20 0.95 
Table 6-8: DM and Expert Complete Agreement 
  
  
213 
 
 
Figure 6-17: Posterior: Decision Maker and 9 Experts; Full Agreement – GEV Max 
Model 
 
Figure 6-18: Posterior: Decision Maker and 9 Experts; Full Agreement – GEV Min 
Model 
 
  
214 
 
 
 
Figure 6-19: Posterior: Decision Maker and 9 Experts; Full Agreement – Normal 
Model 
 
The final example shows a case when the DM and Expert are in complete 
disagreement with one another.  Table 6-9 provides the prior distributions for a DM 
and Expert, as well as the resulting posterior distribution as calculated by Equation 
3-28 and the resulting GEV Min approximation.  The BC value and WC value are 
once again solved/approximated using Equation 3-42 and Equation 3-37, 
respectively. 
 ML BC WC c k or σ Mean Std Dev 
DM  - GEV Max  17 10 26  3.15269 18.82 4.04 
Expert #1 – GEV 
Min 
40 30 46  2.61679 38.49 3.36 
Posterior – GEV 
Max 
35.4 0.18 49 1188.1 6.04750 31.91 7.76 
Table 6-9: DM and Expert Severe Disagreement 
  
215 
 
 
Figure 6-20: Decision Maker and Expert #1 – Severe Disagreement 
 Figure 6-20 it shows that in cases of extreme disagreement, the GEV 
approximation of the curve calculated by Equation 3-28 begins to break down.  This 
graph also illustrates another concern with the GEV Min model.  Looking closely at 
the dashed line in the left graph, it can be seen that the graph has not quite collapsed 
to the x-axis at X=0.  This indicates that, using the GEV Min approximation, there is 
a small, but non-zero probability that the activity duration will be negative.  Further 
discussion on this issue will be provided in Chapter 7. 
  
  
216 
 
Chapter 7:  Discussion 
 
 
Scheduling challenges are neither unique to Wallops Flight Facility (WFF), 
nor are they localized to the recent past.  Many practices have been developed to 
describe how one should schedule a project, but despite these recommendations, 
many projects fail to meet deadlines.  Based on this research, it appears there are 
trends in estimation practices among stakeholders in the three demographics studied.  
How can this information be used to develop better schedules?  What can be learned 
from the GAO reports that can be augmented by scheduling best practices?  What is 
the best way to incorporate the estimates of a diverse group of stakeholders?  This 
chapter seeks to tie together previous chapters to answer these questions.   
7.1 Past is Present: GAO Reports vs. Current Results 
The GAO reports from Chapter 2 provided a quick summary of project history 
at NASA.  As a government agency, projects at NASA are routinely analyzed to 
determine what went right and what went wrong (“About GAO” 2015).  This analysis 
coupled with long-standing research in decision making biases helps shed light on the 
challenges faced by project managers in developing an accurate schedule.  This 
section brings together the results of the GAO reports and the responses of subjects in 
this research and discusses those results through the lens of biases identified in the 
decision-making literature to determine reasons why projects struggle to finish on 
time. 
  
217 
 
 
7.1.1 External Influences 
In an uncertain world, there are many roadblocks to successfully completing a 
project on time.  Some of these can be managed within the project, but some are 
outside the control of anyone involved in planning or execution.  The effects of these 
external constraints could be seen throughout the GAO reports and also in the 
responses of the subjects in this study.   
   One external influence that can affect the accuracy of the estimating process 
is the retirement of the stakeholders who posses background history and knowledge 
(GAO 2006a, 4, 22, 2006e, 10, 2008, 6).  Chapter 2 provided several methods for 
defining an expert, but typically, these are people who have, “…special knowledge 
about an uncertain quantity or event” (Morris 1977, 679).  Without the knowledge 
base of these experts, there will be gaps in the estimating processes. This is why it is 
critical to account for the opinions of multiple stakeholders to ensure missing pieces 
to the knowledge base are minimized and also to ensure the correct mix of project 
stakeholders within the project team.  Each piece of information is a data point by 
which the decision maker can update her beliefs.   
Review boards served as another external influence on a project’s schedule.  
The purpose of these boards is to review the plan and question the project team to 
determine whether or not the plan is mature enough to continue.  A GAO report from 
2006 recommended more reviews and project-stop points throughout the cycle.  In 
theory, this would lead the project teams to develop more thorough plans to ensure 
approval from the review board.  It seems, however, based on the responses of some 
  
218 
 
subjects in this research that input from these review boards is not always 
appreciated.  In some cases, they were referred to as stumbling blocks to the project.   
Project teams may perceive that these reviews are required as proof to 
outsiders that the project team knew what it was doing.  These beliefs and perceptions 
may be indicative of the “not invented here” bias identified by Ariely (Ariely 2009a, 
Loc. 1443).  When the project plan is presented to a board, questions can be perceived 
as personal attacks, and since board members are an entity outside the project team, 
the project team is less likely to want to incorporate the suggestions since they were 
not invented within the project team itself.   If those suggestions cause extra work that 
was not originally accounted for, the perception of the project team could be that the 
delay was caused by a force beyond their control and that they are no longer 
responsible for the resulting schedule delays.   
The GAO reports studied pointed out that the culture of NASA is generally 
two fold.  Both the “can do” attitude and the culture of safety could almost be merged 
into one general goal:  “Make it happen, but make it happen safely” (Martin 2012, 
11–17; GAO 2017, 15–17, 21–22).  While not a constraint per se, this culture does 
have an effect on project schedules.  From the Demographics survey, it could be seen 
that across all subjects, schedule and cost took a back seat to reducing risk and 
ensuring technical success.  The desire to preserve technical success over other 
constraints can also have a significant effect on the schedule.  From the project 
preferences survey, when comparing the different project constraints to one another 
there is a clear preference for sacrificing the material constraints of time and money 
in order to reduce risk of project failure/personal injury or decrease the quality of 
  
219 
 
support.  The GAO reports point out that there is a prevailing attitude at NASA that 
states that as long as technical success is achieved, no one will care whether or not the 
project came in on time or under budget, or at least that memories of those project 
management failures will fade in the light of the technical success (Martin 2012, 11–
12).  Several subjects in this study mentioned the same mentality, confirming that this 
same attitude is also prevalent at WFF. (Kremer 2017c).  The implication for 
scheduling practices is that if problems arise in a project, the schedule will be the first 
thing to go, and as schedule increases, the cost will most likely increase as well.     
  
7.1.2  Internal Influences 
Some constraints are beyond the project manager’s control, but others are 
within her sphere of influence.  One of the major complaints in both the GAO reports 
and across both the management and technical subjects in this research was a failure 
to adequately define requirements (GAO 2009c, 1,3,5-6, 1993b, 4, 1993a, 11).  
According to PMBOK, in order to manage a project successfully, all activities must 
be tied back to a requirement (PMI 2013, para. 5.4, 6.2).  If requirements are only 
developed at a high level, there may be confusion about what is actually required in 
order to meet a requirement (GAO 2014, 25; Mantel Jr. et al. 2004, 82).  As was seen 
in the COA Survey, perceptions differ between stakeholders in the 
management/technical divide.  An activity that one stakeholder believes is necessary 
to meet a high-level requirement may be perceived as “gold-plating” to another.   
Additionally, stakeholders may be planning to different definitions of loss 
aversion where loss aversion is defined as saving face for the project constraint they 
  
220 
 
are most responsible for (Ariely 2009a, Loc 1328, 1518).  For the project manager, 
saving face means finishing the project on-time and on-budget, so she may not be as 
concerned with repeated testing as long as it works.  For the technician, saving face 
means ensuring the system performs as advertised.  Each group, however, still 
shoulders some responsibility for those other constraints.  The project manager must 
still deliver a working system or risk losing face and the technician can still be held 
accountable if the system is not ready when advertised.  In the latter case especially, 
this could be a major reason for the oft cited complaint that the schedules were too 
compressed to begin with (GAO 1991c, 6, 1993b, 4, 1989, 21).  When the technician 
provides an estimate, he may be factoring in the additional testing that will prove he 
did everything in his power to deliver a functioning system that meets requirements.   
At a project level, when looking at individual estimates in the surveys 
provided, it was noted that several of the estimates followed a pattern where the same 
or similar best case (BC), worst case (WC), and most likely (ML) estimates were 
provided for several activities in a row.  This could be a form of anchoring where a 
subject is trying to formulate an estimate quickly, and once an initial number is 
settled on, that value will is used over and over for similar activities and may, in fact, 
be affecting the estimates of all subsequent activities.  This could unknowingly 
influence all of the values in the estimate, depending on how carefully the estimate is 
made.  Although it would be only ineffective to allow an estimator to provide one 
estimate at a time, the anchoring effect is something all estimators should be mindful 
of as they provide the estimates.   
  
221 
 
 As stated in the GAO reports and in the responses from the subjects in this 
study, there is a tendency to plan to high-level schedules (GAO 2014, 25).  If these 
schedules are too high level, then the activity anchoring problem can be compounded 
by estimators making different assumptions about what is meant by the activity title.   
From the scheduling survey responses, it was shown that the biggest disagreement 
among the survey subjects came from determining whether or not all required 
activities were included in the activity list.  Many subjects felt that activities were left 
out, while still others believed that the activities were included, but that they may 
have been rolled up into one overarching activity.  By not explicitly calling out 
specific activities, assumptions from different stakeholders can result in significantly 
different estimates (Mumpower 1996, 194).  This would be akin to estimating the 
drive time from College Park to Dulles International on Friday afternoon at 4:00pm 
by only taking until account the miles to travel.  Without having “technical” 
knowledge of the area and the associated traffic patterns, this would seem like a good 
measure.  Experts, however, know differently.   
It is this expert knowledge of the nuances of the activities that is critical to 
accurately assessing how long an activity should take (Moder and Rodgers 1968, B-
79).  And when these experts are also responsible for completing work on multiple 
different projects, it is less likely that they will be able to spend their time carefully 
planning out those nuances.  As a remedy to this, even if the project team cannot be 
involved in the initial planning of the project, it may be helpful for the project 
manager to develop the initial schedule and then provide it individually for each 
project stakeholder to review as they are available.   
  
222 
 
From the Scheduling surveys, it was seen that subjects were in general 
agreement that the provided lists were good starting points for the schedule (i.e. very 
few felt there were extraneous activities on the lists provided).  One option is for 
project managers to provide the high-level schedules individually to the project team 
members and ask them to list out the activities they believed were required to meet 
the high-level milestone.  Private initial assessment ensures that no one person can 
dominate the discussion and squelch the quieter members of the group Baecher 1999, 
20; Surowiecki 2005, 29).  Before obtaining the estimates, however, it is helpful to 
understand some of the expected estimating behaviors of those who provide the 
estimates.   
 
7.2 Stakeholder Responses: What to Expect 
This section considers a discussion on how demographic traits influence 
personality traits and scheduling duration estimations.  This section combines the 
results of the Demographics and Scheduling surveys and interprets results of Chapter 
5.   
7.2.1: The Influence of Demographics 
The present results suggest that a stakeholder’s Position Demographic exerts a 
heavy influence.  This was shown in multiple analyses of the estimation data 
collected subjects.  Analysis showed that managers traditionally provide lower 
duration estimates than do technicians.  For each project, when activity durations 
were summed to find the total project duration (Te), the sample population showed 
  
223 
 
that 81% of the time, managers estimate a lower Te than technical subjects did.  
Performing a binomial test on the results indicated that the probability of observing 
this behavior in the general population was statistically significant (p<0.001).  The 
other two demographics did not show any statistically significant differences among 
demographic levels.   
The pattern repeated itself when looking individually at each of the three 
estimates which compose a PERT mean.  In this case, when the estimates were 
compared within each demographic, each case showed statistically significant 
differences among the levels of each demographic.  For example: 
• Managers provided smaller estimates than technical subjects 
• Those with fewer years of experience provided smaller estimates than 
those with more years of experience 
 
• Those with a more formal education provided smaller estimates than 
those with a less formal education.  
 
Because these comparisons were made at the activity level of each project, there 
should not be any concern that the results are skewed by comparison of dissimilar 
activities, especially in the case of the ML estimate.  The BC and WC estimates may 
have bias, since they are based on the ML estimate, but they demonstrate the same 
pattern.  When using Equation 2-1 to solve for a project duration mean, even if the 
ML values are identical between two estimators, a smaller BC and smaller WC 
estimate will result in a smaller overall mean.  It also stands to reason that if the ML 
value of an estimator is smaller, in most cases the two outlying estimates will also be 
smaller, again resulting in a smaller activity duration.   
  
224 
 
Although each of the demographics showed statistically significant 
differences, given that the smallest p-value belonged to the Position demographic by 
several orders of magnitude, it would appear that demographic is the driving force 
behind the differences in estimates.  DOE could not be used to confirm these results 
because the sampled projects were very different from one another in both size and 
complexity.  For example, one project may have a larger total duration simply 
because it is a larger project and not because the subjects are responding in a certain 
manner.   
When the magnitude of the difference between the ML and BC estimates were 
compared, only the LoE demographic produced statistically significant results.  In 
this case, the results showed that the magnitude of the difference between the ML and 
BC estimates of a subject with more formal education is likely to be a smaller than a 
subject with less formal education.  Given an equal ML estimate, this means that 
estimators with less formal education cluster their BC estimates more closely to their 
ML estimates than the estimators with more formal education.   
When the difference between the ML and WC estimates were compared, the 
Position demographic again was the clearest effect.  In this case, the results showed 
that technicians are less optimistic than the managers as indicated by the wider 
separation between the ML estimate and the WC estimate.  This spread is 
representative of the contingency the estimator believes is required to account for 
project unknowns.  These results indicate that technicians are accounting for more 
things to go wrong than the managers.   The other two demographics were also 
statistically significant, but with larger p-values.  These results show that the largest 
  
225 
 
estimate will most likely be provided by a technician with many years of experience 
and modest formal education.  
After the model was selected (GEV Max, GEV Min, or Normal) and after the 
parameters were set, the probability density between the BC and WC values was 
evaluated.   When the differences were evaluated using DOE, the Position 
demographic once again showed statistical significance (p<0.0490).  In this case, the 
expected value for the managers was 0.97 and 0.98 for technicians.  While these 
values are separated by only a very small margin, they are another indicator that 
technicians are accounting for a wider range of possible durations than the managers.  
The technicians are leaving less density in the tails of their estimates, so they are less 
likely to be surprised when the project is actually completed. 
When comparing the variance as calculated by the squared value of Equation 
2-2, the results showed once again that the Position demographic was driving the 
differences in estimates.  A smaller variance indicates less uncertainty in an estimate 
(Morris 1977, 688).  The estimator is not compensating for things going wrong or for 
things going right.  The larger variances in the technical group correlate with the 
results seen earlier when looking at each individual estimate.  With a larger ML 
estimate than the managers, and a larger separation between the ML and WC 
estimates, these results again demonstrate that technicians are providing larger 
estimates and also accounting for more uncertainty in their estimates.   
When the contribution ratios of the outlying estimates were calculated using 
Equation 3-5 through Equation 3-7, the Position demographic was once again the 
driving factor, except in the case of Equation 3-6; the contribution ratio using the BC 
  
226 
 
estimate showed no significant factors.  Equation 4-3, Equation 4-4, and Equation 
4-7, Equation 4-8 went on to show that to calculate the BC and WC values, given 
only an estimator’s ML value, the multiplying constants for technicians for the WC 
estimate was larger than that of the managers.  Conversely, the managers had a larger 
BC multiplying constant than the technicians.  While the multiplying constant to 
solve of the BC estimate was relatively close between the managers and technicians, 
the constant to solve for the WC estimate had a much larger separation.  These results 
contradict Question 4 in Table 5-8, which showed that technicians were more likely 
to have a smaller BC estimate than technicians, but they confirm Question 1 (overall 
PERT estimates) and Question 9 (variance analysis).  With a smaller BC multiplying 
constant and a larger WC multiplying constant, assuming equal ML values, the 
variance of a technician will be wider than that of a manager.  Again assuming equal 
ML values, these multiplying constants will result in a larger PERT activity average 
when calculating the average using Equation 2-1.  Results from Question 1 showed 
that typically the manager ML estimates are smaller than the technician’s ML 
estimates, so the difference in the two estimates will be even larger. 
The calculations described above also provided an indication as to how 
skewed an estimation is.  If the separation between the ML and BC estimates is equal 
to the separation between the ML and WC estimates, the distribution will not have 
any skew.  If the separation between the ML and BC estimates is smaller than the 
separation between the ML and WC estimates, the distribution model will be skewed 
to the right.  Chapter 4 described that the expected value of the ratio calculated in 
Equation 3-5 was approximately 39% for managers and 30% for technicians, 
  
227 
 
indicating that technicians have a heavier positive skew than the managers.   In this 
context, the heavier positive skew indicates that the technical estimators are 
compensating for more adverse uncertainty than the manager estimators.  Adverse 
uncertainty is meant to describe project challenges as opposed to things that will clear 
the way for project success.  The managers seem to have more hope that the good 
things could happen as well as the bad.  The technicians seem less hopeful in their 
estimates, with their uncertainty manifesting itself as a larger WC estimate.  Another 
interpretation of the data is that the technicians are less sure of their estimates.  It may 
not be that they believe everything will go wrong, but uncertainty breeds caution and 
caution includes providing a larger WC estimate that is more likely to incorporate 
unknown issues (Goldratt 1997, 152; Golenko-Ginzburg 1988, 767).  This ties into 
the discussion regarding the confidence estimates provided by the subjects which will 
be presented later in this chapter.  
The expected value of Equation 3-5 was the result of performing DOE on the 
aggregated data of each of the subjects.  The final expected value provided by DOE 
showed that, in general, estimates resulted in a positively skewed distribution.  
Breaking down this analysis, however, to look at individual activities tells a slightly 
different story.   Looking at estimates for all of the activities provided, the subjects 
heavily favored the positively skewed model.  This seems to indicate a general belief 
that things are more likely to go wrong on a project than they are to go right.  When 
these results are further decomposed into the two factors of the Position demographic, 
the results are more interesting.  These results show that managers are twice as likely 
(34% vs 17%) to provide an estimate that has either no skew or negative skew as the 
  
228 
 
technicians.  This could be an indicator as to why managers and technicians disagree 
on project durations.  In this context, a distribution with no skew or negative skew 
indicates a higher level of optimism than a distribution with a positive skew.  
Managers are effectively saying that they believe things are either equally likely to go 
right or wrong on the project, or that they even believe that there is a better chance 
that things will go right instead of going wrong.  This may also be an indicator that 
managers feel they have more control of the schedule and can therefore drive the 
schedule to ensure it meets the originally advertised completion time.  (Kremer 
2017a). Technicians on the other hand seem to believe things are much more likely to 
go poorly and that more time should be allotted for impending doom.  From the 
results seen in this study, however, it is important to at least be aware of the 
possibility of a negatively skewed distribution, especially when working with 
managers.   
 
7.2.2 Discrete vs. Continuous Confidence Assessments   
When given the Scheduling Survey, subjects were asked to provide a value for 
how confident they were in their assessment of the ML estimate, described as a 
probability.  The survey treated the ML estimate as a discrete value with an 
associated probability that the activity would finish within an operating window about 
that time (Tetlock 2005, 40; Önkal et al. 2003, 182–83).  In hindsight, the survey 
should have been more specific about the definition of  “confidence,” but none of the 
subjects questioned the legitimacy of the way the confidence estimate was asked.   
  
229 
 
An indicator that subjects were treating the ML estimate as a discrete value 
could be seen in a few of the confidence levels provided.  Most subjects provided 
confidence levels in the 70%-90% range, but a few provided confidence rates much 
lower, some even below 50%.  It would seem that if a person provided a probability 
of less than 50%, then there should be some other value on the positive number line 
that they believed would more accurately represent the actual duration of the activity.  
By assessing a probability of less than 50%, these subjects seemed to indicate that 
they were treating the ML estimate as an event.  An event either happens or does not, 
as opposed to a continuous variable that can move within some range (Murphy and 
Winkler 1977, 45).  These low probabilities show the subject treated the ML duration 
as an event finishing within the rounded-off time.  If the subjects were not treating the 
ML estimate this way, they would have adjusted their ML estimate to a different 
value that they believed was closer to how long the activity would take.  Another 
possible interpretation of the confidence estimates is that they were tied less to the 
actual estimate and were actually reflective of the estimator’s confidence in his 
estimating ability.   In cases where the estimated probability was very low, it could be 
that subjects were so uncertain about the activity that they were effectively providing 
a uniform probability for all possible durations.  If this is a more accurate 
interpretation of the confidence estimate collected in this study, Hubbard suggests 
using Fermi decomposition to approach the desired confidence range (Hubbard 2010, 
10–12).  It should also be noted that Kahnemann and Tversky’s anchoring concept 
seemed to apply here as well.  In most cases, the same confidence level was used for 
  
230 
 
all activities estimated by the subject.  The first estimate provided may have heavily 
influenced subsequent estimates as the subject anchored on that first estimate.   
The variance, as represented by the BC and WC estimates is indicative of an 
estimator’s level of uncertainty (Morris 1977, 688).  The larger the separation 
between the BC and WC estimates, the larger the uncertainty.  Assuming the first 
interpretation of the confidence estimate is correct, if the subjects were treating the 
ML estimate as an event with a probability of occurring, then the BC and WC 
estimates are compensating for the uncertainty in that estimate.  For example, if a 
subject estimates that an activity will most likely take 10 hours and she is 90% certain 
that it will take 10 hours, then she believes that there is a 10% chance the event will 
take on some other discrete value.  The BC and WC estimates are an attempt to 
account for that other 10%.  The less certain an estimator is about the ML value, the 
wider the spread of the BC and WC estimate should be to account for that 
uncertainty.   
In testing the correlation between the confidence level (interpreted as above) 
and the standard deviation (calculated using the PERT approximation), there was a     
-0.3 correlation between the confidence estimate and standard deviation.   
When the confidence estimates were analyzed using DOE it was discovered 
that this was one of the few cases where the Position demographic was not the 
significant factor driving the results.  YoE was the driving factor.  Assuming the 
interpretation of confidence as a probability of a discrete event is correct, averaging 
across projects and subjects revealed that confidence hovered around 75% for 
subjects in the first half of their career, but rose to 85-90% in the second half of their 
  
231 
 
careers.  When broken down further, confidence started out at an average of 73% in 
the first eight years, increased to 77% in the second eight years, increased again to 
86% in the third eight years, and finally topped out at 89% in the 24+ years category.  
This pattern suggests that as subjects progress in their careers, they gain confidence in 
their estimating abilities (which may not be warranted) (Shanteau 1992, 12; Trumbo 
et al. 1962, 68–71).  This could be for several reasons, but the most likely explanation 
is that as subjects gain experience, they learn what to expect on certain types of 
projects.  As subjects learn what to expect, their uncertainty decreases because they 
began to get a feel for problems that were originally unknowns.  As the uncertainty 
decreases, they become more confident that the estimate provided is sufficient to 
account for uncertainties and also matches historical completion times as experienced 
by the subjects.  Because the significant factor driving confidence was YoE, this 
would indicate that this increase in confidence levels applies to both managers and 
technicians (Kremer 2017a).   
7.2.3 Risk Aversion  
If confidence is a measure of uncertainty, is high confidence indicative of 
knowledge or does it reflect a subject’s belief about his or her ability as an estimator?  
The original hypothesis was that management subjects would be less risk averse than 
technicians, by dint of the presumed personality types of managers and technicians.   
Risk aversion manifests in a concave utility function (Raiffa 1968, 68).  Both 
management and technicians were presumed to exhibit risk aversion, but the 
supposition was that management would exhibit less than technicians, and thus have 
less concave utility functions.   
  
232 
 
The results did not confirm these hypotheses.  From the full set of results in 
Appendix A.7, subjects, regardless of their demographic, tended to be to the left of 
the CME line, indicating risk averse behavior.  Most subjects, however, demonstrated 
some convexity in their curve, indicating risk prone behavior.   
As Raiffa points out, this behavior has been documented in several studies and 
is not unexpected.  (Raiffa 1968, 95)    When the results were reduced to reflect only 
the mid-point estimates, the results still did not show a clear delineation among the 
different demographics.  As can be seen in Figure 5-1 and the results in Table 5-8, 
there was no statistically significant difference between the responses of the 
management group or the technical group, nor did the other two demographics show 
any statistically significant differences (see Figure 5-2 through Figure 5-5.  It appears 
that subjects with differing levels of risk aversion, as measured through a monetary 
bet, can be found at different leadership levels, experience levels, and education 
levels.   
While Utility did not have any significant factors driving the results, and 
confidence appeared to be driven by the YoE demographic, there was a weak 
correlation between risk aversion and confidence.  In this case, it was hypothesized 
that risk-seeking subjects are, in general, confident by nature because in order to seek 
out risk, one must believe one will succeed (Raiffa 1968, 94–95).  A person with a 
high level of confidence in their general abilities will probably also have high 
confidence in their estimating abilities.  With no regard to demographics, the 
correlation between the two traits was 0.14 for cases with three or more projects, 
indicating that these two personality types may be weakly linked. 
  
233 
 
Although this Utility assessment was developed assuming the use of personal 
money, and the projects studies were funded through government money, it is 
believed that a Utility curve created by using personal money is still a good 
assessment, because as subjects provided estimates for the projects, they became 
personally invested in those estimates.  Now, instead of money, reputation is attached 
to the project, so it translates the government project to the personal realm.  Ariely 
conducted a study where subjects were asked to perform a menial task.  Upon 
completion, for some subjects the results of the work were destroyed as soon as they 
were completed while others’ work was left alone.  Those who saw their work 
destroyed were less motivated to continue working.  This simple experiment is an 
indicator of how quickly people can become invested in their work.  (Ariely 2009a, 
Loc 883-1015)   
7.2.4 Risk Aversion as Applies to Scheduling  
If schedule risk is defined as the project finishing either above or below the 
extreme estimates provided by an estimator, and if risk averse subjects typically take 
action to mitigate a risk, then it was hypothesized that subjects who exhibit risk 
averse behavior would have a larger separation between their BC and WC estimates 
in an effort to compensate for as much uncertainty as possible.  If variance is regarded 
as an, “I told you so” buffer where an estimator can claim success as long as the 
actual duration falls between the BC and WC values, then it stands to reason that a 
risk averse person will make that range wider to increase the chance of the actual 
duration falling within that range.   
  
234 
 
While Table 5-8 shows that there are no significant factors driving the levels 
of risk aversion, it also shows that for standard deviation, the Position demographic 
was the only one of the three where one group was consistently behaving differently 
from the others within the demographic.  In this case, when looking at each individual 
activity, the managers were providing smaller variances than their technician 
counterparts.  When the results of these two surveys were tested for correlation, the 
resulting correlation coefficient was -0.21 when an average correlation coefficient 
was calculated using only projects with three or more subjects.  Although the 
correlation is weak, it indicates that at some level, personnel described as risk averse 
based on the Utility model are providing wider estimate ranges.  Risk-averse 
estimators may believe this helps to ensure that any unknown contingencies are 
accounted for and decreases the probability the estimator will be blamed for a bad 
estimate.   
Closely related to the correlation between utility and standard deviation is the 
correlation between Utility and Te.  The Te value is heavily influenced by the ML 
estimate since, as can be seen in Equation 2-1, the ML value is given four times the 
weight of the other two estimates.  As was seen in Table 5-8, the Position 
demographic is the driving force behind the differences in responses for both the 
variance and the Te estimates, while Utility was not being driven by any particular 
demographic.  From those results, it was shown that typically managers were 
providing smaller Te values and also providing smaller variances than the technicians.  
As was shown previously, there was a negative correlation between utility and 
  
235 
 
standard deviation.  A slightly stronger negative correlation (-0.29) was also observed 
when comparing Utility and Te for projects with three or more subjects.   
While Te is partly composed of the BC and WC values that are also used in 
the calculation of standard deviation (see Equation 2-2), the value is dominated by the 
ML value since its weight is four times more than the other two estimates.  While the 
correlation between Utility and standard deviation reflects how subjects compensate 
for uncertainty by widening their estimated range, this value also factors in the 
magnitude of the estimate itself.  Given that the ML estimate is the driving force 
behind the Te value, it can be inferred that not only are risk averse personnel 
providing a wider range in their estimates, but they are also providing larger ML 
estimates.  A larger ML estimate negatively correlated with risk aversion indicates 
that personnel who are risk averse are more concerned with a project finishing late 
than they are with it finishing early.  For these people, success is defined as finishing 
before the deadline.  The best way to mitigate the risk of finishing after the deadline 
is to provide a larger ML estimate that will be less likely to be exceeded.   
7.2.5 Summary   
The past sections described how personality traits and estimating practices 
could be observed at varying levels of the three demographics.  It also showed how 
personality traits and estimating practices relate to one another.  In a scheduling 
context, the data indicate that subjects in the technical career fields provide higher 
estimates and wider variances than those in management.   
 
  
236 
 
7.3 Aggregating Estimates 
There is considerable literature on both expert aggregation and improvement 
of schedule estimations, but the two fields do not seem to intersect.  The literature on 
scheduling provided many ways to improve on the PERT equation, but seemed to 
focus on estimates derived from a single person.  Although there will be overlap, each 
stakeholder in a project brings a slightly different view with slightly different 
information (Surowiecki 2005, 9–10; Budescu and Rantilla 2000, 373–74).  The 
“unknown unknown” of one stakeholder may be a known issue to another (Silver 
2012, 420).  Using all available information gives a decision maker the best possible 
chance of accounting for uncertainties.  Aggregating these estimates, however, 
presents challenges.   
7.3.1 The PERT (Beta) Prior 
The creators of PERT needed a way to account for uncertainty in schedules 
and they only had a month to develop that method.  They believed that the best 
person to provide duration estimates was the “technical man” who was actually doing 
the work.  They also knew, however, the “technical man” was busy doing technical 
work and did not have much time to devote to the development of a schedule 
(Malcolm et al. 1959, 648, 650, 659).  They were also aware that most people do not 
have training in probability assessment and would need an easier way to express their 
uncertainty (Chaloner and Duncan 1983, 174; Clark 1962, 406).  To that end, the 
creators of PERT settled on a method that required an estimator to simply provide 
BC, WC and ML estimates (Malcolm et al. 1959, 651).  These were used to develop a 
distribution curve for the activity using the beta distribution.   
  
237 
 
 The recommended three-point estimates in PERT for calculating the expected 
value and variance of an activity duration are given in Equation 2-1 and Equation 2-2  
(Malcolm et al. 1959, 651–52; Mantel Jr. et al. 2004, 144; PMI 2013, para. 6.5.2.4). 
These are simple equations, easily remembered and applied.  The creators of PERT 
did not start out with a particular distribution, but, as Clark pointed out, they needed 
something to model the probability distribution of the activity duration and a beta 
distribution fit the need (Clark 1962, 406).  It is flexibly, can be uni-modal, modeling 
the tendency of an activity to have only one ML value, and it tapered off at the tales, 
which modeled how the probability decreased as it moved away from the mode 
(Malcolm et al. 1959, 651).   
To fully define a beta PDF the two hyper-parameters, α and β must be known.  
Having these parameters allows the calculation of the mean and variance of the 
distribution.  Without proper training, it can be very difficult to estimate these 
parameters, so Equation 2-1 and Equation 2-2 were developed in an attempt to create 
an approximation to the actual mean and variance of the beta distribution (Regnier 
2005b, 6).   
While the PERT approximations served their creators relatively well, once 
statisticians began to take a hard look, the model began to break down (Grubbs 1962, 
913).  The PERT distribution is based on an estimate provided by an imperfect 
human.  As the PERT methodology was analyzed more closely, it was pointed out 
that the estimates provided by the experts were just that:  estimates.  There was no 
way to know if the BC estimate used in Equation 2-1 was actually the true absolute 
shortest duration of an activity or if it was just the belief of the expert that it was the 
  
238 
 
shortest (Grubbs 1962, 914–15).  The same logic applied to both the ML and the WC 
estimates.  Pickard provided a solution to that issue, but the data collection and 
approximating algebra are much more complex than the simple PERT formula 
(Pickard 2004, 1571–74).   
These approximations are only exact when the sum of the hyper-parameters 
equals six (Grubbs 1962, 914).  Keefer and Verdini did a demonstration comparing 
how well Equation 2-1 and Equation 2-2 approximate the true mean and variance of 
the beta distribution. They confirmed that the PERT approximation worked when the 
hyper-parameters summed to six, but as the summation value increased, the 
approximation is less close.  The variance approximation was even worse (Keefer and 
Verdini 1993, 1087–88).  Grubbs pointed out that this constraint put a limitation on 
the beta distribution which had been chosen for its versatility (Grubbs 1962, 914).  
Several researchers have proposed new approximating equations for the beta 
distribution which allow for a wider range of hyper-parameter values (Pearson and 
Tukey 1965; Golenko-Ginzburg 1988; Megill 1971).  Keefer and Verdini provided a 
summary of several of these approximations and their capability to accurately 
approximate the true beta mean and variance (Keefer and Verdini 1993, 1087–88).  
The two most accurate approximations (Equation 2-5 and Equation 2-6) were no 
more mathematically complicated than the PERT estimation, but use the median and 
varying endpoint fractiles.  Unfortunately, they still ran into the same problem as all 
the other approximations: complete dependence on the estimates of flawed experts 
coupled with an unknown underlying distribution (Regnier 2005b, 8).    
  
239 
 
The creators of PERT chose the beta distribution because they believed its 
general shape matched what could be expected for an activity duration (Malcolm et 
al. 1959, 651; Clark 1962, 406).  Its versatility also had the added benefit of enabling 
it to model what in this research is referred to as the optimists, the pessimists, and the 
neutrals simply by adjusting the hyper-parameters.  While the creators of PERT 
appeared to be leaning towards a Bayesian construct by modeling the belief of the 
estimator (Malcolm et al. 1959, 651), others approached the model from a frequentist 
perspective and struggled with the fact that the true underlying distribution of the 
activity was unknown (Pickard 2004, 1568–73; Grubbs 1962, 914–15; PMI 2013, 
para. 6.5.2.4).  Given that activity durations must have a lower limit, but could 
technically remain uncompleted for an extended period of time, it would seem that a 
frequentist distribution of an activity would be skewed to the right. From this 
research, however, it was seen that estimators would sometimes provide BC, WC, and 
ML estimates that were more accurately modeled by a left-skewed or no-skew 
distribution.  In the frequentist view, the estimates provided may not accurately 
reflect the true mode or extreme estimates of the distribution of historical durations.  
In a subjective context, the model reflects the beliefs of the expert. 
7.3.2 Bayesian Prior 
Estimation of the mode has been criticized because it can be difficult to 
accurately estimate the mode of an unknown distribution from data (Golenko-
Ginzburg 1988, 770).  In the Bayesian (degree-of-belief) sense, the mode is easy to 
determine because it reflects the estimator’s belief in the ML duration (Chaloner and 
Duncan 1983, 175).  If the estimator believed that a different value had a higher 
  
240 
 
probability of occurring, then the mode would move to that value.  The end-points 
also become reflective of a belief about the activity instead of the actual smallest and 
largest durations of an activity.   
Interpreting the three baseline estimates as a function of Kahneman’s System 
1 and System 2, may explain why estimating the median may not necessarily be 
easier than estimating the mode.  The three PERT baseline estimates could be 
interpreted as a function of System 1, whereas the other approximations are more a 
function of System 2.  For example, when someone asks an expert how long 
something should take, a number will immediately pop into his head, based on past 
experiences or analogous estimating (Kahneman 2011, 24; Clark 1962, 406; PMI 
2013, para. 6.5.2.1-6.5.2.2).  The same holds true for asking for a BC or WC estimate.  
Asking an expert to provide a number for which half the time the duration is below 
that number and half the time the duration is above that number, however, involves 
more careful thought (Chaloner and Duncan 1983, 175).  A number will probably 
present itself in the expert’s mind, but he will need to stop and consider it in relation 
to all other possible durations to determine whether or not it is the median.  Even if 
the median is considered a better statistical assessment (Keefer and Verdini 1993, 
1088), it is still an assessment. 
Golenko-Ginzburg has argued that estimating the mode maybe be unnecessary 
if the two endpoint estimates are provided.  He showed that when asked to provide 
duration estimates, the mode typically fell in a location calculated by (2BC + WC)/3, 
that is, at the one-third point (Golenko-Ginzburg 1988, 770).  This experiment was 
repeated in the present project, and similar results were found:  there was no 
  
241 
 
statistically significant difference between the calculated mode and the actual mode, 
even when comparing the management subjects who had a more varied skew type.  
These results were calculated using Excel’s ™ “t-test: Two-Sample Assuming 
Unequal Variance” in the Data Analysis Add-On which compared the mean of the 
ML estimates provided by the subjects as compared to the results that would be 
obtained by using the calculation described above (Golenko-Ginzburg 1988, 769). 
Golenko-Ginzberg asserted this meant that the ML estimate was unimportant 
in calculating the expected value of activity duration (Golenko-Ginzburg 1988, 769).  
In the present research, when providing estimates, several estimators omitted the ML 
estimates on their survey sheets.  The reason is unknown, but when an estimate was 
left out, it was always the ML estimate, perhaps further indicating that the location of 
the mode is not considered a useful measure.  When using the distributions proposed 
in this research (GEV or Normal), however, the mode serves as the location 
parameter of the curve and becomes more important in modeling the estimator’s 
beliefs.  
Despite the criticisms of the mode, it is useful to remember the reason PERT 
was originally developed.  The technique developed by the creators of PERT allowed 
the “technical man” a way to provide a quick estimate without having to take the time 
for a thorough statistical analysis (Malcolm et al. 1959, 659; Clark 1962, 406).  
Certain activities must be scheduled based on a set number (e.g. delivery dates, work 
crew arrival dates, etc.).  Because most people do not spend their days calculating the 
expected value of an unknown quantity based on their perceived range of values, they 
have only their “ML” estimate by which to live.   
  
242 
 
7.3.3 A New Prior Model 
 For the all of the beta distribution’s versatility, it has two major drawbacks.  
The first is the requirement to estimate the hyper-parameters which describe the 
shape.  The second is its limited domain.  Because the beta distribution is only 
defined between its two endpoints and is zero elsewhere, using it in an expert 
aggregation context is a challenge unless the experts have the same BC and WC 
estimates.   In its standard form, the beta distribution is only defined on the interval 
from [0,1] (“Beta Distribution” 2016).  This interval can be converted using the 
change of variables described by Grubbs such that the points [0,1] are redefined as 
the BC and WC estimates, but this involves extra conversion steps when aggregating 
expert opinion (Grubbs 1962, 913).  For example, one expert could have a BC 
estimate of 10 hours and a WC estimate of 25 while another could have values of 15 
and 40.  Plotting both of these distributions on the interval [0,1] would yield an 
inaccurate representation of the distributions. To accurately display the distribution, 
the estimates would need to be converted to their actual locations on the number line.  
In addition, given the definition of the beta domain, it can be seen that, using Morris’ 
aggregation method, there would be no probability density for anything below 15 or 
above 25 because one or the other of the two distributions is zero beyond those 
points.   
The fact that PERT remains a popular method of accounting for project 
schedule uncertainty seems to confirm the belief that the overall shape of the model is 
good.  These considerations were the driving force behind the selection of the GEV 
Max, GEV Min, and Normal distributions as the new models for the subjective 
  
243 
 
duration assessment.  These three distributions maintain the general 
unimodal/tapering shape of the beta distribution, but have the added advantage of 
being defined along the entire real number line.  Between the three models, it was 
also possible to account for all three skew types seen in the expert estimates.  The 
necessity of using three different distributions complicates matters slightly, but if a 
decision maker knows what she is looking for, these complications can be overcome.  
These distributions present advantages over the beta distribution in the aggregation.  
First, all three distributions are defined on (-∞,∞) so, even though the density may be 
negligible, there is no point at which the density is completely zero.   In the 
aggregation, this is critical to ensuring that the posterior density exists everywhere.   
 Another advantage of using the GEV Max-GEV Min-Normal model is that its 
defining parameters are directly related to the estimates provided by experts.  For the 
GEV Max and GEV Min models, the distribution is defined by three parameters.  The 
first parameter, ξ, was set to zero which allowed the distribution to be defined along 
the entire number line.  This meant that at least some density could occur below the 
BC estimate and above the WC estimate, acknowledging that the person providing 
the estimate may not have fully accounted for extreme cases.  The second parameter, 
μ, is simply equal to the mode of the distribution which is the ML estimate provided 
by the expert.  No special considerations are needed for this parameter as it translates 
directly from the estimate to the distribution and locks the distribution into the 
appropriate location on the number line.  The shape parameter, referred to as “k”, is 
directly dependent on the initial estimates provided by the expert as can be seen in 
Equation 3-20 and Equation 3-22.   
  
244 
 
With these parameters defined, the mean and variance for the GEV Max/GEV 
Min distributions can be directly calculated without the need for an approximating 
equation.  For the Normal distribution, defining the distribution is even simpler.  The 
location on the number line is once again defined by the ML estimate and the 
variance is defined by solving for the separation between the ML estimate and either 
endpoint and dividing by three.   
 While the defining parameters for the GEV Max/GEV Min distributions are 
tied directly to the baseline estimates, it should be noted one major assumption was 
required in order to determine the shape parameter used to define the distribution.  
The intent of the PERT creators was that there should be negligible density below the 
BC estimate and above the WC estimate (Malcolm et al. 1959, 651).  Based on the 
acceptance of the general shape of beta distribution, the goal was to make the new 
distribution models match the shape of the beta distribution as closely as possible.  
From Equation 3-15 and Equation 3-17 the value of F(x) was set at 0.0001 for the 
GEV Max distribution and 0.99995 for the GEV Min distribution.   
For the GEV Max distribution, setting the CDF to 0.0001 ensured that the 
density below the BC estimate was negligible, but not zero.  The value of 0.0001 was 
selected by plotting the GEV model in MatLab™ and selecting the value at which the 
curve begins to rise above the x-axis.  As can be seen in Figure 6-1, this is roughly 
equivalent to the beta distribution starting to have a density at its BC value.  The same 
logic was applied to the GEV Min case, where the value of 0.99995 was roughly the 
point where the curve began to collapse back down to the x-axis (see Figure 6-3).  
The advantage of this method is that if there is a belief that more density should be 
  
245 
 
below the BC or above the WC estimates, the calculation of “k” is still possible.  The 
decision maker would simply need to change the value in the denominator of 
Equation 3-20 or Equation 3-22, depending on the model in question (and noting that 
Equation 3-21 and Equation 3-23 would need to be reworked to reflect the new 
density value selected).  If the intent is to match the general shape of the PERT 
distribution, however, these values should be reasonable estimates. 
 One disadvantage of the GEV Max/GEV Min distributions is that there are 
three baseline estimates to model, but only two distribution parameters to describe the 
location and shape.  Because one of those parameters is taken up with the mode, that 
leaves only the shape parameter to match either the BC estimate or the WC estimate.  
For the GEV Max case, the choice was made to match the BC estimate.  Given the 
results seen in the GAO reports, and the opinions provided by subjects in this 
research, it was deemed unlikely that the project would finish earlier than the BC 
estimate.  Using these assumptions and looking again at Figure 6-1, it can be seen that 
the GEV Max model matches the BC estimate relatively closely and matches the 
mode exactly, but that the curve does not collapse back down to the x-axis until 
considerably after the provided WC estimate.   
For a typical network schedule, unless the decision maker is using a Monte 
Carlo simulation, the only required values from the distribution are the mean and the 
variance.  For the GEV Max model, the mean and variance are calculated using μ and 
“k” and “k” is only dependent on the BC and ML values, which are provided by the 
expert.  Because of this, the fact that the WC value is less accurately modeled is not 
as critical.   
  
246 
 
The skew of the expert estimates was taken as indication of the certainty in an 
expert’s outlying estimates.  For example, a right-skewed estimate (as modeled by the 
GEV Max distribution) indicated that an expert was relatively confident that the 
duration would not be less than the BC estimate, but that there was more uncertainty 
about the WC estimate. The longer tail to the right of the mean accounts for that 
uncertainty.  A left-skewed estimate (as modeled by the GEV Min distribution) was 
an indication that there was relatively high confidence that the duration would not 
exceed the WC estimate, but that there was less certainty about the BC estimate, 
leading to the longer tail to the left of the mean.   
For the GEV Min distribution the shape parameter was calculated slightly 
differently, based on the discussion in the preceding paragraph, the desire to continue 
to match the beta distribution, and the fact that trying to match the BC estimate using 
the 0.0001 value resulted in a distribution that did not appear to accurately model 
either end-point estimate. For the GEV Min distribution, the decision was made to 
match the ML and WC estimates and leave the BC estimate as the “floating” 
estimate.  In this case, the shape parameter is defined by the ML and WC estimates, 
so the mean and variance are defined by the two parameters in which the expert 
seems to hold the most confidence.  In these cases, it is believed that the left-skewed 
nature of the distribution indicates the estimator feels less need to account for the 
duration exceeding the WC estimate.  Although this is not typically the case, experts 
who provide estimates of this type are either extremely optimistic or they have some 
information that gives them more confidence that they will not exceed their WC 
estimate.   
  
247 
 
The GEV Min model has a drawback in Monte Carlo simulation.  Because the 
domain of this distribution is the entire real number line, if the activity duration is 
relatively short (roughly less than ten time units), the distribution could result in 
appreciable probability density assigned to negative duration values.  A solution for 
this issue is suggested in Chapter 8.  
7.3.4 Calibrating the Experts 
 One problem with eliciting expert probabilities is that experts are rarely 
calibrated (Morris 1977, 682; Baecher 1999, 4).  Part of the process in Morris’ 
method requires the decision maker to calibrate the expert either empirically or 
subjectively.  For this research, data was not available to empirically calibrate the 
experts, so a subjective calibration method was developed.  While the beta 
distribution’s limited domain was problematic for aggregating the experts, it was 
perfect as a calibration filter.  Because the CDF of any probability is only defined on 
the interval [0,1], the beta distribution was an ideal match for processing the input 
data (“Beta Distribution” 2016).  To calibrate the expert, Morris’ method passes the 
CDF of the prior through the calibration filter (Morris 1977, 682–86, 689).   
In an example provided in Morris’ 1977 article, he used curves that were 
described by polynomial equations to describe experts who were either over-stated or 
under-stated in their knowledge, but the general shape resembled a symmetrical beta 
distribution (Morris 1977, 691).  Using that example as a baseline, it was determined 
that the beta function’s versatility allowed for a single equation to describe the entire 
calibration range, including the case where no calibration was required, simply by 
adjusting the hyper-parameters of the defining equation of the beta curve.  
  
248 
 
Empirically, for a continuous variable, a person is fully calibrated if actual values fall 
within the correct fractile.  In a scheduling context, the expert is well calibrated if 
across several assessments actual values occur at the frequency predicted by the 
expert (Tversky and Kahneman 1974, 1128–29).  For example, across several 
activities, actual durations estimated to be below the 0.1 fractile should only occur 
less than 10% of the time.  If they occur more often than 10% of the time, then the 
expert requires calibration to make reality match the prediction (Baecher 1999, 5–6).   
Morris contends that subjective calibration is also possible in the absence of 
empirical data (Morris 1977, 689).  In this case, the decision maker assesses her 
opinion of the expert to determine whether or not she believes the expert will be 
surprised by the actual results, where surprise is defined as the actual value being 
anything below the 0.1 fractile or above the 0.9 fractile (Morris 1977, 691).  She can 
calculate the duration value that falls within a particular fractile using the CDF of the 
prior distribution and determine from that value whether or not she believes the 
expert will be surprised by the results.  To that end, the intent of the calibration 
function is effectively to adjust the variance by altering the fractiles of the extreme 
estimates.   
Passing the CDF of the expert’s prior through the beta filter alters the duration 
values that fall at the 0.1 or 0.9 fractile.  It widens the values for experts the decision 
maker believes to be overstating their knowledge and shrinks the values of those she 
believes to be understating their knowledge (Morris 1977, 691).  Another advantage 
of using the beta curve as a calibration function was that it allowed the expert’s mode 
to remain unaltered as it passed through the filter.  Because the intent was to only 
  
249 
 
affect the extreme values where the expert could be “surprised”, the values for the 
two beta hyper-parameters provided in Appendices 8-10 were calculated with the 
constraint that the filter could not affect the mode of the prior distribution.  Based on 
the above discussions regarding the importance of the mode in subjective probability 
assessments, it was believed that the mode should not be altered by filtering.  If both 
the mode and the endpoints were altered drastically, then the purpose of eliciting an 
expert’s opinion would be rendered moot.    
Although subjective calibration was used for this research, Equation 3-39 and 
Equation 3-40 could possibly be used as a point of reference for how well calibrated 
the expert is.  These equations are used to determine the density between the BC and 
WC estimates for the GEV Max and GEV Min distributions, respectively (a Normal 
distribution will always have a density of 0.997 based on the 3-sigma assumption 
used in this research (NIST 2017a).  For the GEV Max model, due to the assumption 
that the density below the BC estimate was 0.0001, the value of Δ from Equation 3-39 
can only change based on the location of the WC estimate.  The further away the WC 
estimate is from the ML estimate, the more density is accounted for between the two 
extreme estimates.  If the decision maker has a point of reference for what is 
considered an appropriate value of Δ, the location of the WC estimate could possibly 
be used to calibrate the expert.  For the GEV Min case, the same holds true, except 
that due to the assumption that the density in the range (-∞, WC] is 0.99995, Equation 
3-40 can only change based on the location of the BC estimate.  As in the GEV Max 
case, for a GEV Min prior, if the decision maker has a point of reference for her 
  
250 
 
beliefs regarding how much density falls between the BC an WC estimates for a well-
calibrated estimator, she can use these as a point of reference for calibration. 
Current project management software allows for the adjustment of the weights 
typically used to calculate a PERT Mean (Equation 2-1) (Microsoft 2017).  This 
adjustment allows a project manager to account for uncertainty by changin the 
weights of the three estimates (e.g.  if the pproject manager anticipates problems, the 
weight of the WC estimate can be increased and the weight of the ML estimated 
decreased).  Changing the weights, however, may violate the assumption of the mean 
of a beta distribution as approximated by Equation 2-1, so the method should be used 
with caution. 
The method proposed in this research is intended to model the subjective 
probability of the assessor.  Because these are subjective assessments, a DM can best 
account for uncertainty through the placement of the ML estimate.  Instead of 
influencing the mean by weighting one estimate more heavily than another, the DM 
should consider her beliefs and determine a ML estimate accordingly.  The mean for 
the two GEV models is then calculated using this ML value and the shape parameter 
(as calculated using the extreme estimates; BC for GEV Max and WC for GEV Min).  
For the Normal model, the mean will equal the ML value.  With respect to the 
Expert’s assessments, the DM can account for uncertainty through the calibration 
process.  The calibration function will alter the mean of the posterior distribution 
depending on the DM’s assessment of the Expert’s estimating abilities.  This allows 
the DM to effectively weight her faith in the expert because, due to the assumption of 
  
251 
 
independence, the estimate with the tigher variance will have the heavier the 
influence on the mean of the poseterior. 
7.3.5 Posterior Distribution  
Although the aggregation method was able to combine the estimates of 
multiple experts, there are things to consider when using the method.  When 
aggregating expert inputs, it was noted that with the addition of each input, the 
variance of the posterior distribution gets smaller (see Figure 6-8 vs. Figure 6-17).  
From a sampling theory view, this is correct: based on the assumption of 
independence, as more experts are added, the variance shrinks because their “errors” 
cancel (Gelman et al. 2013, 32).  If multiple experts are in relative agreement with 
one another, then the expert’s uncertainty regarding the actual duration can be 
drastically reduced.  Instead of accounting for a wide range of possible outcomes, the 
agreement among the experts means that the decision maker no longer needs to 
account for huge amounts of uncertainty.   
In gathering multiple opinions, the decision maker has followed the advice to, 
“Trust, but verify”.  As the number of experts increases, the variance of the posterior 
distribution may become unrealistically narrow.  This occurs as a function of the 
multiplication in Equation 3-28, but it may not accurately reflect the uncertainty of 
the decision maker, especially if the experts were not in reasonable agreement with 
one another and a wider variance is warranted to account for the disagreement. 
 Another issue noted in the calculation of the posterior distribution was the 
tendency of an outlying prior distribution to dominate the shape and location of the 
posterior.  As with the shrinking variance, this phenomenon is a result of the 
  
252 
 
multiplication performed in Equation 3-28.  In order to match the intent of PERT, an 
effort was made to concentrate most of the probability density between the BC and 
WC estimates of each stakeholder (both decision maker and experts).  Although the 
density outside these estimates is non-zero, it is still very small.  Because the shape of 
the curve of the posterior distribution is created by multiplying together the various 
PDFs of each stakeholder, for any given duration value, one small result of Equation 
3-14, Equation 3-16, Equation 3-18 can dominate the result of Equation 3-28.   
The dominating outlier will depend greatly on the prior probabilities of each 
stakeholder.  For example, assume each stakeholder’s prior is modeled by a GEV 
Max distribution.  For each stakeholder, the density below the BC value is negligible, 
but there may be considerable density above the WC estimate.  If there is a cluster of 
prior probabilities with their modes at the lower end of the number line with one 
outlier with its mode at the higher end, the negligible density of the outlier will 
collapse the larger densities of the clustered probabilities, even though they are in 
agreement with one another.  On the other end of the spectrum, the three clustered 
priors may still have considerable density as the density of the outlier begins to 
substantially increase.  Because each stakeholder has non-negligible density at that 
point, the posterior curve will also have non-negligible density which will cause the 
posterior distribution to more closely resemble the outlying prior distribution.   
The exact opposite case will happen in the GEV Min distribution, where the 
prior with the smallest mode will dominate the posterior.  For an outlier modeled by a 
Normal distribution, the variance of the outlier will play a role in how heavily it 
dominates the posterior.  The wider the variance, the smaller the affect the outlier will 
  
253 
 
have on the posterior because the density of the outlier will be spread across a wide 
region and will be more likely to still have significant density when it encounters the 
cluster of expert priors probabilities exists elsewhere on the number line. 
 As was seen in Figure 6-20, if stakeholders are in severe disagreement with 
one another where one expert is modeled with a GEV Max distribution and a mode 
on the lower end of the number line and another is modeled with a GEV Min 
distribution and a mode on the upper end of the number line, the outcome of Equation 
3-28 does not cleanly result in any of the three recommended approximation models.  
This is due to the multiplication of the priors from Equation 3-28.  Because the two 
priors are “facing” one another, their peaks are on opposite ends of the number line, 
but each still has considerable density when they intersect in the middle.  This results 
in an oddly shaped posterior that cannot be closely approximated by any of the 
recommended models (Winkler 1986, 302).  In cases such as these the 
recommendation is to either base the approximation on the general skew of the 
calculated posterior curve or to approach the stakeholders with the disagreement and 
request a reassessment of the estimate (Winkler 1968, 70).   
 
  
  
254 
 
Chapter 8:  Conclusions and Future Work 
 
 Projects are different, but the problems are the same.  External influences such 
as resource constraints and unforeseen issues can cause a project to fall behind 
schedule, but the biggest issues come from within.  Differences in opinion regarding 
how long an activity should take plague project managers as stakeholders vie to alter 
the planned duration based on past experiences and current constraints.  This research 
sought to understand the differences among stakeholders regarding activity and 
project durations and to develop methods to bridge those differences in opinion.   
8.1 Conclusions 
 Project stakeholders come from a wide variety of backgrounds, experience 
levels, and agendas.  A major part of this research was to determine how these 
differences manifested in scheduling practices.  To that end, project stakeholders 
from Wallops Flight Facility (WFF) were surveyed to gather information on 
demographics, risk aversion, project constraint preferences, and schedule estimation 
practices.  A method was then developed which allowed the aggregation of 
scheduling estimates from multiple stakeholders.  This aggregate estimate represented 
the sum knowledge of all participating stakeholders and allowed the project manager 
to create a plan based on input from her entire team. 
8.1.1 Influence of Demographics 
 Several constraints were analyzed by comparing responses of stakeholders in 
different demographics to determine if different demographics responded in 
  
255 
 
predictable ways.  When it came to project constraint preferences, Section 5.2.1  
showed there was a clear preference among those polled to sacrifice schedule and 
cost for the sake of preserving quality and reducing risk.  This correlated well with 
the literature from GAO reports seen in Section 2.1.1 which consistently showed the 
same results given the technical focus and culture of safety seen at NASA.  This 
prevailing attitude contributes to scheduling difficulties in the face of challenges 
because any challenge that threatens technical quality or is perceived as a potential 
threat to personnel safety could cause a schedule slip (Martin 2012, 11).  When 
constraints were looked at individually in Section 5.2.2, the results showed that only 
Schedule and Quality had significant factors driving the results.  The Position 
demographic drove the AHP weights calculated for the Schedule constraint, where 
Table 5-6 showed that technicians were more willing to sacrifice Schedule for the 
sake of Cost/Risk/Quality than the managers.   The LoE demographic drove the 
Quality constraint, where Table 5-6 showed that the those with more formal 
education were more ready to sacrifice Quality for the sake of Cost/Schedule/Risk 
than those with less formal education.   
With respect to risk aversion, the results seen in Section 5.2.3 did not appear 
to be driven by any particular factor.  Subjects across all demographic divides 
demonstrated a wide variety of risk behaviors, although several demonstrated the S-
curve as described by Kahneman and Tversky  (Kahneman 2011, 282; Raiffa 1968, 
94–95). 
 Confidence was treated not as confidence intervals, but as probability 
assessments of a binary yes/no event.  Section 5.2.4 showed that, when averaging 
  
256 
 
each participant’s confidence estimates, confidence increased as experience 
increased.  From these results, it can be concluded that personnel become more 
confident in their estimates and that they put more trust in their own judgment as they 
gain experience.   
 When it came to the actual schedule estimates, Table 5-8 showed that, with 
only one exception, the Position demographic was the major demographic driving the 
different responses.  For both the PERT average and each element making up the 
PERT average, managers could typically be depended upon to provide smaller 
duration estimates of work activities than did the technicians.  This leads to the 
conclusion that managers believed things could be done faster than the technicians 
believed they could be done.   
When comparing personality traits, Table 5-8 showed the Level of Formal 
Education (LoE) and Years of Experience (YoE) demographics began to play a larger 
role.  The conclusion is that, while certain personality traits tend to respond in certain 
ways, the primary driver in scheduling practices is the management/technical divide.  
There appear to be some weak correlations between personality traits and scheduling 
practices.  Given all of this, if a Decision Maker learns the demographic of a 
stakeholder, she can update her belief about the expected opinions of that stakeholder, 
and based on the correlations seen in this research, have an idea of what to expect 
from that stakeholder’s duration estimates.    
 Outside the scheduling estimates, Section 4.1 summarized several thoughts 
subjects had regarding general reasons why projects fall behind.  One of the factors 
mentioned frequently by members of both the management and technical 
  
257 
 
demographics was the inability of project staff to focus on one particular project.  
When staff were re-directed to other projects, it affected the original duration 
estimates which were made assuming a certain level of availability of project 
personnel (Gould 2005, 251).  Other factors mentioned included aggressive schedules 
and poor planning, which correlate to Kahneman and Tversky’s planning fallacy 
(Kahneman 2011, 246, 249).   
From a separate survey described in Section 4.2, results showed that 
management and technical subjects generally agreed that enough people were 
assigned to the projects, but once again pointed out that those assigned must be 
allowed to focus on the project at hand if schedules are to be met.  Results from this 
same survey showed that managers and technicians were also in relative agreement, 
when provided a list of activities, that each of those activities were necessary to 
successfully complete a project.  The major disagreement between the two groups 
resulted from a debate as to whether or not all activities required were provided in the 
list.   
The COA Survey described in Section 5.1 revealed a further disagreement 
between the two groups regarding what constituted “necessary” work.  Technicians 
polled were more likely to regard extra troubleshooting as risk mitigations whereas 
managers were divided as to whether or not extra troubleshooting constituted risk 
mitigation or gold plating.   
Based on these results, it is concluded that stakeholder perception plays a 
major role in determining the time allotted for a given activity.  During the planning 
process, these perceptions may be driving some of the aggressive schedules that were 
  
258 
 
mentioned by project subjects and in some GAO reports  (GAO 1980a, 52,37, 2014, 
10) as well as some of the debates on how long an activity should take.   
Table 5-8 showed that technicians typically provided longer estimates than the 
managers.  Table 5-14 showed technicians also tended to provide estimates that fell 
into the “pessimist” pattern, where the prior distribution was right-skewed.  Based on 
the results of the COA Survey in Table 5-2, they also preferred to have extra time to 
ensure their systems were operating at full capacity.  Table 5-8 showed that 
managers, on the other hand, provided smaller estimates overall and their variances 
were smaller, meaning that their extreme estimates were clustered more tightly 
around their “most likely” estimate.  Table 5-14 showed their prior distributions, 
although still favoring the “pessimist” model, were more likely to result in either the 
“optimist” or “neutral” models, where the distributions were left-skewed or did not 
have skew.  Given that managers typically develop and drive the schedules, these 
results show that scheduling tendencies among the managers, coupled with the 
planning fallacy may be creating the aggressive schedules mentioned by many 
subjects in Section 4.1.  Based on the results of the COA survey in Table 5-1 and 
Table 5-2, matters are further complicated by the fact that a manager may not 
perceive an activity as critical to the success of the project, while a technician may 
have the opposite view.   
If a schedule is only developed to the milestone level, which, according to the 
GAO reports in Section 2.1.3 (GAO 2014, 25) and the survey results in Section 4.1, 
tends to happen, a technician may be trying to pack extra activities into the milestone 
which the manager did not account for, activities which the technician believes are 
  
259 
 
critical to the success of the mission.  Combining these results leads to the conclusion 
that, because time was not allotted for these activities and the manager may not 
realize they are being accomplished, it may appear that the technician is including too 
much contingency in the estimate.  The technician, on the other hand, believes the 
activities are critical and may not realize that the manager has no knowledge of the 
activity.  This leads to a perception that the schedule is too compressed.  If this 
happens frequently, the confirmation bias may also be reinforcing the opinion that the 
schedule is always too compressed, as memories of projects completed early fade into 
the background.   
Managers, on the other hand, may be holding on to memories of successfully 
completed projects which reduce their perceived contingency required to successfully 
complete a project on time (Kahneman 2011, 80–81).  Ultimately, it is critical for 
project stakeholders to maintain open lines of communication to ensure that plans are 
communicated clearly (Kremer 2017b).  This can help ensure that perceptions of 
required tasks are in-line with one another which will lead to overall better 
scheduling.    
8.1.2 Aggregating Estimates 
 Even with good communication, there will still be different opinions about 
how long an activity should take.  Given the diversity of project stakeholders, Section 
3.5.3 described a specific application of Morris’ method to aggregate expert opinion 
as applied to the PERT methodology.  This allowed for better incorporation of those 
diverse opinions by calculating a single posterior distribution that encompassed the 
assessments of all stakeholders (Morris 1977, 680–86).  The procedure begins by 
  
260 
 
obtaining the three standard duration estimates, “best case”, “worst case”, and “most 
likely”, used in the PERT method from both the DM and the Experts (Malcolm et al. 
1959, 650–52).  In the PERT methodology, these estimates would be modeled by a 
beta distribution  (Clark 1962, 406).  The new method, as described in Section 3.5.1, 
models these estimates according to one of three different distributions:  the GEV 
Max distribution for the pessimists, the GEV Min distribution of the optimists, and 
the Normal distribution for the neutrals.  In Section 3.5.1, equations were developed 
to convert the estimates provided into the parameters that would eventually describe 
the shape of the prior distribution once the overall model was selected.  In this case, 
the estimate distribution was treated not as a frequency distribution of potential 
activity durations, but as a model of the stakeholder’s beliefs about duration (Dennis 
V. Lindley 1983, 4). 
Once the priors were established, Section 3.5.2 described a method of 
calibrating the Expert priors, again based on the work of Morris (Morris 1977, 682–
84).  The beta distribution was chosen as a descriptive model of the calibration filter.  
This allowed the mode of the calibrated prior to remain the same, but modified the 
variance to reflect the DM’s belief regarding whether or not the Expert overstated, 
understated, or correctly stated his knowledge of the situation.  Tables in Appendix 
A.10 – Appendix A.12 relate the beta parameters to the likelihood of surprise for the 
three distributions used to model the Expert’s prior duration assessments.  
Once the prior distributions are established and calibrated, a posterior 
distribution is calculated, using Equation 3-28, which is the final step in Morris’ 
method (Morris 1977, 686).  Given the algebraic complexities of determining the 
  
261 
 
mean and variance of this posterior distribution, Section 3.5.3 described the method 
used to approximate the results of the posterior curve as calculated by Equation 3-28.  
The three approximation models chosen were the GEV Max distribution, the GEV 
Min distribution, or the Normal distribution.  Section 3.5.3 describes the equations 
used to calculate the parameters of these approximating distributions.  These 
approximations allowed for the quick calculation of expected value and variance, 
which are required for the development of a network schedule.   
Even with good communication, project stakeholders will have different 
opinions about how long an activity should take.  The posterior distribution derived 
using this method is reflective of the collective knowledge of all participating project 
stakeholders.  Biases are more likely to cancel each other out (Surowiecki 2005, 10) 
and each stakeholder will know that their input was included in the derivation of the 
final schedule.   
 
8.2 Future Work 
 The methods described above provide a synopsis of the diversity of project 
stakeholders and a new method for aggregating those diverse opinions.  There are 
still, however, several areas ripe for further research and development.  The sections 
below provide a summary of those areas of improvement. 
8.2.1: Participant Dependence 
For the purposes of this research, it was assumed that the Experts were all 
statistically independent when calculating the posterior.  Research has shown, 
however, that it is extremely difficult to have true independence given how humans 
  
262 
 
with similar backgrounds tend to cluster together (Winkler 1968, B-65, 1981, 480; 
Clemen 1987, 373).  An avenue of further research is to determine the dependency of 
the Experts and develop a method to incorporate this dependency into the assessment 
of the posterior distribution. 
8.2.2 Research Expansion and Refinement 
 This study gathered data from a very particular niche within the workforce.  
One area of further research is to expand beyond this particular group into other 
projects, not only at Wallops Flight Facility, but NASA at large to determine whether 
or not the trends noted here are local or global to NASA.  Beyond that, expansion into 
other career fields is another area of exploration.  Applying the same processes, 
personnel in areas such as construction, healthcare, and software development could 
be studied.  For example, are there differences in the perception of how long it takes 
to care for a patient between hospital administrators and the doctors who provide the 
care (Dr. Jeffrey Neely, personal communication)?  Do personnel in the construction 
industry feel the same way about the project constraints as was suggested in this 
research by personnel at NASA? 
 Beyond the potential organizational culture differences, there may be global 
cultural differences that should be considered.  This research was conducted in the 
United States, but if the assessments completed in this research were to be conducted 
elsewhere in the world, the results may differ based on cultural priorities and norms  
(Grisham 2010, 104).  Knowing these priorities and scheduling tendencies in advance 
could allow cross-cultural project teams to account for these differences when 
developing project schedules.  Although the results may differ from those found in 
  
263 
 
this research, if other organizations and industries around the world struggle with 
disagreements about how long a project should take, this research provides some 
suggestions for understanding the “why” behind those differences and a method to 
merge those differences into a single estimate a project manager can use in the 
development of a schedule.  
 Additionally, further refinements to the methods used in this research could be 
used when polling a larger group.  For example, the constraints questions could be 
further specified to determine if there are consistent “break points” among the 
preferences based on specific project inputs, where one preference gives way to 
another.  A collection of actual activity durations could be compared to the estimates 
to help determine a method for accuracy calibration among the participants. 
 
8.2.3 Data for the Decision Maker  
 While the two parts of this research (as described in Section 1.2) were 
independent of one another in that the model used to update the estimates did not 
account for the traits/estimating trends of the subjects providing the estimates, one 
area of future work is to merge the two parts together to develop a method to 
incorporate an estimator’s traits as data from which a decision maker can update her 
beliefs.  For example, if the decision maker knows that one group tends to estimate 
longer durations than another, this information could be used as part of the Bayesian 
updating process or perhaps as a method to calibrate various stakeholders to account 
for differences in project constraint priorities. 
  
264 
 
 For companies that maintain a knowledge repository, one possible area of 
study is the development of a method for incorporation of this knowledge into the 
estimating process.  Using the Bayesian model described in this research, a repository 
of project completion times could serve as a base-rate from which the project 
manager could start her assessment.  Details specific to her project would then allow 
her to update her prior to more accurately reflect her beliefs about her specific project 
given the current project requirements and constraints.  Maintaining a record of 
different expert’s estimates versus actual completion times would also aid in the 
efforts to calibrate the experts.  Once this data was collected, another area of future 
study could be to compare project managers who use the knowledge repository as 
compared to those who do not to determine whether or not the data improve the 
accuracy of the schedule estimates. 
8.2.4 Communication of Assumptions 
 For the scheduling estimates collected in this research, “project completion” was 
defined as the completion of all tasks listed in the scheduling survey.  It has been pointed out, 
however, that different team members will have different definitions of what constitutes the 
completion of the project.  These different definitions could affect resource allocation if an 
organization is involved in more than one project, as one stakeholder perceives a project to be 
complete while another may still have tasks to accomplish.  This, then, can also lead to 
project team members continuing to ask for assistance from other members who believed 
they were complete with the project and could allocate their time elsewhere.  One area of 
future research is to study the definition of project completion among different stakeholders 
and determine how best to incorporate those definitions when developing the project schedule 
and advertising the completion date. 
  
265 
 
 One finding of this research suggested that one reason for differences in activity 
estimates was due to differing assumptions about what actions were required to successfully 
complete the activity.  An area for future research would be to ask participants to once again 
provide estimates, but to explicitly state all assumptions about the sub-activities that 
comprised each activity (Moder and Rodgers 1968, B-82).  This would help ensure that any 
differences seen in the estimates were driven by beliefs about how long an activity should 
take as opposed to a lack of communication regarding the sub-activities that constituted the 
activity listed in the schedule.  
 Successful completion of a project includes all aspects of project management, both 
technical and programmatic.  Based on statements made in an IG Report (Martin 2012, 13–
14) and the results seen in this study regarding project constraint preferences, it appears that 
technical and programmatic success are treated as two separate entities and that technical 
success takes precedence over programmatic success.  One are of future research is to 
determine why this perception exists, if it exists in organizations outside of NASA, and how 
to change the perception such that management of all project constraints is considered when 
evaluating project success.   
 
8.2.5 Dominating Outliers  
 It was noted on several of the example estimates that, in a group of estimates, 
an extreme outlier would often dominate the mode of the posterior distribution due to 
the differing density levels in the tails of the priors as described in Section 7.3.5.  If, 
however, multiple experts are in agreement in their opinion on a duration estimate 
with only one outlier expressing disagreement, the posterior distribution should 
reflect this.  One area of further research is to determine a method that accounts for 
  
266 
 
the agreement among the experts and lessens the impact of the outlier on the posterior 
distribution. 
8.2.6 Confidence and Risk 
While not statistically correct, it is clear from Section 5.2.4 that a confidence 
rating on a single number has a meaning attached to it for many people.  One 
recommended avenue of study is to look more closely into this interpretation of 
confidence and better explore the meaning stakeholders assign to the value.  After 
requesting this information, subjects would be asked to explain exactly what that 
confidence value means and how they interpret the value.  The same could be said for 
the “Risk” project constraint discussed in Section 5.2.1.  Although not clearly 
defined, most stakeholders appeared to have some idea of a concept of risk 
independent of any other constraint (e.g. schedule risk or cost risk or quality risk).  
While it is believed that subjects perceived risk as a threat to mission success or 
personnel safety, further study into these perceptions are warranted. 
8.2.7 Approximations and Direct Calculation 
 In an effort to simplify the calculations required to determine the expected 
value and variance of the posterior distribution, approximations of the posterior curve 
resulting from Equation 3-28 were used.  This method involves calculating the PDF 
of the posterior at set intervals across the range of predicted duration estimates and 
then matching that curve with an approximation with a known equation for the mean 
and variance.  The approximations used in this case were defined over the interval 
from (-∞ to +∞) to accommodate and encompass a wide range of expert duration 
  
267 
 
estimates  (“Generalized Extreme Value Distribution - Wikipedia” 2016, “Normal 
Distribution” 2016).  In some cases, this allowed a non-zero probability that the 
duration of an activity could be negative, which is a physical impossibility.  One 
recommendation to remedy this is to use a gamma distribution for the “pessimists” 
and a Weibull distribution for the “optimists”.  These distributions are both defined 
from [0, +∞) and would ensure no duration estimate would be less than zero  
(“Gamma Distribution - Wikipedia” 2016, “Weibull Distribution” 2016).  This would 
require recalculating many of the equations in Sections 3.5.1 and 3.5.3 to reflect the 
new distributions with their defining parameters.  A distribution would also need to 
be found that roughly modeled the Normal distribution, but was only defined on the 
positive number line.  Perhaps a better solution would be to develop an equation that 
described the posterior curve, where the parameters were defined by the estimates 
provided by the stakeholders.  Once a method for calculating the mean and variance 
of that curve was determined, then the approximation would no longer be necessary.  
8.2.8 Filter settings 
As part of the Bayesian updating process, Decision Maker’s (DM’s) are asked 
to assess the likelihood of the Expert being surprised by the actual duration.  In 
Section 3.5.2, surprise was defined as the actual duration falling below the 0.1 fractile 
or above the 0.9 fractile.  For the purposes of this research, tables were developed 
which defined the α and β parameters required for each likelihood assessment from 
0.01 to 0.99 (see Appendix A.10 – Appendix A.12).  Based on the current procedure, 
once the DM decides on a likelihood value, she would need to reference these tables 
  
268 
 
in order to choose the appropriate values for α and β.  It may be possible, however, to 
calculate these values knowing only the likelihood assessment of the DM. 
From Figure 8-1, it can be seen the relationship of α and β is linear.   
 
Figure 8-1: Relationship of α and β for the Beta Filter 
Once α is determined, β can be easily calculated using one of the three 
equations in Table 8-1, based on the form of the prior distribution: 
Prior Distribution: Relationship of α and β: 
GEV Max β = 1.7181α – 0.7181 
GEV Min β = 0.582α – 0.418 
Normal α = β 
Table 8-1: Relationship of α and β for the Beta Filter 
There is also a relationship between the value of α and the likelihood of 
surprise assessed by the DM.  This relationship is not linear, but it does follow a clear 
pattern that could be described by a formula.    
  
269 
 
 
Figure 8-2: Relationship of Likelihood of Surprise and α for the Beta Filter 
Although the Excel™ Trendline function provided a close approximating 
formula, when compared to the results from the table, the formula began to break 
down and some values of the likelihood were skipped altogether due to rounding.  
Further study into the relationship between α and the likelihood of surprise would 
render the tables created for this research unnecessary.  Once the DM had assessed 
her belief in the likelihood of surprise, she could use a formula to determine α, and 
then go on to use the formulas provided in Table 8-1 to compute the β parameter. 
 For this research, the assessment of the likelihood of surprise was left entirely 
to the discretion of the DM.  It was a subjective assessment based on the DM’s 
knowledge of different Experts and their propensity to over- or under-exaggerate their 
level of knowledge.  Given that objective data can be difficult to come by, one further 
area of study is to provide the DM a point of reference from which to base her 
  
270 
 
likelihood assessment.  One recommendation is to use the Δ value calculated in 
Equation 3-39 Equation 3-40 as a point of reference.  If, for example, the density 
between the Expert’s “best case” and “worst case” estimates is less than 0.997 (i.e. 
the 3σ value recommended in Section 3.5.1), then the DM should use the “overstated” 
model.  If it is above 0.997, she should use the “understated” model.  An Expert 
density of exactly 0.997 would not require any calibration.  The challenge would be 
to determine an acceptable value of Δ to use as the cut-off point and also to determine 
an appropriate way to relate the likelihood value to the value of Δ.  Another 
recommendation is to record the Expert’s self-assessed confidence intervals on the 
“best case” and “worst case” estimates and then compare those self-assessed values to 
the density between those values as calculated by Δ.  The challenge in this case would 
again be how to correlate the differences noted with a likelihood assessment.   
  
  
271 
 
Appendices 
 
This section includes the surveys provided to each of the subjects, as well as the 
responses gathered from each of the surveys.  This is the raw data that was used 
determine the results and conclusions presented here.  Below is a list describing the 
content of each of the appendices.  
Appendix # Title Description 
A.1 Recruitment Email Email used to contact prospective participants and determine their 
interest/availability to participate in the study 
A.2 Traits/ 
Opinions Survey 
Gathered the required demographic information from the subject.  
Also gathered the data required to study the questions regarding 
constraint preferences and risk tolerances 
A.3 Scheduling Survey Collected data on duration estimates for specific projects at 
Wallops Flight Facility.  Also gathered data on opinion of the 
provided task list and project personnel availability 
A.4 Follow-On Survey Intended to gather data to compare actual durations with the 
estimated durations.  Information provided was not usable, so 
results were not included in this research 
A.5 “Course of 
Action” Survey 
Collected data on the perception managers and technicians of 
what constitutes “necessary” work 
A.6 Participant List Provides a summary of participants who consented to participate 
in the study identified by their random identifier and demographic 
category 
A.7 Utility results Consolidate results of the Risk Tolerance questions found in the 
Traits/Opinions survey 
A.8 AHP results Consolidates results of the Project Constraint Preferences 
questions found in the Traits/Opinions survey 
A.9 Scheduling Survey 
– Estimation 
Results and 
Calculations 
Summarizes the results of the duration estimations, confidences 
and calculated means and variances for the project surveys 
provided 
A.10 GEV Max Beta 
Filters 
Table providing the required hyper-parameters to calibrate an 
Expert with a GEV Max prior 
A.11 GEV Min Beta 
Filters 
Table providing the required hyper-parameters to calibrate an 
Expert with a GEV Min prior 
A.12 Normal Beta 
Filters 
Table providing the required hyper-parameters to calibrate an 
Expert with a Normal prior 
A.13 DesignExpert™ 
Experiment 
Settings 
Tables describing the configuration settings needed to re-create 
the experiments conducted here using the DesignExpert™ 
program 
  
272 
 
 
A.1 Recruitment E-mail 
All, 
 Good morning!  I was wondering if I could get your help with something.  
Some already know about this, but if you don’t, I’ve started a research project for a 
degree I’m working on that’s looking at how we here at Wallops schedule projects.  
Some have helped me out with some data gathering last winter, but I found out some 
requirements the University of Maryland has regarding research, so I need to re-do a 
few things.  All that to say: 
This research will take a look at how different people schedule project activities.  The 
basic plan of the research is to get several estimates of how long people think a 
project will take and then compare it to the actual project to see how long it actually 
took.  Hopefully, it will show who we should be listening to when it comes to 
scheduling a project.  I’ll be tracking several projects over the course of the next year 
or so and basically (for projects that are selected and that you have been assigned to 
work) I would just need you to fill out a survey for each project on how long you 
think an activity should take (bonus: since these are real-world projects, your input 
may be used in building the actual schedule for the project).  The scheduling surveys 
should take about 10 minutes to complete…maybe up to 30 minutes if you want to be 
really thorough on some of the questions.  It will basically be like when a Range 
Services Manager (RSM) would ask you how long you think something should take, 
just with a few more details.  There’s also a “Follow-On” survey where you’ll track 
how long it took to do different activities in each project along with a few other 
questions about how the project played itself out.  If you fill out your hours as the 
project goes along, it should take 5-10 minutes to complete the survey.  If you wait to 
the end, it may take a little longer since you’ll need time to go back and figure what 
you did (maybe closer to 30-60 minutes, but that’s worst-case-scenario).  There’s one 
other survey that will only be completed once which will gather some basic 
demographic information (education, years of service at Wallops, etc.).  That survey 
should take maybe 30 minutes, again depending on how thorough you want to be.   
Participation would be entirely voluntary and if you decide not to participate, it won’t 
affect anything.  All final data will be masked so your responses won’t be tied back 
specifically to you.  <Name removed> and <Name removed> have both given me 
permission to press forward with this research, so no worries there either.  
If you’re willing to help me out, shoot me an e-mail to let me know, and I’ll get in 
contact with you from there on the next few steps.   
Thanks! 
Lauren 
P.S. It would be best to only contact me by e-mail (as opposed to in person) for 
questions and for these next steps to help maintain anonymity. 
  
  
273 
 
A.2 Traits/Opinions Survey 
All of the following questions are optional.  While all answers will be useful to this 
research, if any particular question or group of questions make you uncomfortable, 
you are free to leave the question blank. 
Demographics 
Please enter the 3-digit code given to you when you signed the consent form: ____ 
 
 
1. Which category best describes your current position with the company 
(Senior Management would be anyone not directly involved with the project.  
Focused more on the program level) 
 - Technical 
 -  Management 
 
 
2. How long have you been in your current position? (if you have worked in a similar 
position at a different company, include that time) 
 
 
3. What is your completed level of education? 
 - High School Diploma 
 - Associates Degree (or equivalent) 
 - Bachelors  
 - Masters 
 
 
 
 
 
Risk Tolerance 
The following 5 questions will be used to determine your risk tolerance.  Your 
answers will form points on a curve, the shape of which is a good indicator of 
whether you are risk tolerant or risk averse.   
 
For these questions, consider the following scenario.  You have been given a chance 
to play a "lottery" to win $5000.  The person offering this lottery has given you a 
choice.  You can either play the game, or he will trade you your chance at $5000 for 
straight cash that he will pay on the spot.  The questions below ask you to determine 
the lowest monetary value for which you will trade your chance at winning.  For 
example, if you had a 5% chance of winning $5000, would you trade that 5% chance 
for $15 on the spot? $20?  Remember to use the LOWEST dollar amount you would 
be willing to trade. 
 
  
274 
 
1. You have a 10% chance of winning $5000.  How much money would you trade 
this chance for if you knew you could walk away with cash in-hand? 
 
2. You have a 35% chance of winning $5000.  How much money would you trade 
this chance for if you knew you could walk away with cash in-hand? 
 
3. You have a 50% chance of winning $5000.  How much money would you trade 
this chance for if you knew you could walk away with cash in-hand? 
 
4. You have a 68% chance of winning $5000.  How much money would you trade 
this chance for if you knew you could walk away with cash in-hand? 
 
5. You have a 87% chance of winning $5000.  How much money would you trade 
this chance for if you knew you could walk away with cash in-hand? 
 
 
 
 
 
Preference Analysis 
The following questions deal with what's important to you with respect to different 
aspects of project completion.  Each question asks your preference between two 
options and then using a scale of 1-9 (where 1 means no preference and 9 means a 
strong preference) to indicate how strong your preference is for your chosen option.  
As we all know, projects encounter a variety of problems as they progress.  The 
questions below seek to determine which project constraints you consider more 
important to manage.  To answer the question, type in the preference followed by the 
strength of your preference.  For example, if you have a moderate preference of 
dealing with an increase in cost over an increase in schedule, answer the question 
"Cost, 3".  Below is a table describing the meaning behind the 1-9 values. 
 
1- A and B equally important 
3- A slightly more important than B 
5- A strongly more important than B 
7- A very strongly more important than B 
9- A absolutely more important than B 
2,4,6, and 8- intermediate values between the defined numbers 
 
 
 
 
 
 
 
 
 
  
275 
 
Examples: 
1. Increased Cost indicates that your overall project cost will increase, but you may 
have the advantage of decreased schedule, increased quality, etc. 
2. Increased Schedule indicates that your overall schedule will increase, but you may 
have the advantage of increased quality or decreased risk 
3. Increased Risk indicates that your overall risk of failing to meet project objectives 
will increase, but you may have the advantage of less cost or a shorter schedule 
4. Decreased Quality indicates that your project quality may decrease (example: 
antenna has a jitter, but it still "works"), but you might finish early or under budget or 
schedule 
5. Decreased Resources indicates that your project may have fewer resources, but you 
may finish under budget 
 
 
1. Would you prefer an Increased Cost or an Increased Schedule? On a scale of 1-9, 
by how much do you prefer your chosen option? 
 
2. Would you prefer an Increase Cost or Increased Risk?  On a scale of 1-9, by how 
much do you prefer your chosen option? 
 
3. Would you prefer Increased Cost or Decreased Quality?  On a scale of 1-9, by how 
much would you prefer your chosen option? 
 
4. Would you prefer Increased Schedule or Increased Risk?  On a scale of 1-9, by 
how much do you prefer your chosen option? 
 
5. Would you prefer Increased Schedule or Decreased Quality?  On a scale of 1-9, by 
how much do you prefer your chosen option? 
 
6. Would you prefer Increased Risk or Decreased Quality?  On a scale of 1-9, by how 
much do you prefer your chosen option? 
 
 
 
 
 
 
 
  
  
276 
 
A.3 Scheduling Survey 
1. Below is a list of activities that must be completed in order to finish the  
___________ Project.  For each of the activities below, please write down how long 
you believe each activity will take in the “Most Likely” column based on your 
experiences at Wallops and elsewhere.  Next to your estimate, please list how 
confident you are (one a scale from 0% - 100%) Please also write down how long you 
think the activity will take in a “best case” scenario (everything goes right) and a 
worst case scenario (everything goes wrong).   
Estimates can be in either hours or days (example: 4 hours, .5 days, 3 days, .5 hours, 
etc.) 
Assume a work week is ___ hour days, Monday - _______ 
Assume a team of ______ people  
Assume personnel are working on the project ______ % of their time 
Activity Name Most 
Likely/Confidence? 
Best Case Worst Case 
1. Activity 1    
2. Activity 2    
3. Activity 3    
4. Activity 4    
 
2. Do you believe more people need to be added to the project?  If so, how many 
more? 
 
3. Do you believe that each of the activities listed above needs to be completed in 
order to successfully finish this project?  If not, which ones can be removed (write 
down activity number or name) 
 
4.  Do you believe there are any additional activities required to successfully 
complete this project that are not listed above?  If so, list them below (use a second 
page if more space is required). 
 
 
 
 
 
 
 
 
 
 
 
  
277 
 
5.  Given everything you know about the above project (team assigned, activities 
required) and operations at Wallops: 
For Engineering Projects: What date do you believe this project will be completed?   
For Deployments: What date to you believe the team needs to deploy in order to be 
ready for mission support 
In both cases, please provide your “most likely” estimate with a confidence level 
(from 0% - 100%) your “best case” estimate and your “worst case” estimate  
 
 Most 
Likely/Confidence? 
Best Case Worst Case 
Project Completion 
Date 
   
 
If you would like to make any comments about the reasoning behind your estimates 
or any of your other answers, please provide them on the next page. 
Comments: 
 
  
  
278 
 
A.4 Follow-On Survey 
1. Now that you’ve completed the project, please list how long each of the activities 
actually took.  If you did not work on all of the activities, fill out the sheet based on 
the activities you did work on. 
Work week was:  ___ hour days, Monday - _______ 
Team consisted of:  ______ people  
You/Team was able to work on the project:  ______ % of their time 
 
Note: These “actual times” can be in hours or days (example: 4 days, 3 hrs, .5 hours, 
etc.) 
Activity Name Actual Completion Time 
1. Activity 1  
2. Activity 2  
3. Activity 3  
4. Activity 4  
5. Activity 5  
  
2. Were there any unexpected challenges?  What were they? 
 
3. Did you feel like you had enough time to complete each activity to your 
satisfaction or did you feel rushed to meet deadlines? 
 
If you would like to make any comments about the reasoning behind your estimates 
or any of your other answers, please provide them on the next page. 
Comments: 
  
  
279 
 
A.5 “Course of Action” (COA) Survey 
1. Management or Technical? 
 Pick whichever option you picked for the previous survey 
 
o Management 
o Technical 
 
 
<depending on what the participant chooses, it will direct them to one of two pages as 
described below> 
 
 
 
<If the Participant chooses Management> 
 
For Management 
Your team has been testing a piece of equipment for 2 weeks.  Right now, it meets 
requirements, but is just barely within the specification.  They’ve worked with 
equipment like this before and have stated its performance could be better.  They 
think they know where the issue is and they believe an extra week of testing will get 
the system up to its full capability.  The project is currently right on schedule and this 
extra week will mean a schedule slip that will increase the overall cost of the project 
and delay the readiness review by one week.   
 
1. Do you leave the system “as is” and press forward with the readiness review 
or take the extra week to get the system up to its full capability? 
 
o Leave “as is” and press forward 
o Take the extra week to work on the system 
 
 
2. Given that the system currently meets the requirements, would you consider 
the extra week gold-plating (i.e. going unnecessarily above and beyond the 
requirement) or risk mitigation (i.e. less chance the equipment could fail)? 
 
o Gold-plating 
o Risk Mitigation 
 
3. Given your experiences throughout your professional career, why do you 
believe projects fall behind schedule? 
Please keep your responses generic (don't identify specific people or projects) and remember to also 
keep them respectful! 
 
 
  
280 
 
 
<If the Participant chooses Technician> 
 
 
 
For Technicians 
You’ve been testing a piece of equipment for 2 weeks.  Right now it meets 
requirements, but is just barely within the specification.  You’ve worked with 
equipment like this before and know its performance could be better.  You think you 
know where the issue is and you believe an extra week of testing will get the system 
up to its full capability.  The project is currently right on schedule and this extra week 
will mean a schedule slip that will increase the overall cost of the project and delay 
the readiness review by one week.   
 
1. Do you leave the system “as is” and press forward with the readiness review 
or take the extra week to get the system up to its full capability? 
 
o Leave “as is” and press forward 
o Take the extra week to work on the system 
 
 
2. Given that the system currently meets the requirements, would you consider 
he extra week gold-plating (i.e. going unnecessarily above and beyond the 
requirement) or risk mitigation (i.e. less chance the equipment could fail)? 
 
o Gold-plating 
o Risk Mitigation 
 
3. Given your experiences throughout your professional career, why do you 
believe projects fall behind schedule? 
Please keep your responses generic (don't identify specific people or projects) and remember to also 
keep them respectful! 
 
 
 
 
 
 
 
 
  
281 
 
A.6 Participant List 
Participant ID Demographic 
Designator 
408 M1B 
481 M1M 
498 M1M 
164 M1M 
548 M1M 
969 M1M 
858 M1T 
838 M2B 
380 M2B 
458 M2M 
518 M2M 
148 M2T 
157 M2T 
222 M4B 
 
 
 
 
 
 
For the charts above: 
Position Demogrpahic: 
M = Management 
 T = Technical 
 
Years of Experience (YoE) Demographic: 
1 = 0 - 7   
2 =  8 - 14  
3 =  15 - 23  
4 =  24+ 
 
Level of Formal Education (LoE) 
Demographic: 
M = Master’s 
B = Bachelor’s 
T = Technical/Associates degree 
H = High School 
 
 
 
 
 
 
Participant ID Demographic 
Designator 
127 T1B 
191 T1B 
739 T1B 
912 T1B 
399 T1H 
824 T1M 
670 T1T 
203 T2B 
712 T2B 
819 T2B 
158 T2H 
538 T2M 
661 T2T 
434 T2T 
493 T2T 
441 T2T 
857 T3B 
315 T3M 
396 T3T 
774 T4H 
619 T4H 
798 T4H 
424 T4H 
463 T4T 
  
282 
 
A.7 Utility results 
In the chart below, the bold/underlined 3-digit numbers on the top row represent subject designators.  The first column tells the subject 
their probability of winning $5000.  The second column represents CME for the lottery (P[win] * 5000).  Values below the subject 
designators represent each participant’s “cash on the table” value for which they would trade their chance at winning $5000. 
P(win) Utility 458 408 481 838 157 498 164 548 148 380 969 858 222 518 
0.1 500 750 500 500 1000 100 500 20 300 5 500 501 750 350 250 
0.35 1750 2000 2000 1750 2500 250 1750 40 1500 25 2500 1751 2000 1000 1000 
0.5 2500 3000 2500 2500 5000 1000 2500 50 2000 100 3500 2501 2750 2000 2000 
0.68 3400 4500 3200 3400 5000 1500 3400 75 3000 200 4000 3401 3750 3000 3500 
0.87 4350 4950 4250 4350 5000 2000 4350 100 3500 250 0 4351 4500 4000 4000 
  
P(win) Utility 661 774 434 493 396 619 399 798 203 158 463 538 424 127 
0.1 500 100 50 500 1000 50 200 499 500 10 1000 1000 400 1000 100 
0.35 1750 350 100 1750 2500 50 350 1749 1750 50 1500 2000 1575 2000 200 
0.5 2500 500 2000 2500 3000 100 500 2499 2500 250 2000 3500 2250 4000 1000 
0.68 3400 2500 2500 3400 3500 100 1000 3300 3400 2000 3000 4000 3060 0 2500 
0.87 4350 3500 4200 4350 4000 150 1500 4351 4350 4000 4000 4500 3915 0 4000 
  
P(win) Utility 670 824 191 739 441 712 819 857 315 912      
0.1 500 400 20 500 500 250 20 25 250 50 500      
0.35 1750 2000 25 1750 1750 1000 40 100 850 250 700      
0.5 2500 4000 100 2500 2500 2500 100 200 1250 500 2500      
0.68 3400 4500 200 3400 3400 3000 200 400 2500 1000 3500      
0.87 4350 4750 1000 4350 4350 4000 500 600 3500 2000 4250      
 
 
  
  
283 
 
A.8 AHP Results 
Each box represents the results from a single participant.  The 3-digit number in the top row is the subject designator.  The first row 
lists the constraints and the weight assigned to each constraint based on the results of the “Preference Analysis” section of the 
“Traits/Opinions” Survey (Appendix A.2) and Equation 3-1.  The final column is the consistency among the weights for each subject 
calculated using Equation 3-2. 
458 Weights  408 Weights  148 Weights  380 Weights  
Cost 0.131  Cost 0.260  Cost 0.620  Cost 0.239  
Schedule 0.604  Schedule 0.183  Schedule 0.270  Schedule 0.572  
Quality 0.035  Quality 0.052  Quality 0.066  Quality 0.049  
Risk 0.230  Risk 0.505  Risk 0.044  Risk 0.140  
Consistency 0.422  Consistency 0.242  Consistency 0.206  Consistency 0.374  
            
838 Weights  518 Weights  969 Weights  858 Weights  
Cost 0.334  Cost 0.549  Cost 0.112  Cost 0.275  
Schedule 0.452  Schedule 0.266  Schedule 0.062  Schedule 0.519  
Quality 0.106  Quality 0.140  Quality 0.191  Quality 0.119  
Risk 0.108   Risk 0.044  Risk 0.635  Risk 0.088  
Consistency 0.110  Consistency 0.395  Consistency 0.088  Consistency 0.142  
            
548 Weights  498 Weights  222 Weights  164 Weights  
Cost 0.084  Cost 0.391  Cost 0.362  Cost 0.552  
Schedule 0.136  Schedule 0.208  Schedule 0.427  Schedule 0.265  
Quality 0.469  Quality 0.236  Quality 0.069  Quality 0.114  
Risk 0.311  Risk 0.165  Risk 0.142  Risk 0.069  
Consistency 0.159  Consistency 0.674  Consistency 0.033  Consistency 0.083  
           
 
 
 
 
  
284 
 
 
661 Weights  774 Weights  493 Weights  396 Weights 
Cost 0.335  Cost 0.440  Cost 0.507  Cost 0.300 
Schedule 0.502  Schedule 0.404  Schedule 0.289  Schedule 0.602 
Quality 0.037  Quality 0.075  Quality 0.050  Quality 0.049 
Risk 0.126  Risk 0.081  Risk 0.154  Risk 0.049 
Consistency 0.127  Consistency 0.005  Consistency 0.198  Consistency 0.129 
              
399 Weights  798 Weights  203 Weights  158 Weights 
Cost 0.495  Cost 0.433  Cost 0.318  Cost 0.272 
Schedule 0.313  Schedule 0.433  Schedule 0.534  Schedule 0.600 
Quality 0.108  Quality 0.070  Quality 0.046  Quality 0.063 
Risk 0.083  Risk 0.064  Risk 0.101  Risk 0.065 
Consistency 0.059  Consistency 0.001  Consistency 0.077  Consistency 0.107 
            
463 Weights  538 Weights  434 Weights  424 Weights 
Cost 0.529  Cost 0.491  Cost 0.072  Cost 0.304 
Schedule 0.315  Schedule 0.350  Schedule 0.041  Schedule 0.514 
Quality 0.051  Quality 0.097  Quality 0.444  Quality 0.065 
Risk 0.105  Risk 0.062  Risk 0.444  Risk 0.116 
Consistency 0.086  Consistency 0.035  Consistency 0.060  Consistency 0.118 
           
127 Weights  670 Weights  824 Weights  191 Weights 
Cost 0.260  Cost 0.334  Cost 0.557  Cost 0.348 
Schedule 0.541  Schedule 0.376  Schedule 0.248  Schedule 0.498 
Quality 0.059  Quality 0.041  Quality 0.138  Quality 0.118 
Risk 0.140  Risk 0.249  Risk 0.057  Risk 0.035 
Consistency 0.271  Consistency 0.510  Consistency 0.072  Consistency 0.312 
           
  
285 
 
           
739 Weights  441 Weights  712 Weights  819 Weights 
Cost 0.291  Cost 0.283  Cost 0.249  Cost 0.609 
Schedule 0.491  Schedule 0.581  Schedule 0.619  Schedule 0.247 
Quality 0.151  Quality 0.037  Quality 0.045  Quality 0.036 
Risk 0.067  Risk 0.099  Risk 0.088  Risk 0.108 
Consistency 0.074  Consistency 0.230  Consistency 0.197  Consistency 0.407 
           
857 Weights  315 Weights  619 Weights  912 Weights 
Cost 0.161  Cost 0.284  Cost 0.262  Cost 0.514 
Schedule 0.554  Schedule 0.494  Schedule 0.582  Schedule 0.299 
Quality 0.044  Quality 0.109  Quality 0.114  Quality 0.044 
Risk 0.241  Risk 0.114  Risk 0.043  Risk 0.143 
Consistency 0.188  Consistency 0.186  Consistency 0.386  Consistency 0.197 
 
 
 
 
  
  
286 
 
A.9 Scheduling Survey – Estimation Results and Calculations 
The following pages in Appendix A.9 provide the results of the Scheduling Surveys.  The chart below gives an example of how to 
interpret the surveys.  The Demographic Designator corresponds to the designators described in Section 3.2.1.  For each value, the 
number in parenthesis represents the 3-digit participant designator.  For example, ML (481) is the row listing the “most likely” 
estimates for Activity 1 – Activity 11 for participant 481.  Participant 481 is a manager with 0-7 years of experience and a Master’s 
degree (M1M).   Cells with “BLNK” show where the participant did not provide an estimate or the value could not be calculated due 
to missing data.  Cells highlighted in grey indicate that responses were provided shortly after the project started, but before any real 
durations could be reported.  If the PERT duration of an activity could not be calculated for one person within a survey, that activity 
was removed from the summation for all participants for that activity and replaced with “BLNK”.  Confidence estimates were also set 
as “BLNK” for any participant that did not provide a ML estimate (even if they provided a confidence estimate).  If the BC estimate 
was larger than the ML estimate, estimates for that activity were replaced with “BLNK”. Units are in hours.  Note that given the small 
sample size and close working quarters at Wallops Flight Facility, there are five “dummy projects” inserted in this appendix to help 
protect the anonymity of the subjects.  These are fictitious projects with fictitious estimates made to resemble the actual projects.  Data 
from these projects was not used in the analyses completed in this study. 
 
Project designator Activity Designator 
Demographic 
Designator 
ML = Most Likely 
BC = Best Case 
WC = Worst Case 
(XXX) = Participant 
              Designator 
PERT =  
(BC +(4*ML)+WC)/6 
 
Te = ∑PERT 
 
VAR = (WC-BC)/6 
 
Conf= Confidence in 
            ML estimate 
  
287 
 
  Survey 1                       
    A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 
M1M ML (481) 2 5 18 8 5 2 9 5 BLNK 1 9 
M1B ML (408) 2 1 8 4 1 1 8 1 1 4 40 
                          
  BC (481) 2 4 9 8 4 1 8 4 BLNK 1 6 
  BC (408) 1 0.5 6 2 0.5 0.5 6 0.5 0.5 2 24 
                          
  WC (481) 4 9 27 12 9 4 18 9 BLNK 2 12 
  WC (408) 4 2 12 6 2 2 12 2 2 6 56 
                          
  PERT (481) 2.33 5.50 18.00 8.67 5.50 2.17 10.33 5.50 BLNK 1.17 9.00 
  PERT (408) 2.17 1.08 8.33 4.00 1.08 1.08 8.33 1.08 BLNK 4.00 40.00 
                          
  Te(481) 68.17                     
  Te(408) 71.17                     
                          
  VAR (481) 0.11 0.69 9.00 0.44 0.69 0.25 2.78 0.69 BLNK 0.03 1.00 
  Conf (481) 0.95 0.80 0.70 0.70 0.80 0.95 0.90 0.80 BLNK 0.70 0.90 
                          
  VAR (408) 0.25 0.06 1.00 0.44 0.06 0.06 1.00 0.06 0.06 0.44 28.44 
  Conf (408) 0.75 0.85 0.50 0.50 0.85 0.85 0.50 0.85 0.85 0.85 0.60 
 
 
 
 
 
 
 
  
288 
 
  Survey 2                                     
    A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 A17 A18 
                                        
T2H ML (158) 0.25 0.25 0.75 0.25 1 0.5 2 0.25 1 0.5 3 6 0.25 2 0.5 0.6 1 0.5 
M1M ML (164) 2 0.4 1.5 0.5 0.5 0.5 1 0.25 0.5 0.25 2 2 1 1 0.25 0.25 0.5 0.5 
                                        
  BC (158) 0.17 0.17 0.5 0.17 1 0.33 1.5 0.17 0.75 0.33 3 5 0.25 2 0.33 0.33 0.83 0.33 
  BC (164) 1.5 0.3 1.25 0.3 0.3 0.3 0.75 0.2 0.3 0.2 1.5 1.5 0.75 0.75 0.2 0.2 0.4 0.4 
                                        
  WC (158) 0.5 0.5 1 0.5 1.5 0.75 2.5 0.5 1.5 0.75 4 8 0.5 2.5 0.75 0.75 1.5 0.75 
  WC (164) 3 0.75 2.5 0.75 0.75 0.75 2 0.75 0.7 0.5 3 3 1.5 1.5 0.5 0.5 1 1 
                                        
  PERT (158) 0.28 0.28 0.75 0.28 1.08 0.51 2 0.28 1.04 0.51 3.17 6.17 0.29 2.08 0.51 0.58 1.06 0.51 
  PERT (164) 2.08 0.44 1.63 0.51 0.51 0.51 1.13 0.33 0.5 0.28 2.08 2.08 1.04 1.04 0.28 0.28 0.57 0.57 
                                        
  Te(158) 21.4                                   
  Te(164) 15.9                                   
                                        
  VAR (158) 0 0 0.01 0 0.01 0 0.03 0 0.02 0 0.03 0.25 0 0.01 0 0 0.01 0 
  Conf (158) 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.9 0.95 0.95 0.95 0.95 0.95 0.95 
                                        
  VAR (164) 0.06 0.01 0.04 0.01 0.01 0.01 0.04 0.01 0 0 0.06 0.06 0.02 0.02 0 0 0.01 0.01 
  Conf (164) 0.9 0.95 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.95 0.9 0.9 0.9 0.9 
                                        
 
 
 
 
 
 
  
289 
 
  Survey 3                     
    A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 
T2B ML (819) 3 2 2 1 1 1 1 2 0.25 2 
M1M ML (548) 2 3 2 2 1 2 1 1 4 2 
                        
  BC (819) 3 2 2 1 1 1 1 2 0.25 2 
  BC (548) 1 1 1 1 0.5 1 0.5 0.5 2 1 
                        
  WC (819) 6 4 4 3 3 3 3 4 0.5 4 
  WC (548) 4 5 4 4 2 4 2 2 5 4 
                        
  PERT (819) 3.5 2.33 2.33 1.33 1.33 1.33 1.33 2.33 0.29 2.33 
  PERT (548) 2.17 3 2.17 2.17 1.08 2.17 1.08 1.08 3.83 2.17 
                        
  Te (819) 18.5                   
  Te (548) 20.9                   
                        
  VAR( 819) 0.25 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0 0.11 
  CONF (819) BLNK ->                   
                        
  VAR (548) 0.25 0.44 0.25 0.25 0.06 0.25 0.06 0.06 0.25 0.25 
  CONF (548) 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 
 
 
 
 
 
 
 
  
290 
 
  Survey 4                                   
    A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 A17 
M1M ML (164) 6 1 1 16 2 2 2 2 2 3 1 1 1 1 2 1 1 
T2H ML (158) 4 1 1 1 8 4 6 1.5 4 8 1 6 4 2 8 2 2 
                                      
  BC (164) 4 0.5 0.5 12 1.5 1.5 1 1.5 1.5 2 0.5 0.5 0.5 0.5 1 0.5 0.5 
  BC (158) 4 0.5 0.75 0.75 6 3 4 1 2 7 1 4 3 1 6 1 2 
                                      
  WC (164) 8 2 2 27 2.5 3 3 3 3 4 2 2 2 2 3 2 2 
  WC (158) 8 2 2 8 16 8 8 3 8 16 4 8 5 4 16 4 4 
                                      
  PERT (164) 6 1.08 1.08 17.2 2 2.08 2 2.08 2.08 3 1.08 1.08 1.08 1.08 2 1.08 1.08 
  PERT (158) 4.67 1.08 1.13 2.13 9 4.5 6 1.67 4.33 9.17 1.5 6 4 2.17 9 2.17 2.33 
                                      
  Te (164) 47.1                                 
  Te (158) 70.8                                 
                                      
  VAR (164) 0.44 0.06 0.06 6.25 0.03 0.06 0.11 0.06 0.06 0.11 0.06 0.06 0.06 0.06 0.11 0.06 0.06 
  CONF (164) 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 0.6 
                                      
  VAR (158) 0.44 0.06 0.04 1.46 2.78 0.69 0.44 0.11 1 2.25 0.25 0.44 0.11 0.25 2.78 0.25 0.11 
  CONF (158) BLNK ->                                 
 
 
 
  
291 
 
 
  Survey 5           
       A1 A2 A3 A4 A5 
M1B ML (408) 12 6 6 6 6 
T1H ML (399) 8 16 4 4 8 
              
  BC (408) 8 4 4 4 4 
  BC (399) 4 4 2 2 4 
              
  WC (408) 29 12 12 12 12 
  WC (399) 16 24 8 8 16 
              
  PERT (408) 14.2 6.67 6.67 6.67 6.67 
  PERT (399) 8.67 15.3 4.33 4.33 8.67 
              
  Te (408) 40.8         
  Te (399) 41.3         
              
  VAR (408) 12.3 1.78 1.78 1.78 1.78 
  CONF (408) 0.5 0.8 0.8 0.8 0.7 
              
  VAR (399) 4 11.1 1 1 4 
  CONF (399) 0.5 0.75 0.7 0.5 BLNK 
 
 
 
 
 
 
 
  Survey 6         
    A1 A2 A3 A4 
T4H ML (424) 1 7 8 8 
M1M ML (548) 6 6 3 3 
            
  BC (424) 1 4 6 6 
  BC (548) 3 3 2 2 
            
  WC (424) 2 8 10 12 
  WC (548) 8 8 4 4 
            
  PERT (424) 1.17 6.67 8 8.33 
  PERT (548) 5.83 5.83 3 3 
            
  Te (424) 24.2       
  Te (548) 17.7       
            
  VAR (424) 0.03 0.44 0.44 1 
  CONF (424) BLNK ->       
            
  VAR (548) 0.69 0.69 0.11 0.11 
  CONF (548) 0.7 0.7 0.7 0.7 
  
292 
 
 
 Survey 7             
    A1 A2 A3 A4 A5 A6 
T4T ML (463) BLNK ->           
M2T ML (148) 8 1 8 8 10 10 
                
  BC (463) 4 2 4 4 4 4 
  BC (148) 6 0.5 6 6 8 8 
                
  WC (463) 9 9 9 13.5 9 13.5 
  WC (148) 9 1.5 9 9 12 12 
                
  PERT (463) BLNK ->           
  PERT (148) 7.83 1 7.83 7.83 10 10 
                
  Te (463) BLNK           
  Te (148) 44.5           
                
  VAR (463) 0.69 1.36 0.69 2.51 0.69 2.51 
  CONF (463) BLNK->      
                
  VAR (148) 0.25 0.03 0.25 0.25 0.44 0.44 
  CONF (148) 0.9 0.9 0.85 0.85 0.85 0.85 
 
 
 
 
 
  
293 
 
 
  Survey 8                             
    A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 
T2T ML (441) 4 4 4 6 9 2 0.5 1 2 1 1 2 2 1 
T2B ML (712) 1.5 1 1.5 1 3 6 1.5 2 2.5 0.5 0.5 2 0.5 0.5 
                                
  BC (441) 2 1 2 4 6 1 0.5 0.5 1 0.5 0.5 1 1 0.5 
  BC (712) 0.5 0.5 1 0.5 2.5 4 0.5 1.5 2 0.5 0.5 1.5 0.5 0.5 
                                
  WC (441) 18 18 9 18 18 4 9 9 18 2 2 4 9 2 
  WC (712) 3 1.5 4 1.5 4 9 4 3 4 1 1 3 2 1 
                                
  PERT (441) 6 5.83 4.5 7.67 10 2.17 1.92 2.25 4.5 1.08 1.08 2.17 3 1.08 
  PERT (712) 1.58 1 1.83 1 3.08 6.17 1.75 2.08 2.67 0.58 0.58 2.08 0.75 0.58 
                                
  Te (441) 53.3                           
  Te (712) 25.8                           
                                
  VAR (441) 7.11 8.03 1.36 5.44 4 0.25 2.01 2.01 8.03 0.06 0.06 0.25 1.78 0.06 
  CONF (441) 0.8 0.8 0.8 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.85 0.85 0.95 
                                
  VAR( 712) 0.17 0.03 0.25 0.03 0.06 0.69 0.34 0.06 0.11 0.01 0.01 0.06 0.06 0.01 
  CONF (712) BLNK ->                           
 
  
  
294 
 
  Survey 9                   
    A1 A2 A3 A4 A5 A6 A7 A8 A9 
T1B ML (912) 2 2 4 8 6 1 4 8 2 
T4H ML (619) 1 1 1 0.5 2 0.17 2.5 6 1 
T2T ML (661) 2 2 18 27 8 BLNK 9 54 2 
                      
  BC (912) 1 1 1 2 4 0.5 3 6 1 
  BC (619) 0.5 0.5 0.5 0.25 1.5 0.17 1 4 0.5 
  BC (661) 1 1 5 8 6 BLNK 9 27 1 
                      
  WC (912) 4 4 8 12 10 2 8 10 3 
  WC (619) 2 2 2 2 3 0.5 5 9 2 
  WC (661) 8 8 63 108 18 BLNK 27 90 8 
                      
  PERT (912) 2.17 2.17 4.17 7.67 6.33 BLNK 4.5 8 2 
  PERT (619) 1.08 1.08 1.08 0.71 2.08 BLNK 2.67 6.17 1.08 
  PERT (661) 2.83 2.83 23.3 37.3 9.33 BLNK 12 55.5 2.83 
                      
  Te (912) 37                 
  Te (619) 16                 
  Te (661) 146                 
                      
  VAR( 912) 0.25 0.25 1.36 2.78 1 0.06 0.69 0.44 0.11 
  CONF (912) 1 1 0.5 0.5 0.9 1 0.75 0.8 1 
                      
  VAR (619) 0.06 0.06 0.06 0.09 0.06 0 0.44 0.69 0.06 
  CONF (619) 1 1 0.9 0.75 1 1 0.75 0.75 1 
                      
  VAR (661) 1.36 1.36 93.4 278 4 BLNK 9 110 1.36 
  CONF (661) 0.75 0.75 0.6 0.85 0.9 BLNK 0.95 0.7 0.9 
  
295 
 
  Survey 10                     
    A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 
T1B ML (191) 4 16 8 4 4 16 8 4 4 32 
T3M ML (315) 20 30 20 7 10 30 20 7 20 60 
M1M ML (548) 20 10 5 7 5 10 5 7 5 15 
                        
  BC (191) 2 8 4 2 2 8 4 2 2 16 
  BC (315) 15 25 15 0.5 5 25 15 0.5 20 30 
  BC (548) 15 7 3 5 3 7 3 5 3 12 
                        
  WC (191) 12 48 24 12 12 48 24 12 12 96 
  WC (315) 40 50 50 15 15 40 50 15 50 180 
  WC (548) 30 20 7 10 10 20 7 10 7 20 
                        
  PERT (191) 5 20 10 5 5 20 10 5 5 40 
  PERT (315) 22.5 32.5 24.2 7.25 10 30.8 24.2 7.25 25 75 
  PERT (548) 20.8 11.2 5 7.17 5.5 11.2 5 7.17 5 15.3 
                        
  Te (191) 125                   
  Te (315) 259                   
  Te (548) 93.3                   
                        
  VAR (191) 2.78 44.4 11.1 2.78 2.78 44.4 11.1 2.78 2.78 178 
  CONF (191) 0.9 0.9 0.9 0.9 0.9 0.8 0.8 0.8 0.8 0.9 
                        
  VAR (315) 17.4 17.4 34 5.84 2.78 6.25 34 5.84 25 625 
  CONF (315) 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 
                        
  VAR (548) 6.25 4.69 0.44 0.69 1.36 4.69 0.44 0.69 0.44 1.78 
  CONF (548) 0.4 0.4 0.3 0.4 0.3 0.4 0.3 0.4 0.3 0.4 
  
296 
 
 Survey 11        
  A1 A2 A3 A4 A5 A6 A7 
T2T ML (441) 18 18 3 4 18 13.5 4 
T3T ML (396) 16 16 6 6 45 16 8 
M2T ML (148) 16 16 8 8 40 16 8 
         
 BC (441) 13.5 13.5 2 3 9 9 3 
 BC (396) 12 12 4 4 36 12 6 
 BC (148) 10 10 6 6 30 12 6 
         
 WC (441) 22.5 27 9 9 36 18 9 
 WC (396) 24 24 8 8 54 24 16 
 WC (148) 26 26 16 16 60 24 16 
         
 PERT (441) 18 18.8 3.83 4.67 19.5 13.5 4.67 
 PERT (396) 16.7 16.7 6 6 45 16.7 9 
 PERT (148) 16.7 16.7 9 9 41.7 16.7 9 
         
 Te (441) 82.9       
 Te (396) 116       
 Te (148) 119       
         
 VAR (441) 2.25 5.06 1.36 1 20.3 2.25 1 
 CONF(441) 0.75 0.75 0.8 0.8 0.8 BLNK BLNK 
         
 VAR (396) 4 4 0.44 0.44 9 4 2.78 
 CONF (396) 0.85 0.85 0.85 0.85 BLNK 0.75 0.75 
         
 VAR (148) 7.11 7.11 2.78 2.78 25 4 2.78 
 CONF (148) 0.75 0.75 0.85 0.85 0.66 0.75 0.75 
 Survey 12    
  A1 A2 A3 
T1H ML (399) 4 27 6 
M1M ML (481) 6 5 5 
M1B ML (408) 12 16 16 
     
 BC (399) 3 16 2 
 BC (481) 4 2 2 
 BC (408) 8 12 12 
     
 WC (399) 8 40 16 
 WC (481) 8 9 9 
 WC (408) 16 24 24 
     
 PERT (399) 4.5 27.3 7 
 PERT (481) 6 5.17 5.17 
 PERT (408) 12 16.7 16.7 
     
 Te (399) 38.8   
 Te (481) 16.3   
 Te (408) 45.3   
     
 VAR (399) 0.69 16 5.44 
 CONF (399) 0.7 0.5 0.5 
     
 VAR (481) 0.44 1.36 1.36 
 CONF (481) 0.95 0.5 0.5 
     
 VAR (408) 1.78 4 4 
 CONF(408) 0.8 0.85 0.85 
  
297 
 
 Survey 13                
    A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 
M2M ML (518) 4 4 2 4 8 4 8 2 1 4 2 4 6 6 12 
M1M ML (498) 16 16 0.5 1 BLNK 1 8 4 0.2 2 BLNK 4 3 8 8 
M4B ML (222) 13.5 2 7 3 13.5 5 3 2.5 2 3 2 1.5 0.75 1.5 BLNK 
M1M ML (481) 6 6 1 3 4 1 2 16 4 2 BLNK 6 2 2 6 
                                  
  BC (518) 2 3 1.5 2 6 3 4 1.5 0.5 2 1 2 4 4 8 
  BC (498) 3 8 0.2 0.25 BLNK 0.25 4 2 0.1 1.5 BLNK 3 2 4 4 
  BC (222) 9 1.5 4 2 7 2 2 1.5 1 1.5 1 0.75 0.42 0.75 BLNK 
  BC (481) 3 4 1 3 4 1 1 10 3 2 BLNK 5 1 1 6 
                                  
  WC (518) 6 5 4 6 10 5 10 4 1.5 5 3 5 8 8 16 
  WC (498) 40 24 2 4 BLNK 16 16 8 0.5 4 BLNK 8 4 12 12 
  WC (222) 27 3 9 5 18 7 4 4 2.5 5 2.5 3 1 2 BLNK 
  WC (481) 8 8 3 5 8 3 3 20 6 3 BLNK 8 4 3 10 
                                  
  PERT (518) 4 4 2.25 4 BLNK 4 7.67 2.25 1 3.83 BLNK 3.83 6 6 BLNK 
  PERT (498) 17.8 16 0.7 1.38 BLNK 3.38 8.67 4.33 0.23 2.25 BLNK 4.5 3 8 BLNK 
  PERT (222) 15 2.08 6.83 3.17 BLNK 4.83 3 2.58 1.92 3.08 BLNK 1.63 0.74 1.46 BLNK 
  PERT (481) 5.83 6 1.33 3.33 BLNK 1.33 2 15.7 4.17 2.17 BLNK 6.17 2.17 2 BLNK 
                                  
  Te (518) 48.8                             
  Te (498) 70.3                             
  Te (222) 46.3                             
  Te (481) 52.2                             
                                  
  VAR (518) 0.44 0.11 0.17 0.44 0.44 0.11 1 0.17 0.03 0.25 0.11 0.25 0.44 0.44 1.78 
  CONF (518) BLNK ->                             
  
298 
 
  Survey 13 (cont.)                             
    A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 
  VAR (498) 38 7.11 0.09 0.39 BLNK 6.89 4 1 0 0.17 BLNK 0.69 0.11 1.78 1.78 
  CONF (498) 0.95 0.9 0.95 0.5 BLNK 0.5 0.6 0.95 0.95 BLNK BLNK 0.95 0.95 0.6 0.7 
                                  
  VAR (222) 9 0.06 0.69 0.25 3.36 0.69 0.11 0.17 0.06 0.34 0.06 0.14 0.01 0.04 BLNK 
  CONF (222) 0.75 0.75 0.75 0.8 0.9 0.9 0.7 0.7 0.8 0.85 0.8 0.9 0.9 0.8 BLNK 
                                  
  VAR (481) 0.69 0.44 0.11 0.11 0.44 0.11 0.11 2.78 0.25 0.03 BLNK 0.25 0.25 0.11 0.44 
  CONF (481) 0.9 1 0.9 0.9 0.8 0.8 0.8 0.9 0.8 0.9 BLNK 0.9 0.9 0.8 0.8 
 
 
  Survey 14                              
    A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 
T2T ML (661) 2 1.5 8 0.5 5 8 40 20 50 110 20 20 30 50 50 
T4H ML (619) 0.5 0.5 1 1 2 1 2 4 2 BLNK BLNK 4 4 1 3 
                                 
  BC (661) 1.5 0.75 5 0.25 2 3 20 6 30 110 20 20 10 30 30 
  BC (619) 0.5 0.5 0.5 0.5 1 0 1 2 1 BLNK BLNK 3 2.5 1 2 
                                 
  WC (661) 8 3 12 1 8 24 70 30 100 450 50 40 60 100 100 
  WC (619) 1 1 2 2 4 10 4 10 3 BLNK BLNK 6 8 3 5 
                                 
  PERT (661) 2.92 1.63 8.17 0.54 5 9.83 41.7 19.3 55 BLNK BLNK 23.3 31.7 55 55 
  PERT (619) 0.58 0.58 1.08 1.08 2.17 2.33 2.17 4.67 2 BLNK BLNK 4.17 4.42 1.33 3.17 
                                 
  Te (661) 309                            
  Te (619) 29.8                            
                                 
  
299 
 
  Survey 14 (cont.)                            
    A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 
  VAR (661) 1.17 0.14 1.36 0.02 1 12.3 69.4 16 136 3211 25 11.1 69.4 136 136 
  CONF (661) 0.8 0.9 1 1 0.75 0.25 0.6 0.6 0.9 0.85 0.85 0.85 0.6 0.9 0.9 
                                 
  VAR (619) 0.01 0.01 0.06 0.06 0.25 2.78 0.25 1.78 0.11 BLNK BLNK 0.25 0.84 0.11 0.25 
  CONF (619) 0.9 0.9 0.75 0.8 0.8 0.3 0.8 0.9 1 BLNK BLNK 0.8 1 1 0.9 
 
 
  Survey 15                     
    A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 
M1M ML (548) 8 5 8 5 7 10 15 7 7 10 
T2T ML (493) 120 30 60 30 5 10 10 5 5 5 
T2B ML (203) 42 15 42 15 10 63 21 5 10 3 
M1B ML (408) 13 7 20 10 2 13 15 2 3 10 
M2B ML (838) 15 10 15 10 1 BLNK 5 1 BLNK BLNK 
                        
  BC (548) 5 2 5 2 5 7 10 5 5 7 
  BC (493) 90 20 30 15 3 6 5 3 3 3 
  BC (203) 21 10 21 10 5 42 15 3 5 1 
  BC (408) 10 5 15 7 1 10 10 1 1 5 
  BC (838) 10 5 10 5 0.5 BLNK 3 0.5 BLNK BLNK 
                       
  WC (548) 10 10 10 10 10 15 20 10 10 15 
  WC (493) 180 60 90 60 10 20 14 10 10 10 
  WC (203) 63 21 63 42 15 126 42 10 21 5 
  WC (408) 23 15 30 20 5 17 17 3 5 20 
  WC (838) 22.5 15 25 15 3 BLNK 10 3 BLNK BLNK 
                       
  
300 
 
  Survey 15 (cont.)                   
    A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 
  PERT (548) 7.83 5.33 7.83 5.33 7.17 BLNK 15 7.17 BLNK BLNK 
  PERT (493) 125 33.3 60 32.5 5.5 BLNK 9.83 5.5 BLNK BLNK 
  PERT (203) 42 15.2 42 18.7 10 BLNK 23.5 5.5 BLNK BLNK 
  PERT (408) 14.2 8 20.8 11.2 2.33 BLNK 14.5 2 BLNK BLNK 
  PERT (838) 15.4 10 15.8 10 1.25 BLNK 5.5 1.25 BLNK BLNK 
                        
  Te (548) 55.7                   
  Te (493) 272                   
  Te (203) 157                   
  Te (408) 73                   
  Te (838) 59.3                   
                        
  VAR (548) 0.69 1.78 0.69 1.78 0.69 1.78 2.78 0.69 0.69 1.78 
  CONF (548) 0.7 0.6 0.6 0.7 0.6 0.6 0.7 0.6 0.5 0.5 
                        
  VAR (493) 225 44.4 100 56.3 1.36 5.44 2.25 1.36 1.36 1.36 
  CONF (493) 0.5 0.7 0.5 0.5 0.8 0.8 0.6 0.8 0.8 0.8 
                        
  VAR (203) 49 3.36 49 28.4 2.78 196 20.3 1.36 7.11 0.44 
  CONF (203) 0.7 0.8 0.6 0.5 0.8 0.5 0.6 0.7 0.7 0.8 
            
  VAR (408) 4.69 2.78 6.25 4.69 0.44 1.36 1.36 0.11 0.44 6.25 
  CONF (408) 0.6 0.4 0.4 0.8 0.9 0.75 0.5 0.9 0.8 0.6 
                        
  VAR (838) 4.34 2.78 6.25 2.78 0.17 BLNK 1.36 0.17 BLNK BLNK 
  CONF (838) 0.75 0.9 0.65 0.9 0.85 BLNK 0.8 0.85 BLNK BLNK 
  
301 
 
  Survey 16             
    A1 A2 A3 A4 A5 A6 
M2M ML (518) 3 2 1 2 1 4 
T2B ML (819) 45 4 4 4 4 8 
M1M ML (498) 4 4 2 1 2 6 
M4B ML (222) 3.5 1.5 0.5 0.33 0.17 5 
M1M ML (164) 16 2 1 2 0.5 4.5 
M1M ML (481) 4 1 1 2 1 4 
                
  BC (518) 2 1 0.5 1 0.5 2 
  BC (819) 45 4 4 4 4 8 
  BC (498) 2 1 1 1 1 4 
  BC (222) 2.25 1 0.17 0.17 0.08 3 
  BC (164) 12 1.5 0.75 1.5 0.3 4 
  BC (481) 1 1 1 1 1 1 
                
  WC (518) 4 3 2 3 1.5 6 
  WC (819) 72 6 8 6 8 12 
  WC (498) 6 12 6 4 4 10 
  WC (222) 5 4 1 0.6 0.33 8 
  WC (164) 24 3 2 3 1 8 
  WC (481) 6 4 2 2 2 8 
                
  PERT (518) 3 2 1.08 2 1 4 
  PERT (819) 49.5 4.33 4.67 4.33 4.67 8.67 
  PERT (498) 4 4.83 2.5 1.5 2.17 6.33 
  PERT (222) 3.54 1.83 0.53 0.35 0.18 5.17 
  PERT (164) 16.7 2.08 1.13 2.08 0.55 5 
  PERT (481) 3.83 1.5 1.17 1.83 1.17 4.17 
        
  Survey 16 (cont)           
    A1 A2 A3 A4 A5 A6 
  Te (518) 13.1           
  Te (819) 76.2           
  Te (498) 21.3           
  Te (222) 11.6           
  Te (164) 27.5           
  Te (481) 13.7           
                
  VAR (518) 0.11 0.11 0.06 0.11 0.03 0.44 
  CONF (518) 
BLNK 
->           
                
  VAR (819) 20.3 0.11 0.44 0.11 0.44 0.44 
  CONF (819) 
BLNK 
->           
                
  VAR (498) 0.44 3.36 0.69 0.25 0.25 1 
  CONF (498) 0.8 0.05 0.1 0.05 0.5 0.95 
                
  VAR (222) 0.21 0.25 0.02 0.01 0 0.69 
  CONF (222) 0.85 0.9 0.8 0.9 0.9 0.9 
                
  VAR (164) 4 0.06 0.04 0.06 0.01 0.44 
  CONF (164) 0.9 0.9 0.85 0.9 0.95 0.95 
                
  VAR (481) 0.69 0.25 0.03 0.03 0.03 1.36 
  CONF (481) 0.8 0.9 0.9 0.5 0.5 0.9 
 
  
302 
 
 Survey 17        
    A1 A2 A3 A4 A5 A6 A7 
T4H ML (424) 2 2 2 2 4 2 6 
M1M ML (481) 5 5 5 5 5 1 2 
M1B ML (408) 1 1 1 1 BLNK 6 6 
                  
  BC (424) 1 1 1 1 2 1 4 
  BC (481) 4 4 4 4 4 1 1 
  BC (408) 0.5 0.5 0.5 0.5 BLNK 4 4 
                  
  WC (424) 4 4 4 4 8 4 6 
  WC (481) 9 9 9 9 10 2 4 
  WC (408) 2 2 2 2 BLNK 8 8 
                  
  PERT (424) 2.17 2.17 2.17 2.17 BLNK 2.17 5.67 
  PERT (481) 5.5 5.5 5.5 5.5 BLNK 1.17 2.17 
  PERT (408) 1.08 1.08 1.08 1.08 BLNK 6 6 
                  
  Te (424) 16.5             
  Te (481) 25.3             
  Te (408) 16.3             
                  
  VAR (424) 0.25 0.25 0.25 0.25 1 0.25 0.11 
  CONF (424) 1 1 1 1 1 1 1 
                  
  VAR (481) 0.69 0.69 0.69 0.69 1 0.03 0.25 
  CONF (481) 0.8 0.8 0.8 0.8 0.8 0.95 0.95 
                  
  VAR (408) 0.06 0.06 0.06 0.06 BLNK 0.44 0.44 
  CONF (408) 0.85 0.85 0.85 0.85 BLNK 0.75 0.75 
  
303 
 
  Survey 18               
    A1 A2 A3 A4 A5 A6 A7 
M2T ML (148) 1.5 1 2 1.5 2 80 27 
M1M ML (548) 1 0.5 2 3 2 2 2 
                  
  BC (148) 1 0.75 1 1 1 40 18 
  BC (548) 1 0.25 1 2 1 1 1 
                  
  WC (148) 4 4 3 3 4 80 40 
  WC (548) 3 1 3 4 3 4 4 
                  
  PERT (148) 1.83 1.46 2 1.67 2.17 73.3 27.7 
  PERT (548) 1.33 0.54 2 3 2 2.17 2.17 
                  
  Te (148) 110             
  Te (548) 13.2             
                  
  VAR (148) 0.25 0.29 0.11 0.11 0.25 44.4 13.4 
  CONF (148) 0.9 0.9 0.9 0.8 0.8 0.9 0.75 
                  
  VAR (548) 0.11 0.02 0.11 0.11 0.11 0.25 0.25 
  CONF (548) 0.7 0.7 0.7 0.7 0.7 0.7 0.7 
 
 
 
 
 
  
304 
 
  Survey 19                             
    A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 
T3T ML (396) BLNK ->                           
T4H ML (774) BLNK ->                           
M2T ML (148) 8 5 6 1 1 4 4 50 8 2 3 6 2 8 
M1T ML (858) 6 2 2 2 1.5 6 16 20 1.5 1 16 4 5 5 
M2T ML (157) BLNK ->                           
                                
  BC (396) 6 3 3 2 2 3 8 8 5 3 BLNK 2 2 3 
  BC (774) 8 4 4 2 2 4 10 48 8 8 8 4 4 4 
  BC (148) 6 4 5 0.5 0.75 3 2 49 6 1.5 2 5 1 7 
  BC (858) 4 1.5 1.5 1 1 4 12 16 1 0.5 12 2 4 4 
  BC (157) 4 2 1 2 0.5 1.5 4 2 1 2 1 0.7 1 1.5 
                                
  WC (396) 24 10 10 4 4 6 24 16 12 6 BLNK 8 8 9 
  WC (774) 14 8 8 4 4 6 16 BLNK 10 10 10 6 6 6 
  WC (148) 16 7 16 2 2 8 8 58 16 8 5 10 8 12 
  WC (858) 8 4 4 3 3 8 20 24 3 3 20 5 6 6 
  WC (157) 8 4 1.5 4 1 2 6 3 2 3 1.5 1 2 2 
                                
  PERT (396) BLNK ->                           
  PERT (774) BLNK ->                           
  PERT (148) 9 5.17 7.5 1.08 1.13 4.5 4.33 51.2 9 2.92 3.17 6.5 2.83 8.5 
  PERT (858) 6 2.25 2.25 2 1.67 6 16 20 1.67 1.25 16 3.83 5 5 
  PERT (157) BLNK ->                           
                                
  Te (396) BLNK                            
  Te (774) BLNK                           
  Te (148) 117                           
  Te (858) 88.9                           
  
305 
 
  Survey 19 (cont.)                           
    A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 
  Te (157) BLNK                           
                                
  VAR (396) 9 1.36 1.36 0.11 0.11 0.25 7.11 1.78 1.36 0.25 BLNK 1 1 1 
  CONF (396) BLNK->              
                                
  VAR (774) 1 0.44 0.44 0.11 0.11 0.11 1 BLNK 0.11 0.11 0.11 0.11 0.11 0.11 
  CONF (774) BLNK->              
                                
  VAR (148) 2.78 0.25 3.36 0.06 0.04 0.69 1 2.25 2.78 1.17 0.25 0.69 1.36 0.69 
  CONF (148) 0.7 0.7 0.7 0.9 0.9 0.7 0.5 0.7 0.7 0.7 0.9 0.9 0.66 BLNK 
                                
  VAR (858) 0.44 0.17 0.17 0.11 0.11 0.44 1.78 1.78 0.11 0.17 1.78 0.25 0.11 0.11 
  CONF (858) 0.7 0.5 0.5 0.75 0.75 0.75 0.75 0.5 0.5 0.25 0.5 0.75 0.75 0.75 
                                
  VAR (157) 0.44 0.11 0.01 0.11 0.01 0.01 0.11 0.03 0.03 0.03 0.01 0 0.03 0.01 
  CONF (157) BLNK->              
 
 
 
 
 
 
  
306 
 
  Survey 20                           
    A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 
T2B ML (203) 10 30 42 10 126 42 84 10 63 15 20 15 10 
T2T ML (493) 60 300 120 30 90 30 90 30 90 30 90 30 10 
M1M ML (969) 20 45 150 10 10 5 150 20 15 5 10 5 2 
                              
  BC (203) 5 20 30 5 105 21 63 5 42 10 10 5 5 
  BC (493) 30 180 60 10 30 14 45 14 30 14 60 20 5 
  BC (969) 15 25 100 5 5 2.5 100 10 10 2.5 5 2.5 1 
                              
  WC (203) 21 63 84 21 168 84 126 30 84 21 42 21 15 
  WC (493) 90 400 240 30 180 60 180 45 120 60 180 45 14 
  WC (969) 40 75 200 12.5 12.5 10 300 25 20 10 15 10 4 
                              
  PERT (203) 11 33.8 47 11 130 45.5 87.5 12.5 63 15.2 22 14.3 10 
  PERT (493) 60 297 130 26.7 95 32.3 97.5 29.8 85 32.3 100 30.8 9.83 
  PERT (969) 22.5 46.7 150 9.58 9.58 5.42 167 19.2 15 5.42 10 5.42 2.17 
                              
  Te (203) 502                         
  Te (493) 1026                         
  Te (969) 468                         
                              
  VAR (203) 7.11 51.4 81 7.11 110 110 110 17.4 49 3.36 28.4 7.11 2.78 
  CONF (203) 0.8 0.7 0.7 0.5 0.6 0.5 0.6 0.75 0.7 0.7 0.7 0.8 0.8 
                              
  VAR (493) 100 1344 900 11.1 625 58.8 506 26.7 225 58.8 400 17.4 2.25 
  CONF (493) 0.5 0.5 0.5 0.7 0.5 0.7 0.6 0.6 0.6 0.7 0.6 0.6 0.75 
                              
  VAR (969) 17.4 69.4 278 1.56 1.56 1.56 1111 6.25 2.78 1.56 2.78 1.56 0.25 
  CONF (969) 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 0.8 
  
307 
 
  Survey 21                             
    A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 
T2H ML (158) 0.5 0.5 0.25 0.75 2 0.25 1 0.75 2 0.25 2 4 0.5 1 
M1M ML (548) 1 0.25 2 0.25 0.25 0.25 2 0.5 0.25 0.75 1 1 2 2 
T4H ML (424) 1.5 .75 1.5 0.5 1 0.5 1.25 1 BLNK 1 0.5 3 2 3 
                                
  BC (158) 0.25 0.25 0.17 0.5 1.5 0.17 0.75 0.5 1.5 0.17 1.5 2.5 0.5 0.5 
  BC (548) 0.5 0.17 1.25 0.17 0.17 0.17 1.25 0.5 0.17 0.5 0.5 0.5 1.25 1.25 
  BC (424)  1.5 0.25 1 0.25 0.5  0.25  1  0.5  BLNK  0.5  0.25  2  0.5  2 
                
  WC (158) 1 1 0.5 1.5 3.5 0.5 1.5 1.5 3.5 0.5 3.5 7.5 1 2 
  WC (548) 2 1 3.5 0.5 0.5 0.5 3.5 1 0.5 1.25 1.5 1.5 3.25 3.25 
 WC (424) 2.5 2 3 1.5 1.5 1 3 1.25 BLNK 4 2 4 4 5 
                                
  PERT (158) 0.54 0.54 0.28 0.83 2.17 0.28 1.04 0.83 BLNK 0.28 2.17 4.33 0.58 1.08 
  PERT (548) 1.08 0.36 2.13 0.28 0.28 0.28 2.13 0.58 BLNK 0.79 1 1 2.08 2.08 
 PERT (424) 1.67 0.88 1.67 0.63 1.00 0.54 1.50 0.96  BLNK 1.42 0.71 3.00 2.08 3.17 
                                
  Te (158) 21.3                           
  Te (548) 16.1                           
 Te (424) 19.2              
                                
  VAR (158) 0.13 0.13 0.06 0.17 0.33 0.06 0.13 0.17 0.33 0.06 0.33 0.83 0.08 0.25 
  CONF (158) 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.95 0.9 0.95 0.95 
                                
  VAR (548) 1.5 0.83 2.25 0.33 0.33 0.33 2.25 0.5 0.33 0.75 1 1 2 2 
  CONF (548) 0.9 0.95 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.95 
                
 VAR(424) 0.17 0.29 0.33 0.21 0.17 0.13 0.33 0.13 BLNK 0.58 0.29 0.33 0.58 0.50 
 CONF(424) 1 1 1 1 1 1 1 1 BLNK 1 1 1 1 1 
  
308 
 
  Survey 22                               
    A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 
T2T ML (441) 3 3 3 5 8 1 0.25 2 1 2 2 1 1 2 1 
T2B ML (819) 2 0.5 2 0.5 2 5 0.75 1 1.5 0.75 1 1 0.75 0.75 2 
M2T ML (157) 0.5 2 1 0.5 5 5 1 1 2 1.5 2 0.75 1 2 3 
T4H ML (774) 1 1 2 3 3 2 0.74 1 1.4 0.4 2 0.4 2 1 3 
                                  
  BC (441) 2 2 2 3.25 5 0.5 0.17 1.25 0.5 1 1 0.5 0.5 1 0.5 
  BC (819) 1.25 0.5 1.25 0.17 1.5 3 0.5 0.5 1 0.5 0.5 0.5 0.5 0.5 1.25 
  BC (157) 0.17 1.25 0.5 0.25 3 3 0.5 0.5 1.25 1 1.25 0.5 0.5 1.25 2 
  BC(774) 0.5 0.5 1.25 2 2 1.5 0.5 0.5 1 0.17 1.5 0.25 1.25 0.5 2 
                                  
  WC (441) 5.5 5.5 5.5 9 15 2 0.5 4 2 4 4 2 2 4 2 
  WC (819) 4 1 4 1 4 9 2 2 3 2 2 2 2 2 4 
  WC (157) 1 3 2 1 8 8 1.5 1.5 3 3 3 2 2 3 5 
  WC (774) 2 2 4 5.5 5.5 4 2 2 3 1 4 1 4 2 5.5 
                                  
  PERT (441) 3.25 3.25 3.25 5.38 8.67 1.08 0.28 2.21 1.08 2.17 2.17 1.08 1.08 2.17 1.08 
  PERT (819) 2.21 0.58 2.21 0.53 2.25 5.33 0.92 1.08 1.67 0.92 1.08 1.08 0.92 0.92 2.21 
  PERT (157) 0.53 2.04 1.08 0.54 5.17 5.17 1 1 2.04 1.67 2.04 0.92 1.08 2.04 3.17 
  PERT (774) 1.08 1.08 2.21 3.25 3.25 2.25 0.91 1.08 1.6 0.46 2.25 0.48 2.21 1.08 3.25 
                                  
  Te (441) 38.2                             
  Te (819) 23.9                             
  Te (157) 29.5                             
  Te (774) 26.4                             
                                  
  VAR (441) 0.58 0.58 0.58 0.96 1.67 0.25 0.06 0.46 0.25 0.5 0.5 0.25 0.25 0.5 0.25 
  CONF (441) 0.8 0.8 0.8 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.85 0.85 0.95 0.85 
  
309 
 
  Survey 22 (cont.)                             
    A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 
  VAR (819) 0.46 0.08 0.46 0.14 0.42 1 0.25 0.25 0.33 0.25 0.25 0.25 0.25 0.25 0.46 
  CONF (819) 0.95 0.9 0.95 0.5 0.8 0.5 0.6 0.95 0.95 0.95 0.95 0.95 0.95 0.6 0.7 
                                  
  VAR (157) 0.14 0.29 0.25 0.13 0.83 0.83 0.17 0.17 0.29 0.33 0.29 0.25 0.25 0.29 0.5 
  CONF(157) 0.8 0.9 1 1 0.75 0.25 0.6 0.6 0.9 0.85 0.85 0.85 0.6 0.9 0.9 
                                  
  VAR (774) 0.25 0.25 0.46 0.58 0.58 0.42 0.25 0.25 0.33 0.14 0.42 0.13 0.46 0.25 0.58 
  CONF (774) 0.9 0.95 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.9 0.95 0.9 
 
 
 
 
 
 
 
 
 
 
 
 
  
310 
 
  Survey 23       
    A1 A2 A3 
T1H ML (399) 3 24 5 
M2B ML (838) 4 3 3 
M1M ML (498) 10 15 15 
          
  BC (399) 2 15.5 3 
  BC (838) 2 2 2 
  BC (498) 6 9 10 
          
  WC (399) 5.5 44 9 
  WC (838) 6.5 5 4.5 
  WC (498) 16 23.5 24 
          
  PERT (399) 3.25 25.9 5.33 
  PERT (838) 4.08 3.17 3.08 
  PERT (498) 10.3 15.4 15.7 
          
  Te (399) 34.5     
  Te (838) 10.3     
  Te (498) 41.4     
          
  VAR (399) 0.58 4.76 1 
  CONF (399) 0.7 0.5 0.5 
          
  VAR (838) 0.42 0.33 0.25 
  CONF (838) 0.95 0.5 0.5 
          
  VAR (498) 1 1.42 1.5 
  CONF (498) 0.8 0.85 0.85 
  Survey 24             
    A1 A2 A3 A4 A5 A6 
T3T ML (396) 4 2 4 4 4 4 
M2T ML (157) 7 2 7 7 9 9 
                
  BC (396) 2.5 1.25 2.5 2.5 2.5 2.5 
  BC (157) 4.5 1 5 4 5.5 5.5 
                
  WC (396) 11 3 11 11 14 14 
  WC (157) 13 4 13 13 16.5 16.5 
                
  PERT (396) 4.92 2.04 4.92 4.92 5.42 5.42 
  PERT (157) 7.58 2.17 7.67 7.5 9.67 9.67 
                
  Te (396) 27.6           
  Te (157) 44.3           
                
  VAR (396) 1.42 0.29 1.42 1.42 1.92 1.92 
  CONF (396) 1 0.9 0.9 0.9 0.9 0.9 
                
  VAR (157) 1.42 0.5 1.33 1.5 1.83 1.83 
  CONF (157) 0.9 0.9 0.85 0.85 0.85 0.85 
  
311 
 
  Survey 25                       
    A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 
T4H ML (798) 0.5 2 0.5 6 BLNK 27 3.5 2 4 27 BLNK 
  BC (798) 0.25 1 0.25 4 BLNK 18 1.5 1 2 18 BLNK 
  WC (798) 1 2.5 1 12 BLNK 36 8 4 9 36 BLNK 
                          
  PERT (798) 0.54 1.92 0.54 6.67 BLNK 27 3.92 2.17 4.5 27 BLNK 
                          
  Te (798) 74.3                     
                          
  VAR (798) 0.02 0.06 0.02 1.78 BLNK 9 1.17 0.25 1.36 9 BLNK 
  CONF (798) 0.9 0.9 0.9 0.9 BLNK 0.6 0.9 0.9 0.9 0.9 BLNK 
 
 
  Survey 26                             
    A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 
T4H ML (774) 8 4 4 2 2 4 BLNK BLNK BLNK 4 8 5 5 6 
  BC (774) 7 3 3 1 1 3 BLNK BLNK BLNK 3 7 4 4 5 
  WC (774) 14 8 8 4 4 6 BLNK BLNK BLNK 6 10 6 6 7 
                                
  PERT (774) 8.83 4.5 4.5 2.17 2.17 4.17 BLNK BLNK BLNK 4.17 8.17 5 5 6 
                                
  Te (774) 54.7                           
                                
  VAR (774) 1.36 0.69 0.69 0.25 0.25 0.25 BLNK BLNK BLNK 0.25 0.25 0.11 0.11 0.11 
  CONF (774) 0.85 1 1 1 0.8 1 BLNK BLNK BLNK 0.9 0.9 0.9 0.9 0.85 
 
  
  
312 
 
 
  Survey 27           
    A1 A2 A3 A4 A5 
T1B BC (739) 2 3 2.5 1 4 
  WC (739) 54 18 18 54 6 
              
  VAR (739) 75.1 6.25 6.67 78 0.11 
 
 
  Survey 28                               
    A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 
T2M ML (538) 1.6 4.3 BLNK 2.5 0.5 4 3 11.3 6.2 1.5 101 46.2 5 10 15 
  BC (538) 1 3 BLNK 2 0.4 3 2 9 3 1 63 21 2 5 10 
  WC (538) 2 6 BLNK 4 1 6 4 16 12 2 168 84 10 20 30 
                                  
  PERT (538) 1.57 4.37 BLNK 2.67 0.57 4.17 3 11.7 6.63 1.5 106 48.3 5.33 10.8 16.7 
                                  
  Te (538) 223                             
                                  
  VAR (538) 0.03 0.25 BLNK 0.11 0.01 0.25 0.11 1.36 2.25 0.03 306 110 1.78 6.25 11.1 
  CONF (538) 0.7 0.7 BLNK 0.7 0.7 0.7 0.7 0.7 0.5 0.5 0.5 0.5 0.5 0.5 0.5 
 
 
 
  
  
313 
 
 
  Survey 29                                 
    A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16 
T4H ML (619) 9 9 5 1 4 0.5 9 9 2 18 2.5 5.5 7 4 9 2.5 
  BC (619) 6 8 3 1 2 0.5 5 5 1 9 2 4 6 3 5 1.5 
  WC (619) 20 20 9 3 8 1 10 10 4 60 6 9 10 8 15 4 
                                    
  PERT (619) 10.3 10.7 5.33 1.33 4.33 0.58 8.5 8.5 2.17 23.5 3 5.83 7.33 4.5 9.33 2.58 
                                    
  Te (619) 108                               
                                    
  VAR (619) 5.44 4 1 0.11 1 0.01 0.69 0.69 0.25 72.3 0.44 0.69 0.44 0.69 2.78 0.17 
  CONF (619) 0.8 0.9 0.8 0.9 0.8 1 0.8 0.8 0.9 0.5 0.75 0.8 0.9 0.9 0.75 0.9 
 
 
  Survey 30                 
    A1 A2 A3 A4 A5 A6 A7 A8 
T4H ML (798) 0.5 2 0.5 4 BLNK 2 3.5 2 
  BC (798) 0.25 1 0.25 2 BLNK 1 1.5 1 
  WC (798) 1 2.5 1 6 BLNK 4 8 4 
                    
  PERT (798) 0.54 1.92 0.54 4 BLNK 2.17 3.92 2.17 
                    
  Te (798) 15.3               
                    
  VAR (798) 0.02 0.06 0.02 0.44 BLNK 0.25 1.17 0.25 
  CONF (798) 0.9 0.9 0.9 0.9 BLNK 0.9 0.9 0.9 
 
  
  
314 
 
  Survey 31                   
    A1 A2 A3 A4 A5 A6 A7 A8 A9 
M1B ML (408) 4 2 2 2 2 1 4 4 8 
  BC (408) 2 1 1 1 1 0.5 2 2 6 
  WC (408) 6 4 4 4 4 4 6 6 12 
                      
  PERT (408) 4 2.17 2.17 2.17 2.17 1.42 4 4 8.33 
                      
  Te (408) 30.4                 
                      
  VAR (408) 0.44 0.25 0.25 0.25 0.25 0.34 0.44 0.44 1 
  CONF (408) 0.9 0.5 0.5 0.5 0.5 0.75 0.75 0.75 0.5 
 
 
  Survey 32     
    A1 A2 
T4T ML (463) 4 2 
  BC (463) 3 2 
  WC (463) 7 4 
        
  PERT (463) 4.33 2.33 
        
  Te (463) 6.67   
        
  VAR (463) 0.44 0.11 
  CONF (463) 0.8 0.9 
 
  
  
315 
 
 
 Survey 33                     
    A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 
M1M ML (548) 2 2 10 2 2 2 4 8 4 3 
  BC (548) 1 1 3 1 1 1 2 4 2 1 
  WC (548) 4 4 18 4 4 4 5 10 5 4 
                        
  PERT (548) 2.17 2.17 10.2 2.17 2.17 2.17 3.83 7.67 3.83 2.83 
                        
  Te (548) 39.2                   
                        
  VAR (548) 0.25 0.25 6.25 0.25 0.25 0.25 0.25 1 0.25 0.25 
  CONF (548) 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 
 
  Survey 34                 
    A1 A2 A3 A4 A5 A6 A7 A8 
T2T ML (661) 4 8 18 2 6 4 BLNK 9 
  BC (661) 1 6 9 0.5 4 3 BLNK 6.75 
  WC (661) 8 18 36 4 18 8 BLNK 13.5 
                    
  PERT (661) 4.17 9.33 19.5 2.08 7.67 4.5 BLNK 9.38 
                    
  Te (661) 56.6               
                    
  VAR (661) 1.36 4 20.3 0.34 5.44 0.69 BLNK 1.27 
  CONF (661) 0.9 0.9 0.85 0.9 0.9 0.9 BLNK 0.9 
 
  
  
316 
 
 
  Survey 35                     
    A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 
T3M ML (315) 5 3 9 10 5 3 2 4 6 15 
  BC(315) 3 2 6 6 3 2 1 2.5 4 10 
  WC(315) 9 5.5 16.5 18 9 5.5 3.5 8 11 27.5 
                        
  PERT (315) 5.33 3.25 9.75 10.7 5.33 3.25 2.08 4.42 6.5 16.3 
                        
  Te (315) 66.8                   
                        
  VAR (315) 1 0.58 1.75 2 1 0.58 0.42 0.92 1.17 2.92 
  CONF (315) 0.95 0.9 0.8 0.7 0.95 0.9 0.9 0.8 0.75 0.9 
 
 
 
 
 
 
 
 
 
 
  
317 
 
A.10 GEV Max Beta Filters 1
𝐵𝐵(α, β) α β LoS 
423.037 3.866 5.925 0.01 
80.943 3.021 4.473 0.02 
31.982 2.558 3.676 0.03 
17.070 2.251 3.149 0.04 
10.727 2.028 2.767 0.05 
7.469 1.858 2.474 0.06 
5.576 1.723 2.242 0.07 
4.369 1.612 2.052 0.08 
3.546 1.519 1.892 0.09 
2.961 1.440 1.756 0.10 
2.527 1.372 1.638 0.11 
2.192 1.311 1.535 0.12 
1.929 1.258 1.443 0.13 
1.717 1.210 1.361 0.14 
1.542 1.167 1.286 0.15 
1.396 1.127 1.219 0.16 
1.274 1.092 1.157 0.17 
1.170 1.059 1.101 0.18 
1.078 1.028 1.048 0.19 
1.00 1.00 1.00 0.20 
0.931 0.974 0.955 0.21 
0.869 0.949 0.913 0.22 
0.815 0.926 0.874 0.23 
0.766 0.905 0.837 0.24 
0.721 0.885 0.802 0.25 
0.681 0.866 0.769 0.26 
0.645 0.848 0.738 0.27 
0.612 0.831 0.709 0.28 
0.580 0.814 0.681 0.29 
0.553 0.799 0.655 0.30 
0.526 0.784 0.629 0.31 
0.501 0.770 0.605 0.32 
0.480 0.757 0.583 0.33 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 1
𝐵𝐵(α,β) α β LoS 
0.458 0.744 0.561 0.34 
0.439 0.732 0.540 0.35 
0.421 0.721 0.520 0.36 
0.403 0.709 0.501 0.37 
0.387 0.699 0.482 0.38 
0.371 0.688 0.465 0.39 
0.357 0.679 0.448 0.40 
0.343 0.669 0.432 0.41 
0.330 0.660 0.416 0.42 
0.318 0.651 0.401 0.43 
0.306 0.643 0.386 0.44 
0.295 0.635 0.372 0.45 
0.284 0.627 0.359 0.46 
0.274 0.619 0.346 0.47 
0.264 0.612 0.333 0.48 
0.255 0.605 0.321 0.49 
0.245 0.598 0.309 0.50 
0.237 0.591 0.298 0.51 
0.229 0.585 0.287 0.52 
0.220 0.579 0.276 0.53 
0.213 0.573 0.266 0.54 
0.205 0.567 0.256 0.55 
0.198 0.561 0.246 0.56 
0.190 0.556 0.236 0.57 
0.184 0.550 0.227 0.58 
0.177 0.545 0.218 0.59 
0.171 0.540 0.210 0.60 
0.165 0.535 0.202 0.61 
0.158 0.531 0.193 0.62 
0.153 0.526 0.186 0.63 
0.147 0.521 0.178 0.64 
0.141 0.517 0.170 0.65 
0.136 0.513 0.163 0.66 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 1
𝐵𝐵(α, β) α β LoS 
0.131 0.509 0.156 0.67 
0.125 0.505 0.149 0.68 
0.121 0.501 0.143 0.69 
0.115 0.497 0.136 0.70 
0.111 0.493 0.130 0.71 
0.106 0.490 0.124 0.72 
0.101 0.486 0.118 0.73 
0.097 0.483 0.112 0.74 
0.092 0.480 0.106 0.75 
0.087 0.476 0.100 0.76 
0.083 0.473 0.095 0.77 
0.079 0.470 0.090 0.78 
0.075 0.467 0.085 0.79 
0.071 0.464 0.080 0.80 
0.067 0.461 0.075 0.81 
0.063 0.459 0.070 0.82 
0.059 0.456 0.065 0.83 
0.056 0.453 0.061 0.84 
0.051 0.451 0.056 0.85 
0.048 0.448 0.052 0.86 
0.044 0.446 0.048 0.87 
0.040 0.443 0.043 0.88 
0.037 0.441 0.039 0.89 
0.033 0.439 0.035 0.90 
0.029 0.436 0.031 0.91 
0.027 0.434 0.028 0.92 
0.023 0.432 0.024 0.93 
0.019 0.430 0.020 0.94 
0.017 0.428 0.017 0.95 
0.013 0.426 0.013 0.96 
0.010 0.424 0.010 0.97 
0.006 0.422 0.006 0.98 
0.003 0.420 0.003 0.99 
 
 
 
 
 
 
  
  
318 
 
 
 
A.11 GEV Min Beta Filters 1
𝐵𝐵(α, β) α β LoS 
423.037 5.925 3.866 0.01 
80.943 4.473 3.021 0.02 
31.982 3.676 2.558 0.03 
17.070 3.149 2.251 0.04 
10.727 2.767 2.028 0.05 
7.469 2.474 1.858 0.06 
5.576 2.242 1.723 0.07 
4.369 2.052 1.612 0.08 
3.546 1.892 1.519 0.09 
2.961 1.756 1.440 0.10 
2.527 1.638 1.372 0.11 
2.192 1.535 1.311 0.12 
1.929 1.443 1.258 0.13 
1.717 1.361 1.210 0.14 
1.542 1.286 1.167 0.15 
1.396 1.219 1.127 0.16 
1.274 1.157 1.092 0.17 
1.170 1.101 1.059 0.18 
1.078 1.048 1.028 0.19 
1.00 1.00 1.00 0.20 
0.931 0.955 0.974 0.21 
0.869 0.913 0.949 0.22 
0.815 0.874 0.926 0.23 
0.766 0.837 0.905 0.24 
0.721 0.802 0.885 0.25 
0.681 0.769 0.866 0.26 
0.645 0.738 0.848 0.27 
0.612 0.709 0.831 0.28 
0.580 0.681 0.814 0.29 
0.553 0.655 0.799 0.30 
0.526 0.629 0.784 0.31 
0.501 0.605 0.770 0.32 
0.480 0.583 0.757 0.33 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 1
𝐵𝐵(α, β) α β LoS 
0.458 0.561 0.744 0.34 
0.439 0.540 0.732 0.35 
0.421 0.520 0.721 0.36 
0.403 0.501 0.709 0.37 
0.387 0.482 0.699 0.38 
0.371 0.465 0.688 0.39 
0.357 0.448 0.679 0.40 
0.343 0.432 0.669 0.41 
0.330 0.416 0.660 0.42 
0.318 0.401 0.651 0.43 
0.306 0.386 0.643 0.44 
0.295 0.372 0.635 0.45 
0.284 0.359 0.627 0.46 
0.274 0.346 0.619 0.47 
0.264 0.333 0.612 0.48 
0.255 0.321 0.605 0.49 
0.245 0.309 0.598 0.50 
0.237 0.298 0.591 0.51 
0.229 0.287 0.585 0.52 
0.220 0.276 0.579 0.53 
0.213 0.266 0.573 0.54 
0.205 0.256 0.567 0.55 
0.198 0.246 0.561 0.56 
0.190 0.236 0.556 0.57 
0.184 0.227 0.550 0.58 
0.177 0.218 0.545 0.59 
0.171 0.210 0.540 0.60 
0.165 0.202 0.535 0.61 
0.158 0.193 0.531 0.62 
0.153 0.186 0.526 0.63 
0.147 0.178 0.521 0.64 
0.141 0.170 0.517 0.65 
0.136 0.163 0.513 0.66 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 1
𝐵𝐵(α, β) Α β LoS 
0.131 0.156 0.509 0.67 
0.125 0.149 0.505 0.68 
0.121 0.143 0.501 0.69 
0.115 0.136 0.497 0.70 
0.111 0.130 0.493 0.71 
0.106 0.124 0.490 0.72 
0.101 0.118 0.486 0.73 
0.097 0.112 0.483 0.74 
0.092 0.106 0.480 0.75 
0.087 0.100 0.476 0.76 
0.083 0.095 0.473 0.77 
0.079 0.090 0.470 0.78 
0.075 0.085 0.467 0.79 
0.071 0.080 0.464 0.80 
0.067 0.075 0.461 0.81 
0.063 0.070 0.459 0.82 
0.059 0.065 0.456 0.83 
0.056 0.061 0.453 0.84 
0.051 0.056 0.451 0.85 
0.048 0.052 0.448 0.86 
0.044 0.048 0.446 0.87 
0.040 0.043 0.443 0.88 
0.037 0.039 0.441 0.89 
0.033 0.035 0.439 0.90 
0.029 0.031 0.436 0.91 
0.027 0.028 0.434 0.92 
0.023 0.024 0.432 0.93 
0.019 0.020 0.430 0.94 
0.017 0.017 0.428 0.95 
0.013 0.013 0.426 0.96 
0.010 0.010 0.424 0.97 
0.006 0.006 0.422 0.98 
0.003 0.003 0.420 0.99 
 
 
 
 
 
 
 
  
  
319 
 
A.12 Normal Beta Filters 1
𝐵𝐵(α, β) α Β LoS 
61.960 3.467 3.467 0.01 
24.305 2.866 2.866 0.02 
14.047 2.521 2.521 0.03 
9.500 2.279 2.279 0.04 
7.001 2.093 2.093 0.05 
5.455 1.943 1.943 0.06 
4.418 1.818 1.818 0.07 
3.674 1.710 1.710 0.08 
3.117 1.615 1.615 0.09 
2.695 1.532 1.532 0.10 
2.355 1.456 1.456 0.11 
2.084 1.388 1.388 0.12 
1.862 1.326 1.326 0.13 
1.676 1.269 1.269 0.14 
1.518 1.216 1.216 0.15 
1.384 1.167 1.167 0.16 
1.268 1.121 1.121 0.17 
1.166 1.078 1.078 0.18 
1.078 1.038 1.038 0.19 
1.00 1.00 1.00 0.20 
0.930 0.964 0.964 0.21 
0.868 0.930 0.930 0.22 
0.812 0.898 0.898 0.23 
0.761 0.867 0.867 0.24 
0.716 0.838 0.838 0.25 
0.674 0.810 0.810 0.26 
0.635 0.783 0.783 0.27 
0.599 0.757 0.757 0.28 
0.568 0.733 0.733 0.29 
0.538 0.709 0.709 0.30 
0.511 0.687 0.687 0.31 
0.485 0.665 0.665 0.32 
0.461 0.644 0.644 0.33 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 1
𝐵𝐵(α, β) α β LoS 
0.439 0.624 0.624 0.34 
0.418 0.604 0.604 0.35 
0.399 0.585 0.585 0.36 
0.381 0.567 0.567 0.37 
0.363 0.549 0.549 0.38 
0.347 0.532 0.532 0.39 
0.333 0.516 0.516 0.40 
0.317 0.499 0.499 0.41 
0.304 0.484 0.484 0.42 
0.292 0.469 0.469 0.43 
0.279 0.454 0.454 0.44 
0.268 0.440 0.440 0.45 
0.257 0.426 0.426 0.46 
0.246 0.412 0.412 0.47 
0.236 0.399 0.399 0.48 
0.227 0.387 0.387 0.49 
0.217 0.374 0.374 0.50 
0.209 0.362 0.362 0.51 
0.200 0.350 0.350 0.52 
0.193 0.339 0.339 0.53 
0.184 0.327 0.327 0.54 
0.177 0.316 0.316 0.55 
0.170 0.306 0.306 0.56 
0.163 0.295 0.295 0.57 
0.157 0.285 0.285 0.58 
0.150 0.275 0.275 0.59 
0.144 0.265 0.265 0.60 
0.139 0.256 0.256 0.61 
0.132 0.246 0.246 0.62 
0.127 0.237 0.237 0.63 
0.122 0.228 0.228 0.64 
0.117 0.220 0.220 0.65 
0.112 0.211 0.211 0.66 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 1
𝐵𝐵(α, β) α Β LoS 
0.107 0.203 0.203 0.67 
0.102 0.195 0.195 0.68 
0.098 0.187 0.187 0.69 
0.093 0.179 0.179 0.70 
0.089 0.171 0.171 0.71 
0.084 0.163 0.163 0.72 
0.081 0.156 0.156 0.73 
0.077 0.149 0.149 0.74 
0.073 0.142 0.142 0.75 
0.069 0.135 0.135 0.76 
0.065 0.128 0.128 0.77 
0.062 0.121 0.121 0.78 
0.059 0.115 0.115 0.79 
0.055 0.108 0.108 0.80 
0.052 0.102 0.102 0.81 
0.049 0.096 0.096 0.82 
0.045 0.089 0.089 0.83 
0.042 0.083 0.083 0.84 
0.039 0.077 0.077 0.85 
0.036 0.072 0.072 0.86 
0.033 0.066 0.066 0.87 
0.030 0.060 0.060 0.88 
0.028 0.055 0.055 0.89 
0.025 0.049 0.049 0.90 
0.022 0.044 0.044 0.91 
0.020 0.039 0.039 0.92 
0.017 0.034 0.034 0.93 
0.015 0.029 0.029 0.94 
0.012 0.024 0.024 0.95 
0.010 0.019 0.019 0.96 
0.007 0.014 0.014 0.97 
0.005 0.009 0.009 0.98 
0.003 0.005 0.005 0.99 
 
 
  
320 
 
A.13 DesignExpert™ Experiment Settings 
The tables below show the configurations used to set up the experiment runs in the 
DesignExpert™ software.  After selecting the “Optimal (Custom)” analysis option 
and setting the number of factors and their levels, the information in the tables below 
can be used to configure the experiment as was done in this research.  In these tables, 
delta represents the smallest change detected by the software, sigma is the standard 
deviation among the collected weights, and power is a measure of the probability of 
successfully detecting whether or not an effect is significant.  Recommended power is 
80% 
 
Note that the Power levels shown for constraints may not match the values provided 
here below and may change based on the final samples used in the design matrix.  
These were the values the program calculated when the experiment was completed 
for this research. 
 
When populating the run-sheet, the runs will need to be adjusted to match the data 
actually collected in this research.  The runs suggested by DesignExpert™ are based 
on the D-optimality criteria and do not match the demographics of the subjects who 
provided information.  The ANOVA completed on the data is based on the run-sheet 
(i.e. the actual data collected from the subjects).  
 
Project Constraint Analysis – by Demographic  
Design Parameter Selected Setting 
Effects Analyzed Main Effects  
   A: Position 
   B: Years of Experience 
   C: Level of Formal Education 
Interaction 
   AB: Management|Years of Experience 
Exchange: Coordinate 
Optimality D 
Blocks 1 
Model Points 11 
Additional Model Points 2 
Lack-of-Fit points 6 
Replicate Points 17    
 
Constraint Delta Sigma Delta/Sigma Power A Power B Power C 
Cost 0.27 0.15 1.80 99.9% 83.1% 81.8% 
Schedule 0.29 0.16 1.81 99.9% 83.6% 82.4% 
Quality 0.19 0.1 1.90 99.9% 86.8% 85.7% 
Risk 0.24 0.13 1.85 99.9% 84.9% 83.7% 
 
Table A-1: DOE Experiment Set-up – Project Constraints 
  
321 
 
Risk Aversion 
Design Parameter Selected Setting 
Effects Analyzed Main Effects  
   A: Position 
   B: Years of Experience 
   C: Level of Formal Education 
Interaction 
   AB: Management|Years of Experience 
Exchange: Coordinate 
Optimality D 
Blocks 1 
Model Points 11 
Additional Model Points 3 
Lack-of-Fit points 6 
Replicate Points 18    
 
Constraint Delta Sigma Delta/Sigma Power A Power B Power C 
Utility 2150 1280 1.680 99.9% 81.3% 83.9% 
 
Table A-2: DOE Experiment Set-up –Risk Aversion 
 
 
 
Confidence 
Design Parameter Selected Setting 
Effects Analyzed Main Effects  
   A: Position 
   B: Years of Experience 
   C: Level of Formal Education 
Exchange: Coordinate 
Optimality D 
Blocks 1 
Model Points 8 
Additional Model Points 2 
Lack-of-Fit points 5 
Replicate Points 11 
 
Constraint Delta Sigma Delta/Sigma Power A Power B Power C 
Confidence 0.23 0.115 2 99.9% 81.8% 81.8% 
 
 
Table A-3: DOE Experiment Set-up – Confidence Analysis 
 
  
322 
 
Skew Analysis 
Design Parameter Selected Setting 
Effects Analyzed Main Effects  
   A: Position 
   B: Years of Experience 
   C: Level of Formal Education 
Exchange: Coordinate 
Optimality D 
Blocks 1 
Model Points 8 
Additional Model Points 2 
Lack-of-Fit points 7 
Replicate Points 12 
 
Constraint Delta Sigma Delta/Sigma Power A Power B Power C (𝑀𝑀𝑀𝑀−𝐵𝐵𝐵𝐵)(𝑀𝑀𝑀𝑀−𝐵𝐵𝐵𝐵)+(𝑊𝑊𝐵𝐵−𝑀𝑀𝑀𝑀)  0.19 0.105 1.80952 99.9% 83.2% 83.2% 
 
Table A-4: DOE Experiment Set-up – Duration Estimate Skew 
 
 
Outlying Estimate Analysis 
Design Parameter Selected Setting 
Effects Analyzed Main Effects  
   A: Position 
   B: Years of Experience 
   C: Level of Formal Education 
Exchange: Coordinate 
Optimality D 
Blocks 1 
Model Points 8 
Additional Model Points 3 
Lack-of-Fit points 6 
Replicate Points 12 
 
Constraint Delta Sigma Delta/Sigma Power A Power B Power C 
BC/(ML+BC) 0.07 0.0396 1.7677 99.8% 80.8% 80.8% 
WC/(ML+WC) 0.18 0.0985 1.8274 99.9% 82.5% 82.5% 
 
Table A-5: DOE Experiment Set-up – Outlying Estimate Analysis 
  
  
323 
 
Bibliography 
“About GAO.” 2015. Accessed February 17. http://www.gao.gov/about/index.html. 
Alpert, Marc, and Howard Raiffa. 1982. “A Progress Report on the Training of 
Probability Assessors.” In Judgment Under Uncertainty:  Heuristics and 
Biases. New York, NY: Cambridge University Press. 
Ariely, Dan. 2009a. Upside of Irrationality Unexpected Benefits of Defying Logic at 
Work & at Home. 1 edition. Harper. 
———. 2009b. Predictably Irrational, Revised and Expanded Edition: The Hidden 
Forces That Shape Our Decisions. 1 Exp Rev edition. HarperCollins e-books. 
Arkes, Hal R. 1985. “The Psychology of Sunk Cost.” The Psychology of Sunk Cost 35 
(1): 124–40. doi:10.1016/0749-5978(85)90049-4. 
Baecher, Gregory. 1999. “Expert Elicitation in Geotechnical Risk Assessments.” 
USACE Draft Report. College Park, MD: Department of Civil Engineering, 
University of Maryland. 
“Bayes’ Theorem.” 2017. Wikipedia. 
https://en.wikipedia.org/w/index.php?title=Bayes%27_theorem&oldid=76860
7734. 
Bennett, F., M. Lu, and S. AbouRizk. 2001. “Simplified CPM/PERT Simulation 
Model.” Journal of Construction Engineering and Management 127 (6): 513–
14. doi:10.1061/(ASCE)0733-9364(2001)127:6(513). 
Benson, P. George, Shawn P. Curley, and Gerald F. Smith. 1995. “Belief Assessment: 
An Underdeveloped Phase of Probability Elicitation.” Management Science 
41 (10): 1639–53. 
Berlin, Isaiah, Henry Hardy, and Michael Ignatieff. 2013. The Hedgehog and the 
Fox : An Essay on Tolstoy’s View of History. 2nd ed. Princeton: Princeton 
University Press. 
“Beta Distribution.” 2016. Wikipedia. 
https://en.wikipedia.org/w/index.php?title=Beta_distribution&oldid=7536186
37. 
“Beta Function.” 2016. Wikipedia. 
https://en.wikipedia.org/w/index.php?title=Beta_function&oldid=749020939. 
“Binomial Distribution.” 2016. Wikipedia. 
https://en.wikipedia.org/w/index.php?title=Binomial_distribution&oldid=753
619524. 
Bram, Uri. 2011. Thinking Statistically. 3 edition. Capara Books. 
Brenner, Lyle A., Derek J. Koehler, Varda Liberman, and Amos Tversky. 1996. 
“Overconfidence in Probability and Frequency Judgments: A Critical 
Examination.” Organizational Behavior and Human Decision Processes 65 
(3): 212–19. doi:10.1006/obhd.1996.0021. 
Budescu, David V., and Adrian K. Rantilla. 2000. “Confidence in Aggregation of 
Expert Opinions.” Acta Psychologica 104 (3): 371–98. doi:10.1016/S0001-
6918(00)00037-8. 
Buehler, Roger, Dale Griffin, and Michael Ross. 1994. “Exploring the ‘Planning 
Fallacy’: Why People Underestimate Their Task Completion Times.” Journal 
of Personality and Social Psychology 67 (3): 366–81. doi:10.1037/0022-
3514.67.3.366. 
  
324 
 
Chaloner, Kathryn M., and George T. Duncan. 1983. “Assessment of a Beta Prior 
Distribution: PM Elicitation.” Journal of the Royal Statistical Society. Series 
D (The Statistician) 32 (1/2): 174–80. doi:10.2307/2987609. 
Clark, Charles E. 1962. “The PERT Model for the Distribution of an Activity Time.” 
Operations Research 10 (3): 405–6. 
Clemen, Robert T. 1986. “Calibration and the Aggregation of Probabilities.” 
Management Science 32 (3): 312–14. 
———. 1987. “Combining Overlapping Information.” Management Science 33 (3): 
373–80. 
Davidson, Lynn B., and Dale O. Cooper. 1980. “Implementing Effective Risk 
Analysis at Getty Oil Company.” Interfaces 10 (6): 62–75. 
Dawes, Robyn M. 1979. “The Robust Beauty of Improper Linear Models in Decision 
Making.” American Psychologist 34 (7): 571–82. doi:10.1037/0003-
066X.34.7.571. 
Dawes, Robyn M., and Bernard Corrigan. 1974. “Linear Models in Decision 
Making.” Psychological Bulletin 81 (2): 95–106. doi:10.1037/h0037613. 
DeGroot, Morris H., and Stephen E. Fienberg. 1983. “The Comparison and 
Evaluation of Forecasters.” Journal of the Royal Statistical Society. Series D 
(The Statistician) 32 (1/2): 12–22. doi:10.2307/2987588. 
DesignExpert (version 9.0.6.2). 2015. Stat-Ease, Inc. 
Einhorn, Hillel J. 1974. “Expert Judgment: Some Necessary Conditions and an 
Example.” Journal of Applied Psychology 59 (5): 562–71. 
doi:10.1037/h0037164. 
Einhorn, Hillel J., and Hogarth. 1978. “Confidence in Judgment: Persistence of the 
Illusion of Validity.” Confidence in Judgment: Persistence of the Illusion of 
Validity. 85 (5): 395–416. doi:10.1037/0033-295X.85.5.395. 
“Euler–Mascheroni Constant.” 2016. Wikipedia. 
https://en.wikipedia.org/w/index.php?title=Euler%E2%80%93Mascheroni_co
nstant&oldid=745226377. 
Farr, Michael. 2012. “PMP Examp Power Prep: Course Slides and Practice Exams.” 
CMF Solutions and ESI. 
French, S. 1986. “Calibration and the Expert Problem.” Management Science 32 (3): 
315–21. 
French, Simon. 1980. “Updating of Belief in the Light of Someone Else’s Opinion.” 
Journal of the Royal Statistical Society. Series A (General) 143 (1): 43–48. 
doi:10.2307/2981768. 
———. 1985. “Group Consensus Probability Distributions: A Critical Survey.” In 
Bayesian Statistics 2. New York, NY: Elsevier Science Publishes. 
“Gamma Distribution - Wikipedia.” 2016. Accessed May 16. 
https://en.wikipedia.org/wiki/Gamma_distribution. 
GAO. 1976. “Space: Acquisition and Utilization of Wind Tunnels by the National 
Aeronautics and Space Administration.” PSAD-76-133. Washington, D.C. 
http://www.gao.gov/products/PSAD-76-133. 
———. 1977a. “Space: NASA’s Resource Data Base and Techniques for Supporting, 
Planning, and Controlling Programs Need Improvement.” PSAD-77-78. 
Washington, D.C. http://www.gao.gov/products/PSAD-77-78. 
  
325 
 
———. 1977b. “Space: National Aeronautics and Space Administration Should 
Provide the Congress with More Information on the Pioneer Venus Project.” 
PSAD-77-65. Washington, D.C. http://www.gao.gov/products/PSAD-77-65. 
———. 1977c. “Space: Status and Issues Pertaining to the Proposed Development of 
the Space Telescope Project.” PSAD-77-98. Washington, D.C. 
http://www.gao.gov/products/PSAD-77-98. 
———. 1977d. “Space Transportation System: Past, Present, Future.” PSAD-77-113. 
Washington, D.C. http://www.gao.gov/products/PSAD-77-113. 
———. 1980a. “Space: A Look at NASA’s Aircraft Energy Efficiency Program.” 
PSAD-80-50. Washington, D.C. http://www.gao.gov/products/PSAD-80-50. 
———. 1980b. “Space: The Federal Weather Program Must Have Stronger Central 
Direction.” LCD-80-10. Washington, D.C. 
http://www.gao.gov/products/LCD-80-10. 
———. 1982. “Government Operations: GAO Position on Several Issues Pertaining 
to Air Force Consolidated Space Operations Center Development.” 
Fo/MASAD-82-45. Washington, D.C. 
http://www.gao.gov/products/MASAD-82-45. 
———. 1988a. “Space Exploration: NASA’s Deep Space Missions Are Experiencing 
Long Delays.” GAO/NSIAD-88-128BR. Washington, D.C. 
http://www.gao.gov/products/NSIAD-88-128BR. 
———. 1988b. “Space Station: NASA Efforts To Establish a Design-To-Life-Cycle 
Cost Process.” GAO/NSIAD-88-147. Washington, D.C. 
http://www.gao.gov/products/NSIAD-88-147. 
———. 1989. “Weather Satellites: Cost Growth and Development Delays Jeopardize 
U.S. Forecasting Ability.” GAO/NSIAD-89-169. Washington, D.C. 
http://www.gao.gov/products/NSIAD-89-169. 
———. 1991a. “Space Station: NASA’s Search for Design, Cost, and Schedule 
Stability Continues.” GAO/NSIAD-91-125. Washington, D.C. 
http://www.gao.gov/products/NSIAD-91-125. 
———. 1991b. “Weather Satellites: Action Needed to Resolve Status of the U.S. 
Geostationary Satellite Program.” GAO/NSIAD-91-252. Washington, D.C. 
http://www.gao.gov/products/NSIAD-91-252. 
———. 1991c. “Weather Satellites: The U.S. Geostationary Satellite Program Is at a 
Crossroad.” GAO/T-NSIAD-91-49. Washington, D.C. 
http://www.gao.gov/products/T-NSIAD-91-49. 
———. 1992a. “Space: NASA’s Development of EOSDIS.” GAO/IMTEC-92-42R. 
Washington, D.C. http://www.gao.gov/products/IMTEC-92-42R. 
———. 1992b. “Weather Forecasting: Cost Growth and Delays in Billion-Dollar 
Weather Service Modernization.” GAO/IMTEC-92-12FS. Washington, D.C. 
http://www.gao.gov/products/IMTEC-92-12FS. 
———. 1993a. “NASA Program Costs: Space Missions Require Substantially More 
Funding Than Initially Estimated.” GAO/NSIAD-93-97. Washington, D.C. 
http://www.gao.gov/products/NSIAD-93-97. 
———. 1993b. “Space Station: Program Instability and Cost Growth Continue 
Pending Redesign.” GAO/NSIAD-93-187. Washington, D.C. 
http://www.gao.gov/products/NSIAD-93-187. 
  
326 
 
———. 1994a. “NASA: Major Challenges for Management.” GAO/T-NSIAD-94-18. 
Washington, D.C. http://www.gao.gov/products/T-NSIAD-94-18. 
———. 1994b. “Space Shuttle: NASA’s Plans for Repairing or Replacing a 
Damaged or Destroyed Orbiter.” GAO/NSIAD-94-197. Washington, D.C. 
http://www.gao.gov/products/NSIAD-94-197. 
———. 1997. “NASA: Major Management Challenges.” GAO/T-NSIAD-97-178. 
Washington, D.C. http://www.gao.gov/products/T-NSIAD-97-178. 
———. 1998. “Space Surveillance: DOD and NASA Need Consolidated 
Requirements and a Coordinated Plan.” GAO/NSIAD-98-42. Washington, 
D.C. http://www.gao.gov/products/NSIAD-98-42. 
———. 2001. “Space Station: Inadequate Planning and Design Led to Propulsion 
Module Project Failure.” GAO-01-633. Washington, D.C. 
http://www.gao.gov/products/GAO-01-633. 
———. 2002a. “Space Station: Actions Under Way to Manage Cost, but Significant 
Challenges Remain.” GAO-02-735. Washington, D.C. 
http://www.gao.gov/products/GAO-02-735. 
———. 2002b. “Space Transportation: Challenges Facing NASA’s Space Launch 
Initiative.” GAO-02-1020. Washington, DC. 
http://www.gao.gov/products/GAO-02-1020. 
———. 2003. “NASA: Major Management Challenges and Program Risks.” GAO-
03-849T. Washington, D.C. http://www.gao.gov/products/GAO-03-849T. 
———. 2004. “NASA: Lack of Disciplined Cost-Estimating Processes Hinders 
Effective Program Management.” GAO-04-642. Washington, D.C. 
http://www.gao.gov/products/GAO-04-642. 
———. 2006a. “NASA: Implementing a Knowledge-Based Acquisition Framework 
Could Lead to Better Investment Decisions and Project Outcomes.” GAO-06-
218. Washington, D.C. http://www.gao.gov/products/GAO-06-218. 
———. 2006b. “NASA: Sound Management and Oversight Key to Addressing Crew 
Exploration Vehicle Project Risks.” GAO-06-1127T. Washington, D.C. 
http://www.gao.gov/products/GAO-06-1127T. 
———. 2006c. “NASA’s James Webb Space Telescope: Knowledge-Based 
Acquisition Approach Key to Addressing Program Challenges.” GAO-06-
634. Washington, D.C. http://www.gao.gov/products/GAO-06-634. 
———. 2006d. “National Aeronautics and Space Administration: Long-Standing 
Financial Management Challenges Threaten the Agency’s Ability to Manage 
Its Programs.” GAO-06-216T. Washington, D.C. 
http://www.gao.gov/products/GAO-06-216T. 
———. 2006e. “Next Generation Air Transportation System: Preliminary Analysis 
of the Joint Planning and Development Office’s Planning, Progress, and 
Challenges.” GAO-06-574T. Washington, D.C. 
http://www.gao.gov/products/GAO-06-574T. 
———. 2006f. “Polar-Orbiting Operational Environmental Satellites: Cost Increases 
Trigger Review and Place Program’s Direction on Hold.” GAO-06-573T. 
Washington, D.C. http://www.gao.gov/products/GAO-06-573T. 
  
327 
 
———. 2007. “NASA: Challenges in Completing and Sustaining the International 
Space Station.” GAO-07-1121T. Washington, D.C. 
http://www.gao.gov/products/GAO-07-1121T. 
———. 2008. “NASA: Ares I and Orion Project Risks and Key Indicators to 
Measure Progress.” GAO-08-186T. Washington, D.C. 
http://www.gao.gov/products/GAO-08-186T. 
———. 2009a. “Geostationary Operational Environmental Satellites: Acquisition Is 
Under Way, but Improvements Needed in Management and Oversight.” 
GAO-09-323. Washington, D.C. http://www.gao.gov/products/GAO-09-323. 
———. 2009b. “NASA: Assessments of Selected Large-Scale Projects.” GAO-09-
306SP. Washington, D.C. http://www.gao.gov/products/GAO-09-306SP. 
———. 2009c. “NASA: Projects Need More Disciplined Oversight and Management 
to Address Key Challenges.” GAO-09-436T. Washington, D.C. 
http://www.gao.gov/products/GAO-09-436T. 
———. 2010. “NASA: Key Management and Program Challenges.” GAO-10-387T. 
Washington, D.C. http://www.gao.gov/products/GAO-10-387T. 
———. 2011. “NASA: Issues Implementing the NASA Authorization Act of 2010.” 
GAO-11-216T. Washington, D.C. http://www.gao.gov/products/GAO-11-
216T. 
———. 2012. “NASA: Assessments of Selected Large-Scale Projects.” GAO-12-
207SP. Washington, D.C. http://www.gao.gov/products/GAO-12-207SP. 
———. 2013. “James Webb Space Telescope: Actions Needed to Improve Cost 
Estimate and Oversight of Test and Integration.” GAO-13-4. Washington, 
D.C. http://www.gao.gov/products/GAO-13-4. 
———. 2014. “Space Launch System: Resources Need to Be Matched to 
Requirements to Decrease Risk and Support Long Term Affordability.” GAO-
14-631. Washington, D.C. http://www.gao.gov/products/GAO-14-631. 
———. 2017. “NASA Commercial Crew Program: Schedule Pressure Increases as 
Contractors Delay Key Events.” GAO-17-137, Washington DC. February 16. 
http://www.gao.gov/products/GAO-17-137. 
Gelman, Andrew, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and 
Donald B. Rubin. 2013. Bayesian Data Analysis, Third Edition. 3 edition. 
Chapman and Hall/CRC. 
“Generalized Extreme Value Distribution - Wikipedia.” 2016. Accessed August 23. 
https://en.wikipedia.org/wiki/Generalized_extreme_value_distribution. 
Genest, Christian, and Mark J. Schervish. 1985. “Modeling Expert Judgments for 
Bayesian Updating.” The Annals of Statistics 13 (3): 1198–1212. 
Goldratt, Eliyahu-. 1997. Critical Chain. The North River Press Publishing 
Corporation. http://www.amazon.com/Critical-Chain-Eliyahu-M-
Goldratt/dp/0884271536/ref=sr_1_1_twi_pap_1?ie=UTF8&qid=1458348235
&sr=8-1&keywords=Critical+Chain. 
Golenko-Ginzburg, Dimitri. 1988. “On the Distribution of Activity Time in PERT.” 
The Journal of the Operational Research Society 39 (8): 767–71. 
doi:10.2307/2583772. 
Gould, Frederick. 2005. Managing the Construction Process: Estimating, Scheduling, 
and Project Control. 3rd ed. Upper Saddle River, New Jersey: Pearson 
  
328 
 
Education In. https://www.amazon.com/Managing-Construction-Process-
Estimating-
Scheduling/dp/013113406X/ref=sr_1_1?ie=UTF8&qid=1491012734&sr=8-
1&keywords=Managing+the+Construction+Process+Third+Edition. 
Grisham, Thomas W. 2010. International Project Management : Leadership in 
Complex Environments. Hoboken, N.J. : Wiley,. 
Grubbs, Frank E. 1962. “Attempts to Validate Certain PERT Statistics or ‘Picking on 
PERT.’” Operations Research 10 (6): 912–15. 
Hammond, Kenneth R. 1996. Human Judgment and Social Policy : Irreducible 
Uncertainty, Inevitable Error, Unavoidable Injustice. New York : Oxford 
University Press,. 
Harrison, J. Michael. 1977. “Independence and Calibration in Decision Analysis.” 
Management Science 24 (3): 320–28. 
Heath, Chip, and Rich Gonzalez. 1995. “Interaction with Others Increases Decision 
Confidence but Not Decision Quality: Evidence against Information 
Collection Views of Interactive Decision Making.” Organizational Behavior 
and Human Decision Processes 61 (3): 305–26. doi:10.1006/obhd.1995.1024. 
Hogarth, Robin M. 1975. “Cognitive Processes and the Assessment of Subjective 
Probability Distributions.” Journal of the American Statistical Association 70 
(350): 271–89. doi:10.2307/2285808. 
Howard, Ron. 1995. Apollo 13. Adventure, Drama, History. 
Hubbard, Douglas W. 2009. The Failure of Risk Management: Why It’s Broken and 
How to Fix It. Hoboken, New Jersey: John Wiley & Sons, Inc. 
http://www.amazon.com/Failure-Risk-Management-Why-Broken-
ebook/dp/B0026LTMAU/ref=tmm_kin_swatch_0?_encoding=UTF8&qid=14
56584143&sr=8-1. 
———. 2010. How to Measure Anything: Finding the Value of Intangibles in 
Business. 2 edition. Hoboken, N.J: Wiley. 
Jeffreys, Harold. 1983. Theory of Probability. Oxford [Oxfordshire] : Clarendon 
Press ; 
Jenner, Lynn. 2015. “Sounding Rockets Overview.” Text. NASA. March 6. 
http://www.nasa.gov/mission_pages/sounding-rockets/missions/index.html. 
Johnson, D. 1998. “The Robustness of Mean and Variance Approximations in Risk 
Analysis.” The Journal of the Operational Research Society 49 (3): 253–62. 
doi:10.2307/3010474. 
———. 2002a. “Triangular Approximations for Continuous Random Variables in 
Risk Analysis.” The Journal of the Operational Research Society 53 (4): 457–
67. 
———. 2002b. “Triangular Approximations for Continuous Random Variables in 
Risk Analysis.” The Journal of the Operational Research Society 53 (4): 457–
67. 
Johnson, David. 1997. “The Triangular Distribution as a Proxy for the Beta 
Distribution in Risk Analysis.” Journal of the Royal Statistical Society. Series 
D (The Statistician) 46 (3): 387–98. 
Johnson, Timothy R., David V. Budescu, and Thomas S. Wallsten. 2001. “Averaging 
Probability Judgments: Monte Carlo Analyses of Asymptotic Diagnostic 
  
329 
 
Value.” Averaging Probability Judgments: Monte Carlo Analyses of 
Asymptotic Diagnostic Value 14 (2): 123–40. doi:10.1002/bdm.369. 
Kahneman, Daniel. 2011. Thinking, Fast and Slow. Reprint edition. Farrar, Straus and 
Giroux. 
Kahneman, Daniel, and Amos Tversky. 1979. “Prospect Theory: An Analysis of 
Decision under Risk.” Econometrica 47 (2): 263–91. doi:10.2307/1914185. 
Kane, Robert L. 1995. “Creating Practice Guidelines: The Dangers of Over-Reliance 
on Expert Judgment.” Journal of Law, Medicine and Ethics 23: 62. 
Keefer, Donald L., and Samuel E. Bodily. 1983. “Three-Point Approximations for 
Continuous Random Variables.” Management Science 29 (5): 595–609. 
Keefer, Donald L., and William A. Verdini. 1993. “Better Estimation of PERT 
Activity Time Parameters.” Management Science 39 (9): 1086–91. 
Kremer, Steven. 2013a. “Research Range Services 2013 Annual Report.” Annual 
Report. Wallops Flight Facility: NASA. 
http://www.nasa.gov/centers/wallops/home/#.U9wrXSiwXvc. 
———. 2013b. “Wallops Range User’s Handbook.” 840-HDBK-0003. Wallops 
Flight Facility: NASA. http://sites.wff.nasa. gov/multimedia/docs/wffruh.pdf. 
———. 2015. “Research Range Services 2015 Annual Report.” Wallops Flight 
Facility: NASA. 
———. 2017a. “Chapter 5 Comments,” January 3. 
———. 2017b. “Ch6 - RE: Research Project - Fighting down Panic :),” February 6. 
———. 2017c. “RE: Research Project-  Fighting down Panic :),” February 6. 
Lichtenstein, Sarah, Baruch Fischhoff, and Lawerence Phillips. 1977. “Calibration of 
Probabilities: The State of the Art.” In Decision Making and Change in 
Human Affairs. The Netherlands: D. Reidel Publishing Company. 
Lindley, D. V. 1982. “The Improvement of Probability Judgements.” Journal of the 
Royal Statistical Society. Series A (General) 145 (1): 117–26. 
doi:10.2307/2981425. 
Lindley, D. V., A. Tversky, and R. V. Brown. 1979. “On the Reconciliation of 
Probability Assessments.” Journal of the Royal Statistical Society. Series A 
(General) 142 (2): 146–80. doi:10.2307/2345078. 
Lindley, Dennis V. 1983. “Theory and Practice of Bayesian Statistics.” Journal of the 
Royal Statistical Society. Series D (The Statistician) 32 (1/2): 1–11. 
doi:10.2307/2987587. 
Malcolm, D. G., J. H. Roseboom, C. E. Clark, and W. Fazar. 1959. “Application of a 
Technique for Research and Development Program Evaluation.” Operations 
Research 7 (5): 646–69. 
Mamet. 2015. “David Mamet Quotes at BrainyQuote.com.” BrainyQuote. Accessed 
July 11. 
http://www.brainyquote.com/quotes/quotes/d/davidmamet478663.html. 
Mantel Jr., Samuel J, Jack R Meredith, Scott M. Shafer, and Margaret M Sutton. 
2004. Core Concepts, with CD: Project Management in Practice. 2 edition. 
Hoboken, NJ: Wiley. 
Marquand, Richard. 1983. Star Wars: Episode VI - Return of the Jedi. Action, 
Adventure, Fantasy. 
  
330 
 
Martin, Paul K. 2012. “NASA’s Challenges to Meeting Cost, Schedule, and 
Performance Goals.” Audit IG-12-021. NASA. 
http://oig.nasa.gov/audits/reports/FY12/IG-12-021.pdf. 
MATLAB (version 9.1.0.441655). 2016. Natick, MA: MathWorks, Inc. 
Meehl, Paul E. 1954. Clinical versus Statistical Prediction: A Theoretical Analysis 
and a Review of the Evidence. Minneapolis, MN: Jones Press, Inc. 
Megill, Robert. 1971. An Introduction to Risk Analysis. Petroleum Publishing 
Company. 
Microsoft. 2017. “Use a PERT Analysis to Estimate Task Durations - Project.” 
Accessed April 13. https://support.office.com/en-us/article/Use-a-PERT-
analysis-to-estimate-task-durations-864b5389-6ae2-40c6-aacc-0a6c6238e2eb. 
“MinStableDistribution—Wolfram Language Documentation.” 2017. Accessed 
March 7. 
https://reference.wolfram.com/language/ref/MinStableDistribution.html. 
Moder, Joseph J., and E. G. Rodgers. 1968. “Judgment Estimates of the Moments of 
Pert Type Distributions.” Management Science 15 (2): B76–83. 
Montgomery, Douglas C. 2008. Design and Analysis of Experiments. 7 edition. 
Hoboken, NJ: Wiley. 
Morris, Peter A. 1974. “Decision Analysis Expert Use.” Management Science 20 (9): 
1233–41. 
———. 1977. “Combining Expert Judgments: A Bayesian Approach.” Management 
Science 23 (7): 679–93. 
———. 1983. “An Axiomatic Approach to Expert Resolution.” Management Science 
29 (1): 24–32. 
———. 1986. “Observations on Expert Aggregation.” Management Science 32 (3): 
321–28. 
Mosleh, A., V. M. Bier, and G. Apostolakis. 1988. “A Critique of Current Practice for 
the Use of Expert Opinions in Probabilistic Risk Assessment.” Reliability 
Engineering & System Safety 20 (1): 63–85. doi:10.1016/0951-
8320(88)90006-3. 
Mumpower, Jeryl L.Stewart, Thomas R. 1996. “Expert Judgement and Expert 
Disagreement.” Thinking & Reasoning 2 (2/3): 191–212. 
doi:10.1080/135467896394500. 
Murphy, Allan H., and Robert L. Winkler. 1977. “Reliability of Subjective 
Probability Forecasts of Precipitation and Temperature.” Journal of the Royal 
Statistical Society. Series C (Applied Statistics) 26 (1): 41–47. 
doi:10.2307/2346866. 
NASA. 2014. “NASA Space Flight Program and Project Management Handbook.” 
NASA/SP-2014-3705. Washington, D.C. 
http://ntrs.nasa.gov/archive/nasa/casi.ntrs.nasa.gov/20150000400.pdf. 
———. 2015a. “NASA Space Flight Program and Project Management 
Requirements W/Change 1-13.” Accessed July 16. 
http://nodis3.gsfc.nasa.gov/npg_img/N_PR_7120_005E_/N_PR_7120_005E_
.pdf. 
———. 2015b. “NPR 7120.5C NASA Program and Project Management Processes 
and Requirements.” Accessed August 1. 
  
331 
 
http://nodis3.gsfc.nasa.gov/displayCA.cfm?Internal_ID=N_PR_7120_005C_
&page_name=main. 
“NASA Sounding Rockets Annual Report 2013.” 2013. Annual Report NP-2013-11-
078-GSFC. Wallops Flight Facility: NASA. 
http://sites.wff.nasa.gov/code810/files/Sounding%20Rockets%20Annual%20
Report%202013_sm.pdf. 
NIST. 2017a. “1.3.6.7.1. Cumulative Distribution Function of the Standard Normal 
Distribution.” Accessed March 5. 
http://www.itl.nist.gov/div898/handbook/eda/section3/eda3671.htm. 
———. 2016b. “NIST/SEMATECH e_Handbook of Statistical Methods.” Accessed 
December 2. 
http://www.itl.nist.gov/div898/handbook/eda/section3/eda366g.htm. 
“NIST/SEMATECH E-Handbook of Statistical Methods.” 2016. Accessed December 
2. http://www.itl.nist.gov/div898/handbook/eda/section3/eda366h.htm. 
“Normal Distribution.” 2016. Wikipedia. 
https://en.wikipedia.org/w/index.php?title=Normal_distribution&oldid=75291
7181. 
Önkal, Dilek, J. Frank Yates, Can Simga-Mugan, and Şule Öztin. 2003. “Professional 
vs. Amateur Judgment Accuracy: The Case of Foreign Exchange Rates.” 
Organizational Behavior and Human Decision Processes 91 (2): 169–85. 
doi:10.1016/S0749-5978(03)00058-X. 
Pearson, E. S., and J. W. Tukey. 1965. “Approximate Means and Standard Deviations 
Based on Distances between Percentage Points of Frequency Curves.” 
Biometrika 52 (3/4): 533–46. doi:10.2307/2333703. 
Pickard, William F. 2004. “Inverse Statistical Estimation via Order Statistics: A 
Resolution of the Ill-Posed Inverse Problem of PERT Scheduling.” Inverse 
Problems 20 (5): 1565. doi:10.1088/0266-5611/20/5/014. 
PMI. 2013. A Guide to the Project Management Body of Knowledge ( PMBOK® 
Guide ). Fifth Edition, Kindle Version. Newtown Square, Pa: Project 
Management Institute. 
R Core Team. 2014. R: A Language and Environment for Statistical Computing. 
Vienna, Austria: R Foundation for Statistical Computing. http://www.R-
project.org/. 
Raiffa, Howard. 1968. Decision Analysis: Introductory Lectures on Choices Under 
Uncertainty. Reading, Mass.: Longman Higher Education. 
Regnier, Eva. 2005a. “Hidden Assumptions in Project Management Tools,” no. 11 
(January): 1–4. 
———. 2005b. “Activity Completion Times in PERT and Scheduling Network 
Simulation, Part II.” DRMI Newletter, no. 12 (April): 1,4-9. 
Roberts, Harry V. 1965. “Probabilistic Prediction.” Journal of the American 
Statistical Association 60 (309): 50–62. doi:10.2307/2283136. 
Roebber, Paul, and Lance Bosart. 2014. “The Complex Relationship between 
Forecast Skill and Forecast Value: A Real-World Analysis: Weather and 
Forecasting: Vol 11, No 4.” Accessed February 23. 
http://journals.ametsoc.org/doi/abs/10.1175/1520-
0434(1996)011%3C0544%3ATCRBFS%3E2.0.CO%3B2. 
  
332 
 
Rowe, Gene, and George Wright. 2001. “Differences in Expert and Lay Judgments of 
Risk: Myth or Reality?” Risk Analysis 21 (2): 341–56. doi:10.1111/0272-
4332.212116. 
Ruland, William. 1978. “The Accuracy of Forecasts by Management and by Financial 
Analysts.” The Accounting Review 53 (2): 439–47. 
Savage, Leonard J. 1971. “Elicitation of Personal Probabilities and Expectations.” 
Journal of the American Statistical Association 66 (336): 783–801. 
doi:10.2307/2284229. 
Schervish, Mark J. 1984. “Combining Expert Judgments.” Technial Report 294. 
Pittsburgh, PA: Department of Statistics, Carnegie Mellon University. 
———. 1986. “Comments on Some Axioms for Combining Expert Judgments.” 
Management Science 32 (3): 306–12. 
Selvidge, J. E. 1980. “Assessing the Extremes of Probability Distributions by the 
Fractile Method*.” Decision Sciences 11 (3): 493–502. doi:10.1111/j.1540-
5915.1980.tb01154.x. 
Shanteau, James. 1992. “The Psychology of Experts: An Alternative View.” In 
Expertise and Decision Support. New York : Plenum Press,. 
Shih, N. -H. 2005. “Estimating Completion-Time Distribution in Stochastic Activity 
Networks.” The Journal of the Operational Research Society 56 (6): 744–49. 
Silver, Nate. 2012. The Signal and the Noise: Why So Many Predictions Fail — but 
Some Don’t. 1 edition. New York: Penguin Press HC, The. 
Sniezek, Janet A, and Rebecca Henry. 1990. “Revision, Weighting, and Commitment 
in Consensus Group Judgment.” Revision, Weighting, and Commitment in 
Consensus Group Judgment 45 (1): 66–84. doi:10.1016/0749-5978(90)90005-
T. 
“Statistical Distributions.” 2016. Accessed August 23. 
http://people.stern.nyu.edu/adamodar/New_Home_Page/StatFile/statdistns.ht
m. 
Steyn, Herman. 2001. “An Investigation into the Fundamentals of Critical Chain 
Project Scheduling.” International Journal of Project Management 19 (6): 
363–69. doi:10.1016/S0263-7863(00)00026-0. 
Surowiecki, James. 2005. The Wisdom of Crowds. Reprint edition. New York: 
Anchor. 
Tetlock, Philip. 2005. Expert Political Judgment. Kindle Edition. Princeton, New 
Jersey: Princeton University Press. 
https://www.amazon.com/dp/B00C4UT1A4/ref=dp-kindle-
redirect?_encoding=UTF8&btkr=1. 
Trumbo, D, C Adams, M Milner, and L Schipper. 1962. “Reliability and Accuracy in 
the Inspection of Hard Red Winter Wheat.” Cereal Science Today 7. 
Tsai, Claire I., Joshua Klayman, and Reid Hastie. 2008. “Effects of Amount of 
Information on Judgment Accuracy and Confidence.” Organizational 
Behavior and Human Decision Processes 107 (2): 97–105. 
doi:10.1016/j.obhdp.2008.01.005. 
Tversky, Amos. 1974. “Assessing Uncertainty.” Journal of the Royal Statistical 
Society. Series B (Methodological) 36 (2): 148–59. 
  
333 
 
———. 1975. “A Critique of Expected Utility Theory: Descriptive and Normative 
Considerations.” Erkenntnis (1975-) 9 (2): 163–73. 
Tversky, Amos, and Daniel Kahneman. 1974. “Judgment under Uncertainty: 
Heuristics and Biases.” Science, New Series, 185 (4157): 1124–31. 
———. 1981. “The Framing of Decisions and the Psychology of Choice.” Science, 
New Series, 211 (4481): 453–58. 
Tversky, Amos, and Eldar Shafir. 1992. “Choice under Conflict: The Dynamics of 
Deferred Decision.” Psychological Science 3 (6): 358–61. 
Tversky, Amos, and Peter Wakker. 1995. “Risk Attitudes and Decision Weights.” 
Econometrica 63 (6): 1255–80. doi:10.2307/2171769. 
Ward, Dan. 2015. “Ward.pdf.” Accessed July 9. 
http://www.dau.mil/pubscats/ATL%20Docs/Sep-Oct11/Ward.pdf. 
waynehale. 2015. “Ten Years After Columbia: STS-112, the Harbinger.” Wayne 
Hale’s Blog. Accessed August 1. 
https://waynehale.wordpress.com/2012/12/03/ten-years-after-columbia-sts-
112-the-harbinger/. 
“Weibull Distribution.” 2016. Wikipedia. 
https://en.wikipedia.org/w/index.php?title=Weibull_distribution&oldid=7579
39623. 
Weiss, David, and James Shanteau. 2014. “Empirical Assessment of Expertise (PDF 
Download Available).” Accessed February 23. 
https://www.researchgate.net/publication/10614553_Empirical_Assessment_o
f_Expertise. 
West, Mike, and Jo Crosse. 1992. “Modelling Probabilistic Agent Opinion.” Journal 
of the Royal Statistical Society. Series B (Methodological) 54 (1): 285–99. 
Whittlesea, Bruce W.A. 1990. “Illusions of Immediate Memory: Evidence of an 
Attributional Basis for Feelings of Familiarity and Perceptual Quality.” 
Illusions of Immediate Memory: Evidence of an Attributional Basis for 
Feelings of Familiarity and Perceptual Quality 29 (6): 716–32. 
doi:10.1016/0749-596X(90)90045-2. 
Winkler, Robert L. 1968. “The Consensus of Subjective Probability Distributions.” 
Management Science 15 (2): B61–75. 
———. 1981. “Combining Probability Distributions from Dependent Information 
Sources.” Management Science 27 (4): 479–88. 
———. 1986. “Expert Resolution.” Management Science 32 (3): 298–303. 
Winston, Wayne L. 2003. Operations Research: Applications and Algorithms. 4 
edition. Belmont, CA: Cengage Learning. 
Yates, J. Frank. 1990. Judgment and Decision Making. Englewood Cliffs, N.J: 
Prentice Hall College Div. 
Zajonc, Robert B. 1968. “Attitudinal Effects of Mere Exposure.” Attitudinal Effects of 
Mere Exposure. 9 (2, Pt.2): 1–27. doi:10.1037/h0025848. 
Zio, E. 1996. “On the Use of the Analytic Hierarchy Process in the Aggregation of 
Expert Judgments.” Reliability Engineering & System Safety 53 (2): 127–38. 
doi:10.1016/0951-8320(96)00060-9.