ABSTRACT Title of dissertation: ESTIMATION OF EXPECTED RETURNS, TIME CONSISTENCY OF A STOCK RETURN MODEL, AND THEIR APPLICATION TO PORTFOLIO SELECTION Huaqiang Ma, Doctor of Philosophy, 2010 Dissertation directed by: Prof. Dilip B. Madan AMSC / Department of Finance Longer horizon returns are modeled by two approaches, which have different impact on skewness and excess kurtosis. The Levy approach, which considers the random variable at longer horizon as the cumulants of i.i.d random variables from shorter horizons, tends to decrease skewness and excess kurtosis in a faster rate along the time horizon than the real data implies. On the other side, the scaling approach keeps skewness and excess kurtosis constant along the time horizon. The combination of these two approaches may have a better performance than each one of them. This empirical work employs the mixed approach to study the returns at five time scales, from one-hour to two-week. At all time scales, the mixed model outperforms the other two in terms of the KS test and numerous statistical distances. Traditionally, the expected return is estimated from the historical data through the classic asset pricing models and their variations. However, because the realized returns are so volatile, it requires decades or even longer time period of data to attain relatively accurate estimates. Furthermore, it is questionable to extrapolate the expected return from the historical data because the return is determined by future uncertainty. Therefore, instead of using the historical data, the expected return should be estimated from data representing future uncertainty, such as the option prices which are used in our method. A numeraire portfolio links the option prices to the expected return by its striking feature, which states that any contingent claim's price, if denominated by this portfolio, is the conditional expectation of its denominated future payoffs under the physical measure. It contains the information of the expected return. Therefore, in this study, the expected returns are estimated from the option calibration through the numeraire portfolio pricing method. The results are compared to the realized returns through a linear regression model, which shows that the difference of the two returns is indifferent to the major risk factors. This demonstrates that the numeraire portfolio pricing method provides a good estimator for the expected return. The modern portfolio theory is well developed. However, various aspects are questioned in the implementation, e.g., the expected return is not properly estimated using historical data, the return distribution is assumed to be Gaussian, which does not reflect the empirical facts. The results from the first two studies can be applied to this problem. The constructed portfolio using this estimated expected return is superior to the reference portfolios with expected return estimated from historical data. Furthermore, this portfolio also outperforms the market index, SPX. ESTIMATION OF EXPECTED RETURNS, TIME CONSISTENCY OF A STOCK RETURN MODEL, AND THEIR APPLICATION TO PORTFOLIO SELECTION By Huaqiang Ma Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park, in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2010 Advisory Committee: Professor Dilip B. Madan, Chair Professor Benjamin Kedem Professor Mark Loewenstein Professor Tobias von Petersdorff Professor Victor M. Yakovenko ? Copyright by Huaqiang Ma 2010 Acknowledgements First and foremost, I would like to thank my advisor Professor Dilip B. Madan. Through the years of my PhD study, Dr. Madan was always there offering his help. His width and depth of knowledge opens my eyes in the world of mathematical finance; his insight lights the way when I got lost in the mist; his intolerance for ambiguity sharpens my mind; and his sense of humor soothes my nerves when I was struggling in the research. To me, Dr. Madan is also a role model. Most of people only enjoy the time after work. Very few people enjoy their life at work, and Dr. Madan is one of them. His passion for knowledge inspired and motivated me during my PhD study, and will inspire and motivate me through my life. I want to thank Professor Mark Loewenstein for generous advices in my thesis. I would like to extend my gratitude to Professor Benjamin Kedem, Professor Tobias von Petersdorff, and Professor Victor M. Yakovenko for agreeing to serve in the PhD advisory committee. I also want to thank Professor Michael Fu for organizing our weekly financial seminar, which broaden my knowledge in the mathematical finance. I would like to say thanks to my fellow classmates. Samvit Prakash, Bing Zhang, and Christian Silva guided me with their own experience in math finance. Thank Yun Zhou, Linyan Cao, Guoyuan Liu, and Geping Liu for numerous hours of discussion for the problems in the research. I would like to express my grateful feelings to the staff members. Ms. Alverda McCoy was always there willing to help. Mr. William (Bill) Schildknecht was so kind to offer me teaching opportunities in and out of math department. Mr. Chuck LaHaie helped me with financial data. iii My sincere thanks go to my friends, especially Wei Hu, Yiling Luo, Baozhong Mao, Hua Sheng, Anshuman Sinha, and Min Sun. Your sincere help and encouragement helped me go through the hard time in my PhD study. Finally, I would like to thank my parents. They always stood by me with their endless love and support. No words can express my gratitude. iv Contents 1 Empirical Study of A Stock Return Model 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Two Approaches to Model Stock Market Returns . . . . . . . . . . . 6 1.2.1 L?vy Processes: De?nition, Properties, and L?vy Market Models 6 1.2.2 Scaling Property and Self-similarity . . . . . . . . . . . . . . . 13 1.3 Modeling Stock Returns with L?vy and Scaling . . . . . . . . . . . . 16 1.3.1 Preliminary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.3.2 Self-decomposable Laws . . . . . . . . . . . . . . . . . . . . . 17 1.3.3 Mixed Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.4 Related Methods and Techniques . . . . . . . . . . . . . . . . . . . . 21 1.4.1 Variance Gamma Process and the Associated Law . . . . . . . 21 1.4.2 Stock Price Dynamics with the VG Process . . . . . . . . . . 26 1.4.3 Maximum Likelihood Estimation (MLE) . . . . . . . . . . . . 28 1.4.4 Fast Fourier Transform (FFT) . . . . . . . . . . . . . . . . . . 29 1.4.5 VG Random Number Simulation . . . . . . . . . . . . . . . . 31 1.5 Numerical Implementation and Results . . . . . . . . . . . . . . . . . 33 v 1.5.1 Data, Sketch of the Procedure, and Brief Introduction of the Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . 33 1.5.2 Statistical Estimation at the Unit Time . . . . . . . . . . . . . 35 1.5.3 Statistical Estimation at Longer Horizons . . . . . . . . . . . 37 1.5.4 Statistical Analysis for Model Performance Comparison . . . . 44 1.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2 Estimating Expected Return By Numeraire-Portfolio Method 50 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.2 Pricing with Physical Measure . . . . . . . . . . . . . . . . . . . . . . 54 2.2.1 A Simple Example . . . . . . . . . . . . . . . . . . . . . . . . 54 2.2.2 The Numeraire Portfolio . . . . . . . . . . . . . . . . . . . . . 56 2.2.3 Pricing Under the Physical Measure . . . . . . . . . . . . . . . 64 2.3 Estimating Expected Returns Via Numeraire-portfolio Pricing Method 67 2.3.1 Idea of the Expected Return Estimation . . . . . . . . . . . . 67 2.3.2 Multivariate Random Number Simulation Via FGC . . . . . . 70 2.4 Numerical Implementation and Results . . . . . . . . . . . . . . . . . 75 2.4.1 Estimating Expected Returns . . . . . . . . . . . . . . . . . . 76 2.4.2 Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . . 81 2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 3 A New Approach to Portfolio Selection 92 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 3.2 Portfolio Evaluation - Acceptability Indices . . . . . . . . . . . . . . . 95 vi 3.2.1 Acceptance Sets and Coherent Risk Measure . . . . . . . . . . 95 3.2.2 Acceptability Indices . . . . . . . . . . . . . . . . . . . . . . . 96 3.2.3 WVAR Acceptability Indices . . . . . . . . . . . . . . . . . . . 99 3.2.4 Bid and Ask Prices . . . . . . . . . . . . . . . . . . . . . . . . 102 3.3 Numerical Implementation and Results . . . . . . . . . . . . . . . . . 104 3.3.1 Trading Strategy . . . . . . . . . . . . . . . . . . . . . . . . . 104 3.3.2 Procedure and Results . . . . . . . . . . . . . . . . . . . . . . 105 3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 vii List of Figures 1-1 VG ?t to WMT at 1-hour time scale . . . . . . . . . . . . . . . . . . 36 1-2 Statistical ?t to WMT at 2hr timescale . . . . . . . . . . . . . . . . . 39 1-3 Statistical ?t to WMT at 3hr timescale . . . . . . . . . . . . . . . . . 40 1-4 Statistical ?t to WMT at 1d timescale . . . . . . . . . . . . . . . . . 41 1-5 Statistical ?t to WMT at 1w timescale . . . . . . . . . . . . . . . . . 42 1-6 Statistical ?t to WMT at 2w timescale . . . . . . . . . . . . . . . . . 43 1-7 Proportion of stocks with p-value greater than certain level (2-hour timescale) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 1-8 Proportion of stocks with p-value greater than certain level (3-hour timescale) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 1-9 Proportion of stocks with p-value greater than certain level (1-day timescale) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 1-10 Proportion of stocks with p-value greater than certain level (1-week timescale) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 1-11 Proportion of stocks with p-value greater than certain level (2-week timescale) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 viii 2-1 A single-period binormial model. . . . . . . . . . . . . . . . . . . . . 54 2-2 The ?tted option data of SPX on July 11, 2007, with one-month ma- turity, RMSE=2.01 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 2-3 The ?tted option data of HPQ on July 11, 2007, with one-month ma- turity, RMSE=0.0617 . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 2-4 Estimated return[SPX vs. realized return]SPX for SPX (January 1999 to October 2009) . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3-1 Graphic illustration of the Representation Theorem . . . . . . . . . . 98 3-2 Cumulative returns of the ?ve portfolios ( = 0:05) . . . . . . . . . . 108 3-3 Cumulative returns of the ?ve portfolios ( = 0:10) . . . . . . . . . . 109 3-4 Cumulative returns of the ?ve portfolios ( = 0:15) . . . . . . . . . . 110 3-5 Cumulative returns of the ?ve portfolios ( = 0:20) . . . . . . . . . . 111 3-6 Cumulative returns of the ?ve portfolios ( = 0:25) . . . . . . . . . . 112 3-7 Estimated-return portfolio at di?erent risk level (AIX=MINMAXVAR)113 ix List of Tables 1.1 Statistics of the Estimated VG Parameters (at unit time = 1 hour) . 36 1.2 Statistics of the Estimated VGMixed Parameter c . . . . . . . . . . . 38 1.3 Statistics of the Estimated VGMixed Parameter . . . . . . . . . . . 38 1.4 Meanofthestatisticaldistancesofthethreemodelsatdi?erenttimescales 45 1.5 Std. of the statistical distances of the three models at di?erent timescales 45 2.1 Percentage of positive risk premium of each stock in 130 days . . . . 81 2.2 Mean and std. of the estimated return[ and realized return] (annu- alized) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 2.3 t-test of each i (indi: = 0 represents i = 0, indi: = 1 represents i 6= 0) 88 2.4 F test of (indi: = 0 represents = 0, indi: = 1 represents 6= 0) . . 89 3.1 Mean of portfolio return at di?erent risk level (January 1999 - October 2009) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 3.2 Std. of portfolio return at di?erent risk level (January 1999 - October 2009) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 x Chapter 1 Empirical Study of A Stock Return Model 1.1 Introduction Stock markets are complex dynamic systems with many elements interacting with each other. The interaction of the elements can be observed as price ?uctuations through time. Because of the complexity of stock markets, price ?uctuations exhibit statistical properties, which can be reproduced by ?nancial models. Sophisticated models are capable of dynamically describing these properties, that is, they can pro- vide statistical ?t and analysis for the stock returns at various time scales, from hourly to daily returns, and even to monthly and yearly returns. These models are time consistent through the time horizon. The statistical accuracy of ?nancial models is necessary and useful in many areas. For example, in risk management, a proper model can help reduce severe losses. With the incentive both for academic and prac- 1 tical purposes, many ?nancial models have been developed. The stochastic approach has been widely implemented to model this complex dy- namic system. Dating back to 1900, Louis Bachelier [2] ?rst proposed that the stock price behaves as random walks and modeled the price at time t as St = S0 + Wt, where Wt is Brownian motion and  is the volatility. In this model, the price ?uctu- ation St = St S0 follows Brownian motion, which has independent and identical increments, with increments having Gaussian distributions. To ensure positive price, log-price ?uctuation ln(St=S0) = lnStlnS0, instead of the price ?uctuation, is mod- eled to follow Brownian motion [2]. The stochastic process of St is then said to follow the geometric Brownian motion, and the dynamics of St is dSt = Stdt + StdWt, which has the analytic solution St = S0 expf(2=2)t + Wtg. This model is also employed in option pricing by Black, Scholes [7], and Merton [58], and the work earned the latter two a Nobel Prize of economics in 1997. Sophisticated ?nancial models can reproduce, if not all but some important sta- tistical properties of the stock markets. The main object is to study the probability density function (pdf) of the increments at various time scales and compare the im- plied statistical properties with the empirical facts. The increments in the stock markets are the log-price ?uctuations (or log-price returns) ln(St+=St) at some given time scale . The Brownian motion model misses some of the important statistical properties of the stock markets. Compared to Brownian motion, the empirical distri- bution of the increments of log-price returns has more mass at the origin and in the tails, i.e., the stock markets have more events of small returns and big returns (either losses or gains) than the model predicts. This empirical fact is called heavy tails (or 2 fat tails), which can be quantitatively measured by excess kurtosis. The excess kur- tosis of Gaussian distribution is zero while the markets always show positive excess kurtosis. The other discrepancy is that Brownian motion has a continuous sample path but the path of stock prices may show discontinuity (or jumps). L?vy processes, which were introduced by Paul L?vy in the early 20th century, are stochastic processes with independent and identical increments. It relaxes one assumption in Brownian motion, Gaussian increments. This relaxation provides the ?exibility to have more choices of distributions for the increments as long as they are in?nitely divisible, and these distributions are more leptokurtic (positive excess kurtosis) than the Gaussian one. Furthermore, discontinuity is also possible to appear in the path of L?vy processes. Merton [59] is one of the ?rst to propose one L?vy process (Brownian motion compound Poisson process, or jump-di?usion process) to model asset returns. Later, numerous L?vy models were proposed, among them we cite the Variance Gamma model by Madan and Seneta [49] [48], the NIG model by Barndor?-Nielson [3], the Generalized Hyperbolic model by Eberlein and Prause [24], Prause [67], the Meixner model by Schoutens [73], and the CGMY model by Carr et al. [11]. Because of the ?exibility of choosing from various distributions, L?vy models can capture some of the important empirical facts, such as jumps and fat tails. The statistical ?t is usually performed at some ?xed time scale. However, little work has been done to do statistical analysis along the time horizon. One example of work is done by Eberlein and Ozkan [23], who investigated the time consistency of L?vy models where a L?vy distribution model is employed at di?erent time scales, from hourly to daily return. For L?vy processes, skewness decreases in the rate of 3 the square root of the time horizon and excess kurtosis decreases in the rate of the time horizon. However, empirical studies [17] [22] show the actual data decrease more slowly than L?vy processes predict. Besides L?vy models, self-similarity or scaling is applied in ?nancial markets. In a stochasticprocess, thescalingpropertymeansthedistributionofincrementsofvarious time scales can be obtained from that of other time scale by rescaling the random variable at that time scale. Mandelbrot [52] is the ?rst to introduce this concept into ?nancial markets, where he considered cotton price returns having the scaling property. We cite some of the other works as Mantegna and Stanley [54], Cont, Bouchaud and Potters [18], Mandelbrot [53], Peters [63], Cont [17], and Galloway and Nolder [31]. With the assumption of the scaling behavior, the distribution at the larger timescales can be derived from those at the smaller ones, which are easier to estimate because the data is su? cient. Not like L?vy processes, the fact, the stochastic processes with scaling property have constant skewness and excess kurtosis at all timescales, also does not satisfy the empirical results. These two approaches have di?erent impacts on skewness and excess kurtosis throughthetimehorizon. TheL?vyapproachindicatesafasterdecaythanthemarket while the scaling approach has constant skewness and excess kurtosis at all horizons. Thus, Eberlein and Madan [22] proposed a model mixed of the two approaches, which providesthefreedomtoletthetermstructureoftheskewnessandexcesskurtosishave a similar pattern as the markets. One thing needs to be pointed out is that this mixed approach is not associated with any processes. Instead, it only uses the distributional properties taken from these two processes. In this model, the random variables of 4 the increments of the stock returns at various time horizons are decomposed into two independent parts, one is from the increments of a L?vy process, and the other comes from that of a scaling process. In this paper, we start the statistical parameter estimation from a short horizon, e.g., one hour, because of the abundant date. A base distribution is chosen, which shall be in?nite divisible and have self-decomposability characteristic (SDC) because of the requirements from L?vy processes and decomposition of the random variables. The distribution is decomposed into two parts in law, one is partial of itself, and the other is a remaining component. The distributions at longer horizons are run by these two components, represented by two parameters correspondingly: one implies the proportion of the Levy composition; the other is the scaling coe? cient of the remaining component. These two parameters are estimated at the longer horizons, such as two hours, three hours, one day, one week, and two weeks in this paper. For comparison, the statistical estimations are also conducted by the associated L?vy model and scaling model at the same timescales, and statistical analyses, including the Kolmogorov Smirnov (KS) test, the Kolmogorov distance, 2-distance, L1 and L2 distance, are performed. All these statistical analyses indicate that the scaling model outperforms the L?vy model at longer horizons, while the mixed model dominates these two models at all horizons. Furthermore, the averages of the two parameters of 500 individual stocks are both around 0.4 through the time horizon. Thus, it is adequate to assume value 0.4 for these two parameters at even longer time scales, such as half-year, one year or longer, where the statistical parameter estimation are not feasible due to the lack of data. 5 The remaining of this chapter is organized as follows. Section 1.2 introduces and discusses the two approaches of modeling ?nancial markets, the L?vy process approach and the scaling approach. Section 1.3 presents the empirical study of skew- ness and excess kurtosis in stock markets and explains how the above two approaches are questionable. In section 1.4, the mixed approach is developed to model the dis- tribution of returns through the time horizon. Section 1.5 describes the methods to do statistical estimation. Section 1.6 presents the results. Section 1.7 concludes the study and suggests further work. 1.2 Two Approaches to Model Stock Market Re- turns 1.2.1 L?vy Processes: De?nition, Properties, and L?vy Mar- ket Models ? L?vy Processes and In?nitely Divisible Distributions Relax one assumption in Brownian Motion, the Gaussian increments, and we have L?vy processes, which provide more ?exibility to build continuous-time stochastic models. De?nition 1 L?vy Process A stochastic process X = fXt : t  0g on ( ;F;P) with values in Rd is said to be a L?vy process if, 6 (1) X has independent increment: that is, for any 0  t1 < t2 < ::: < tn < 1; Xt2 Xt1;:::;Xtn Xtn1 are independent. (2) X has stationary increment: that is, the law of Xt Xs is the same as Xts, where 0  s < t < 1 (3) Xt is stochastic continuous: that is, for 8 " > 0 and t > 0 lim h!0 P (jXt+h Xtj  ") = 0: (4) X0 = 0 almost surely. (5) Xt has the cadlag property: that is, right-continuity and left limits. The third condition, stochastic continuity, does not imply continuous sample path. It only means that at any given (or deterministic) time t, the probability that a jump occurs is zero. However, sample path may still be discontinuous at random times. Thus, the L?vy process is capable of capturing random jumps occurring in the ?nancial markets. The ?rst two conditions are the main features of L?vy processes, and the third one follows from the ?rst two (Keller [37]). Jacod and Shiryaev [36] name processes with conditions (1) and (2) ?processes with stationary independent increment (PIIS).? Given a random variable Y with the probability distribution F, if we make Xt law= Y for any t > 0, then we can construct a L?vy process through the time horizon as follows: 1. For the sample path at t;2t;3t;:::;nt;:::;n > 1, Xnt = Xt + (X2t Xt) + 7 ::: + (Xnt X(n1)t), whose distribution is the same as that of the sum of n i:i:d random variables Xt;X2tXt;:::;XntX(n1)t because of the ?stationary independent increment?property; 2. Within interval Xt X0, we choose any integer m > 1 that m = t, Xt = X + (X2 X) + ::: + (Xm X(m1)), that is, Xt is divided into m i:i:d parts and it has the same law of that of the sum of m i:i:d random variables, whose distribution can be derived from Xt. In this procedure, the distribution of Xt can be in?nitely ?divided?as m > 1 can be in?nitely large. This property is called ?in?nite divisibility.? De?nition 2 In?nite Divisibility Let X be a random variable with distribution F. A probability distribution F on Rd is in?nitely divisible if for any integer n > 1, there exists n i:i:d random variables X1;X2;:::;Xn such that X law= X1 + X2 + ::: + Xn: (1.1) We have the following proposition showing the one-to-one relationship between the L?vy process and in?nitely divisible distribution: Proposition 3 L?vy Processes and In?nitely Divisible Distributions For any in?nitely divisible distribution F, there exists a L?vy process fXt : t  0g such that the law of X1 is F. Conversely, given a L?vy process fXt : t  0g, the distribution of Xt is in?nitely divisible for every t > 0. 8 Compared to Brownian motion, it?s quite ?exible to choose distributions for the increments of L?vy processes, with only one constraint that the distributions should be in?nitely divisible. Usually the pdf of Xt in L?vy processes is not easy to obtain [36]. Instead we study the characteristic function of Xt. Let t(u) or Xt(u) be the characteristic function of Xt, E[eiuXt]. De?ne t(u) = Xt(u) = lnt(u) as the characteristic exponent. Then, the characteristic function of a L?vy process is given by the following proposition: Proposition 4 Characteristic Function of L?vy Processes Let fXt : t  0g be a L?vy process on Rd and its characteristic exponent at t = 1 be . Then is a continuous function : Rd ! R, such that: E[eiuXt] = et (u); u 2 Rd: (1.2) By this proposition, we can build a L?vy process from any in?nitely divisible distributionthroughitscharacteristicfunction. Therefore, thelawofXt isdetermined by the law of X1; both are in?nitely divisible. ? Properties of L?vy Processes Brownian motion is a well-known L?vy process with Gaussian increment. Another simple and common L?vy process is the compound Poisson process fXt : t  0g, which is de?ned as Xt = NtP i=0 Yi where fNt : t  0g is a Poisson process and Yi are i:i:d random variables independent of Nt. Other L?vy processes can be decomposed by these simple bricks. 9 Proposition 5 L?vy-It? Decomposition Let fXt : t  0g be a L?vy process on Rd. It can be decomposed into three parts: Xt = XBt + XCt + lim "#0 fX"t; where (1.3) XBt = rt + AW(d)t XCt = Z jXj1;s2[0;t] xN(ds;dx) fX"t = Z "jXj1;s2[0;t] xeN(ds;dx): XBt is a d-dimensional continuous Gaussian process with drift r and covariance matrix A, W(d)t is a d-dimensional Brownian motion; XCt is a compound Poisson process with jump size jXj  1. N(ds;dx) is a Poisson random measure on R+ (Rdnf0g); fX"t is a compensated compound Poisson process eN(ds;dx) = N(ds;dx)(dx)ds, where  is a jump intensity (or called L?vy measure) on Rdnf0g and is given by (dx) = E[N(1;dx)].  also veri?es R Rdnf0g (1jxj2)(dx) < 1. (r;A;) is called the L?vy characteristic triplet or L?vy triplet. The L?vy-It? decomposition states that the structure of the sample path of any L?vy process consists three parts: a di?usion process with drift, XBt , a ?large jump? process with jump size greater than one, XCt , and a ?small jump?process with jump size less than one, fX"t. As there can be in?nitely many small jumps around zero and their sum may not converge, we have to compensate the compound Poisson process with small jumps to make it a martingale so that it won?t explode. That?s how we 10 get the third part fX"t in the decomposition (Proposition 2.16 in [19]). With the L?vy-It? decomposition formula, it?s easy to derive the characteristic function of L?vy processes, which is given in the next theorem: Theorem 6 L?vy-Khinchin Representation Let fXt : t  0g be a L?vy process. Its characteristic function satis?es: E[eiuXt] = et (u); u 2 Rd; (1.4) where (u) = iru 12uAu + Z Rdnf0g (eiux 1iux1jXj1)(dx): (1.5) The L?vy-Khinchin representation explicitly links L?vy processes to in?nitely di- visible distributions. Given an in?nitely divisible distribution F with characteristic component (1.5), a L?vy process Xt can be generated where its law at t = 1 is F. Thus, we can study any L?vy process from its corresponding in?nitely divisible distribution. ? Distributional Property of L?vy Process: Tail Behavior The L?vy-Khinchin representation enables us to study the tail behavior of the dis- tribution of a L?vy process through its associated in?nitely divisible distribution F, which is characterized by a L?vy triplet (r;A;). We cite the following proposition from Cont [19], Proposition 3.13. Proposition 7 Moments and Cumulants of a L?vy process 11 Let fXt : t  0g be a L?vy process on R with characteristic triplet (r;A;). The n- th absolute moment of Xt, E[jXtjn] is ?nite for some t or, equivalently, for every t > 0 if and only ifRjxj1jxjn (dx) < 1. In this case moments of Xt can be computed from its characteristic function by di?erentiation. In particular, the form of cumulants of Xt is: E[Xt] = t(r + Z jxj1 x(dx)) (1.6) c2(Xt) = VarXt = t(A + Z R x2(dx)) (1.7) cn(Xt) = t Z R xn(dx) for n  3 (1.8) Skewness and excess kurtosis of the increments Xt or Xt+tXt can be derived by this proposition. skewness(Xt) = c3(Xt)c 2(Xt)3=2 = skewness(X1)pt (1.9) excesskurtosis(Xt) = c4(Xt)c 2(Xt)2 = kurtosis(X1)t (1.10) We can conclude that skewness decreases at the rate of t1=2 and excess kurtosis decreases at the rate of t. This proposition also implies that the distributions of the incrementsofL?vyprocessesareleptokurtic(excesskurtosisispositive)asc4(Xt) > 0. 12 1.2.2 Scaling Property and Self-similarity Traditionally, scaling phenomena are observed and studied in physical sciences. In the 1990s, the availability of high-frequency data and computer technology made it possible to investigate scaling behavior in economic systems. Empirical studies show that the asset prices exhibit similar statistical properties at di?erent time scales, which bring interest to implement scaling property in economics. In mathematics, the scaling behavior is associated with stochastic processes ex- hibiting the self-similarity property, which is de?ned below: De?nition 8 Self-Similarity and Self-Similar Processes A stochastic process fXt : t  0g is said to be self-similar if Xt law=  Xt; (1.11) where  > 0 is a scaling factor. is called the self-similarity exponent. This process is also called the self-similar process or -self-similar process. Awell-known self-similar process is the Brownian motion with self-similarity expo- nent = 1=2. Some of the other L?vy processes may also have self-similar properties, and they are named -stable L?vy processes. However, self-similarity property does not solely appear in L?vy processes. It also exists in processes with dependent incre- ments. For example, fractional Brownian motions, which have correlated increments, also show self-similarity. If we take the unit time for Xt in (1.11), we have 13 Xt law= t X1; 8 t > 0; (1.12) which indicates the law for Xt at time t can be obtained from the law at the unit time, scaled by a coe? cient t . Nowletusstudyhowtheproperties, includingcharacteristicfunction, distribution function, tail behavior, and moments, of self-similar processes behave through time horizon.  Characteristic function Xt law= t X1 (1.13) , E[eiuXt] = E[eiut X1] , Xt(u) = [X1(u)]t or Xt(u) = t X1(u)  Distribution function cdf: Xt law= t X1 , P(Xt  x) = P(t X1  x) , FXt(x) = FX1( xt ) (1.14) pdf: di?erentiate (2.13), we get fXt(x) = 1t fX1( xt ) 14 At the center x = 0, fXt(0) = 1t fX1(0) (1.15)  Center and Tail Behavior Several authors have used (1.15) to test self-similarity on stock returns and estimate around the center of the density function. Mantegna and Stanley [54] studied the S&P 500 Index and concluded that  0:71 and self-similar L?vy (a -stable process) processes describe the dynamics of the pdf well at zero. However, this model fails at tails. Later, power-law distributions, along with self-similar processes, are proposed by numerous authors [18] [32] [30] to model the tail behavior of stock returns. If self-similar processes have power-law tail at X1 P(jX1j > x)  1x ; then at other time scales P(jXtj > x)  t x ; for t > 0; which means the tail behavior still exhibits power-law distribution with some scaling coe? cient .  Moments, variance, skewness, and kurtosis Using Eq. (1.12), it?s obvious to derive the moment at t > 0 from t = 1 Moment: E[Xt] = t E[X1] 15 E[Xnt ] = tn E[Xn1 ] Variance: Var[Xt] = t2 Var(X1) Skewness: skew(Xt) = E[(XtEXt)3][Var(X t)]3=2 = skew(X1) Kurtosis: kurt(Xt) = E[(XtEXt)4][Var(Xt)]2 = kurt(X1) We can tell that skewness and excess kurtosis of self-similar processes do not change along the time horizon. 1.3 Modeling Stock Returns with L?vy and Scal- ing 1.3.1 Preliminary As discussed in the previous section, the term structures of skewness and excess kur- tosis (the relationship between skewness/excess kurtosis and the time horizon) exhibit di?erent patterns under di?erent models. If we assume the price is moved by inde- pendent news and it is the result of the accumulation of these independent identical shocks, then the stock price is led by L?vy processes. The skewness and excess kur- tosis of the price ?uctuations drop at the reciprocal of pt and t, respectively. If the stock markets, as complex dynamic systems, exhibit scaling behavior as numerous authors have indicated, then the skewness and excess kurtosis of the price ?uctua- tions keep constant at all time scales. These two postulations have been investigated by numerous authors. The empirical studies show that the term structures of skew- 16 ness and excess kurtosis behave in between of these two approaches, that is, skewness and excess kurtosis decay slower than L?vy and but faster than scaling. Thus, it is natural to propose a model that combines these two ideas: at a chosen time scale, called unit time, the random variable of log-price increment (or price ?uctuation, or log return) is split into two components, one runs as the accumulation of i:i:d random variables, which is L?vy, and the other behaves as a scaling random variable along the time horizon. Again, we shall point out that this construction is only for the distributions of the stock returns at various time horizons, and do not necessarily have to be associated with any stochastic processes. 1.3.2 Self-decomposable Laws The ?rst step in modeling is to split the random variable at the unit time, which is related to a family of limit laws and its associated property, self-decomposability. Itisknownthatthestockpricesaremovedbymanypiecesofinformationornoises. If these pieces of information are considered as a sequence of independent random variables (not necessarily identical) fZi : i = 1;2;:::g, then the price ?uctuation Pt is the consequence of the impacts from all Zi. Let Sn = nP i=0 Zi and rewrite it as anSn +bn. L?vy [42] and Khinchin [40] studied the asymptotic behavior of anSn +bn and de?ned a family of laws called ?class L.?It states that there exist sequences of constants an, the scaling coe? cients, and bn, the centering constants, such that the distribution of anSn + bn converges to the law of a random variable X, which belongs to a family of laws ?class L.?The class L laws are limit laws. The Central 17 Limit Theorem, which says the distribution of the normalized sum of a large number of i:i:d random variables converges to Gaussian distribution, is a special case of the class L. As Pt, the price change within t time horizon, is the outcome of many independent random variables appearing in t, it can be approximated as a random variable X which has the law of class L. Sato [72] studied another class of random variables with self-decomposable prop- erty, which is de?ned below. De?nition 9 Self-decomposable Laws A random variable X is self-decomposable if for 8 c 2 (0;1) X law= cX + X(c); (1.16) where X(c) is a random variable independent of X. This means a self-decomposable random variable X can be decomposed into a partial of itself and another independent random variable. Furthermore, Sato [72] shows that the random variable X is self-decomposable if and only if it has class L distribution. Thus, we can study the property of the price ?uctuation Pt through the self-decomposable laws, which is relatively easier to handle than class L. Self-decomposable distributions belong to the family of in?nitely divisible laws [41]. Their characteristic functions are given by the following proposition [72]: Proposition 10 Characteristic Function of Self-decomposable Laws 18 The characteristic function of a self-decomposable random variable X is E[eiux] = exp  iru 122u2 + Z R  eiux 1iux1jxj1g(x)jxj dx  ; (1.17) where r,  are constants, 2  0,RR(jxj2 1)g(x)jxj dx < 1, and g(x) is an increasing function when x < 0 and an decreasing function when x > 0. The L?vy measure of the self-decomposable laws has the form g(x)jxj with some constraints for g(x) as indicated above. This kind of function g(x) is called the self-decomposability characteristic (SDC) of the random variable X [12]. 1.3.3 Mixed Model Let the log-price change (or the log return) X = lnSt lnS0 be a self-decomposable random variable within some chosen unit time (t = 1), e.g., one hour, one day. By Eq. (1.16), we have X law= cX + X(c): The log return Yt at other time scales t are developed from the two components at the unit time, that is, Xt is also decomposed into cX(t), which runs as a L?vy process from the cX, and t X(c), which is scaled from X(c). Yt = cX(t) + t X(c): (1.18) The characteristic function of Yt can be derived, E[eiuYt] = E[eiucX(t)+t X(c)] = E[eiucX(t)]E[eiut X(c)] 19 as from E[eiuX] = E[eiu(cX+X(c))] = E[eiucX]E[eiuX(c)]; we can get E[eiuX(c)] = E[eiuX]=E[eiucX] = exp( (u))=exp( (cu)) = exp( (u) (cu)); where () is the characteristic exponent. so E[eiuYt] = expft (cu) + (ut ) (cut )g: (1.19) And the following proposition [22] provides the term structure of variance, skew- ness, and excess kurtosis in this model: Proposition 11 Variance, Skewness, and Kurtosis of the Mixed Model Let Var(X), Skew(X), Kurt(X) be the variance, skewness, and excess kurtosis at the unit time t = 1 of a self-decomposable random variable X de?ned by (1.16). Then the variance, skewness and excess kurtosis of Yt de?ned by Eq. (1.18) are: Var(Yt) = Var(X)(c2t + (1c2)t2 (1.20) Skew(Yt) = Skew(X)pt  c3 + (1c3)t3 1 (c2 + (1c2)t2 1)3=2  (1.21) Kurt(Yt) = Kurt(X)t  c4 + (1c4)t4 1 (c2 + (1c2)t2 1)2  (1.22) 20 Remarks: (1)Bysimplecalculation, wecanseethat h c3+(1c3)t3 1 (c2+(1c2)t2 1)3=2 i < 1and h c4+(1c4)t4 1 (c2+(1c2)t2 1)2 i < 1 when 0 < c < 1; thus, skewness decays at the rate between pt and 0, and excess kurtosis decay at the rate between t and 0. (2) It can be seen from this proposition that Yt follows L?vy process when c = 1 and scaling process when c = 0. 1.4 Related Methods and Techniques In the experimental procedure, a couple of methods and techniques are needed, in- cluding the law at the unit time, the maximum likelihood estimation, the fast Fourier transform (FFT), and the simulation. They are used in the statistical parameter esti- mation and analysis both in this chapter and the other two chapters. In this section, a brief review of these methods is provided. 1.4.1 Variance Gamma Process and the Associated Law Variance Gamma Process as a Time-changed Brownian Motion Brownian motion captures the essentiality of the stock markets but also misses some important empirical facts. The dynamics of the markets is not homogenous through time: that is, sometimes the markets are very active while other times they are relatively slow. Time, instead of being considered as a steady increasing process, can be viewed as a randomly changing time that is ?economically relevant.?Thus, we have a generalized version of Brownian motion with random time, which provides 21 more ?exibility to describe the log stock prices. If this random time follows a gamma process, the time-changed Brownian motion is called the Variance Gamma process [49][48].1 First, we de?ne gamma process. De?nition 12 Gamma Process A gamma process (t;;) is a L?vy process with independent gamma increments where  is the mean rate and  is the variance rate. The increment gh = (t+h;;) (t;;) is a gamma random variable and has the probability density function (pdf) fh(g) = 1( ) g 1eg= ; (1.23) where = 2h , = , g  0. The gamma process is a pure jump L?vy process, i.e., no di?usion part W(t). The mean of gh is h and the variance is h. The characteristic function of the gamma process (t) is  (t)(u) = 1 1iu !2t= : (1.24) If replacing the calendar time t in Brownian motion with a random time (t), the expectation of (t) should equal t, E[ (t)] = t. Thus, the gamma process must increase with a unit mean rate, which means  = 1 and (t;1;) are used to model the time in this time-changed Brownian motion. 1Most of the results in this section, if not indicated, come from Ref [48]. 22 Given a gamma process with a unit mean rate, we can build a Variance Gamma process. De?nition 13 Variance Gamma Process Let b(t;;) be a Brownian motion with drift  and standard deviation , b(t;;) = t + W(t). Let (t;1;) be an independent gamma process with unit mean rate. Then the Variance Gamma process (VG) is de?ned as X(t;;;) = b( (t;1;);;) =  (t;1;) + W( (t;1;)); (1.25) a time-changed Brownian motion with gamma random time. ? Properties of the VG Process The random variable X(t) =  (t)+W( (t)) of the VG process in the time interval t contains two independent random parts: a Gaussian random variable W(t) and a gamma random variable (t). Thus, the independence can let us conveniently use the conditional expectation method to derive its pdf and characteristic function. The pdf of X(t) can be obtained from a normal density function conditioned on a gamma random variable. So we can integrate the gamma part and get the density function of X(t), which is given below: fX(t)(x) = Z 1 0 1 p2g exp  (xg) 2 22g   g t 1 exp(g)  t (t) dg: (1.26) 23 Similarly, the characteristic function of X(t) can be attained by conditional ex- pectation on the gamma random variable g(t): X(t)(u) =  1 1iu + u2 (2=2) t  : (1.27) Because the characteristic function has a much simpler expression than the density function, it is used most of the time. Another representation for the VG process is to interpret it as the di?erence of two independent gamma processes, X(t) = p(t) n(t); (1.28) where p(t) is a gamma process representing positive jumps and n(t) is an in- dependent gamma process with negative jumps. The VG process is the result of the e?ect of these two processes, which implies that the VG process is also a pure jump L?vy process. The L?vy measure (or L?vy density) can be determined from the characteristic function, kX(dx) = cexp(Gx)jxj1 dx; x < 0 cexp(Mx)x1dx; x > 0; (1.29) 24 where c = 1 > 0 (1.30) G = 0 @ s 22 4 + 2 2  2 1 A 1 > 0 (1.31) M = 0 @ s 22 4 + 2 2 +  2 1 A 1 > 0: (1.32) From (1.29) we can see that the ?rst part (x < 0) is the L?vy measure of the negative jumps ( n) with parameter C, G, and the second part (x > 0) is the L?vy measure of the positive jumps ( p) with parameter C, M. The central moments, skewness, and kurtosis can also be derived. The results are listed below: mean =  (1.33) variance = 2 + 2 (1.34) skewness = (32 + 22)=(2 + 2)3=2 (1.35) kurtosis = 3(1 + 2 4(2 + 2)2); (1.36) which show that the VG process has the ?exibility to control skewness and excess kurtosis, unlike Brownian motion which has ?xed values. The L?vy measure of the VG process (1.29) indicates the VG random variable has a self-decomposable distribution (Proposition 10). Thus, the VG random variable is a candidate to be the building blocks at the unit time in the mixed model. 25 1.4.2 Stock Price Dynamics with the VG Process ? Preliminary - Stock Market Models Stock price can be modeled as a stochastic process St = S0 exp(t + Xt); (1.37) where St, S0 are stock prices at time t, 0, respectively, Xt represents a stochastic process, and  is the mean rate of returns of the stock. However, the stock price St in (1.37) is not a martingale under the statistical probability measure (or usually called ?physical measure?). The following proposition provides a way to make St a martingale [Theorems 2.5.1 and 2.5.3 in [72]]. Proposition 14 Let fXt : t  0g be a real-valued process with independent incre- ment. If E[eaXt] < 1 for some real-valued a, then  eaXt E[eaXt]  t0 is a martingale at all t  0. Let a = 1 and assume E[eXt] < 1, Eq (1.37) can be rewritten as St = S0exp(t + Xt)E[eX t] : (1.38) Take the expectation of St, we get E[St] = S0etE h eXt E[eXt] i = S0et, or S0 = E[St]et , which means Stet, the stock price discounted by its drift term et, is a martingale under the physical measure. 26 ? VG Stock Market Model Let Xt in (1.38) be the VG random variable and fXt : t  0g the associated VG process, we have the VG stock market model. By the VG characteristic function (1.27), E[exp(XVG(t))] = XVG(t)(i) =  1 1i(i) + (i)2 (2=2)) t  = exp  t ln  1  2 2  ; denotes ! = 1 ln  1 22  Eq (1.38) becomes St = S0 expft + XVG(t) + !tg; (1.39) which is the stochastic process for the stock price. The log-price return ln  St S0  follows t + XVG(t) + !t. Use the density function (1.29) of the VG random variable XVG(t) and integrate the gamma random variable, we can derive the density function (pdf) of the log-price return ln  St S0  . Proposition 15 Density Function (pdf) of VG Log-price Returns Let r(t) = ln  S(t) S(0)  be the log-price return, and St follow Eq (1.39). Under the 27 physical measure, the pdf of r(t) is given by f(r(t)) = 2exp  z 2  p2 t (t)  z2 22  +  2 ! t 2 1 4 Kt  1 2 1 2  s z2 22  +  2 ! ; (1.40) where K is the modi?ed Bessel function of the second kind, and z = r(t) t t  ln  1 22  . As ln  St S0  = t+XVG(t)+!t, the characteristic function of the log-price return r(t) is easier to attain E[eiur(t)] = Eeiu(t+XVG(t)+!t) = eiu(+!)t VG(u) = expfiu( + !)tg  1iu  2 2 u 2 t  : (1.41) The characteristic function has a much simpler form than the density function. As these two functions have a one-on-one relationship connected by Fourier transform, people use the characteristic function most of the time. 1.4.3 Maximum Likelihood Estimation (MLE) Probability distribution can model the log-stock returns with a ?xed interval, e.g., daily returns, hourly returns, etc. Given a set of sample points, the maximum likeli- hood estimation (MLE) can be employed to estimate the model parameters by ?tting the data to the statistical model. Let x1;x2;:::;xn be n i:i:d sample data points collected from a population. The 28 pdf of a proposed distribution is f(x;! ), where n!  : 1;2;:::;k o is the parameter set. The likelihood function is de?ned as L(! jx) = n i=1 f(xi;! ): (1.42) The method of maximum likelihood function is to estimate ! by ?nding values of a parameter set b which maximize L(! jx). Eq (1.42) can be rewritten in the logarithm version, logL(! jx) = n i=1 f(xi;! ); which is called log likelihood function. Maximization is then conducted on this function instead. Due to the complexity of pdf formulas, only numerical procedure is feasible to perform MLE in most cases. When the closed-form of pdf is not known or it?s too complicated to use in MLE, characteristic function is employed. At this situation, the conversion from characteristic function to pdf is performed by Fourier transform. A useful technique in the Fourier transform will be discussed in the next section. 1.4.4 Fast Fourier Transform (FFT) The numerical MLE method is realized through optimization procedure, which re- quires the calculation of the likelihood function at each iteration. If the pdf is not known or computationally feasible, then the values of pdf are attained from characteristic function through Fourier transform, f(x;! ) = 12 R11 eiuxX(u)du. 29 The fact that f(x;! ) is a real-valued function implies that X(u) has even real part and odd imaginary part, which derives the transform formula to f(x;! ) = 1  R1 0 e iuxX(u)du.The latter can be numerically calculated by f(xk;! ) = 1 N j=1 eiujxkX(uj); (1.43) where  = u, k = 0;1;:::;N 1. Eq (1.43) is the discrete Fourier transform (DFT) and requires O(N2) operations: N computations are needed for each xk, and there are N number of xk. The fast Fourier transform (FFT) is an e? cient algorithm to compute the DFT. It produces exactly the same result as DFT and only requires O(N logN) operations. There are several FFT algorithms. What we use in the Matlab code is based on FFTW [81], and the details of this algorithm can be found in Ref [78]. The ready-to-use formula for the numerical computation of pdf can be derived from Eq (1.43)2. In this formula, N usually takes the value of the power of 2. Given the step size ; the upper limit of u is a = N, the grid point uj = (j 1). Let  be the length of grid of x, then x ranges from b to b, where b = N2 , and the grid point xk = b + (k 1) for k = 1;2;:::;N. With the above setting, Eq (1.43) becomes f(xk;! ) = 1 N j=1 ei(j1)(b+(k1))X(uj) = 1 N j=1 ei(j1)(k1) eibujX(uj): (1.44) 2The procedure is based on the work by Carr and Madan [13]. 30 The formula of the standard discrete Fourier transform is Z(k) = N j=1 e2iN (j1)(k1) z(j): (1.45) Comparing Eq (1.44) with Eq (1.45), we get  = 2N : With properly chosen values of  and N, Eq (1.44) can be immediately used in the numerical procedure. It can be further revised by incorporating Simpon?s rule, f(xk;! ) = 1 N j=1 e2iN (j1)(k1) eibujX(uj) 3(3 + (1)j j1); (1.46) where j is the Kronecker delta function with value one at j = 1 and zero else- where. Comparing Eq (1.45) and Eq (1.46), we have z(j) = 1eibujX(uj) 3(3 + (1)j j1); (1.47) which is the input in the FFT calculation. 1.4.5 VG Random Number Simulation Simulation of VG random number is needed in the goodness of ?t of the VG model in our work. The VG random number can be simulated directly from the de?nition 31 of the VG process. Recall the VG process X(t) is a drifted Brownian motion with its time following the gamma process, which is independent of the Brownian motion (De?nition 13), X(t) =  (t) + W( (t)): (1.48) To generate the random number X(t) at time t, we can simulate two independent random variables separately: a standard normal random variable W(1) and a gamma random variable (t). W( (t)) in (1.48) is a Gaussian random variable, so it can be written as W( (t)) = p (t)W(1). As W(1) and (t) are independent, the product of these two simulated numbers gives us W( (t)). Thus, the random number X(t) in (1.48) is attained. The above procedure is summarized below: Algorithm 16 VG Random Number Simulation Step 1. Generate a standard normal random variable z  N(0;1) Step 2. Generate a gamma random number g  Gamma(t;) Step 3. The VG random number X(t) at time t is given by X(t) = g + pgz 32 1.5 Numerical Implementation and Results 1.5.1 Data, Sketch of the Procedure, and Brief Introduction of the Statistical Analysis The ?rst largest 495 stocks are chosen from S&P 500 and S&P MidCap400 [83]. The data are the log-price returns ranging from January 2, 2003 to December 29, 2006, which include nonoverlapping one-hour, two-hour, three-hour, daily, weekly, and biweekly returns. Sketch of the procedure is listed below:  Step 1. Estimate the log return data at the unit time, which is chosen to be one hour, for each stock using the Variance Gamma distribution.  Step2. Estimatethelogreturndataatlongerhorizonsforeachstockusingthree di?erent models, which are the VG iid model (because the random variables at longer horizons are the cumulants of i.i.d random variables from the unit time), the VG scaling model (with VG law at the unit time), and the VG mixed model. The longer horizons are two hours, three hours, one day, one week, and two weeks.  Step 3. Conduct statistical goodness of ?t for each model. The statistical analyses include the Kolmogorov-Smirnov test, the Kolmogorov distance, the modi?ed Kolmogorov distance, 2-distance, L1 and L2 distances, which are brie?y described below. 33  Kolmogorov-Smirnov test (KS-test) The KS-test is a statistical hypothesis test to determine whether the two data sets di?er signi?cantly. The null hypothesis is that the two data sets are from the same continuous distribution, and the alternative hypothesis is that they belong to di?erent distributions. The signi?cance level is usually taken as 5%. The advantage of the KS-test is that it does not have any assumptions about the distributions of the data. On the other side, the cost or the disadvantage is it is less sensitive or accurate than other tests.  Various statistical distances to measure how close the two probability distributions are We have two probability distributions: one is the empirical distribution obtained from the data; the other is the ?tted probability distribution. If the two distributions are similar, then graphically their cdf curves are close. The distances between these two curves can be used to measure how good the ?t is. Let F1 and F2 be the cdf of two distributions. A couple of distances have been de?ned.  Kolmogorov distance distK(F1;F2) = sup x2R jF1(x)F2(x)j measures the largest distance between the two cdfs  Modi?ed Kolmogorov distance disteK(F1;F2) = sup jxj>" jF1(x)F2(x)j 34 The empirical cdf has a large jump at x = 0, which is called the 0-return e?ect. To eliminate this e?ect, the distance is not measured in a small area around 0.  2-distance Denote n1, n2 the m-dimensional frequency vectors from samples of two distribu- tions. The 2-distance is calculated by: dist2(F1;F2) = mP i=1  n1i n1i n2i n2i 2 /(n1i + n2i) where n1i, n2i are the elements in vectors n1, n2.  Lp-distance distLp(F1;F2) =RRjF1(x)F2(x)jp dx1=p, p = 1;2;::: 1.5.2 Statistical Estimation at the Unit Time We ?rst estimate the VG parameters (;;) for the hourly demeaned returns using MLE. The KS-test is then performed on the observed data and the simulated data using the ?tted VG model. The signi?cance level is = 0:05. Among 495 stocks, the VG model only has a goodness ?t for one quarter (126) of the stocks, with p-value greater than 5%. Further estimations at longer horizons will be conducted only on these 126 stocks. The statistics of the estimated parameters are presented in Table 1.1, includ- ing mean, standard deviation, minimum, maximum, quantile 1/4, quantile 3/4, and median. To graphically illustrate the statistical ?t, the ?tted pdf and its empirical counterpart are shown in Figure 1-1 for WMT (Walmart). 35    mean 0.4473 1.32E-04 0.0397 std. 0.0936 2.27E-05 0.3282 min. 0.2788 8.73E-05 -0.5000 quantile 1/4 0.3694 1.13E-04 -0.1886 median 0.4436 1.31E-04 0.0763 quantile 3/4 0.5124 1.48E-04 0.2940 max 0.6713 2.11E-04 0.5000 Table 1.1: Statistics of the Estimated VG Parameters (at unit time = 1 hour) Figure 1-1: VG ?t to WMT at 1-hour time scale 36 1.5.3 Statistical Estimation at Longer Horizons The distribution of the mixed model is based on the distribution at the unit time, which means the estimated parameters of the base distribution (b;b;b) is used at longer horizons. The remaining two parameters, c and , are to be estimated, where c represents theproportionof therandomvariablebehavinglikeL?vyprocesses, and represents the scaling coe? cient of the remaining component of the random variable. The maximum likelihood estimation is performed on the nonoverlapping demeaned log return data at each longer horizon, including two hours, three hours, one day, one week, and two weeks. The summary statistics of the estimated (bc;b ) is presented in Table 1.2 (bc) and Table 1.3 (b ). The parameters of the VG mixed model are hard to estimate at even longer horizons, such as half year because of the lack of data. The average values of c and are around 0.4 in our estimation. Thus, c and may be properly assumed to have value of 0.4 at horizons where estimation is not feasible due to lack of data. The performance of the VG mixed model is compared to other two models, the VG iid model and the VG scaling model. The distribution of the VG iid model at longer horizon t is the accumulated i:i:d VG variables to time t, so it is known if the distribution at the unit time is given. In the VG scaling model, the random variable Xt is t 0X1, a scaled version of X1 from the unit time t = 1. Thus, we need to estimate , the scaling coe? cient at time t. Sample graphs of the ?tted and empirical pdf of these three models at various time horizons for WMT are shown in Figure 1-2 to Figure 1-6. 37 2 hours 3 hours 1 day 1 week 2 weeks mean 0.6404 0.7204 0.3390 0.4274 0.4713 std. 0.1712 0.1122 0.1116 0.1249 0.1421 min. 0.0064 0.3985 0.0002 0.0504 0.0002 quantile 1/4 0.5882 0.6518 0.2716 0.3436 0.3959 median 0.6591 0.7420 0.3334 0.4310 0.5025 quantile 3/4 0.7665 0.7955 0.4198 0.5165 0.5764 max 0.9144 0.9792 0.5759 0.7142 0.6927 Table 1.2: Statistics of the Estimated VGMixed Parameter c 2 hours 3 hours 1 day 1 week 2 weeks mean 0.4966 0.5117 0.3277 0.3344 0.2671 std. 0.0617 0.0968 0.0536 0.111 0.1648 min. 0.2074 0.3043 0.0456 4.3e-06 1.9e-08 quantile 1/4 0.4674 0.4666 0.3074 0.3258 0.0536 median 0.5005 0.5028 0.3413 0.3634 0.3451 quantile 3/4 0.5363 0.5439 0.3621 0.3998 0.3864 max 0.6359 1.0000 0.4239 0.4681 0.4782 Table 1.3: Statistics of the Estimated VGMixed Parameter 38 Figure 1-2: Statistical ?t to WMT at 2hr timescale 39 Figure 1-3: Statistical ?t to WMT at 3hr timescale 40 Figure 1-4: Statistical ?t to WMT at 1d timescale 41 Figure 1-5: Statistical ?t to WMT at 1w timescale 42 Figure 1-6: Statistical ?t to WMT at 2w timescale 43 1.5.4 Statistical Analysis for Model Performance Compari- son A couple of statistical analyses, described in Section 1.5.1, are conducted to compare the performances of the three models, namely VG mixed, VG scaling, VG iid. The statistics of ?ve distances are presented in Table 1.4 (mean) and Table 1.5 (std.). The KS-test is also employed at the longer horizons to test and compare the three models. It examines whether the observed data and the simulated data from the ?tted model belong to the same distribution. the p-value is attained and graphed in ?gures (Figure 1-7 to Figure 1-11) whose xaxis is the p-value and y axis is the proportion of stocks whose p-value exceeds the corresponding p-value on the xaxis. ThegraphshowstheVGscalingmodelhasbetterperformancethantheVGiidmodel, and the VG mixed model outperforms both models at all horizons. At longer horizon (2-week), the VG iid model has better performance than the VG scaling model, which alsocon?rmstheempiricalfactthatthereturndistributionasymptoticallyapproaches Gaussian, which has i.i.d increment along the time horizon. 1.6 Conclusion This chapter investigates the performance of three stock return models, the VG iid model, the VG scaling model, and a mixed version of the two models, the VG mixed model. The ?rst two approaches have di?erent e?ects on skewness and excess kurtosis along the time horizon. The empirical study shows that skewness and excess kurtosis 44 Mean distK dist eK dist2 distL1 distL2 2h-VG Mixed 0.0064 0.0047 0.0001 0.0008 4.00E-06 2h-VG Scaling 0.0122 0.0060 0.0002 0.0013 7.00E-06 2h-VG iid 0.0160 0.0137 0.0003 0.0018 9.00E-06 3h-VG Mixed 0.0096 0.0065 0.0002 0.0011 6.00E-06 3h-VG Scaling 0.0170 0.0101 0.0004 0.0019 1.40E-05 3h-VG iid 0.0204 0.0182 0.0005 0.0024 1.20E-05 1d-VG Mixed 0.0144 0.0118 0.0007 0.0024 7.00E-06 1d-VG Scaling 0.0247 0.0224 0.0011 0.0038 1.40E-05 1d-VG iid 0.1157 0.1157 0.0083 0.0265 3.04E-04 1w-VG Mixed 0.0250 0.0239 0.0027 0.0070 1.60E-05 1w-VG Scaling 0.0346 0.0342 0.0036 0.0094 1.90E-05 1w-VG iid 0.1143 0.1143 0.0164 0.0369 2.73E-04 2w-VG Mixed 0.0388 0.0381 0.0053 0.0121 2.80E-05 2w-VG Scaling 0.0503 0.0502 0.0065 0.0155 3.30E-05 2w-VG iid 0.1140 0.1140 0.0204 0.0411 2.31E-04 Table 1.4: Mean of the statistical distances of the three models at di?erent timescales Std. distK dist eK dist2 distL1 distL2 2h-VG Mixed 0.0024 0.0021 0.0001 2.00E-04 2.00E-06 2h-VG Scaling 0.0052 0.0022 0.0001 4.00E-04 5.00E-06 2h-VG iid 0.0056 0.0054 0.0001 6.00E-04 5.00E-06 3h-VG Mixed 0.0037 0.0028 0.0001 4.00E-04 4.00E-06 3h-VG Scaling 0.0060 0.0041 0.0001 4.00E-04 1.00E-05 3h-VG iid 0.0064 0.0063 0.0002 7.00E-04 7.00E-06 1d-VG Mixed 0.0062 0.0046 0.0003 8.00E-04 4.00E-06 1d-VG Scaling 0.0094 0.0089 0.0004 1.00E-03 1.10E-05 1d-VG iid 0.0129 0.0129 0.0017 3.20E-03 2.29E-04 1w-VG Mixed 0.0103 0.0095 0.0012 2.30E-03 9.00E-06 1w-VG Scaling 0.0131 0.0129 0.0013 2.60E-03 1.30E-05 1w-VG iid 0.0211 0.0211 0.0041 6.40E-03 1.66E-04 2w-VG Mixed 0.0124 0.0124 0.0025 4.30E-03 1.60E-05 2w-VG Scaling 0.0175 0.0177 0.0026 4.80E-03 2.00E-05 2w-VG iid 0.0228 0.0228 0.0063 8.90E-03 1.34E-04 Table 1.5: Std. of the statistical distances of the three models at di?erent timescales 45 Figure 1-7: Proportion of stocks with p-value greater than certain level (2-hour timescale) Figure 1-8: Proportion of stocks with p-value greater than certain level (3-hour timescale) 46 Figure 1-9: Proportion of stocks with p-value greater than certain level (1-day timescale) Figure 1-10: Proportion of stocks with p-value greater than certain level (1-week timescale) 47 Figure 1-11: Proportion of stocks with p-value greater than certain level (2-week timescale) in L?vy models decline much faster than the observed data when time increases while they stay constant in the scaling models. A strategy of combining the two approaches is proposed [22]: at a short horizon named unit time, e.g., one hour, we split the random variable of log return into two components, one is a fraction of itself and the other is the remaining part. The ?rst component follows the VG iid model along the time horizon, and the second one follows the VG scaling model. Statistical estimation and analysis are conducted. All the statistical analyses show the VG mixed model outperforms the other two models at all longer horizons. Furthermore, both the estimated coe? cient, c and , in the VG mixed model have an average value of 0.4 at all horizons. The mixed model provides a practical method to construct return distributions at longer horizons, which has many applications in 48 the ?nancial industry. There are a couple of things to investigate in the future study. 1. In this study, the VG distribution only ?ts one-fourth of the stocks at the unit time, one hour. Other possible self-decomposable distributions should be explored to have a better statistical ?t at this unit time. 2. The value 0.4 for are assumed for c and when estimation is not feasible due to the lack of data. However, other values should be sought to have a better performance than 0.4. One possible way is to combine data from di?erent horizons to estimate c and , which may be more accurate than 0.4. 3. The mixed model provides a strategy to build return distributions at longer horizons. The probability measure obtained from the time series stock return data is called the physical measure P. Option surface contains the information to construct anotherreturndistributioncalledriskneutralmeasureQ. Ithasinterestedresearchers for a long time regarding how the ratio P=Q behaves along the time horizon. This question is not easy to answer due to the di? culty in obtaining the physical measure at longer horizons. The mixed model provides a possible way to study this topic. 49 Chapter 2 Estimating Expected Return By Numeraire-Portfolio Method 2.1 Introduction The expected return is one of the most important numbers in ?nance, which predicts risky asset?s future performance. The estimation of expected returns is crucial to many investment decisions, e.g., portfolio selection. Much research has been done to analyze andmodel expectedreturns byvarious riskfactors. However, fewstudies have been done to estimate this important number. Furthermore, there is no universally accepted agreement on the estimation method. One widely used method implements classic asset pricing models, mainly the Cap- ital Asset Pricing Model (CAPM) and the Fama-French three-factor model, to esti- mate expected returns from historical data. In these models (and their variations), the expected return is a?ected by one or more than one of the risk factor(s), named 50 beta(s). In the estimation procedure, Beta(s) is ?rst estimated using a simple OLS regression on historical data. Then the expected return is obtained by the product of the estimated beta(s) and the associated risk premium. However, realized returns are so volatile that a huge amount of data is required to obtain relatively precise estimates. Detailed discussion can be found in [6]. An empirical study by Bartholdy and Peare [4] also indicates that none of the two popular models provides an accurate good ?t, where both regressions in the method can only explain an average 5% of di?erences in returns. Numerousauthors, includingBreeden[8], Lucas[46], MehraandPrescott[57], and Rubenstein [71], demonstrate that expected returns are determined by future uncer- tainty and investors?preferences, instead of implied by realized returns. Therefore, a discount cash ?ow model which links expected returns to future cash ?ow (uncer- tainty) is proposed. The estimator is originally derived from the Dividend Discount Model by Preinreich [68], which says an asset?s current price is the future payo?s dis- counted by the expected return. Edwards and Bell [25] and Ohlson [62] improved the model and derived the Edwards-Bell-Ohlson equation, which, along with its modi?ed versions, is implemented by numerous authors. We cite Claus and Thamas [15] and Philips [64]. However, this method is not robust for assets with dividends or earning growth rates. For other estimation approaches, we refer to Welch?s paper [79], which provides a review of the existing estimates of the expected returns. In this paper, an interest- ing survey is conducted among a group of academic ?nancial economists, and their forecasts of equity premium are reported. 51 In this chapter, we propose a novel estimation approach, which also tries to ex- trapolate the expected return from the future uncertainty that is represented by option prices. Unlike the classic risk-neutral pricing, the option can be priced by an alternative method, which is related to a so-called numeraire portfolio [45] and the associated pricing method. The numeraire portfolio is a self-?nancing, positive port- folio, which maximizes the expected log utility at the terminal time. It exists if, and only if, there is no arbitrage opportunity. A striking feature of this portfolio is that the price process of any asset in the same market, if denominated by this portfolio, is a martingale under the physical measure. Therefore, the numeraire portfolio provides a pricing method for contingent claims, which is proposed by Platen [9] [65] [66]. More explicitly speaking, an option?s price, denominated by the numeraire portfolio, is the expectation of its numeraire-denominated terminal payo? under the physical measure. The physical measure implies the expected return through stochastic stock price models. Therefore, the numeraire pricing method links the expected return and future uncertainty, and it leads to a new method to estimate the expected return. The numeraire portfolio is required in this method. However, its composition is not as easy to determine as its existence. Long [45] demonstrates that it is a levered position in the market portfolio. Furthermore, empirical studies [29] [69] [70] suggest the market portfolio and the numeraire portfolio can be proxied by value-weighted or equal-weighted portfolios, such as S&P 500, NYSE. Option calibration is conducted to estimate the parameters, which include the desired expected return. A simulation technique is employed in the calibration proce- dure. As stock and the numeraire portfolio (or its proxy) are correlated, the bivariate 52 random variable is simulated through the full-rank Gaussian copula (FGC) [39] [50] [51], which transforms the marginal samples to a standard normal random variable, constructs the dependence structure from the binormal random variable, and then transforms the simulated standard normal back to the desired bivariate random num- bers. In the calibration procedure, a stock price model for long-horizons (one month in this study) is required, which is the VG mixed model by Eberlein and Madan [22]. Theexpectedreturns of the?rst 50stocks intheS&P500are estimatedonce every month from January 1999 to October 2009. Unlike the realized return or its sample mean, nearly 95% of the estimated expected returns are positive. The statistics of these estimates are more stable than the realized returns. A simple linear regression model furthershows that the estimatedreturns andthe realizedreturns have the same mean for nearly 80% of the stocks. The results indicate the estimated return can be served as an estimator for the expected return, and it is superior to the estimators from the historical data. The rest of the chapter is organized as follows. Section 2.2 introduces the nu- meraire portfolio and its pricing method. Section 2.3 describes the estimation pro- cedure using the numeraire-portfolio method. The numerical implementation and results are presented in Section 2.4. Section 2.5 concludes. 53 2.2 Pricing with Physical Measure 2.2.1 A Simple Example In this section, we present a simple example to illustrate how to price an asset.1 Consider a single-period binomial model in Figure 2-1. We are interested in valuing a stock A whose current price is S0 = $100. At time t = 1, its price will either be $100 or $95, each with 50% possibility. To simplify the situation, the interested rate is assumed to be zero. Figure 2-1: A single-period binormial model. In this example, the probability at t = 1 is the real world probability which is called the ?physical measure.?Simply taking the expectation using the physical measure, E[S1jt0] = 110  12 + 95  12 = 102:5, does not give us S0 = $100, the stock price at t = 0. If the price is $102:50, nobody will buy this stock as people can invest this amount at time 0 in the money market, which is risk free, and get back $102:50 at time 1 guaranteed with no worry to lose. So the actual price is lower than the 1Please note that it is a rough example, only for illustration purpose. 54 expected price using the physical measure, and the extra amount $2:50 is the risk premium, a compensation for the uncertainty that people take in this risky asset. One approach to obtaining the price is to take the expectation under another probability measure called the ?risk-neutral measure.?The idea was ?rst proposed by Cox and Ross [20] in 1976 and it is now the widely used method in pricing derivatives. Let PQ(S1 = 110) = 13 and PQ(S1 = 95) = 23, and then take the expectation under this measure, we get EQ[S1jt0] = 110  13 + 95  23 = 100, which is the actually price at time 0. This new measure is not the actual probability measure but equivalent to the physical measure. Harrison and Kreps [34] name this risk-neutral measure an ?equivalent martingale measure?as under this formulated measure, the asset price processes are martingales. How can we still obtain the actual price if taking the expectation under the phys- ical measure? In other words, under what condition can the price process still be martingale under the physical measure? Let us assume a portfolio with value $100 at time 0, $105 in the up state and $99:75 in the down state at time 1. Now di- vide the stock A?s prices by the values of this portfolio and then take the expec- tation under the physical measure, and we get expected dominated price at time 1 E[Sd1jt0] = 110105  12 + 9599:75  12 = 1, which is the dominated price at time 0 Sd0 = 100100. Thus, the stock?s dominated price is a martingale under the physical measure. This portfolio is found by Long [45] and named the numeraire portfolio. Again, please keep in mind that this is a rough example to illustrate the idea of asset pricing, no further information is implied. The structure and property of the numeraire portfolio is discussed in the following section. 55 2.2.2 The Numeraire Portfolio ? The Setting In this section, the basic de?nition and assumptions are set up. The numeraire portfolio is discussed within this context. A single-period model of an asset market is considered. We assume no transaction costs and restrictions on short sales. N tradable assets exist in the market, with price Sti for asset i at time t, where i = 1;:::;N and t = 0;1. To simplify the situation, the asset prices are adjusted values, in which the information of dividend and split is re?ected. All the assets are assumed to have strictly positive prices, i.e., Sti > 0. It is also reasonable to assume all prices are bounded, denoted as P(Sti < D; i = 1;:::;N; t = 0;1) = 1, which means all prices are less than a ?nite number D for sure. Let Rti be the rate of return for asset i from time i1 to i. Thus, Rti is also bounded. Now, we have the price and rate of return N 1 vector St and Rt for the N assets at time t. Portfolios can be constructed using the N assets. Denote ti the number of units of asset i at time t, and t the associated 1N composition vector. We also assume ?nite portfolios, which make it a ?nite number for all i and t. The market value of the portfolio A at time t is denoted as V t, which equals tSt at time t. Therearesome speci?c requirements for the portfolios inourcontext: self-?nancing and always positive value. In a self-?nancing portfolio, the purchase of new assets must be funded by the sale of its own assets, expressed in the mathematical formula as t1St = tSt for all t  1. Because of self-?nancing, only portfolios with positive 56 values can survive in the market, as when one portfolio?s value is below zero, and there is no exogenous infusion and the portfolio is valueless. We assume there exists at least one self-?nancing portfolio with positive value all the time. In that case, we have at least one portfolio with good performance to serve as a numeraire portfolio, which will be de?ned later. Last, we de?ne arbitrage, or ?pro?t opportunities?termed in Long?s paper [45]. Roughly speaking, arbitrage is the opportunity to get something from nothing, or a ?free lunch.?In our case, a portfolio with arbitrage opportunity has initial zero cost (t = 0) but probability one to have nonnegative terminal value (t = 1 or in more general case t = T where T  1), and a positive probability to have strictly positive gain terminal value. Mathematically it is de?ned as follows: (1) V 0 = 0; (2) PfV 1  0g = 1; (3) PfV 1 > 0g > 0: ? The Numeraire Portfolio: De?nition Within the above scenario, let us ?nd a portfolio with the maximal expected log return at the terminal time t = 1. The initial value of all portfolios is set to 1, i.e.. V 0 = 1. The composition of portfolio at time t is t, a 1  N vector. In a single-period model, we select the portfolio at t = 0 and hold the position at t = 1. Therefore, t is the same at t = 0 and 1 and can be simpli?ed as . Correspondingly, i is the shares of asset i. The portfolio value is still denoted as V t as it equals St 57 which is still related with t. Under the physical measure, this maximization problem can be formulated as below: max E0 [ln S1] or max E0 [lnV 1], st. S0 = 1: Using the Lagrangian method, @ (E0 [ln S1]( S0 1)) @ i = 0 for each i, i = 1;:::;N. Then E0 S 1i V 1  S0i = 0, where V 1 = S1, i = 1;:::;N: (2.1) Multiply  on both sides of these N equations, E0  S 1 V 1  =  S0: Because S0 = 1 and V 1 = S1, we have  = 1. Eq (2.1) then becomes E0 S 1i V 1  = S0i = S0iV 0 : MaximumisachievedbythesecondorderconditioniftheportfolioV 1 haspositive value. The above result means each asset denominated by a portfolio with maximized 58 expected log return is a martingale under the physical measure. Furthermore, this conclusion can be extended to multi-period discrete-time case by the following theo- rem2. Theorem 17 In a discrete-time market with N assets, = f t : t  0g is a positive self-?nancing portfolio, where t is a 1N vector representing the portfolio?s compo- sition at time t, t = 0;:::;T. If this portfolio maximizes E0[lnVT] at T, then for any asset i in this market Sit V t = Et S iT V T  ; for 0  t < T; (2.2) where Sit is the price of asset i at t, and V t is the value of portfolio at t. Long [45] de?nes the kind of denominating portfolio the ?numeraire portfolio?. De?nition 18 Numeraire Portfolio In a discrete-time market, a numeraire portfolio V  is a self-?nancing portfolio with maximized terminal expected log return. When each asset in the market is de- nominated by this portfolio, it is a martingale under the physical measure. Sit V t = Et S iT V T  ; for 0  t < T, i = 1;:::N: The numeraire portfolio is obtained by maximizing the expected log return, which is also called the expected growth (or expected growth rate in the continuous-time 2The idea of the proof can be found on page 54 in [45]. 59 model). Kelly [38] proposed an investment portfolio, which is named the growth- optimal portfolio(GOP), bymaximizingtheexpectedgrowthrateof portfolios. Thus, the numeraire portfolio is the growth-optimal portfolio, and both reach optimal for log utility investors when the initial portfolio value is set to one. Let Rti be the rate of return for the denominated asset i from t 1 to t, Rti = Sti St1;i 1 = Sti V t = St1;i V t1 1. Take the conditional expectation on both sides, we get Et1[Rti] = Et1 h Sti V t i =St1;iV  t1 1. The numeraire portfolio implies that Et1[Rti] = 0, the best conditional forecast of the rate of return for any denominated asset is zero. This is an impressive feature that implies its relationship with the market portfolio of the Capital Asset Pricing Model (CAPM), which provides information of the composition and proxy of the numeraire portfolio. Details will be discussed on page 62. ? The Numeraire Portfolio: Existence The numeraire portfolio provides striking properties described in the previous session. Now comes the question of under what conditions does such kind of portfolio exist. Theorem 19 Existence and Uniqueness of Numeraire Portfolio In a market all the portfolios are assumed to have bounded values. A numeraire portfolio exists if and only if no arbitrage opportunity exists in the market. If there are two numeraire portfolios, then, they have the same rates of return. We will sketch the idea of the proof to have a better understanding of the condi- tions of the existence. Detailed proof can be found on page 53 in [45]. 60 First, if there exists a numeraire portfolio , then its de?nition implies that any other portfolio has bV 0 = E0 hb V t i ; (2.3) where bV 0 = V 0=V 0, bV 1 = V 1=V 1, the denominated values of the portfolio . If contains arbitrage opportunity, by the arbitrage de?nition, we have bV 0 = 0 and E0 hb V t i > 0; which contradicts Eq (2.3). Thus, the existence of a numeraire portfolio excludes arbitrage. Secondly, we demonstrate the ?if?part. The numeraire portfolio is derived from maximization of E0lnV twith constraint V 0 = 0. On page 56 we assumed that the prices of all assets are bounded and there exists at least one self-?nancing portfolio with always positive values. The solution of the maximization problem exists only under the no arbitrage condition and the above listed assumptions. Thus, there exists a numeraire portfolio. Last, the uniqueness of the rate of returns. If we have two numeraire portfolios A and B with di?erent rates of return RAt, RBt, then they can be denominated by each other, E0 R At RBt  = E0 R Bt RAt  = 0: The above equations exist if and only if RAt = RBt. However, the uniqueness of the rate of return does not imply the unique composition of the numeraire portfo- 61 lios. Vasicek [77] provides an example that with redundant assets, and there exists numeraire portfolios with di?erent compositions. ? The Numeraire Portfolio: Composition and Proxies Onpage60, wedescribedanimpressivepropertyof thenumeraireportfolio: zeroisthe best conditional forecast of the rate of return for any numeraire-denominated asset. This property connects the numeraire portfolio with the market portfolio in CAPM, which is a portfolio consisting of all assets in the market, with weights proportional to their values in the market. Denote Ri and R the rates of return of asset i and the numeraire portfolio , respectively. Then, we have 1 + bRi = S1iV  1 =S0iV  0 = S1iS 0i =V  1 V 0 = 1 + Ri 1 + R ; (2.4) which has a conditional expectation 1 because E0[bRi] = 0. This implies that if asset i has high rate of return, it also has high covariance with the numeraire portfolio?s rate of return. This relationship is similar to the relationship between individualassetsandthemarketportfoliointheCAPM.Thus, thenumeraireportfolio issimilartothemarketportfolio. Furthermore, Long[45] indicatesthatthenumeraire portfolio is a levered position in the mean-variance e? cient portfolio. Roll [70] states the mean-variance e? cient portfolio P can be served as the market portfolio in the CAPM equation, E[Ri] = Rf + iP[E[Rp]Rf]; 62 where the original CAPM equation is E[Ri] = Rf + im[E[Rm]Rf]; where m represents the market portfolio, and the mean-variance e? cient portfolio is the market portfolio. Thus, the numeraire portfolio is also a levered position in the market portfolio. Eq (2.4) shows that the numeraire-denominated rate of return of asset i is bRi = 1 + Ri 1 + R 1: As discussed previously, the above rate of return has mean zero. This property can be used to test and compare di?erent proxies for the numeraire portfolio. Let RiP = 1+Ri1+R P 1 be the proxy-denominated returns. Roll [69] uses the market portfolio proxy, the S&P 500, for the numeraire portfolio, Fama and MacBeth [29] choose the NYSE equal-weighted portfolio, and Long [45] picks the NYSE value-weighted portfolio. The Hotelling T2 hypothesis tests are employed with the null hypothesis that the expected proxy-denominated return equals to zero. All the tests indicate zero expected returns with su? cient high p-values. Long [45] also ?nd the proxy- denominated returns have means close to zero with small standard deviations. These empirical tests suggest value-weighted or equal-weighted portfolios, such as S&P 500, NYSE indices, can serve as proper proxies for the numeraire portfolios. 63 2.2.3 Pricing Under the Physical Measure AsdiscussedinSection2.2.2, anyassetinamarket, whendenominatedbyanumeraire portfolio, is a martingale, so does any portfolio that is the linear combination of all the assets in this market. This numeraire-denominated portfolio process is called the fair price process by B?hlmann and Platen [9]. De?nition 20 Fair Price Process In a no-arbitrage market, = f t : t  0g is the numeraire portfolio with value process V = fV t : t  0g, and  = ft : t  0g is a self-?nancing portfolio with price process V = fVt : t  0g. If the numeraire-denominated price of this portfolio follows: Vt V t = Et V s V s  0  t < s < T; t;s;T 2 N; then, V is called a fair price process. And this market is a fair market. In a fair market, denote a contingent claim H = fHt : 0  t  Tg, which is Ft-measurable and has nonnegative payo? Ht on or before maturity T. Given a numeraire portfolio in this market, a fair price for this contingent claim can be de?ned [9]. De?nition 21 Numeraire Pricing Given a numeraire portfolio = f t : t  0g in a no-arbitrage market, the fair price at time t of a contingent claim H = fHt : 0  t  Tg is de?ned by VHt = V t EPt H T V T  0  t < T; (2.5) 64 where HT is the contingent claim?s terminal payo?. P is the physical measure. Thenumeraire-denominatedprice bVHt = VHtV t iscalledthenumeraire-portfolioprice (or ?benchmarked price?in [9]), which is a martingale under the physical measure P. The numeraire pricing requires the existence of the physical measure and a nu- meraire portfolio. As discussed in Theorem 19, the numeraire portfolio exists if there is no arbitrage opportunity, and the market indices are proper proxies. If the physical measure is easy to attain, then, this method is feasible and easy to implement. An- other advantage is assets?expected return is contained in the pricing formula. Thus, the numeraire-portfolio pricing method provides a possible way to estimate expected returns, which is not easy to obtain. Details will be discussed in the next section. This pricing method is also named the ?real word pricing?in [9] because the pric- ing formula (2.5) uses the physical measure (or the real world probability measure). According to Cochrane [16], the price in Eq (2.5) is the conditional expectation of ?nal payo? discounted by a stochastic factor V t. Alternatively, we have the widely used risk-neutral pricing method, which vales ?nancial asset by the expected future payo?, discounted by a risk-free asset Bt, under the risk-neutral measure. Platen [66] claims the risk-neutral pricing is a special case of the numeraire pricing. 65 Let fBt : t  0g be the riskless saving account (or a risk-free bond). Starting from Eq (2.5), VH0 = V t EP0 H T V T  BT B0  B0 BT  = EP0  HT BTV T  V 0B 0  B0B T  = B0 EP0 H T BT T;0  ; where T;0 = BTV T = B0V 0 = bBTbB 0 . bB0 and bBT are the numeraire-portfolio risk-free bond. B0 and V 0 equal 1 as initially set. In a no-arbitrage market, any numeraire-portfolio asset is a martingale under the physical measure P. Therefore, E0[T;0] = E0 h BT V T i = 1. As the risk-free bond Bt and the numeraire portfolio t are nonnegative, BTV T is also a nonnegative random variable. Thus, T;0 is a Radon?Nikodym derivative, which transforms the physical measure P to another measure Q by the following formula: VH0 = EP0 H T BT T;0  = EQ0 H T BT  ; which is the risk-neutral pricing formula. Thus, Q is the risk-neutral measure. 66 2.3 Estimating Expected Returns Via Numeraire- portfolio Pricing Method 2.3.1 Idea of the Expected Return Estimation Consider a European call option of asset i with terminal payo? HTi = (STi K)+ , where STi is the asset price and K is the strike. Let V = fV t : t  0g be the price of the proxy of the numeraire portfolio. Using the numeraire pricing formula (2.5), the price of this call option is VH0i = V 0 EP0 (S Ti K)+ V T  ; (2.6) where the conditional expectation is under the physical measure of the stock and the proxy. The associated probability distribution is a bivariate distribution of the random variables (STi;V T). Recall Section 1.4.2, the model of the asset price is given by Eq (1.39)3 ST = S0 expfT + X(T) + !Tg; (2.7) where  is the mean rate of return, T is the expected return in the time interval0 to T, ! is a ?convexity correction?to make the expected rate of return be  under the physical measure. 3Although Eq (1.39) is the asset price model using VG distribution, it is applicable to all other proper laws for X(t). 67 As discussed in Theorem 19, the existence of the numeraire portfolio is under the assumption that all portfolios in the market are bounded. If the assets are modeled by Eq (2.7), we need to check the portfolios constructed by the assets with dynamics of Eq (2.7) are bounded. Let the value of such portfolio be  i tiSti at time t, where Sti is asset i?s price and ti is its shares. By Eq (2.7), we have ln   i tiSti  = ln   i tiS0i expft + Xi(t) + !itg   ln q  i ( ti)2  q  i (S0i exp(t + !it))2  q  i e2Xi(t)  = 12 ln   i ( ti)2  + 12 ln   i (S0i exp(t + !it))2  + 12 ln   i e2Xi(t)  : E h ln   i e2Xi(t) i  lnE   i e2Xi(t)  = ln   i Ee2Xi(t)  = ln   i i(2i)  < 1:  i (S0i exp(t + !it))2 is ?nite. If  i ( ti)2 is bounded, which is a reasonable con- straint, then all the portfolios in this model setting are bounded. Thus, the numeraire portfolio exists when the asset prices in the market follows the dynamics of Eq (2.7). To estimate the mean rate of return  using Eq (2.6), we need to calibrate the option prices of asset i, which is conducted by minimize the average absolute error between the market prices and the model prices VH0t. A couple of average absolute errors have been de?ned, which are summarized in Schoutens?s book [74]. 68  Average Pricing Error (APE) APE = 1P R N i=1 jPR PMj N  Average Relative Percentage Error (ARPE) ARPE = 1N N i=1 jPR PMj PR  Average Absolute Error (AAE) AAE = N i=1 jPR PMj N  Root-mean-square Error (RMSE) RMSE = s N i=1 jPR PMj2 N (2.8) where PR is the market price, PR is the mean of the market prices, PM is the model price, and N is the number of options. The model parameters in Eq (2.2.2) can be estimated by minimizing one of these errors. The asset prices (STi;V T) can be obtained either by analytical calculation or simulation. In our work, STi and V T are simulated by a technique called full-rank Gaussian copula (FGC), which will be introduced in the next section. 69 2.3.2 Multivariate Random Number Simulation Via FGC Simulation of multivariate random numbers requires the information of the associ- ated multivariate distribution function, which does not always have the closed-form. Also the correlation is complicated in non-Gaussian distributions and covariance may not be a proper measure for the correlation. Copula provides a general approach to constructing dependence structures to formulate multivariate distribution from arbi- trary marginal distributions, where the dependence structure is independent from the marginal distributions. We will brie?y introduce the de?nition and some important properties of copula below. One type of copula and the application in simulation will also be described. The details of copula can be found in [61]. De?nition 22 Copula A n-dimensional copula C is a multivariate joint distribution function de?ned on [0;1]n with mapping [0;1]n ! [0;1], which has the following properties: (1) C is ground4 and n-increasing; (2) C(1;:::;1;ui;1;:::;1) = ui, ui[0;1], i = 1;:::;n; (3) C(u1;:::;un) = 0 if at least one ui equals zero. This de?nition indicates that copula C, as a multivariate distribution function, has uniform marginal distributions. How is a multivariate distribution function related to other arbitrage multivariate distribution functions? Sklar [76] provides a solution, which is the foundation of the most applications of the copula. 4A function f(x1;x2:::;xn) is called grounded if f(a1;:::) = f(:::;a2;:::) = ::: f(:::;an) = 0, where ai is the least element in the domain of xi. 70 Theorem 23 Sklar?s Theorem Let F be an n-dimensional multivariate distribution function with continuous mar- ginal distributions F1;F2;:::;Fn. Then, there exists a unique n-dimensional copula C de?ned on [0;1]n such that F(x1;x2;:::;xn) = C(F1(x1);F2(x2);:::;Fn(xn)): Sklar?s theorem states that given a join law F and the corresponding marginal laws, there exists a copula C that describing the dependence structure, and that copula does not contain any information of the expression of the marginal laws. Thus, the joint law F can be constructed from the marginal laws F1;F2;:::;Fn and the dependence structure C separately. Instead of expressing in random variables (x1;x2;:::;xn), the Sklar?s theorem has an equivalent form expressed by the probability distributions F1;F2;:::;Fn. Corollary 24 An Equivalent Form of Sklar?s Theorem Let H be an n-dimensional multivariate distribution function with continuous mar- ginal distributions F1;F2;:::;Fn. Then, there exists a unique n-dimensional copula C de?ned on [0;1]n such that C(u1;u2;:::;un) = F(F11 (u1);F12 (u2);:::;F1n (un)); where ui[0;1] for i = 1;2;:::;n. The copula also has a useful invariant property given by Embrechts et al. [27]. 71 Theorem 25 Invariant Property of Copula Let (x1;:::;xn) be a continuous n-dimensional random vector with copula C and h1(x1);:::;hn(xn) be strictly increasing continuous functions on the ranges of x1;:::;xn. then the n-dimensional random vector (h1(x1);:::;hn(xn)) also has the same copula C. This invariant property provides a powerful way to construct the multivariate distributions. If the distribution of the random vector !X is not easy to obtain, then, we can transform it to a new random vector whose dependence structure is easy to build. The only requirement for this procedure is that the transform function is strictly increasing. Now let us introduce one copula, called Gaussian copula proposed by Li [43], which is widely used in ?nancial modeling. The function of the Gaussian copula has the same structure as the cumulative distribution function (cdf). of the standard multivariate Gaussian random variables. De?nition 26 Gaussian Copula The Gaussian copula function is CG(u1;:::;u1) = (11 (u1);:::;1n (un)) ui 2 [0;1] for i = 1;:::;n; where  is the multivariate Gaussian cdf with mean zero and correlation matrix A. i is the univariate standard Gaussian cdf. 72 Malevergne and Sornette [50] provides the method to build the joint distribution using the Gaussian copula. The idea is brie?y described here. Details can be found from [50]. Let X be a n-dimensional random vector with marginal cdf F(xi) and pdf f(xi) for Xi, !Z be a n-dimensional standard Gaussian random vector with the conservation of probability f(xi)dxi = 1p2 exp  z 2 i 2  dzi: Integrate this equation, Fi(xi) = (zi) where (zi) is the cdf of Zi; zi = 1(Fi(xi)): (2.9) Eq (2.9) is strictly increasing. Thus, by Theorem 25, !X and !Z have the same copula C. !Z has a simple and well-de?ned dependence structure, which is the covari- ance matrix A. The copula of !Z is its multivariate cdf, which is the Gaussian copula. The joint distribution can then be easily obtained by combining this Gaussian copula with the marginal distribution of !X. We are interested in applying the copula method in simulating multi-asset returns. A model of dependence, termed the full rank Gaussian copula (FGC), is employed. It is proposed and studied by Malevergne and Sornette [51], later summarized by Madan and Khanna [39]. 73 FGC can be a very useful tool to simulate multivariate non-Gaussian random numbers in ?nance. Empirical study ([17] [54] [32], and many others) indicates that the distributions of returns have power-law tails, whose variance and covariance are either not well-de?ned, or only exist in principle but are hardly accurately to estimate because of the poor convergence of the sample estimators. These multivariate random variables !X can ?rst be transformed to standard Gaussian variables !Z, which have well-de?ned correlations, the covariance matrix A with possibly full rank. Then, the correlation can be estimated more accurately than the direct estimation on the original random samples of !X. Next, multivariate Gaussian random numbers can be simulated using the estimated covariance matrix bA. Finally, the simulated Gaussian randomnumbers is transformed back to get the randomsample of the original random vector !X. The algorithm is summarized below: Algorithm 27 Multivariate Simulation Using FGC Step 1. Calculate the values of cdf for sample of Xi with the marginal cdf Fi(xi) P(Xi  x) = Fi(x) Step 2. Transform the marginal distribution Fi(x) to the standard Gaussian vari- able zi = 1(Fi(xi)); (zi) is the cdf of the standard normal Zi Step 3. Estimate the covariance matrix A with the transformed sample. The 74 estimated covariance matrix is denoted as !A. Step 4. Simulate multivariate Gaussian variables with !A. The simulated random numbers are denoted as eZ. Step 5. Convert each eZi to fXi by fXi = F1i ((eZi)) eX are the simulated random numbers. Remark: 1. By Theorem 25, any multivariate random variables !Y , if connected with !X by strictly increasing functions Yi = gi(Xi), can be simulated by Algorithm 27. We can start from random samples of !X in Step1 and obtain the random numbers of !Y in Step 5; 2. Algorithm 27 also works in a special case when X is a univariate random vari- able. 2.4 Numerical Implementation and Results The expected returns are estimated for the S&P 500 Index and the ?rst 50 stocks of S&P 500 from January 1999 to October 2009. The estimation is conducted once every month, on a Wednesday of the middle of that month, for a total of 130 months. These 130 days are termed the estimation days. S&P 500 is chosen as the proxy of the 75 numeraire portfolio. The stock and option data are obtained from WRDS to estimate the expected returns. The price data of eight sectors of S&P 500 are attained from Reuters to perform statistical analysis. 2.4.1 Estimating Expected Returns Expected return is estimated from calibrating one-month options5, which is realized by minimizing one of the average absolute errors given in Section 2.3.1. RMSE (Eq 2.8) is chosen in our study, RMSE = r N i=1 jPRVH0tj2 N , where VH0t is option?s model price (Eq 2.6), VH0i = V 0 EP0 h (STiK)+ V T i , in which STi and V T are modeled by Eq (2.7) ST = S0 expfT + X(T) + !Tg. Simulation technique FGC is employed in the calibration. TosimulateSTi (stocki)andV T (proxytothenumeraireportfolio)inEq (2.6), a distribution model at horizon T is required. Because of its better performance at a longer horizon, the VG mixed model is employed in this study, as T is one month. The input of this model is the VG parameters (i;i;i) of the marginal distribution for each stock i at the unit time, which is one day in this study. On each estimation day, these parameters are needed, which are estimated from four-year daily stock price data prior to this estimation day. The expected return i of stock i and other parameters (ci; i) in the VG mixed model can be estimated through iterations of simulating (STi;V T) in the optimization. As the index V T also appears in Eq (2.6), we ?rst need to estimate its parameters ( ;c ; ) with its option pricing formula VH0 = V 0 EP0 h(V T K)+ V T i . 5Actual maturity varies from four to ?ve weeks (roughly one month), depending on the days between the estimation day to the next available maturity). 76 The estimation is conducted on 130 estimation days fromJanuary 1999 to October 2009. On each estimation day the following procedure is employed. S&P 500 (SPX) is used as the proxy of the numeraire portfolio. 1. Estimate the VG parameter (;;) for SPX and the 50 stocks using four-year daily asset price data prior to the estimation day. 2. Estimate ( ;c ; ) for SPX from the calibratoin of SPX one-month option data. VG mixed model is employed to model the price of SPX. 3. Estimate (i;ci; i) for each stock i from the calibration of its one-month option data. Figure 2-2 and Figure 2-3 present two sample calibration results, the former is from SPX option data on July 11, 2007, with RMSE value 2.01, the latter is from HPQ option data on the same day, with RMSE 0.098. Both calibrations use one option data with single maturity of one month. Among all the estimated returns of the 51 assets in 130 days, there are 94:52% of b and 73:73% of b rf (risk premium, where rf is the risk-free rate) with positive value. The percentage of positive risk premium (brf) for each asset is presented in Table 2.1. As an example, the estimated return b and the realized return e of SPX are compared in Figure 2-4. From Table 2.1 we can tell the estimated risk premium (brf) are positive most of the time, which is consistent with the argument of positive risk premium of risky 77 Figure 2-2: The ?tted option data of SPX on July 11, 2007, with one-month maturity, RMSE=2.01 78 Figure 2-3: The ?ttedoptiondataof HPQonJuly11, 2007, withone-monthmaturity, RMSE=0.0617 79 Figure 2-4: Estimated return bSPX vs. realized return eSPX for SPX (January 1999 to October 2009) 80 Symbol positive % Symbol positive % Symbol positive % SPX 89.2 DOW 64.6 OXY 63.1 ABT 64.6 DD 68.5 ORCL 92.3 MO 53.8 EMR 71.5 PNC 76.9 AXP 88.5 XOM 63.1 PEP 59.2 AMGN 86.2 F 64.6 PFE 64.6 AAPL 93.1 GE 72.3 PG 54.6 BAC 67.7 HAL 80.8 SLB 78.5 BA 83.1 HPQ 90.0 TGT 89.2 CVS 82.3 HD 82.3 TXN 90.0 CAT 70.8 INTC 82.3 MMM 63.1 CVX 56.2 JNJ 47.7 UNP 79.2 CSCO 82.3 LLY 64.6 UTX 76.2 C 74.6 LOW 88.5 UNH 85.4 KO 60.0 MCD 70.0 VZ 64.6 CL 65.4 MRK 67.7 WMT 75.4 COP 59.2 MSFT 78.5 WAG 80.0 DIS 78.5 NKE 80.8 WFC 74.6 Table 2.1: Percentage of positive risk premium of each stock in 130 days asset in all the ?nancial models. Figure 2-4 also shows that the estimated expected returns are more stable than the observed ones for SPX. Other stocks have the similar results. The mean and the standard deviation of the estimated return b and observed return e for each asset are displayed in Table 2.2, which shows the estimated return b have lower standard deviation than the observed ones. 2.4.2 Statistical Analysis Theprevioussectionshowstheestimatedexpectedreturnusingthenumeraire-portfolio method is better than the realized return and its sample mean. Further investigation is required to test how good it is. 81 Name mean std. mean std. Name mean std. mean std. (b) (b) (e) (e) (b) (b) (e) (e) AAPL 0.088 0.055 0.451 1.655 LLY 0.055 0.049 -0.02 0.869 ABT 0.048 0.039 0.076 0.772 LOW 0.085 0.070 0.132 1.113 AMGN 0.079 0.053 0.130 1.250 MCD 0.053 0.045 0.075 0.876 AXP 0.114 0.134 0.121 1.295 MMM 0.052 0.046 0.092 0.780 BA 0.065 0.048 0.129 1.051 MO 0.033 0.053 0.157 0.917 BAC 0.066 0.120 0.053 1.670 MRK 0.053 0.052 0.041 0.953 C 0.096 0.200 -0.009 1.894 MSFT 0.073 0.054 0.092 1.040 CAT 0.060 0.058 0.125 1.224 NKE 0.067 0.051 0.262 1.085 CL 0.049 0.044 0.057 0.664 ORCL 0.102 0.073 0.217 1.436 COP 0.046 0.052 0.165 0.919 OXY 0.052 0.067 0.298 0.971 CSCO 0.097 0.068 0.073 1.286 PEP 0.045 0.037 0.089 0.667 CVS 0.062 0.047 0.066 0.995 PFE 0.055 0.047 -0.017 0.832 CVX 0.042 0.048 0.109 0.735 PG 0.035 0.047 0.073 0.802 DD 0.059 0.054 0.016 1.015 PNC 0.080 0.077 0.024 1.191 DIS 0.076 0.065 0.072 0.991 SLB 0.060 0.053 0.194 1.115 DOW 0.061 0.080 0.103 1.503 TGT 0.086 0.062 0.120 1.066 EMR 0.065 0.057 0.070 0.783 TXN 0.089 0.054 0.075 1.294 F 0.060 0.131 -0.055 1.819 UNH 0.070 0.052 0.240 1.227 GE 0.076 0.082 0.019 1.052 UNP 0.065 0.058 0.110 0.898 HAL 0.070 0.064 0.188 1.518 UTX 0.068 0.055 0.148 0.874 HD 0.077 0.062 0.045 1.116 VZ 0.054 0.062 -0.008 0.749 HPQ 0.081 0.053 0.199 1.190 WAG 0.061 0.043 0.086 0.837 INTC 0.088 0.062 0.028 1.300 WFC 0.075 0.096 0.089 1.213 JNJ 0.034 0.037 0.095 0.601 WMT 0.066 0.048 0.051 0.728 KO 0.045 0.040 0.059 0.730 XOM 0.051 0.05 0.117 0.677 Table 2.2: Mean and std. of the estimated returnb and realized returne (annualized) 82 Let Ri;t+1 be asset i?s return from t to t + 1 and bi;t be the associated estimated expected return from t to t+1 using the numeraire-portfolio method. Ri;t+1 is Ft+1- measurable and bi;t is Ft-measurable as it is obtained at time t from the option prices, which are Ft-measurable. Ifbi;t properly estimates Ri;t+1, then the conditional expectation of the di?erence of these two returns is zero, i.e., Et[Ri;t+1 bi;t] = 0, or furthermore, E[Ri;t+1 bi;t] = E[Et(Ri;t+1 bi;t)] = 0. The generalized method of moments (GMM) [33] can be employed to test this hypothesis. In the hypothesis test of GMM, the null hypothesis is H0 : E[u] = 0, where u is the orthogonality condition. In our study, let zt, which is Ft-measurable, be the instrumental variables (vector) that may a?ect Ri;t+1 or Ri;t+1 bi;t. Thus, the null hypothesis becomes E[(Ri;t+1 bi;t)zt] = 0. The asset i?s return Ri;t+1 can be similarly modeled by the classical asset return models, such as the Capital Asset Pricing Model (CAPM) [44] [60] [75], the Fama- French three-factor model [28], which establish the relation between assets?expected returns and their risk attributes. Among these models, one asset?s expected return and return itself are linearly determined by various factors which represent di?erent risks this asset is exposed to. The CAPM measures the asset?s return with its sensitivity to one factor, systematic risk or market risk. E[Ri;t] = rf;t + iM(E(RM;t)rf;t); where:  RM;t is the market return from t1 to t with expectation E(RM;t) 83  rf;t is the risk-free rate  iM measures the sensitivity of asset i?s expected return to the market return, and iM = Cov(Ri;RM)Var(R M) In CAPM, only a single factor iM is used to measure asset i?s expected return, which oversimpli?es the complicate situations in the market. Fama-French three- factor model introduces two more risk factors, ?rm size and book-to-market ratio which are represented by two classes of stocks, small cap stocks and value stocks (stocks with a high book-to-market ratio, BTM). These stocks tend to outperform the market. Fama and French include these two factors in their model to adjust assets?outperformance tendency: Ri;t = rf;t + iM(RM;t rf;t) + iS SMBt + iH HMLt + "i;t; where additionally to CAPM:  SMB represents ?Small Minus Big stocks,?which is the excess return of small stocks over big ones  HML represents ?High BTM Minus Low BTM,?which is the excess return of high BTM stocks over small ones  iS and iH measure the sensitivities to ?rm size and book-to-market ratio Besides the risk factor in these two traditional models, other factors are proposed by many people in academia and industry. MSCI Barra [82] suggests numerous fac- tors in their industrial models. We choose two factors in our analysis, asset?s own 84 performance and the in?uence from asset?s sector6. The third one is the market risk in CAPM and the Fama-French model. These returns associated with the three factors are also considered as the instrumental variables zt. Ri;t+1 is Ft+1-measurable. bi;t and the three returns, namely the market return RM;t, asset i?s return Ri;t, and the sector return RS;t are Ft-measurable. At time t, if the null hypothesis is chosen to be Et[(Ri;t+1 bi;t)zt] = 0; (2.10) then the linear regression model assumed in our study is Ri;t+1bi;t = 0 + iM(RM;trf;t)+ ii(Ri;trf;t)+ iS(RS;trf;t)+"i;t+1; (2.11) where:  RM;t is the market return at t  RS;t is the sector?s return at t  rf;t is the risk-free rate at t  ii re?ects asset i?s exposure to its own performance  iS measures asset i?s sensitivity to its sector  iM and "i;t+1 are the same as those in the CAMP and the Fama-French?s model 6The category an asset belongs to, such as Exxon in energy sector. 85  0 is the intercept. If bi;t properly measures asset i?s return, then, all the betas in Eq (2.11) should be zero. Under this scenario, the null hypothesis (2.10) can be rewritten as Et[zt(yt+1 xt )] = 0; where:  zt is the column return vector (1;RM;t rf;t;Ri;t rf;t;RS;t rf;t)  yt+1 = Ri;t+1 bi;t  is the column risk factor vector ( 0; iM; ii; iS) This is equivalent to the ordinary least square (OLS) model [80] y = X +". The estimator b = (XX)1Xy is the same as the OLS estimator. After is estimated through OLS, the F-test in OLS can be performed, where the null hypothesis is H0 : = 0. The acceptance of this null hypothesis implies the acceptance of the GMM null hypothesis (2.10). Linear regressions are employed on two pairs of return using Eq (2.11): one pair is the estimated returns bi;t and the associated realized returns Ri;t+1; the other is the estimated returns bi;t and the sample mean Ri;t+1. Both regressions are conducted for 50 stocks using 130 data points from 130 estimation days from January 1999 to October 2009. Let ti be the number of days to the next available maturity from the estimation day i on which i;t is estimated. Ri;t+1 is the associated realized return 86 during this time period, and Ri;t, rf;t, RS;t and RM;t are the realized returns during the time period back ti days from the estimation day i. Ri;t+1 is the sample mean calculated from the daily returns during the time period forward ti days from the estimation day i. All these returns are annualized. Two hypothesis tests are conducted to test the beta values: t test is to determine the signi?cance of each individual beat, with the null hy- pothesis H0 : i = 0, and the alternative hypothesis H1 : i 6= 0, where i is 0, iM, ii, or iS. F test is for the overall signi?cance of all the betas with the null Hypothesis tests H0 : 0 = iM = ii = iS = 0, and the alternative hypothesis H1 : one or more than one beta is not equal to zero. Both tests are performed with 95% con?dence level. The test results are displayed in Table 2.3 and 2.4. The results are also summarized below: t test: The hypothesis tests for both pairs indicates large portion of stocks have i = 0: Ri;t+1 bi;t: 76% for e ii, 90% for e iM, 72% for e iS, and 80% for 0. Ri;t+1 bi;t: 76% for e ii, 84% for e iM, 76% for e iS, and 66% for 0. F test: The F test shows that 34 out of 50 stocks?p-value is greater than 0.05, which means all the betas of these stocks have zero value with 95% con?dence level. Similar results are also attained using the Fama-French three-factor model. Thus, the numeraire-portfolio method provides a good approach to estimate expected re- turns. 87 Name indi. p-value indi. p-value indi. p-value indi. p-value ii ii iM iM iS iS 0 0 AAPL 0 0.138 0 0.799 0 0.854 1 0.004 ABT 0 0.612 0 0.328 0 0.513 0 0.348 AMGN 0 0.534 0 0.410 0 0.269 0 0.240 AXP 0 0.360 0 0.338 1 0.039 0 0.423 BA 1 0.005 0 0.422 1 0.033 0 0.371 BAC 0 0.052 0 0.887 0 0.309 0 0.804 C 0 0.770 0 0.867 0 0.707 0 0.366 CAT 0 0.195 0 0.157 0 0.067 0 0.260 CL 0 0.949 1 0.004 1 0.001 0 0.152 COP 0 0.543 0 0.632 0 0.678 0 0.111 CSCO 1 0.049 0 0.151 1 0.012 0 0.692 CVS 0 0.579 0 0.697 0 0.975 0 0.355 CVX 0 0.471 0 0.373 0 0.069 0 0.067 DD 1 0.006 0 0.122 1 0.001 0 0.992 DIS 1 0.042 0 0.300 0 0.881 0 0.405 DOW 0 0.506 0 0.333 0 0.367 0 0.963 EMR 0 0.944 0 0.334 0 0.811 0 0.633 F 0 0.875 0 0.573 0 0.471 0 0.117 GE 1 0.040 0 0.976 0 0.151 0 0.937 HAL 0 0.791 0 0.080 0 0.562 1 0.027 HD 1 0.018 0 0.290 1 0.006 0 0.966 HPQ 0 0.292 0 0.524 1 0.039 0 0.359 INTC 0 0.629 1 0.009 0 0.058 0 0.675 JNJ 0 0.488 0 0.099 1 0.001 0 0.137 KO 0 0.281 0 0.543 0 0.210 0 0.288 LLY 0 0.984 0 0.533 1 0.032 0 0.075 LOW 0 0.211 0 0.594 0 0.211 0 0.241 MCD 0 0.769 0 0.518 0 0.271 0 0.102 MMM 0 0.268 0 0.693 0 0.588 0 0.183 MO 1 0.010 0 0.845 1 0.014 1 0.001 MRK 0 0.665 0 0.576 0 0.725 0 0.703 MSFT 0 0.152 0 0.055 0 0.580 0 0.511 NKE 1 0.001 0 0.383 0 0.900 1 0.000 ORCL 1 0.000 1 0.033 0 0.385 0 0.314 OXY 0 0.085 0 0.626 0 0.779 1 0.000 PEP 1 0.003 0 0.735 0 0.189 1 0.029 PFE 1 0.025 0 0.935 1 0.025 0 0.958 PG 0 0.859 0 0.075 1 0.014 1 0.012 PNC 0 0.187 0 0.828 0 0.260 0 0.223 SLB 0 0.401 0 0.067 0 0.177 0 0.059 TGT 0 0.640 0 0.730 0 0.747 0 0.273 TXN 0 0.707 0 0.547 0 0.225 0 0.806 UNH 0 0.331 0 0.906 0 0.698 1 0.015 UNP 0 0.635 1 0.004 1 0.007 1 0.048 UTX 0 0.052 0 0.842 0 0.386 1 0.007 VZ 1 0.016 1 0.012 1 0.014 0 0.907 WAG 0 0.123 0 0.671 0 0.618 0 0.704 WFC 0 0.343 0 0.081 0 0.075 0 0.301 WMT 0 0.530 0 0.061 0 0.441 0 0.495 XOM 0 0.140 0 0.891 0 0.352 0 0.304 Table 2.3: t-test of each i (indi: = 0 represents i = 0, indi: = 1 represents i 6= 0) 88 Name indi. p-value Name inid. p-value Name inid. p-value AAPL 0 0.292 F 0 0.912 OXY 0 0.171 ABT 0 0.791 GE 0 0.146 PEP 1 0.028 AMGN 0 0.497 HAL 0 0.345 PFE 0 0.056 AXP 0 0.169 HD 1 0.008 PG 1 0.027 BA 1 0.020 HPQ 0 0.102 PNC 0 0.227 BAC 0 0.182 INTC 0 0.079 SLB 0 0.171 C 0 0.820 JNJ 1 0.002 TGT 0 0.714 CAT 0 0.320 KO 0 0.633 TXN 0 0.589 CL 1 0.001 LLY 1 0.009 UNH 0 0.796 COP 0 0.876 LOW 1 0.006 UNP 1 0.044 CSCO 0 0.097 MCD 0 0.417 UTX 0 0.238 CVS 0 0.930 MMM 0 0.213 VZ 1 0.013 CVX 0 0.105 MO 1 0.022 WAG 1 0.050 DD 1 0.014 MRK 0 0.792 WFC 0 0.339 DIS 0 0.169 MSFT 0 0.115 WMT 1 0.003 DOW 0 0.695 NKE 1 0.005 XOM 0 0.473 EMR 0 0.313 ORCL 1 0.003 Table 2.4: F test of (indi: = 0 represents = 0, indi: = 1 represents 6= 0) 2.5 Conclusion Expected returns are determined by future uncertainty, which can be represented by option prices. The numeraire-portfolio pricing method links expected returns to option prices. This method states that the numeraire-denominated option price is the conditional expectation of the numeraire-denominated terminal payo? under the physical measure, which contains the information of the expected return. The expected return is estimated by the option calibration and a statistical analysis on the results is performed. A couple of advantages of this method are summarized below: 1. Stocks are riskier than riskless assets such as the money market account. There- 89 fore, their expected returns should be higher than the risk-free rate. Otherwise, it does not make sense to invest in them. Realized returns representing the his- tory are so volatile that they may not accurately reveal expected returns. For example, they could be outperformed by the risk-free assets for a relatively long period: the stock market?s return was on average less than the risk-free asset for eleven years, from 1973 to 1984 [35]. Furthermore, conditions in markets may change overtime in the long run. Therefore, past average returns may not represent the current situation [6]. Expectedreturnsestimatedbythenumeraire-portfoliomethoduseoptionprices with a short period maturity, e.g., one month. Thus, the price information revealed is ?local?and it represents future uncertainty. The results in this study show the risk premiums are positive most of the time and more stable than the realized returns. Furthermore, the OLS regressions indicate that the di?erence of the estimated returns and the realized returns is indi?erent to two major risk factors for a large portion of assets. This result indicates that the estimated returns properly estimate the expected returns. 2. The traditional estimation using the CAMP and the Fama-French model re- quires the input of betas, the risk factors. However, there is no uniformly ac- cepted agreement what betas should be chosen. Academicians and practitioners try to ??ne gold?by data mining [6]. The uncertain input in the numeraire- portfolio method is the proxy of the numeraire portfolio. Numerous empirical studies show that market indices or equal-weighted/value-weighted portfolios 90 can serve as good proxies. Thus, there is less uncertain input in the numeraire- portfolio approach than the traditional methods. Therefore, our study demonstrates that the numeraire-portfolio approach provides a good estimation for expected returns. Future study may include the generalization of estimating the expected return. Option is one example that represents future uncertainty. The idea to estimate the expected return has two steps: ?rst, we need to ?nd any equity that can reveal future uncertainty; second, this equity contains the information of the expected return. Futures may be one of the candidates that satisfy the above criteria. 91 Chapter 3 A New Approach to Portfolio Selection 3.1 Introduction In ?nancial market, investors often face the question of how to allocate their wealth amongvarious assets, andinwhat sense. Modernportfolio theory(MPT), ?rst articu- lated by Markowitz [55] [56], provides selection principles for maximizing a portfolio?s expected return when ?xing its variance, or minimizing the variance for a ?xed level of expected return. These two principles formulate the e? cient frontier from which investors can choose their preferred portfolio with the optimal combination of gain (the expected return) and risk (de?ned as the standard deviation of return), where in- vestors?preference is the trade-o?between gain and risk. Another important concept is the diversi?cation. Because every asset is correlated with other assets, a properly constructed portfolio?s variance can be smaller than the sum of all assets?variances. 92 Thus, investors can reduce the risk with a diversi?ed portfolio instead of investing in individual asset. In the modern portfolio theory, asset returns are assumed to be multivariate Gaussian random variables. To optimize a portfolio, investors ?rst need to estimate each asset?s expected return, its variance, and the correlation to other assets. Portfolio selection is well developed both in theory and implementation. We refer to Elton and Gruber?s review paper [26], which provides literatures on each topic in the modern portfolio theory. However, various aspects are questioned in the modern portfolio theory. As dis- cussed in Chapter 1, empirical studies indicate individual asset?s return is not nor- mally distributed, which makes correlation complicated. Under this situation, covari- ance may not properly measure correlation. Another issue is in the implementation procedure. The return in the model input is the expected return, which is the predic- tion to asset?s future return. In practice, the expected return is estimated from the historical data, which does not necessarily provide a good prediction. Furthermore, as discussed in Chapter 2, the historical data is very volatile and it cannot give a relatively precise estimation. In this chapter, we propose some alternative approaches for these questions. A non-Gaussian law, the VG mixed distribution described in Chapter 1, is employed to model the marginal distributions of asset returns. This model well captures the skew- ness and excess kurtosis patterns exhibited in the data. The joint law is formulated by FGC, a simulation technique proposed by Malevergne and Sornette [50] [51], and summarized by Madan and Khanna [39]. The FGC transforms all marginal random numbers to standard normal random numbers, and then constructs the dependence 93 structure by the covariance matrix of the multivariate normal distribution, which is well de?ned to measure the correlation. Last, the expected return estimated by the numeraire-portfolio method introduced in Chapter 2 is employed. The estimation is conducted on option prices, which can be viewed as the prediction to asset?s future values. It is also demonstrated in Chapter 2 that this estimator is more stable than that from the historical data. Thus, the estimated expected return by the numeraire- portfolio method is expected to serve as a better and more precise estimator. Criteria in portfolio optimization are other issues to consider, which include the mathematical formulation for the optimization and the measures for portfolio evalua- tion. Traditionally, the utility function is the objective function and to be maximized in the optimization [56]. A variety of measures have been discussed to evaluate port- folios. The paper by Biglova et al. [5] provides a good review and also compares various measures (or risk estimations) that are all in the form of ratios between the expected return and certain risk measures. New criteria are proposed in this study. We construct a portfolio from the buyer?s side. To be a competitive player, the buyer should charge a price as minimal as possible, which is called the bid price [47]. This price is de?ned based on the ac- ceptability index developed by Cherny and Madan [14], and is the negative of the distorted expectation of the terminal payo?. Di?erent risk level leads to di?erent bid price. Given a risk level, the buyer can reach his or her highest pro?t by maximizing the bid price which also depends on the composition of the portfolio. Thus, the bid price, or the distorted expectation, serves as the objective function and the associated acceptability index evaluates portfolio?s performance. 94 Theoutlineoftherestofthechapterisasfollows. Section3.2introducethecriteria of the portfolio optimization, the bid price, and the acceptability index. In Section 3.3, the optimization problem is formulated and the numerical implementation and results are presented. Section 3.4 concludes with a summary. 3.2 Portfolio Evaluation - Acceptability Indices In this section, we start with the traditional utility function that leads to the concept of an acceptance set and the associated coherent risk measure, which measures the risk level of the acceptance set. The acceptability indices, in the sense of the coherent risk measure, is introduced, along with examples. Finally, given a ?xed acceptable level, the bid and ask price are described, which will be employed as the objective function in the portfolio optimization problem. 3.2.1 Acceptance Sets and Coherent Risk Measure In the classical portfolio optimization problem, an investor allocates wealth by maxi- mizing portfolio?s utility function. If he or she starts from a position with a zero-cost portfolio, any positions that will increase the portfolio?s terminal expected utility are acceptable by this investor. These positions form a convex set that contains nonneg- ative terminal cash ?ows. Every investor has his or her acceptable set depending on the preference, or the utility function. The set that is accepted by all investors is the intersection of all these sets, which is a convex cone. It is called the acceptance set [1], which also includes the nonnegative terminal cash ?ows. The acceptance sets 95 are studied by Artzner et al. [1] and Carr, Geman, and Madan [10]. The model is set up for the random variable X, the terminal cash ?ow of zero-cost trades, on a probability space ( ;F;P). The risk measure (X) for the acceptance sets, de?ned as a mapping from the set of risks to the real-line R, is connected with the acceptance set by a nonempty set of probability measures, denoted as D, which are equivalent to P [1] [10]. This risk measure is called coherent risk measure [1]. Delbaen [21] further indicates the coherent risk measure has the form (X) = inf Q2D EQ[X]; (3.1) and a trade X is acceptable when (X)  0. 3.2.2 Acceptability Indices Based on the axioms of the coherent risk measure in [1], Cherny and Madan [14] de?ne the ?index of acceptability,?a mapping from the class of bounded random variables X to the positive real line R+ = [0;1], which has the following four properties: 1. Monotonicity: If X is dominated by random variable Y, X  Y, then, (X)  (Y) 2. Scale Invariance: (X) stays the same when X is scaled by a positive number, (cX) = (X) for c > 0. 3. Quasi-concavity: 96 If (X)  and (Y)  , then, (X + Y)  4. Fatou Property (Convergence) Let fXng be a sequence of random variables. jXnj  1 and Xn converges in probability to a random variable X. If (Xn)  x, then (X)  x. (X) can be considered as the degree to measure the quality of terminal cash ?ow X, where bigger the value of (X), closer is X to the arbitrage. (X) = +1 represents arbitrage and all random variables in the acceptable cone are nonnegative. Under the above four conditions, a basic representation theorem is derived by Cherny and Madan [14], which connects acceptability indices (X) to family of prob- ability measures. Theorem 28 Representation Theorem of Acceptability Indices Let L1 = L1( ;F; eP) be the probability space of the bounded random variables X. (X) is an acceptability index which is a map : L1 ! [0;1] and satis?es the condition 1-4 if and only if there exists a family of subset fD :  0g of eP such that (X) = supf 2 R+ : inf Q2D EQ[X]  0g; (3.2) and fD :  0g is an increasing family of sets of probability measures, i.e., D  D  for  . Remark: (1) The probability measures in D are absolutely continuous with respect to the original probability measure P for X and each Q 2 D is equivalent to P. 97 Figure 3-1: Graphic illustration of the Representation Theorem (2) Eq (3.2) indicates that (X) = is the largest value that makes the ex- pectation of X positive under all probability measures in D . This can be roughly illustrated in Figure 3-1. (3) Recall the coherent risk measure (X) has the relationship in Eq (3.1) with acceptable sets if (X)  0. Then, the acceptability index (X) is linked with (X) by (X) = supf 2 R+ :  (X)  0g: Thus, (X) is the largest risk level that the cash ?ow X is acceptable, and the risk level is . (4) The sets of probability measures fD :  0g can be considered as pricing kernels, which will be discussed later. 98 3.2.3 WVAR Acceptability Indices There are many acceptability indices that satisfy the four conditions on page 96. The weighted VAR (WVAR) acceptability indices [14] are used in this study because of their computational feasibility. The WVAR has the following form: WVAR (X) = Z R xd (FX(x)); (3.3) where FX(x) is the cdf of random variable X. f :  0g is a set of increasing concave continuous functions with mapping : [0;1] ! [0;1], where (0) = 0 and (1) = 1. Furthermore, (y) increases in with ?xed y value. Thus, (y) can be viewed as a function to distort the cdf y = FX(x), adding more weights to the losses, which are the area when FX(x) is close to x or X decreases in negative values. (FX(x)) again is sees as a probability distribution function, and WVAR can be viewed as a distorted expectation of cash ?ow X. Apply Eq (3.3) to Eq (3.2), the WVAR acceptability index (X) is de?ned as (X) = supf 2 R+ : Z R xd (FX(x))  0g; (3.4) where (X) is the biggest such that the distorted expectation is still positive. The expectation of X is taken under a new probability measure Q 2 D by a measure change dQ dP = (FX(x)) where P is the original probability measure of X. D is the set of probability measures discussed in Section 3.2.2. When increases, also increases, which distorts cdf FX more and more to the left, or in another word, 99 gives more weight to the losses. Under this situation, if the distorted expectation of X still remains positive, it means the trading strategy X is more acceptable as it can survive the worse situations, and thus, a better performance. Therefore, (X) = is a performance measure for trade X. Higher , better the performance of trading strategy X. is also seen as a ?stress level?for the cash ?ow X, higher , more stressed of X. The computation of the distorted expectation is relatively simple. Given a sample x1;x2;:::;xN of cash ?ow X, the numerical formula is Z R xd (FX(x)) = NX i=1 x(i)  ( iN) (i1N )  ; (3.5) where fx(i)g are ordered values sorted increasingly, ( iN) is the empirical distri- bution function with ( iN) (i1N ) = 1N for all i = 1;2;:::N. Four WVAR indices are provided in [14], namely MINVAR, MAXVAR, MIN- MAXVAR, and MAXMINVAR. We?rstlookatasimplecase. LetY law= minfx1;x2;:::;x +1g, wherex1;x2;:::;x +1 are +1 independent draws from X. If the cdf of X is z = FX(x), then by the order statistics, the probability function of Y is given by (z) = 1(1z)1+ where z 2 [0;1];  0: (3.6) In this case, is the largest number of draws such that the expected value of the minimum of these draws still remains positive. Larger value of represents better 100 performance of trade X. This acceptability index is termed MINVAR as it?s related with the minimum from a number of independent draws. Now let us check how the distortion in MINVAR reweight the loss and gain. Di?erentiate the distortion function (3.6) d (FX(x)) dx = ( + 1)(1FX(x)) fX(x); where fX(x) is the pdf of X. This derivative shows MINVAR distortion adds more weighttolargelosses(whenFX(x) iscloseto 0)andreducesmoreweighttolargegains (when FX(x) is close to 1). However, large losses can not be reweighted to in?nitely large levels. Thus, a modi?ed MINVAR is considered: (z) = 1(1z 11+ )1+ with its di?erentiation d (FX(x)) dx = ( + 1)(1FX(x) 1 1+ ) FX(x) 1+ fX(x): Under this situation, the large losses can be reweighted in?nitely large and the large gains can be reweighted down to zero. This index is called MINMAXVAR, which will be implemented in this study. Details of the other two indices, MAXVAR and MAXMINVAR, can be found in [14]. 101 3.2.4 Bid and Ask Prices Given a ?xed acceptability index and level, we study the corresponding acceptable price for the terminal cash ?ow X, either from seller?s or buyer?s view, namely ask price and bid price [47], respectively. Let us derive a trading strategy from buyer?s side. If b is the bid price, then the buyer?s residual cash ?ow is X b. To be a competitive buyer in the market, he or she should o?er as much as possible. If the residual cash ?ow X b is acceptable (acceptable level is ) with certain acceptability index , then the competitive bid price is taken maximum and is de?ned as b (X) = supfb : (X b)  g: (3.7) For the bid price, by Theorem 28, we have inf Q2D EQ[X b]  0 , b  inf Q2D EQ[X]: By Eq (3.7), b = inf Q2D EQ[X]. Therefore, the bid price is the minimum of the distorted expectation of X among all Q 2 D with a ?xed level . If WVAR is chosen to be the acceptability index, by Eq (3.4), we have Z R xd (FXb(x))  0 , b + Z R xd (FX(x))  0: 102 By Eq (3.7), the competitive bid price is b (X) = Z R xd (FX(x)); (3.8) the distorted expectation of the terminal cash ?ow X. Similarly, if the seller has the residual cash ?ow a X with acceptable level under the acceptability index , where a is the ask price. The competitive ask price is the minimal price de?ned by a (X) = inffa : (aX)  g: Withthesimilarprocedure, giventheacceptablelevel , wecanderivethecompetitive ask price a = sup Q2D EQ[X]; and a (X) = Z R xd (FX(x)); if using WVAR as the acceptability index. 103 3.3 Numerical Implementation and Results 3.3.1 Trading Strategy A trading strategy over a single period is implemented. As a buyer, we construct a stock-only portfolio at time 0 with an bid price b and the maturity t. All the payo?or cash ?ow X is delivered at the maturity. The resulting residual cash ?ow Xb is set to be at acceptable level with WVAR the acceptability index . A buyer maximize the distorted expectation, or the bid price b, which turns out to be Eq (3.8) derived in Section 3.2.4. Di?erent weights of assets result in di?erent trading strategies or the ?nal cash ?ows X. Thus, this maximized distorted expectation depends on the weights of assets. Optimal weight leads to a maximal distorted expectation for the buyer. Let w = (w1;:::;wi;:::;wn) and R = (R1;:::;Ri;:::;Rn) be the weight and return vectors of the portfolio, where n is the total number of stocks. The cash ?ow x is de?ned as x = w R, which is the portfolio?s return. The portfolio optimization problem is formulated as follows: max w Z R xd (FX(x)); (3.9) s:t: n i=1 wi = 1; 1  wi  1: Objective function in (3.9) can be numerically computed by Eq (3.5), in which N samples of x are obtained by simulation. x = wR = n i=1 wiRi, where Ri has the VG 104 mixed distribution [22]. The stock?s log return is given by Eq (1.39) reformulated as ri = ln StiS 0i = it + X(i)VGMixed(t) + !it; (3.10) where i is the expected return which can be estimated by the numeraire-portfolio method. The stock?s return Ri then equals to exp(ri)1. 3.3.2 Procedure and Results The results and data obtained from Chapter 2 are employed in this study. The trading strategy is implemented on the 130 estimation days in Chapter 2, ranging from January 1999 to October 2009. On each day, a portfolio containing the ?rst 50 stocks from the S&P 500 (SPX) is constructed, and the holding period is the same as the time-to-maturity of the options used in the calibration. The FGC in Algorithm 27 is implemented to simulate the multivariate random variables r = (r1;:::;ri;:::;rn) with VG mixed distribution as the marginal law for each ri. The input of FGC, the VG parameters (i;i;i) from the marginal daily return, the expected return i estimated from the numeraire-portfolio method (bi), and the parameters (ci; i) in the VG mixed distribution at time horizon t, are all obtained from the results in Chapter 2. Thus, the only parameters left in the objective function (3.9) are the weights, which are attained from optimization. The optimal portfolio is named the estimated-return portfolio (ERP). Two reference portfolios are constructed to compare to the estimated-return port- folio. They use the same input parameters as the estimated-return portfolio except 105 the one for the expected return i: one using the realized return Ri is named the realized-return portfolio (RRP); the other using the sample mean Ri is named the mean-return portfolio (MRP). On each of the 130 estimation days, these two portfo- lios are attained by the objective function (3.9). The actual return of the optimized portfolio at the maturity can be calculated by n i=1 bwifi, where bwi is stock i?s weight in the optimized portfolio, and fi is stock i?s actual return during the period from the estimation day to maturity. The actual return for each of the three optimized portfolios is calculated. Besides comparing to the two reference portfolios, we are also interested in com- paring the performance of the estimated-return portfolio to the market index, which is SPX in this study. The adjustment for SPX?s return is required before comparison due to the leverage e?ect. In the estimated-return portfolio, the weight ranges from 1 to 1 for each stock and there are 50 stocks in the portfolio. The weight in SPX can be considered as 1. Thus, the estimated-return portfolio is leveraged compared to SPX. To set them to the same leverage level, the return of SPX is multiplied by some ratios, which are de?ned as follow: lpos = n i=1 bw+i and lneg = n i=1 bwi ; where bw+i =  bw i; if bwi > 0 0; otherwise bwi =  bw i; if bwi < 0 0; otherwise: 106 We have two leveraged returns for SPX R+SPX = lpos RSPX; RSPX = jlnegjRSPX; where RSPX is SPX?s actual return during the period from the estimation day to maturity. These two leveraged SPX are named positive-leveraged SPX and negative- leveraged SPX. The three optimized portfolios are constructed at di?erent risk levels, namely = 0:05; 0:10;0:15; 0:20; and 0:25. MINMAXVAR is employed as the acceptability index. At each risk level, cumulative returns are calculated to compare the performances of the ?ve portfolios, namely the estimated-return portfolio, the realized-return port- folio, themean-returnportfolio, thepositive-leveragedSPX,andthenegative-leveraged SPX. The results of the cumulative returns are graphed in Figure 3-2 to Figure 3-6 for each risk level. The cumulative returns of the estimated-return portfolio at di?erent risk levels is shown in Figure 3-7 to check the e?ect of risk level on its performance. The statistics of the returns for each portfolio at every risk level are also displayed in Table 3.1 and 3.2. All the returns are annualized before the analysis. Itisobservedfromthe?guresandalsocon?rmedfromthetablesthattheestimated- return portfolio is superior to the other two reference portfolios, and its performance is even better than SPX at all risk level except = 0:05. Furthermore, Table 3.2 also shows the return of the estimated-return portfolio is less volatile than SPX, which means it is mean-variance optimal than SPX. 107 Figure 3-2: Cumulative returns of the ?ve portfolios ( = 0:05) 108 Figure 3-3: Cumulative returns of the ?ve portfolios ( = 0:10) 109 Figure 3-4: Cumulative returns of the ?ve portfolios ( = 0:15) 110 Figure 3-5: Cumulative returns of the ?ve portfolios ( = 0:20) 111 Figure 3-6: Cumulative returns of the ?ve portfolios ( = 0:25) 112 Figure 3-7: Estimated-return portfolio at di?erent risk level (AIX=MINMAXVAR) 113 ER P R R P M R P SPX + SPX - = 0:05 -0.0176 -0.031 -0.0319 0.0156 0.015 = 0:10 -0.0175 -0.0873 -0.0824 0.0156 0.0174 = 0:15 0.0078 -0.2722 -0.2272 0.0156 0.0226 = 0:20 0.0264 -0.4262 -0.3671 0.0156 0.0193 = 0:25 0.0394 -0.3704 -0.3829 0.0156 0.0252 Table 3.1: Mean of portfolio return at di?erent risk level (January 1999 - October 2009) ER P R R P M R P SPX + SPX - = 0:05 0.2895 0.4373 0.4159 0.5964 0.5666 = 0:10 0.2740 0.9312 0.9438 0.5964 0.5406 = 0:15 0.2797 2.4156 2.5823 0.5964 0.4539 = 0:20 0.3155 3.4108 3.5483 0.5964 0.3989 = 0:25 0.3599 3.9118 3.9843 0.5964 0.3429 Table 3.2: Std. of portfolio return at di?erent risk level (January 1999 - October 2009) 3.4 Conclusion A new method is proposed to the classic portfolio selection problem. Several new approaches inthis methodare employed: Portfolios are constructedinanon-Gaussian environment; the FGC technique is employed to construct the complicate dependence structure; a new estimator for expected return is used, which is expected to provide 114 a better and precise estimation; ?nally, new criteria, the distorted expectation and the acceptability index, are employed to mathematically formulate the optimization and evaluate the portfolio performance. Three kinds of portfolios are built with the same setting in the portfolio selection procedure, except the estimator for the expected return input. These portfolios are compared to two leveraged SPX. Comparison is conducted at each of the ?ve risk levels for the ?ve portfolios. We observed that the estimated-return portfolio out- performs the other two reference portfolios, which have consistent loss This indicates the estimated return using the numeraire-portfolio method is an e?ective estimator for the asset?s expected return, compared to the estimators from the historical data. Furthermore, this estimator may also serve as a good input in the portfolio optimiza- tion at higher acceptable level ( is near or above 0:20) because, the portfolio has the similar performance as the market index at these levels. Future work following this study can be the portfolio performance testing under various scenarios, including di?erent optimization criteria, e.g., utility function as the objective function, more variety of components in portfolios, such as options and bonds. The purpose is to ?nd out whether, in more general scenario, the numeraire- portfolio method provides an e?ective estimator for the expected return in portfolio selection. 115 Bibliography [1] Artzner, P., F. Delbaen, J. Eber, and D. Heath (1999). Coherent Measures of Risk, Mathematical Finance, 9: 203?228. [2] Bachelier, L. (1900), Th?orie de la sp?culation, Annales de l?Ecole Normale Sup?rieure, 17: 21?86. (English translation: Cootner, P., ed. 1964, The random character of stock market Prices, MIT Press: Cambridge, MA). [3] Barndor?-Nielsen, O. E. (1997). Normal inverse Gaussian distributions and sto- chastic volatility modelling, Scandinavian Journal of statistics, 2: 41?68. [4] Bartholdy, J. and P. Peare (2005). Estimation of expected return: CAPM vs. Fama and French, International Review of Financial Analysis, 14: 407?427 . [5] Biglova, A., S. Ortobelli, S. Rachev and S. Stoyanov (2004). Di?erent approaches to risk estimation in portfolio theory, Journal of Portfolio Management, 31: 103? 112. [6] Black, F. (1995). Estimating expected return, Financial Analysts Journal, 51: 168?171. 116 [7] Black, F. and M. Scholes (1973), The pricing of options and corporate liabilities, Journal of Political Economy, 81: 637?654. [8] Breeden, D. (1979). An intertemporal asset pricing model with stochastic con- sumption and investment opportunities, Journal of Financial Economics, 7: 265? 296. [9] B?hlmann, H. and E. Platen (2002). A discrete time benchmark approach for ?nance and insurance, Working paper. [10] Carr, P., H. Geman, and D. B. Madan (2001). Pricing and hedging in incomplete markets. Journal of Financial Economics, 62:131?67. [11] Carr, P., H. Geman, D. B. Madan, and M. Yor (2002). The ?ne structure of asset returns: An empirical investigation, Journal of business, 75: 305?332. [12] Carr, P., H. Geman, D. B. Madan, and M. Yor (2007). Self-decomposability and option pricing, Mathematical Finance, 17: 31?57. [13] Carr, P. and D. B. Madan (1998). Option valuation using the fast Fourier trans- form, Journal of Computational Finance, 2: 61?73. [14] Cherny, A. and D.B. Madan (2009), New measures for performance evaluation, Review of Financial Studies, 22: 2571?2606. [15] Claus, J. and J. Thomas (2001). Equity premia as low as three percent? Evidence from analysts?earnings forecasts for domestic and international stock markets, Journal of Finance, 56: 1629?1666. 117 [16] Cochrane, J. H. (2001). Asset Pricing, Princeton University Press. [17] Cont, R. (2001). Empirical properties of asset returns: stylized facts and statis- tical issues, Quantitative Finance, 1: 223?236. [18] Cont, R., J.-P. Bouchaud, and M. Potter (1997). Scaling in stockmarket data: stable laws and beyond, in Scale Invariance and Beyond, Dubrulle, B., F. Graner, and D. Sornette, eds., Springer: Berlin. [19] Cont, R. and P. Tankov (2004). Financial modelling with jump processes, Chap- man & Hall/CRC. [20] Cox, J. C. and S. A. Ross (1976). The valuation of options for alternative sto- chastic processes, Journal of Financial Economics, 3: 145?166. [21] Delbaen, F. (2002). Coherent risk measures on general orobability spaces, in K. Sandmann and P. Sch?nbucher (eds.), Advances in Finance and Stochastics: Essays in Honor of Dieter Sondermann, Berlin: Springer 1?37. [22] Eberlein, E. and D. B. Madan (2010). The distribution of returns at longer hori- zons, Working paper. [23] Eberlein, E. and F. ?kzan (2003). Time consistency of L?vy models, Quantitative Finance, 3: 40?50. [24] Eberlein, E., and K. Prause, (2000). The generalized hyperbolic model: ?nancial derivatives and risk measures, Mathematical Finance-Bachelier Congress. 118 [25] Edwards, E. O. and P. W. Bell (1961). The theory and measurement of business income, New York, John Wiley and Sons. [26] Elton, E. J. and M. J. Gruber (1997). Modern portfolio theory, 1950 to date, Journal of Banking & Finance, 21: 1743?1759. [27] Embrechts, P., F. Lindskog, and A. McNeil (2003). Modelling dependence with copulas and applications to risk management, in Rachev, S. (Eds.) Handbook of Heavy Tailed Distributions in Finance, Elsevier, Chapter 8: 329?384. [28] Fama, E. F. and K. R. French (1993). Common risk factors in the returns on stocks and bonds, Journal of Financial Economics 33: 3?56. [29] Fama, E.F.andJ.D.MacBeth(1974). Long-term growth in a short-term market, Journal of Finance, 29: 857?885. [30] Gabaix, X., P. Gopikrishnan, V. Plerou, and H. E. Stanley (2003). A theory of power-law distributions in ?nancial market ?uctuations, Nature, 423: 267?270. [31] Galloway, M.L., andC.A.Nolder(2007).Option pricing with selfsimilar additive processes, Working paper. [32] Gopikrishnan, P., V. Plerou, L. A. Nunes Amaral, M. Meyer, and H. E. Stanley (1999). Scaling of the distribution of ?uctuations of ?nancial market indices, Physical Review E, 60: 5305?5316. [33] Hansen, L. P. (1982). Large sample properties of generalized method of moments estimators, Econometrica, 50: 1029?1054. 119 [34] Harrison, J. M. and D. M. Kreps (1979). Martingales and arbitrage in multiperiod securities markets, Journal of Economic Theory, 20: 381?408. [35] IbbotsonAssociates(1995).Stocks, bonds, bills and in?ation, Yearbook, Ibbotson Associates, Chicago, Ill. [36] Jacod, J. and A. N. Shiryaev (1987). Limit theorems for stochastic processes, Berlin: Springer-Verlag. [37] Keller, U. (1997). Realistic modelling of ?nancial derivatives, Dissertation. Math- ematische Fakult?t der Albert-Ludwigs-Universit?t Freiburg im Breisgau. [38] Kelly, J. R., Jr. (1956). A new interpretation of information rate, Bell System Technical Journal, 35: 917?926. [39] Khanna, A. and D. B. Madan (2009). Non Gaussian models of dependence in returns, Working paper. [40] Khintchine, A. Y. (1938). Limit laws of sums of independent random variables, ONTI, Moscow, Russia. [41] Knight, F. B. (2001). On the path of an inert object impinged on one side by a Brownian particle, Probability Theory Related Fields 121: 577?598. [42] L?vy, P. (1937): Th?orie de l?Addition des Variables Al?atoires, Paris: Gauthier- Villars. [43] Li, D. X. (2000). On default correlation: a copula function approach, Journal of Fixed Income, 9: 43?54. 120 [44] Lintner, J. (1965). The valuation of risk assets and the selection of risky invest- ments in stock portfolios and capital budgets, Reviewof Economics and Statistics, 47: 13?37. [45] Long, J. B. (1990). The numeraire portfolio, Journal of Financial Economics, 26: 29?69. [46] Lucas, R. E., Jr. (1978). Asset prices in an exchange economy, Econometrica, 46: 1429?1445. [47] Madan, D. B. (2010). Pricing and hedging basket options to prespeci?ed levels of acceptability, Quantitative Finance, 10: 607?615. [48] Madan, D., P. Carr, and E. Chang (1998). The variance gamma process and option pricing, European Economic Review, 2: 79?105. [49] Madan, D. and E. Seneta, (1990). The variance gamma (VG) model for share market returns, Journal of Business, 63: 511?524. [50] Malevergne, Y. and D. Sornette (2004). Multivariate Weibull distributions for asset returns: I, Finance Letters, 2: 16?32. [51] Malevergne, Y. and D. Sornette (2005). High-order moments and cumulants of multivariate Weibull asset return distributions: Analytical theory and empirical tests: II, Finance Letters, 3: 54?63. [52] Mandelbrot, B. (1963). The variation of certain speculative prices, Journal of business, 36: 394?419. 121 [53] Mandelbrot, B. (1997). Fractals and scaling in ?nance: discontinuity, concentra- tion, risk: selecta volume E. [54] Mantegna, R. N., and H. E. Stanley (1995). Scaling behaviour in the dynamics of an economic index, Nature, 376: 46?49. [55] Markowitz, H. M. (1952). Portfolio selection, Journal of Finance, 7: 77?91. [56] Markowitz, H. M. (1959). Portfolio selection: e? cient diversi?cation of invest- ments, John Wiley & Sons., New York. [57] Mehra, R. and E. C. Prescott (1985). The equity premium: a puzzle, Journal of Monetary Economics, 15: 145?161. [58] Merton, R. C. (1973). Theory of rational option pricing, Bell Journal of Eco- nomics and Management Science, 4: 141?183. [59] Merton, R. C. (1976). Option pricing when underlying stock returns are discon- tinuous, Journal of Financial Economics, 3: 125?144. [60] Mossin, J. (1966). Equilibrium in a Capital Asset Market, Econometrica, 34: 768?783. [61] Nelsen, R. B. (1999). An introduction to copulas, Springer-Verlag, New York. [62] Ohlson, J. (1995). Earnings, book values and dividends in security valuation, Contemporary Accounting Research, 11: 661?687. [63] Peters, E. E. (1999). Complexity, risk and ?nancial markets, Wiley, New York. 122 [64] Philips, T. K. (2003). Estimating expected returns, Journal of Investing, 12: 49? 57. [65] Platen E. and D. Heath (2006). A Benchmark Approach to Quantitative Finance, Springer Finance, Springer. [66] Platen, E.(2009).A benchmark approach to investing and pricing, workingpaper. [67] Prause, K. (1999). The generalized hyperbolic model: Estimation, ?nancial deriv- atives, and risk measures, Doctoral Thesis, University of Freiburg. [68] Preinreich, G. (1938). Annual survey of economic theory: The theory of depreci- ation, Econometrica, 6: 219?241. [69] Roll, R. (1973). Evidence on the ?Growth-Optimum?model, Journal of Finance, 28: 551?66. [70] Roll, R. (1977). A critique of the asset pricing theory?s tests Part I: On past and potential testability of the theory, Journal of Financial Economics, 4: 129?176. [71] Rubinstein, M.(1976).The valuation of uncertain income streams and the pricing of options, Bell Journal of Economics, 7: 407?425. [72] Sato, K. (1999): L?vy processes and in?nitely divisible distributions, Cambridge: Cambridge, University Press. [73] Schoutens, W. (2001). The Meixner processes in ?nance, EURANDOM Report 2001?2001, EURANDOM, Eindhoven. 123 [74] Schoutens, W. (2003). L?vy processes in ?nance: pricing ?nancial derivatives, John Wiley & Sons Inc. [75] Sharpe, W. F. (1964). Capital asset prices: A theory of market equilibrium under conditions of risk, Journal of Finance, 19: 425?442. [76] Sklar, A. (1959). Fonctions de r?partition ? n dimensions et leurs marges, Pub- lications de I?Institut de Statistique de L?Universit? de Paris, 8: 229?231. [77] Vasicek, O.(1977).An equilibrium characterization of the term structure, Journal of Financial Economics, 5: 177?188. [78] Walker, J. S. (1996). Fast Fourier Transforms, CRC Press, Boca Raton, Florida. [79] Welch, I. (2000). Views of ?nancial economists on the equity premium and on professional controversies, The Journal of Business, 73: 501?537. [80] http://en.wikipedia.org/wiki/Generalized_method_of_moments [81] http://www.?tw.org [82] http://www.mscibarra.com [83] http://wrds.wharton.upenn.edu 124