ABSTRACT Title of Document: CARBON CYCLE DATA ASSIMILATION USING A COUPLED ATMOSPHERE- VEGETATION MODEL AND THE LOCAL ENSEMBLE TRANSFORM KALMAN FILTER Ji Sun Kang, Doctor of Philosophy, 2009 Directed By: Professor Eugenia Kalnay Department of Atmospheric and Oceanic Science We develop and test new methodologies to best estimate CO2 fluxes on the Earth?s surface by assimilating observations of atmospheric CO2 concentration, using the Local Ensemble Transform Kalman Filter. We perform Observing System Simulation Experiments and assimilate simultaneously atmospheric observations and atmospheric carbon observations, but no surface fluxes of carbon. For the experiments, we modified an atmospheric general circulation model to transport atmospheric CO2 and coupled this model with a dynamical terrestrial carbon model and a simple physical land model. The state vector of the model prognostic variables was augmented by the diagnosed carbon fluxes CF, so that the carbon fluxes were updated by the background error covariance with other variables. We designed three types of analysis systems: a C-univariate system where CF errors are coupled only with CO2, a multivariate system where all the error covariances are coupled, and a one-way multivariate analysis where the wind is included in the carbon error covariance, but there is no feedback on the winds. With perfect model experiments, the one-way multivariate analysis has the best results in CO2 analysis. For the imperfect model experiments, we applied techniques of model bias correction and adaptive inflation. With those, we obtained a high-quality analysis of surface CO2 fluxes. Furthermore, the adaptive inflation technique also provides a good estimate of observation errors. A new approach in the multivariate data assimilation with ?variable localization?, where the error correlations between unrelated variables are zeroed-out further improved the multivariate analyses surface CO2 fluxes. We note that with the simultaneous assimilation of winds and carbon variables, we are able to transport atmospheric CO2 with winds as well as, for the first time, couple their error covariances. As a result, the multivariate systems perform well, and do not require any kind of a-priori information that should be pre-calculated by independent observations or model simulations. The many new techniques that we developed and tested put us on a solid basis to tackle the assimilation of real atmospheric and CO2 observations, a project being carried out collaboratively by Dr. Junjie Liu under the direction of Prof. Inez Fung at UC Berkeley. CARBON CYCLE DATA ASSIMILATION USING A COUPLED ATMOSPHERE- VEGETATION MODEL AND THE LOCAL ENSEMBLE TRANSFORM KALMAN FILTER By Ji Sun Kang Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park, in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2009 Advisory Committee: Professor Eugenia Kalnay, Chair/Advisor Professor Ning Zeng, Co-Chair Professor Inez Fung Professor Brian Hunt Professor Kayo Ide Professor Ross Salawitch ? Copyright by Ji Sun Kang 2009 ii Acknowledgements Faculty, friends, and family members have helped me to complete this dissertation. I would like to express my gratitude to these individuals for their support and assistance. First of all, I would like to gratefully and sincerely thank my advisor, Prof. Eugenia Kalnay, for her guidance, understanding, and friendship during my graduate studies. She supported and encouraged me to be able to enjoy research. I have learned abundant things, which is beyond description, from her. Also, my thanks go to Prof. Ning Zeng who serves as co-chair of my dissertation and provides the VEGAS and SLand models for this study. I am grateful to Prof. Kayo Ide for invaluable discussion on my work. I truly appreciate Prof. Inez Fung who helped me understand the issue on the carbon cycle at the early stage of this work. I gladly express my gratitude to Prof. Brian Hunt and Prof. Ross Salawitch for insightful discussion. I thank to members in the ?Weather-Chaos Group?, especially Dr. Takemasa Miyoshi, David Kuhl, Steven Greybush. I also want to thank Dr. Junjie Liu for providing the source code of LETKF for the meteorological fields and cheering me up during my hard times. I am thankful to Dr. Hong Li and Prof. Shu-Chih Yang for helping me make steady progress toward the current point. I am indebted to Dr. Jin- Ho Yoon especially for kindly answering my questions about a carbon cycle modeling with VEGAS. Besides, I was also greatly inspired by the students that I met in the classes of our department. iii Finally, and most importantly, I want to thank my family. My parents, Nam- Hyun Kang and Hee-Ja Ahn, and my brother, Tae-Jun Kang, always trust and encourage me. I am sure that all I have done here would not have been possible without their support. I am blessed with a good family. iv Table of Contents Acknowledgements ....................................................................................................... ii Table of Contents ......................................................................................................... iv List of Tables ............................................................................................................... vi List of Figures ............................................................................................................. vii Chapter 1: Introduction ................................................................................................. 1 1.1. Problems in the estimation of surface CO2 fluxes ............................................. 5 1. 2. SPEEDY-C and SPEEDY-VEGAS ................................................................ 13 1.3. LETKF for carbon cycle data assimilation ...................................................... 14 1.3.1. Formulation of LETKF ............................................................................. 14 1.3.2. Carbon cycle data assimilation: multivariate vs. univariate analyses ....... 16 1.3.3. Analysis in a presence of model error: bias correction and adaptive inflation ............................................................................................................... 17 1.4. Outline of the thesis ......................................................................................... 19 Chapter 2: Carbon Cycle Data Assimilation in the Perfect Model Simulation Using SPEEDY-C ................................................................................................................. 20 2.1. Introduction ...................................................................................................... 20 2.2. Model: SPEEDY-C .......................................................................................... 21 2.3. Three types of data assimilation techniques .................................................... 23 2.3.1. Carbon-univariate data assimilation ......................................................... 23 2.3.2. One-way multivariate data assimilation .................................................... 24 2.3.3. Multivariate data assimilation ................................................................... 26 2.4. Experimental Design ........................................................................................ 27 2.4.1. ?ALL LEVELS? experiment .................................................................... 30 2.4.2. ?OCO + AIRS? experiment ...................................................................... 31 2.4.3. ?OCO? experiment.................................................................................... 32 2.5 Results ............................................................................................................... 32 2.5.1. Performance of SPEEDY-C ...................................................................... 32 2.5.2. Analysis of CO2 variables with LETKF ................................................... 34 2.5.2.1. ?ALL LEVELS? experiment ................................................................. 34 2.5.2.2. ?OCO + AIRS? experiment and ?OCO? experiment ............................ 42 2.6. Summary and discussions ................................................................................ 44 Chapter 3: Coupling SPEEDY-C to VEGAS with SLand .......................................... 46 3.1. Introduction ...................................................................................................... 46 3.2. Methods............................................................................................................ 48 3.2.1. Interface among atmosphere, vegetation, and land models ...................... 48 3.2.2. Additional boundary conditions ................................................................ 50 3.2.3. Adjustment of land-sea mask .................................................................... 50 3.2.4. Soil moisture and tropical rainfall over land ............................................. 51 3.3. Spin-up ............................................................................................................. 54 3.4. Results .............................................................................................................. 55 3.4.1. Offline Land-Vegetation spin-up run........................................................ 55 3.4.2. Fully coupled atmosphere-vegetation-land spin-up run ........................... 55 v 3.5. Summary .......................................................................................................... 62 Chapter 4: Imperfect Model Simulation: Bias Correction, Adaptive Inflation and Estimation of Observation Errors ............................................................................... 64 4.1. Introduction ...................................................................................................... 64 4.2. Experimental Design ........................................................................................ 65 4.3. Bias correction ................................................................................................. 66 4.3.1. Methodology ............................................................................................. 66 4.3.2. Results from the bias correction experiment ............................................ 69 4.4. Adaptive inflation ............................................................................................ 74 4.4.1. Methodology for variable having observation .......................................... 75 4.4.2. Adaptive inflation for a variable having no observation .......................... 79 4.4.3. Results from the adaptive inflation technique .......................................... 82 4.5. Summary .......................................................................................................... 92 Chapter 5: Application of Adaptive Inflation and Estimation of Observation Errors to the Perfect Model Simulation ..................................................................................... 94 5.1. Introduction ...................................................................................................... 94 5.2. Experimental Design ........................................................................................ 94 5.3. Results .............................................................................................................. 95 5.4. Summary and discussion................................................................................ 108 Chapter 6: New Approach for Multivariate Data Assimilation in Carbon Cycle Data Assimilation .............................................................................................................. 109 6.1. Introduction .................................................................................................... 109 6.2. New approach for multivariate data assimilation .......................................... 110 6.2.1. New 1-way multivariate data assimilation with variable localization .... 111 6.2.2. New multivariate data assimilation with variable localization ............... 111 6.3. Experimental Design ...................................................................................... 113 6.4. Results ............................................................................................................ 113 6.4.1. Perfect model simulation with variable localization ............................... 114 6.4.2. Imperfect model simulation with variable localization .......................... 117 6.5. Summary and discussion................................................................................ 127 Chapter 7: Summary and Lessons Learned ............................................................. 130 7.1. Development of SPEEDY-C and SPEEDY-VEGAS .................................... 130 7.2. C-univariate vs. multivariate data assimilation with a fixed inflation factor . 131 7.3. Bias correction, adaptive inflation and observation error estimation ............ 133 7.4. Variable localization in the multivariate data assimilation ............................ 134 7.5. Future plans .................................................................................................... 137 Bibliography ............................................................................................................. 141 vi List of Tables Table 4. 1. Estimated observation error standard deviations, using the OMB2 method (results after two months of analysis). ............................................................... 83 Table 4. 2. Estimated observation error standard deviations, using the AMB*OMB method. (results after two months of analysis) .................................................. 83 Table 5. 1. Estimated observation error standard deviations in C-univariate analysis result after two months of analysis). ................................................................ 103 Table 5. 2. Same as Table 5.1, except for one-way multivariate analysis (result after two months of analysis). .................................................................................. 103 Table 6. 1. Estimated observation error standard deviations in new one-way multivariate analysis ........................................................................................ 123 Table 6. 2. Same as Table 6.1, except for C-univariate data assimilation ............... 123 Table 6. 3. Same as Table 6.1, except for multivariate data assimilation ................ 123 Table 7. 1. Comparison of all methods: C-univariate (C-uni), Multivariate (Multi), Multivariate with a variable localization between winds and surface CO2 fluxes (Multi-L), 1-way multivariate (1way), 1-way multivariate with a variable localization (1way-L). W indicates the wind fields, and O includes temperature, specific humidity, and surface pressure, C is the variable of atmospheric CO2 and CF the surface CO2 fluxes. The contents of the table present the error information used for the analysis of each variable (W, O, C, CF) in each method (C-uni, Multi, Multi-L, 1way, 1way-L). .......................................................... 136 vii List of Figures Figure 1. 1. Time Series of atmospheric CO2 concentration: (Samiento and Gruber, 2002; Barnola, 1999, and Keeling et al., 2000) ................................................... 1 Figure 1. 2. Time series of Growth rate in atmospheric CO2 (red) for past 25 years, and increase rate of released CO2 by fossil fuel emission (yellow). Shaded green portion stands for the total surface CO2 uptake and blue vertical shading indicates El Nino years. ....................................................................................... 2 Figure 1. 3. FLUXNET network which measures the exchanges of CO2 between terrestrial ecosystems and the atmosphere at flux tower sites (http://www. fluxnet.ornl. gov/fluxnet/index.cfm). ................................................................... 6 Figure 1. 4. Sampling location of GLOBALVIEW- CO2 (http://www.esrl.noaa.gov/gmd/ ccgg/globalview/co2/co2_sites.html; CO2 concentration measurements near the surface) .................................................... 6 Figure 2. 1. Schematic plots of the background error covariance matrix for (a) C- univariate, (b) 1-way multivariate, (c) multivariate data assimilations. (C: atmospheric CO2, CF: surface CO2 fluxes)........................................................ 25 Figure 2. 2. (a) A true state of surface CO2 fluxes which includes only anthropogenic emission as a constant forcing with time, (b) Initial condition of surface CO2 fluxes .................................................................................................................. 28 Figure 2. 3. Distribution of rawinsonde observation network. .................................. 30 Figure 2. 4. Representative vertical averaging kernels for column CO2 soundings using near IR absorption of reflected sunlight in the 1.61-lm CO2 band (blue) and thermal IR emission near 14.3 lm (red). Thermal IR soundings are less sensitive to near-surface CO2 because of the small surface?atmosphere temperature contrast. (Crisp et al., 2004) ........................................................... 31 Figure 2. 5. Comparison of SPEEDY-C with NCAR CCM: annual mean of atmospheric CO2 concentration in the surface layer for third year in (a) SPEEDY-C, and (b) NCAR CCM, and the vertical cross section of zonal mean at the beginning of 3rd year simulation in (c) SPEEDY-C, and (d) NCAR CCM. The experiment starts from zero state of CO2 in the atmosphere with a fossil fuel emission of 6 PgC/yr. (unit: ppmv) ............................................................ 33 Figure 2. 6. RMS error of analysis from three types of data assimilation: uncoupled (green), multivariate (blue), and one-way multivariate (red) data assimilation for (a) U (m/s), (b) V (m/s), (c) T (K), (d) q (kg/kg), (e) atmospheric CO2 (ppmv) on the lowest layer of model, and (f) surface CO2 fluxes (10-8 kg/m2/s). .......... 35 Figure 2. 7. (a) True state of atmospheric CO2 on the lowest layer of model after two months from the start of analysis, and the resultant analysis of it from (b) uncoupled data assimilation under ALL LEVELS experiment, (c) multivariate data assimilation under ALL LEVELS experiment, (d) one-way multivariate data assimilation under ALL LEVELS experiment, (e) one-way multivariate data assimilation under OCO+AIRS experiment, and (f) one-way multivariate data assimilation under OCO experiment. The number in the left ?bottom of each figure is RMS error [unit: ppmv] ............................................................. 36 viii Figure 2. 8. (a) True state of surface CO2 fluxes after two months from the start of analysis, and the resultant analysis of it from (b) uncoupled data assimilation under ALL LEVELS experiment, (c) multivariate data assimilation under ALL LEVELS experiment, (d) one-way multivariate data assimilation under ALL LEVELS experiment, (e) one-way multivariate data assimilation under OCO+AIRS experiment, and (f) one-way multivariate data assimilation under OCO experiment. The number in the left ?bottom of each figure is RMS error [unit: 10-9kg/m2/s] .............................................................................................. 37 Figure 2. 9. RMS errors for the first 10 days of (a) CO2 concentration on the lowest layer and (b) surface CO2 fluxes from the C-univariate (uncoupled) data assimilation with 5% (red: control), 15% (green), and 30% (blue) of multiplicative inflation. ...................................................................................... 38 Figure 2. 10. Ensemble spread of atmospheric CO2 analysis on the lowest layer of model from (a) the uncoupled data assimilation, (b) the multivariate data assimilation, and (c) the one-way multivariate data assimilation, after two months of analysis under ALL LEVELS experiments [unit: ppmv] ................. 40 Figure 2. 11. Ensemble spread of CO2 fluxes analysis at the surface from (a) the uncoupled data assimilation, (b) the multivariate data assimilation, and (c) the one-way multivariate data assimilation, two months of analysis under ALL LEVELS experiments [unit: 10-9kg/m2/s] .......................................................... 41 Figure 3. 1. Schematic diagram of interface among the coupled components of atmosphere (SPEEDY-C), vegetation (VEGAS), and land surface (SLand). Variables in the interface are described in section 3.2.1. (Prec-precipitation, Tairs-temperature near the surface, qairs-specific humidity near the surface, VsE-wind speed near the surface, Rsnet-net shortwave radiation at the surface, Evap-evaporation, FTs-sensible heat, Ts-surface temperature at layer 1 (top layer), Swet-soil wetness, Ts2-soil temperature at layer 2, Runf-runoff, gf- growth factor, vegc-vegetation cover, Zrough-roughness length, ali-leaf area index, FSWds-downward shortwave radiation at the surface, NEPa-surface CO2 fluxes between atmosphere and land, vegcmc-annual mean of vegetation cover) ............................................................................................................................ 49 Figure 3. 2. Additional boundary forcing for VEGAS: (a) ice cover data with 1??1? resolution (Peltier, 1994), (b) a gradient of topography data with 1??1? resolution (GLOBE task team, 1999), (c) ice cover data interpolated to SPEEDY-grid T30 resolution, and (d) a gradient of topography interpolated to SPEEDY grid system. ........................................................................................ 51 Figure 3. 3. Annual mean of soil wetness in (a) VEGAS-SLand (LV) offline simulation forced by SPEEDY climatology, (b) the prescribed boundary condition used in the original version of SPEEDY. (unit: dimentionless) ........ 52 Figure 3. 4. Time series of global total of major variables in VEGAS during offline spinup simulation with SPEEDY climatology: (a) GPP (Gross Primary Productivity: black), NPP (Net Primary Productivity: green), and NEP (Net Ecosystem Productivity: yellow), (b) Cleaf (leaf carbon: black), Croot (root carbon: green), and Cwood (wood carbon: yellow), (c) Csfast (fast soil ix carbon:black), Csmed (intermediate soil carbon:green), and Csslow (slow soil carbon: yellow), (d) Cb (total biosphere carbon, i.e. soil carbon + vegetation carbon: black) , Cvege (vegetation carbon: green), and Csoil (soil carbon: yellow). Values are averaged annually. ............................................................ 56 Figure 3. 5. Annual mean fields of NEP (kg/m2/yr), NPP (kg/m2/yr), GPP (kg/m2/yr), Ra (autotrophic respiration, kg/m2/yr), Rh (heterotrophic respiration, kg/m2/yr), and Cb (kg/m2) for the last year of 600-year Land-Vegetation offline spin-up. 57 Figure 3. 6. Same as Figure 3.5 except for Cvege, Csoil, Cleaf, Csfast, Cwood, and Csslow (unit: kg/m2). ......................................................................................... 58 Figure 3. 7. Time series of monthly mean of global total (a) GPP, (b) NPP, and (c) NEP for last 20 years? fully coupled spin-up. .................................................... 60 Figure 3. 8. (a) Annual mean of precipitation from SPEEDY climatology (nine-year mean) and (b) last 10-year mean precipitation of coupled SPEEDY-VEGAS- SLand simulation for 30 years (unit: mm/d). ..................................................... 61 Figure 3. 9. Seasonal mean of NEP (surface carbon flux) in (a) DJF, (b) JJA, (c) MAM, and (d) SON of last year in the fully coupled atmosphere-vegetation- land spin-up. (positive: carbon sources, negative: carbon sinks) ....................... 62 Figure 4. 1. (a) True state of surface CO2 fluxes at the initial time step, (b) initial condition for surface CO2 fluxes. (unit: 10-9kg/m2/s). (positive: CO2 sources, negative: CO2 sinks) .......................................................................................... 67 Figure 4. 2. Schematic plot to describe the low-dimensional correction of model bias: blue arrow stands for nature (or reanalysis), and green arrows indicate every 6 hour forecast stating from the nature run. The departures of 6-hour forecasts from the nature run happen to be caused by the discrepancy between the forecast and the nature runs. The two-month averaged field of those departures is considered as the model bias, and hence it is subtracted from the background at every analysis step. ........................................................................................ 68 Figure 4. 3. Estimated model bias from the low-dimensional correction in (a) wind (m/s) (shading: divergence, unit: 10-7/s), (b) T(K), (c) q (g/kg) on the lowest layer, and (d) surface pressure (Pa), for two months of analysis period (model minus nature, positive: forecast overestimates, negative: forecast underestimates) .................................................................................................. 69 Figure 4. 4. Difference of the climatologies between the forecast model (SPEEDY-C) and the nature run (SPEEDY-C coupled with VEGAS-SLand); (a) soil moisture (mm), (b) evaporation (W/m2), and (c) precipitation (mm/d). (positive: the forecast has larger values than the nature run, negative: the forecast has less values than the nature) ....................................................................................... 71 Figure 4. 5. RMS errors of (a) U (m/s), (b) V (m/s), (c) T (K), (d) q (kg/kg), (e) atmospheric CO2 (ppmv) on the bottom layer, (f) surface CO2 fluxes (10-8 kg/m2/s) in the analysis. Red lines indicates the control run, and blue lines result from the bias correction. .......................................................................... 72 Figure 4. 6. The spatial distribution of errors in the analysis (analysis minus truth): zonal wind (m/s) (a) without bias correction, (b) with bias correction, atmospheric CO2 (ppmv) on the bottom layer (c) without bias correction, (d) x with bias correction, and surface CO2 fluxes (10-9 kg/m2/s) (e) without bias correction, and (f) with bias correction, after two month of data assimilation. . 73 Figure 4. 7. Spread of background ensemble in (a) zonal wind (U), (b) meridional wind (V), and (c) atmospheric CO2 concentration after three weeks of data assimilation under the experiment using a single inflation for the atmospheric CO2 at the lowest layer. ..................................................................................... 78 Figure 4. 8. RMS errors of (a) U, (b) V, (c) T, (d) q, (e) atmospheric CO2 at the level of ?=0.95, and (f) surface CO2 fluxes in the analysis. (blue: with bias correction and adaptive inflation with the OMB2 method, red: with bias correction and adaptive inflation with the AMB*OMB method, green: with bias correction, but no adaptive inflation) ......................................................................................... 84 Figure 4. 9. Same as Figure 4.8, except for surface CO2 flux fields. (unit: 10-9 kg/m2/s) .............................................................................................................. 85 Figure 4. 10. RMS error for two months of analysis: (a) global total, (b) Northern Hemisphere, (c) Southern Hemisphere, and (d) Tropics (20?S ~ 20?N). Yellow bar indicates CTRL experiment, green bar results from the bias correction experiment, blue bar is from the experiment of the bias correction plus adaptive inflation of the OMB2 method, and red bar presents the result of the bias correction plus adaptive inflation of the AMB*OMB method. (unit: m/s for U and V, K for T, g/kg for q, ppmv for atmospheric CO2 (C), 10-8kg/m2/s for surface CO2 fluxes (Cflx)) ................................................................................. 86 Figure 4. 11. Time series of resultant adaptive inflations ( 1?? ) through the OMB2 method for (a) meteological variables for all vertical levels, (b) atmospheric CO2 on the bottom layer (red: over land, blue: over ocean), and (c) atmospheric CO2 on upper levels ........................................................................................... 88 Figure 4. 12. Same as Figure 4.11, except for AMB*OMB method. ........................ 89 Figure 4. 13. Resultant adaptive inflation ( 1?? ) for surface CO2 fluxes for two months of analysis period, coupled with (a) the OMB2 method, and (b) the AMB*OMB method. ......................................................................................... 91 Figure 5. 1. RMS errors of (a) U (m/s), (b) V (m/s), (c) T (K), (d) q (kg/kg), (e) atmospheric CO2 concentration (ppmv) at the level of ?=0.95, and (f) surface CO2 fluxes (*10-8kg/m2/s). (blue: C-univariate analysis with adaptive inflation, red: one-way multivariate analysis with adaptive inflation) .............................. 96 Figure 5. 2. Atmospheric CO2 on the bottom layer: (a) True state, and analysis from (b) one-way multivariate, (c) C-univariate data assimilation with adaptive inflation, (d) one-way multivariate, (d) C-univariate data assimilation without adaptive inflation, after two months of analysis ................................................ 98 Figure 5. 3. Same as Figure 5.2, except for surface CO2 fluxes. ............................... 99 Figure 5. 4. Time series of estimated adaptive inflation for atmospheric CO2 on each vertical layer in (a) C-univariate data assimilation and (b) one-way multivariate data assimilation through OMB2 method. (lev1: ?=0.950, lev2: ?=0.835, lev3: ?=0.685, lev4: ?=0.510, lev5: ?=0.340, lev6: ?=0.200, lev7: ?=0.080) 101 Figure 5. 5. Time series of estimated adaptive inflation for meteorological variables for each vertical layer. ...................................................................................... 102 xi Figure 5. 6. Time series of estimated observation error for atmospheric CO2 in (a) C- univariate, (b) one-way multivariate data assimilation for every vertical level. .......................................................................................................................... 105 Figure 5. 7. Same as Figure 5.6, except for (a) zonal wind, (b) specific humidity, and (c) surface pressure .......................................................................................... 106 Figure 5. 8. Time series of adaptive inflation for surface CO2 fluxes in (a) C- univariate, (b) one-way multivariate data assimilation. ................................... 107 Figure 6. 1. Schematic plots of background error covariance matrix in (a) C- univariate, (b) previous one-way multivariate, (c) previous multivariate, (d) new one-way multivariate, and (e) new multivariate data assimilation. ................. 112 Figure 6. 2. RMS errors of (a) U (m/s), (b) V (m/s), (c) T (K), (d) q (kg/kg), (e) atmospheric CO2 on the lowest layer, (f) surface CO2 fluxes. Green indicates C- univariate data assimilation, red results from new one-way multivariate analysis, and blue from new multivariate analysis. ........................................................ 115 Figure 6. 3. surface CO2 fluxes: (a) True state, analysis from (b) the C-univariate, (c) the new one-way multivariate, and (d) the new multivariate data assimilation after two months of analysis. (unit: 10-9 kg/m2/s) ........................................... 116 Figure 6. 4. (a) True state of surface CO2 fluxes after two months of analysis, and the analysis of surface CO2 fluxes in (b) C-univariate , and (c) one-way multivariate data assimilation with the CO2 observation at every four grid point (6.3% coverage). ......................................................................................................... 118 Figure 6. 5. Atmospheric CO2 on the bottom layer: (a) truth, analysis in (b) C- univariate, (c) old one-way multivariate, (d) new one-way multivariate, (e) old multivariate, (f) new multivariate data assimilation. ....................................... 119 Figure 6. 6. Same as Figure 6.5, except for surface CO2 fluxes. ............................. 121 Figure 6. 7. Time series of resultant adaptive inflations ( 1?? ) for (a) meteorological variables for all vertical levels, (b) atmospheric CO2 on the bottom layer, and (c) atmospheric CO2 on upper levels, in the C-univariate data assimilation ......... 124 Figure 6. 8. Time series of resultant adaptive inflations ( 1?? ) for (a) atmospheric CO2 on the bottom layer, and (b) atmospheric CO2 on upper levels, in the new one-way multivariate analysis. ......................................................................... 125 Figure 6. 9. Same as Figure 6.7, except for the new multivariate data assimilation. .......................................................................................................................... 126 Figure 7. 1. Surface CO2 fluxes: (a) true state, (b) one-way multivariate data assimilation with an adaptive inflation of the OMB2 method but no bias correction for atmospheric CO2, and (c) same as (b) but with a bias correction for atmospheric CO2 using the low-dimensional method. ............................... 138 1 Chapter 1: Introduction The atmospheric CO2 concentration has increased from about 280 parts per million (ppm) in the beginning of industrial age to more than 380 ppm today (Figure 1.1). Since the released CO2 in the atmosphere traps long-wave radiation emitted from the Earth?s surface, the global surface temperature has increased as much as 0.6 ? 0.2?C during the last century (Houghton et al, 2001). Thus, an estimation of future atmospheric CO2 concentration has been highlighted as essential for the projection of the future climate. Figure 1. 1. Time Series of atmospheric CO2 concentration: (Samiento and Gruber, 2002; Barnola, 1999, and Keeling et al., 2000) From the comparison of growth rates between the fossil fuel emissions and atmospheric CO2 concentration, it has been found that only about half of the emitted CO2 remains in the atmosphere and the rest of it sequestered by the land and the Time series of atmospheric CO2 (ppm) 2 ocean (Sarmiento et al., 2002). Moreover, the increase rate of atmospheric CO2 has a significant variability on interannual timescales (Figure 1.2). That means the capacity of the land and ocean CO2 uptakes are varying substantially with time and are strongly connected with the climate. Figure 1. 2. Time series of Growth rate in atmospheric CO2 (red) for past 25 years, and increase rate of released CO2 by fossil fuel emission (yellow). Shaded green portion stands for the total surface CO2 uptake and blue vertical shading indicates El Nino years. This is a highly nonlinear problem which has complex interactions and feedbacks among all the components of land, ocean and atmosphere: the CO2 remaining in the atmosphere is determined by the uptake of the land and the ocean, the atmospheric CO2 concentration changes the global temperature due to its radiative properties, and the global warming caused by the increased CO2 can have an obvious influence on the capacity of CO2 uptake over the land and the ocean through biogeochemical processes. 3 So far, many studies have been done with in-situ measurements in order to understand the global carbon cycle. As one of those studies, Manning and Keeling (2006) estimated global oceanic and land biotic carbon sinks from the measurement of atmospheric O2/N2 ratio and CO2 concentration over the period of 1989-2003. Observations come from the three stations of Scripps Institution of Oceanography global flask sampling network, which have the longest records in the network. The O2/N2 ratio and CO2 concentration can characterize the fossil fuel combustion, terrestrial sinks and ocean uptake based on the knowledge in the chemical processes. This allows calculating how much of anthropogenic CO2 emission is sequestered by land and by ocean quantitatively. The resultant 10-year oceanic and land biotic sinks during each period of 1990-2000 and 1993-2003 show that the ocean uptake has relatively constant value around 2.0 PgC/yr whereas the land biotic carbon sink has much greater natural variability, from 1.2 PgC/yr to 0.5 PgC/yr. These values are global total estimates. In addition, there are studies to estimate the time-averaged CO2 fluxes over the ocean (Mikaloff Fletcher et al., 2007, Gruber et al., 2009). As a most recent work, Gruber et al. (2009) estimates the contemporary net air-sea CO2 flux using an inversion of interior ocean carbon observations using 10 ocean general circulation models (Mikaloff Fletcher et al., 2006, 2007). The spatial distribution of oceanic CO2 fluxes has been estimated reasonably for 23 oceanic regions: the outgassing in the tropics, uptake in midlatitudes, and relatively small fluxes in the high latitudes even though the uncertainty in the Southern Ocean is high. 4 On the other hand, the terrestrial carbon uptake still remains highly uncertain in terms of its spatial and temporal variations according to the climate. However, Tans et al. (1990) has highlighted a terrestrial CO2 uptake in the northern hemisphere to explain the north-south gradient of atmospheric CO2 concentration. Moreover, there is much research to emphasize understanding CO2 sequestration over the land (Bousquet et al., 2000; Gurney et al., 2002; DeFries et al., 2002; Rodenbeck et al., 2003; Friedlingstein et al., 2006) in order to figure out the interannual variability of atmospheric CO2 (Figure 1.2) and to project the potential reservoir of CO2 in the future. One of the most important issues on the carbon cycle is the temporal and the spatial pattern of CO2 sources and sinks at the Earth?s surface. It is necessary to see those patterns with a finer resolution enough to understand the interaction and feedback between the climate change and the biogeochemical processes. This is an essential question in order to understand the changes in surface CO2 fluxes under the current climate and to project the future climate. Thus, the purpose of this study is to explore the feasibility of estimating surface CO2 fluxes by assimilating remotely sensed atmospheric CO2 observations using one of the advanced data assimilation methods, the Local Ensemble Transform Kalman Filter (LETKF; Hunt et al., 2007). We investigate the analysis system to estimate surface CO2 fluxes and atmospheric CO2 concentration as well as atmospheric variables simultaneously on a fine temporal and spatial scale. Because this is the first test of a new methodology, this work is limited to simulated observations for an Observing System Simulation Experiment (OSSE). Only if the 5 results are promising in this simple approach, we will start working with real observations. 1.1. Problems in the estimation of surface CO2 fluxes There is a direct observation network of surface CO2 fluxes, FLUXNET (Figure 1.3), a global collection of micro-meteorological flux measurement sites. The flux tower sites measure the exchanges of carbon dioxide between terrestrial ecosystems and the atmosphere. However, the observed fluxes are representative of areas ranging from square meters to square kilometers. Besides, there are no standard methods for aggregating flux data into various temporal scales (daily, monthly, or annual time periods) so that there are various methods used at each site for temporal scaling (http://www.fluxnet.ornl.gov/fluxnet/index.cfm). For these reasons, this dataset was unlikely to be directly used for the global analysis of surface CO2 fluxes with an atmospheric global circulation model, although it could provide good information for the validation of the resultant analysis. Recently, Stockli et al. (2008) attempted to use this dataset to a land-surface model CLM3.5 which has been coupled with NCAR CAM3.5 model. Data from 15 sites are used to improve the model, and this study suggests that FLUXNET is a valuable tool to develop and validate land surface model. In addition, atmospheric CO2 concentrations have not been observed densely enough to estimate the global distribution of CO2 sources and sinks. In situ measurements of CO2 concentrations at ground stations have been used to monitor the carbon dioxide in the atmosphere (GLOBALVIEW- CO2 data from ESRL/NOAA; 6 Figure 1. 3. FLUXNET network which measures the exchanges of CO2 between terrestrial ecosystems and the atmosphere at flux tower sites (http://www. fluxnet.ornl. gov/fluxnet/index.cfm). Figure 1. 4. Sampling location of GLOBALVIEW- CO2 (http://www.esrl.noaa.gov/gmd/ ccgg/globalview/co2/co2_sites.html; CO2 concentration measurements near the surface) 7 Figure 1.4). The network of surface stations provides high precision information about variations of CO2 fluxes in the global scale. This dataset has been used for a number of previous studies on the carbon cycle and hence it has contributed to extend our understanding on this field. However, the observations are spatially too sparse and temporally heterogeneous for representing the regional variations of surface CO2 fluxes and understanding surface CO2 sinks and sources in finer scales. Recently, more CO2 estimations have become available through remote sensing measurements such as the Atmospheric Infrared Sounder (AIRS; Chahine et al., 2008), the Scanning Imaging Absorption Spectrometer for Atmospheric Chartography (SCIAMACHY; Buchwitz et al., 2005), the CO2-dedicated Orbiting Carbon Observatory (OCO; Crisp et al., 2004) and the Greenhouse gases Observing Satellite (GOSAT; Maksyutov et al., 2008, Hamazaki, 2008). Both OCO and GOSAT were designed specifically to estimate the total column of CO2 mixing ratio which has a high sensitivity to the CO2 near the surface. Unfortunately, the launch of OCO failed, while GOSAT was successfully launched and should start distributing data in mid 2009. These observations are expected to provide valuable information on global CO2, much more comprehensive than the measurements (such as ground- based flask data) available so far. In early studies on CO2 sources and sinks, atmospheric inversion methods have been applied using an atmospheric transport model. The point-based measurement of ESRL/NOAA (GLOBALVIEW- CO2) has been used for inferring surface CO2 fluxes within a Bayesian framework (Gurney et al., 2004; R?denbeck et al., 2003). The atmospheric transport model uses wind fields from a given reanalysis, 8 so that the impact of possible wind errors in the CO2 transport is effectively not considered. Since the available spatial coverage of concentration data is not enough to constrain the flux estimates, a-priori information about the fluxes has been pre- calculated from other sources of information, such as independent measurements or model simulations. Then, the system determines the surface CO2 fields minimizing the difference between modeled and observed concentrations and between predicted fluxes and their prior estimates. With this optimization, it is able to reduce the uncertainties in the surface CO2 flux fields from the a-priori estimates. The spatial and the temporal resolution of the resulting estimates are still limited, due to the ill- posedness of the problem and to the limited number of available measurements (Gurney et al., 2004). In an alternative approach to inversion methods, data assimilation techniques have recently been used to optimize the use of the observations for the analysis of surface CO2 fluxes on the globe. Two advanced methods for data assimilation, 4- dimensional variational data assimilation (4D-Var) and Ensemble Kalman Filter (EnKF) are being used or considered for use in operational numerical weather prediction centers. Both of them have been considered for the estimation of carbon surface fluxes, and this thesis is devoted to the use of a particularly efficient EnKF method, the LETKF. As described below, the applications of data assimilation so far have been ?univariate?, i.e., they perform the analysis of only carbon variables and assume the winds are given from an independent analysis. By contrast, in our approach we propose to perform ?multivariate? data assimilation, assimilating simultaneously both the carbon variables and the standard atmospheric variables 9 including winds, an approach that we will show allows estimating wind uncertainties and ?errors of the day? and therefore improves the carbon variables analysis. Peters et al. (2005) applied an ensemble Kalman filter technique (EnSRF; Whitaker and Hamill, 2002) for estimating weekly CO2 fluxes on the surface. They assimilated ESRL/NOAA GLOBALVIEW-CO2 data, and used the forward integration of TM5 chemistry transport model for the CO2 forecast. The transport model was ran offline (univariately) with the meteorological fields from the European Centre for Medium Range Weather Forecast (ECMWF) model. The state vector of the analysis contains surface CO2 fluxes at multiple time steps since observed CO2 concentration variations contain a history of sources and sinks. Unknown surface fluxes are optimized with atmospheric observations, linked together through the atmospheric transport model, which is the observation operator within an ensemble- based data assimilation system. That is, the Kalman gain matrix determines the surface CO2 flux fields to minimize the combination of the observation errors in the surface flask measurements and the model errors. Here, the model errors are assumed to be caused by the errors in the surface CO2 flux fields, which force the transport model, while the system assumes a ?perfect? transport. Then, the Kalman gain matrix is used to update only the mean state vector of surface CO2 fluxes. Instead of taking account of uncertainties from the ensemble background, the covariance structure of surface CO2 fluxes is prescribed as a 3D variational technique. Thus, their method could not take the advantage of ensemble forecast of estimating ?errors of the day? with respect to the surface CO2 fluxes. This means that their approach misses an important advantage of ensemble Kalman filter techniques. Although they update a 10 prior estimate from the analysis, the system was initialized by a flux estimate from the CASA biosphere model (Randerson et al., 1997) over the land and the Takahashi et al. (2002) ocean fluxes. This work was the first trial to use an ensemble based data assimilation technique for estimating surface CO2 fluxes, and the results provided satisfactory flux estimates for the relatively large regional scales resolved by the surface flask measurement including aircraft measurements. Another study with the Ensemble Transform Kalman filter technique (ETKF: Bishop et al., 2001) has been examined by Feng et al. (2008). This uses a chemistry transport model and scene-dependent averaging kernels of the OCO measurements (Crisp et al., 2004) as an observation operator to assimilate the 8-day mean CO2 observation from the OCO. This is an Observing System Simulation Experiment (OSSE) so that it assimilates simulated observations. They made the state vector consist of the surface CO2 fluxes of 144 regions on the globe at the current assimilation time step but also at the previous 11 time steps, similar to the approach taken by Peters et al. (2005). The basic concept of optimization is also like that in Peters et al. (2005), calculating the adjustment to the background based on the difference between model and observations and their uncertainties within the ensemble-based data assimilation framework. However, the analysis uncertainties have been estimated by the ensemble Transform Kalman Filter algorithm, which is different from Peters et al. (2005) that used prescribed analysis uncertainties. In this experiment, they start from a-priori fluxes estimated as having similar distribution but values 80% higher than the ?true value?, and could reduce uncertainties in the 11 flux estimates by 20-70% on the 144 regions overall, compared to the a priori uncertainties. On the other hand, Baker et al. (2006) developed a four dimensional variational data assimilation approach, 4D-Var, to address the problem of surface CO2 flux estimation, and also tested it through the use of OSSEs. They optimized time-varying boundary values of surface CO2 fluxes over a long measurement span by the adjoint-based iterative descent method. The cost function was defined by the combination of a difference between the simulated CO2 concentration and the observed CO2 over the analysis window, a-priori errors of atmospheric CO2 and CO2 fluxes, and dynamic constraint between the CO2 concentration and the flux. As a result, they could correct the errors in a priori estimate well when the observation errors and a priori flux covariance are assumed to be known perfectly, and the transport model errors are ignored. From experiments they performed to test the sensitivity to the observation error and a prior constraint on the fluxes, they concluded that the accuracy of ?bottom-up? transport model is important even where there are dense observations of CO2. Recently, the European GEMS (Global Monitoring for Environment and Security) project has been building a comprehensive monitoring and forecasting system for atmospheric composition on both global and regional scales (Hollingsworth et al., 2008). For the data assimilation of carbon cycle in this project, it introduced a two-step process such as the atmospheric analysis first and then a flux inversion based on 4D-Var approach. 12 Since the observation density of atmospheric CO2 concentration should increase with upcoming satellite data in addition to increasing surface measurements, it is not computationally feasible to use the direct inversion modeling approach that has contributed to the understanding CO2 sources and sinks on the global scale at the earlier stage (Gurney et al. 2004; R?denbeck et al., 2003). As the next promising methods, either ensemble Kalman filter or variational data assimilation approaches should be the technique of choice due to the computational efficiency in addition to many other advantages. So far, previous studies from both advanced data assimilation methods have used a-priori estimates of surface CO2 fluxes, and none of them deals with the transport error of atmospheric CO2 concentration forced by surface CO2 fluxes. For our work, we introduced a new technique for the carbon cycle data assimilation using the Local Ensemble Transform Kalman Filter (Hunt et al., 2007), in a way that does not need transport inversion or a-priori information. In our approach, we assimilate simultaneously all the atmospheric variables and CO2 variables instead of using the reanalysis data of atmospheric variables when assimilating CO2. From the background state of ensembles, we can deal with ?errors of the day? and further allow the error covariance among the dynamical variables to reflect the distribution of uncertainties caused by each of variables. The details of the new method are described in Section 1.3. This work is a part of the project on ?Carbon Data Assimilation with a Coupled Ensemble Kalman Filter? supported by the Climate Change Prediction Program in Department of Energy. The objective of the project is to estimate surface 13 CO2 fluxes by assimilating atmospheric CO2 observation from space, and the research has been organized into two components: one is a simulation (OSSE) approach to develop and to test various new approaches in data assimilation using the LETKF coupled to a small primitive equations global atmospheric model, and the other is an application of methodologies tested by the simulation approach coupling the LETKF to a higher resolution Community Atmospheric Model (CAM) and with real, not simulated observations. The work of this thesis is devoted to the simulation component in this project. Thus, all the experiments here are Observing System Simulation Experiments (OSSEs). Dr. Junjie Liu is carrying out the real model, CAM3.5, real observations component, AIRS and GOSAT, of the project under the direction of Prof. Inez Fung (UC Berkeley). 1. 2. SPEEDY-C and SPEEDY-VEGAS In the OSSEs, there should be a long model integration, known as a ?nature run? assumed to be the ?truth?. From the nature run, we can simulate the observations which will be assimilated in the analysis. On the other hand, we need to make forecasts to create the background for the analysis. The same model can be used for both the nature run and the forecast model, or it is possible to use a different model for the forecast from that for the nature run. If we use the same model for both the nature run and the forecast, and the forecast starts from the perturbed initial conditions that are not the same as the nature run, then the departure of the forecast from the nature run can be attributed as coming only from the initial conditions. On the other hand, when we use a different model for the forecast from that for the nature 14 run, model errors should be considered as well in addition to the errors caused by initial condition. These experimental designs provide a very good tool to assess the performance of a new data assimilation method because in an OSSE, unlike in real life, we know the truth and we can control the errors. In this study, we modified an intermediate-complexity atmospheric general circulation model, SPEEDY (Molteni, 2003), to simulate atmospheric CO2 concentration with a given forcing of surface CO2 fluxes. For the perfect model experiments, the modified SPEEDY is used for both the nature run and the ensemble forecast. Then, we coupled a dynamic terrestrial carbon model, VEGAS (Zeng et al., 2005), and a physical land surface model, SLand (Zeng et al., 2000) to the SPEEDY with atmospheric CO2 prognostic. The coupled atmosphere-vegetation-land model is used for the nature run while the modified version of SPEEDY continues to make the ensemble forecasts in the imperfect model experiment. The forecast model does not make any changes in surface CO2 fluxes since the surface CO2 fluxes are the forcing term constant with time. The nature run, however, calculates the surface CO2 fluxes every six hours through the interaction among the atmosphere, the land, and the vegetation. Thus, the changes in the surface CO2 flux analysis only come from the data assimilation, not from the forecast model. 1.3. LETKF for carbon cycle data assimilation 1.3.1. Formulation of LETKF LETKF (Hunt, 2005, Hunt et al., 2007) is an advanced ensemble Kalman filter data assimilation scheme. It is a square-root ensemble filter in which the observations 15 are assimilated simultaneously to update the ensemble mean while the ensemble perturbations are updated by transforming the forecast perturbations through a transform matrix term as in Bishop et al. (2001). The analysis is done independently at every grid point using observations from a local region, so this scheme is expected to be efficient for parallel computing systems. y[g]b(i) = H[g]x[g]b(i) , i=1, ? , k (1.1) { }bgibgbg ][)( ][][ yyY ?= (1.2) { }bgibgbg ][)( ][][ xxX ?= (1.3) Here, b(i)[g]x is the i-th member of ensemble forecast, [g]H the observation operator, and b(i)[g]y the i-th member of background observation ensemble. Subscript [g] indicates that the values are estimated globally and the bars above the vectors represent the mean of ensembles. Let the number of ensemble forecast be k, the number of observations l, the dimension of state vector m. First, the analysis system calculates the ensemble forecast on the observation locations using H, the global observation operator (Equation 1.1), and then computes the observation increment for every ensemble member, b[g]Y (Equation 1.2). On the other hand, the deviation of each ensemble forecast from their mean is calculated (Equation 1.3). These processes are done globally initially before going to the computation for each local patch. Now, the analysis mean state ( xa ) and the analysis error covariance ( aX ) are calculated by the ensemble forecast and the observation located within each of the local patch (Equation 1.4-1.6). )(~ bobba yyKXxx ?+= (1.4) 16 111 )(])1()[(~ ??? ?+= RYIYRYK TbbTb k (1.5) 2 1 ]~)1[( aba k PXX ?= (1.6) [ ] 11 )1()(~ ?? ?+= IYRYP kbba (1.7) Here, K~ is the Kalman gain matrix in ensemble space, and R is the observation error covariance, and aP~ is the analysis error covariance matrix in the ensemble space. This is the system to analyze the state vector x which contains normally the meteorological variables such as wind, temperature, humidity and surface pressure. 1.3.2. Carbon cycle data assimilation: multivariate vs. univariate analyses The state vector in the analysis is augmented by adding the surface CO2 fluxes, which are then updated through the background error covariance, an approach similar to parameter estimation (Baek et al., 2006). In the formulation of LETKF, background error of surface CO2 fluxes are not involved explicitly in calculating the Kalman gain matrix, K~ , since they are not observed (Equation 1.5). But, background errors of surface CO2 fluxes result in background error of atmospheric CO2 concentration. That is, the background errors of atmospheric CO2 concentration are partially a result of the errors in surface CO2 forcing. Thus, K~ is determined in a way to minimize the errors of other dynamic variables including the atmospheric CO2 as well as the CO2 flux error. According to choices of variables which are included into the state vector, xb , with surface CO2 fluxes, various ways to estimate surface CO2 fluxes are possible. When we make an analysis state vector with only the atmospheric CO2 concentration 17 in addition to the surface CO2 fluxes, the Kalman gain is calculated by only assimilating atmospheric CO2 observations univariately. As mentioned before, the errors of surface CO2 fluxes implicitly affect the computation of the Kalman gain matrix. Then, the surface CO2 fluxes are updated by multiplying a background error covariance matrix of atmospheric CO2 and surface CO2 fluxes to the Kalman gain matrix (Equation 1.4). If the analysis state vector is designed to also include the wind fields, in addition to atmospheric CO2 and surface CO2 fluxes, then the analysis can reflect the background error covariance among those variables to estimate surface CO2 fluxes multivariately. In our analysis system, atmospheric variables are assimilated simultaneously using the simulated rawinsonde observations as a version of LETKF used in Liu (2007). In order to see how the background error covariance of the atmospheric variables with the surface CO2 fluxes effects on the analysis of surface CO2 fluxes, we designed various data assimilation techniques into the LETKF framework and those are introduced in Chapters 2 and 6. 1.3.3. Analysis in a presence of model error: bias correction and adaptive inflation Since there is no model to represent the true state of atmospheric conditions perfectly, it is necessary to deal with the errors of forecast model in reality. The model bias is due to the discrepancies of a forecast model such as a coarse resolution, imperfect parameterization, etc. This study applies the low-dimensional method introduced by Danforth et al. (2007) to the bias correction of the atmospheric variables. 18 In practice, the ensemble forecast tends to underestimate the uncertainty in its state estimate because of model errors and nonlinearities. This leads to the underestimation of the uncertainties from the forecasts. As a result, the analysis overfits the background state estimate and gives too little weight to the observations. This inconsistency becomes larger over time, so that the information of observations is less and less used by the analysis and, eventually, it leads to an analysis that has little relationship with the observations, known as ?filter divergence?. Thus, it is necessary to inflate the background covariance (or the analysis covariance) during each data assimilation cycle to increase a model error covariance. For the covariance inflation, multiplicative inflation has been applied in this work (Anderson and Anderson, 1999). It is carried out by multiplying the background perturbation from the ensemble mean by a constant factor larger than one. It is common to tune the inflation parameter manually in order to decide a reasonable value for the analysis system. But this tuning is expensive, and becomes further infeasible if the inflation factor should depend on the region or variable. Thus, Li et al. (2009) introduced a method for adaptive inflation estimation. Since the estimation of adaptive inflation is dependent on the observation errors, the paper also estimates the observation error simultaneously. The methodology presented in this paper has been found to be essential to the carbon cycle data assimilation because it is necessary to deal with the inflation for the atmospheric CO2 separately from the meteorological variables. Furthermore, we also consider the inflation for the surface CO2 fluxes which do not have observations. The adaptive inflation of Li?s paper is connected to the 19 existence of observations so it cannot be applied to estimate the inflation for the surface CO2 flux forecast without observations. Thus, we applied a simple inflation method to surface CO2 fluxes. It is basically to make the ensemble spread of analysis correspond to the background ensemble spread at each analysis time (similar to Zhang et al., 2004). 1.4. Outline of the thesis Chapter 2 has the description of the model which we modified for this study and the three types of data assimilations for the carbon cycle in the LETKF framework. To test the performance of a new data assimilation system, the experiments are first done under the simple scenario given by the perfect model assumption. Next, Chapter 3 shows how a coupled atmosphere-vegetation-land model was constructed to estimate the time-varying surface CO2 fluxes over the land. In Chapter 4, the imperfect model experiments are carried out with a method for bias correction, and an adaptive inflation and observation error estimation. The advanced adaptive inflation techniques are also applied to the perfect model simulation in Chapter 5. From the findings in Chapter 5, we introduce a new multivariate data assimilation system in Chapter 6 and see the effect of ?variable localization? in the LETKF data assimilation. Chapter 7 has a summary and lessons we learned, and discusses some future research directions. 20 Chapter 2: Carbon Cycle Data Assimilation in the Perfect Model Simulation Using SPEEDY-C 2.1. Introduction In reality, no forecast model is good enough to completely ignore model error, and we will have to address this serious issue, especially for the carbon cycle. But, in this chapter, we want to address the pure performance of a new analysis system for CO2 variables with no model error or bias. To do this, we run ?identical twin? Observing System Simulation Experiments (OSSEs) using a single model with CO2. One run, called the ?nature run?, serves as the ?truth? for the experiment. Since we will use the same model for the truth and for the forecast, there is no model error. A second run, using an ensemble data assimilation system, can then be compared to the truth. Thus, we build one forecast system for CO2 and use it to create nature run as well as to run the ensemble forecast for a data assimilation so that it allows us to avoid the effects of model error for the moment. In order to simulate the CO2 concentration in the atmosphere, we modified an intermediate-complexity atmospheric general circulation model, SPEEDY (Molteni, 2003). Next, we investigated a new analysis system for the carbon cycle and tested it under the perfect model simulation to assess the performance clearly in the absence of model error. 21 Section 2.2 introduces the model we chose and a detailed description of modification we have done. Section 2.3 introduces the three types of data assimilation we have tested in LETKF framework. Section 2.4 describes the experimental design. The results are shown in Section 2.5. Finally, there is summary of Chapter 2 in Section 2.6. 2.2. Model: SPEEDY-C The SPEEDY model (Molteni, 2003) is a global atmospheric, primitive equations general circulation model (AGCM). Its simplified physical parameterization schemes are computationally efficient, but maintain the basic characteristics of a state-of-the-art AGCM with complex physics. The version used for this study has triangular truncation T30 (corresponding to about 400 km horizontal resolution) with 7 sigma levels. The original version of SPEEDY has five dynamical variables: zonal (U) and meridional (V) wind components, temperature (T), specific humidity (q), and surface pressure (Ps). To use the model for this study, we added two variables: one is atmospheric carbon dioxide (CO2) which is treated as a tracer, so that it is affected only by the two processes of advection and diffusion, and the other is a surface flux of carbon dioxide (CF) which is a source and sink of the atmospheric carbon. Basically, CF is not changed in the model and only plays the role of forcing the atmospheric CO2. Later, it will be updated only by the analysis step of data assimilation. Chemical processes for the atmospheric carbon dioxide have been ignored since CO2 is one of the inert gases in the atmosphere. Moreover, there is no 22 feedback between the integrated CO2 and radiative properties. Thus, the model reads the surface CO2 fluxes as a forcing and allows it to be transported and mixed (Equation 2.1). From now on, the SPEEDY model that contains these carbon-related variables will be referred to as ?SPEEDY-C?. CFCOCO 22 =?+?? )(t (2.1) Equation (2.1) shows the way to calculate the tendency of atmospheric CO2 in SPEEDY-C, where )( 2CO? represents the atmospheric 3-dimensional transport and mixing, and the forcing term, CF, on the right-hand side of Equation (2.1) indicates the surface fluxes of CO2. In reality, the forcing should include fossil fuel emission, land surface fluxes, ocean fluxes, and fluxes due to land use changes. In this chapter, since we are testing the ability of data assimilation to estimate surface CO2 fluxes, we choose a very simple scenario: the source of surface CO2 fluxes is only caused by fossil fuel emissions, which we assume to be constant in time (Andres et al., 1996). Due to a problem with SPEEDY dynamics, based on a spectral discretization, the total amount of atmospheric CO2 is not conserved exactly by atmospheric transports, and there is a small but significant sink of CO2 concentrated in the Southern Hemisphere stratosphere. Since lack of conservation is a well known generic problem, especially for spectral models, and it is desirable to conserve total CO2, we opted for making a simple correction. After the model reads the surface CO2 flux fields, they are converted to the atmospheric CO2 and then transported and mixed in the atmosphere. Thus, we could calculate how much the total amount of atmospheric CO2 should be with the given forcing of surface CO2 flux fields. We also computed the actual amount of global atmospheric CO2 in the SPEEDY-C 23 simulation. From this, we could estimate the ratio of the total amount of atmospheric CO2 which the model should have to what the model actually has. By multiplying the atmospheric CO2 by this ratio at every grid point and every time step, we can get an increase of simulated atmospheric CO2 concentration with a given forcing. Although this is not an ideal correction, it maintains conservation of total CO2, but with a small global redistribution. 2.3. Three types of data assimilation techniques So far, estimations of surface fluxes of carbon have been made univariately, with inversion methods or with data assimilation systems that assume that the wind is given by a reanalysis, and do not couple CO2 and wind errors (Chapter 1). Here we will compare such univariate approach with a multivariate approach in which the estimated errors of CO2 are coupled with the other atmospheric estimated errors within the background error covariance. As far as we know, this is the first time that this has been done, even in simulation mode. Since the CO2 errors estimated in the EnKF data assimilation may have large sampling errors, we found it desirable to create a ?one-way? multivariate system in which CO2 errors do not provide feedback to the winds. 2.3.1. Carbon-univariate data assimilation For the carbon-univariate (C-univariate) data assimilation, atmospheric CO2 concentration and surface CO2 fluxes are updated only by these two variables, and are not affected by other atmospheric variables. That is, there are two separate 24 analysis systems: one for the atmospheric variables, and the other for the CO2-related variables. These two systems never talk to each other during the analysis (similar to the univariate CO2 assimilation that assumes winds are given by another reanalysis). The system for the atmospheric variables has dynamic variables (state vector) Ps)q,T,V,(U,x =1 , while the one for the CO2 variables has CF),(COx 2=2 as a state vector in the analysis cycle (Equation 1.1-1.6.). Figure 2.1 is a schematic plot to show the background error covariance matrices used for those analyses. Diagonal components of those matrices indicate the error variance of each variable while the off-diagonal components are the correlation between the variables. Black boxes indicate that there is no correlation allowed between the variables. From this, the pink box of Figure 2.1(a) allows only the background error covariance between atmospheric CO2 and surface CO2 fluxes to produce their analyses while the errors of all atmospheric variables are coupled in a green box of the plot. As indicated above, this approach is similar to the ?carbon-univariate? approaches that have been used so far to perform carbon data assimilation (Peters et al., 2005; Baker et al., 2006; Feng et al., 2008) or inversions (Bousquet et al. 2000; Gurney et al. 2004; R?denbeck et al., 2003). 2.3.2. One-way multivariate data assimilation Next, we consider a one-way multivariate data assimilation in which the atmospheric CO2 concentration and surface CO2 fluxes are updated by these two carbon variables as well as the wind fields, while the wind field in addition to other atmospheric variables such as temperature, specific humidity and surface pressure is 25 CF C U V T q Ps Ps q T V U C C F Update U, V, T, q, Ps, C, CF Ps q T V U C C F CF C U V T q Ps Update U, V, T, q, Ps Update C & CF (a) (b) (c) CF C U V T q PsC F C U V T q Ps Ps q T V U C C F Update C & CF Figure 2. 1. Schematic plots of the background error covariance matrix for (a) C- univariate, (b) 1-way multivariate, (c) multivariate data assimilations. (C: atmospheric CO2, CF: surface CO2 fluxes) 26 not affected by these two carbon-related variables. As indicated above, this is done to minimize spurious feedback due to sampling errors in the estimation of CO2 errors. This method also has two analysis systems, and the system for the atmospheric variables is exactly same as one in the C-univariate data assimilation. For the CO2 variables, however, we made the state vector of CF),COV,(U,x 22 = to allow the flow-dependent errors estimated for the wind fields to provide feedback to the CO2 variables. However, the wind field from the analysis with 2x is discarded and we update only CO2 and CF from the system of 2x . That is, the pink box of Figure 2.1(b) includes the background error of wind but we only save the analyses of CO2 and CF from the pink box. The wind fields are updated by the green box as in the C-univariate analysis. Thus, information from the wind field is given to the carbon-related variables but the information from CO2 variables does not cause any change in the analysis of the wind field as well as other atmospheric variables. This method was designed because the atmospheric CO2 is transported and diffused by the wind field, but is not influenced by the other atmospheric variables in the forecast model, and at the same time it prevents sampling errors in the CO2 estimation from contaminating the winds. 2.3.3. Multivariate data assimilation In this method (which is the standard approach that would be taken in EnKF systems), all the dynamical variables are included in one vector x so that the analysis of every variable is determined by the background error covariance among all variables. In other words, there is only one analysis system and the state vector 27 is CF),COPs,q,T,V,(U,x 2= . Thus, the background error covariance matrix has the shape of Figure 2.1(c). With this methodology, we allow the atmospheric CO2 to be analyzed by the updated background error of wind field simultaneously, not using reanalysis winds like most of the other previous studies. Furthermore, we can assimilate simultaneously all the atmospheric variables and CO2 variables. This is because we do not use any inversion to calculate the back trajectory of atmospheric CO2 concentration in order to estimate surface CO2 fluxes using the wind fields. Most of research on this issue uses the inversion method (Enting, 2002) which requires the wind fields, mainly from a reanalysis, in order to estimate surface CO2 fluxes, whereas we do assimilate atmospheric CO2 concentration and other meteorological variables simultaneously so that we calculate the Kalman gain matrix with a background error of all the dynamic variables including surface CO2 fluxes. 2.4. Experimental Design Under the perfect model assumption, the SPEEDY-C was used to create a ?nature? (truth) run. We create observations from this nature run by adding random perturbations. At the same time, SPEEDY-C with six prognostic variables including atmospheric CO2 concentration is used as the forecast model in which the CF over land is updated only by the analysis. Again, the forecast model does not have a dynamical forecast equation for CF, so the forecast of CF is persistence (starting from 28 Figure 2. 2. (a) A true state of surface CO2 fluxes which includes only anthropogenic emission as a constant forcing with time, (b) Initial condition of surface CO2 fluxes 29 the analysis value). All the experiments here only include fossil fuel emission (Andres et al., 1996) with a total of 6 PgC/yr as a constant forcing with time. The spatial distribution of this forcing is shown in Figure 2.2(a). This is what we want to estimate through the analysis. The initial conditions of (U, V, T, q, Ps, CO2) for the 20-ensemble forecast are created by adding random perturbations to a state in the truth run which were chosen randomly in time. The standard deviation of the random perturbations used for the initialization depend on the scale of each variable: 1 m/s for U and V, 1 K for T, 0.1 g/kg for q, 1 hPa for Ps, and 1.0 ppmv for CO2. The initial condition of the surface CO2 fluxes has been generated separately as follows: from 20 fields of CO2 concentration in the lowest three layers at arbitrary times, we subtract the one-day prior state of CO2 concentration, and then convert the units of the field from the ppmv/day to the kg/m2/s (Figure 2.2). This approach was found necessary because initializing the fluxes with random numbers (as we first attempted to do) failed to converge to satisfactory results. As suggested by Zupanski et al., 2006, when the spatial scale of initial perturbation is too small to represent physically meaningful signals, EnKF can result in erroneous solutions. To avoid underestimating the uncertainty in the ensemble space, a constant 5% multiplicative inflation was applied. This value had been previously found to be optimal for the atmospheric variables data assimilation (Liu, 2007). The assimilation cycle time is every six hours. The observations for all the experiments were simulated by adding random perturbations to the ?nature? run, with the same magnitude of random perturbations as those used for the initial conditions. For the atmospheric variables, observations 30 have the spatial distribution of the rawinsonde network, where coverage of grid points is about 9 % globally (Figure 2.3). Atmospheric CO2 concentration is observed at every other grid so that the coverage is about 25% in the horizontal, an optimistic assumption because we wanted to explore the potential of EnKF first in a favorable scenario. According to the vertical resolution of CO2 observations, we performed three kinds of experiments: ?ALL LEVELS?, ?OCO + AIRS?, and ?OCO? experiments with different vertical resolusions. Figure 2. 3. Distribution of rawinsonde observation network. 2.4.1. ?ALL LEVELS? experiment At first, the atmospheric CO2 concentration was assumed to be observed at every vertical level. For this case, all three data assimilation approaches introduced in Section 2.3 were tested. 31 2.4.2. ?OCO + AIRS? experiment Since the OCO instrument was known to be most sensitive to the CO2 concentration in the lower troposphere (Crisp et al., 2004) and CO2 retrieval from AIRS (Maddy et al., 2008) has instead the largest sensitivity in the middle troposphere near 7~9 km in the vertical (Figure 2.4), the experiments in this section were designed to have the CO2 observation at only two layers, the lowest layer (?=0.95) and at the mid-troposphere (?=0.34). For this case, only the one-way multivariate data assimilation was performed. These experiments were performed before the OCO launch failed, but since we will still have access to GOSAT measurements (Hamazaki et al., 2008) and other prospective satellite measurements which have a sensor most sensitive to atmospheric CO2 near the surface, this experimental set-up is still important. Figure 2. 4. Representative vertical averaging kernels for column CO2 soundings using near IR absorption of reflected sunlight in the 1.61-lm CO2 band (blue) and thermal IR emission near 14.3 lm (red). Thermal IR soundings are less sensitive to near-surface CO2 because of the small surface?atmosphere temperature contrast. (Crisp et al., 2004) 32 2.4.3. ?OCO? experiment In order to see how important it is to have good measurements near the surface for the analysis of surface CO2 fluxes, we only assimilate the CO2 observations on the lowest layer where OCO was known to be most sensitive. For this, only the one-way multivariate data assimilation was done. 2.5 Results We want to point out that the initial condition of surface CO2 fluxes used here does not include any a-priori information. Figure 2.2 shows that the initial surface CO2 fluxes have not only a different spatial pattern but also an inconsistent magnitude compared to the true state. This is in contrast from other previous studies which require a reasonable initial estimation for surface CO2 fluxes. 2.5.1. Performance of SPEEDY-C In order to see whether SPEEDY-C simulates atmospheric CO2 reasonably well, we made a comparison with the results of an experiment made with NCAR CCM (Community Climate Model) provided by Dr. Fung. First, one needs to point out that NCAR CCM is a much more sophisticated model than SPEEDY-C in terms of physics and dynamics, and its resolution is also higher. CCM has a spectral resolution of T42 (2.8??2.8?) in the horizontal and 18 layers in the vertical. Considering these differences, the results shown in Figure 2.5 suggest that carbon simulations with SPEEDY-C are sufficiently realistic for this study. 33 Figure 2.5 shows the result of an experiment which has only fossil fuel emission with 6 PgC/yr and has been started from zero state of atmospheric CO2. The top panels are an annual mean of atmospheric CO2 over the third year on the surface layer from SPEEDY-C (left) and NCAR CCM (right), and bottom panels are a vertical cross section of zonal mean CO2 concentration at the beginning of third year (left: SPEEDY-C, right: NCAR CCM). From this figure, we can see the spatial distribution of CO2 simulated by SPEEDY-C generally agrees with that of NCAR CCM even though the mixing tends to be stronger in SPEEDY-C with a deeper surface layer. Also, SPEEDY-C represents the well-mixed CO2 within deeper surface (a) (b) (c) (d) 5 6 7 8 9 Figure 2. 5. Comparison of SPEEDY-C with NCAR CCM: annual mean of atmospheric CO2 concentration in the surface layer for third year in (a) SPEEDY-C, and (b) NCAR CCM, and the vertical cross section of zonal mean at the beginning of 3rd year simulation in (c) SPEEDY-C, and (d) NCAR CCM. The experiment starts from zero state of CO2 in the atmosphere with a fossil fuel emission of 6 PgC/yr. (unit: ppmv) 34 layer. Also, SPEEDY-C represents the well-mixed CO2 within a hemisphere and a large gradient between hemispheres as in NCAR CCM. As a general understanding of CO2 transport in the atmosphere, we also confirmed that CO2 mixing within a hemisphere takes about three months and it takes about a year to have significant CO2 mixing between hemispheres. 2.5.2. Analysis of CO2 variables with LETKF 2.5.2.1. ?ALL LEVELS? experiment The three types of data assimilation introduced in Section 2.3 are examined in the ALL LEVELS experiments to allow comparing the performance of each data assimilation scheme. Figure 2.6 shows the global RMS error of the dynamic variables from three of the analyses. The standard atmospheric variables are analyzed similarly well with the rawinsonde distribution of observations through the three data assimilation methods, while the results of CO2 variables vary for each of schemes. As we expected, the results of the atmospheric variables from the C-univariate data assimilation are exactly the same as those from the one-way multivariate data assimilation. For the carbon-related variables, the carbon-univariate data assimilation fails to analyze both CO2 and CF and we have ?filter divergence?. Since the C-univariate data assimilation for the CO2 variables has only these two variables in the background error covariance matrix while there is no observation for CF, the system seems to suffer from the lack of information so that the analysis of atmospheric CO2 35 Figure 2. 6. RMS error of analysis from three types of data assimilation: uncoupled (green), multivariate (blue), and one-way multivariate (red) data assimilation for (a) U (m/s), (b) V (m/s), (c) T (K), (d) q (kg/kg), (e) atmospheric CO2 (ppmv) on the lowest layer of model, and (f) surface CO2 fluxes (10-8 kg/m2/s). 36 Figure 2. 7. (a) True state of atmospheric CO2 on the lowest layer of model after two months from the start of analysis, and the resultant analysis of it from (b) uncoupled data assimilation under ALL LEVELS experiment, (c) multivariate data assimilation under ALL LEVELS experiment, (d) one-way multivariate data assimilation under ALL LEVELS experiment, (e) one-way multivariate data assimilation under OCO+AIRS experiment, and (f) one-way multivariate data assimilation under OCO experiment. The number in the left ?bottom of each figure is RMS error [unit: ppmv] 37 Figure 2. 8. (a) True state of surface CO2 fluxes after two months from the start of analysis, and the resultant analysis of it from (b) uncoupled data assimilation under ALL LEVELS experiment, (c) multivariate data assimilation under ALL LEVELS experiment, (d) one-way multivariate data assimilation under ALL LEVELS experiment, (e) one-way multivariate data assimilation under OCO+AIRS experiment, and (f) one-way multivariate data assimilation under OCO experiment. The number in the left ?bottom of each figure is RMS error [unit: 10-9kg/m2/s] 38 concentration performs poorly even with observations at every other grid point. That is, the carbon-related system of CO2 and CF does not have enough constraints so that a bad analysis of one variable can cause negative feedback to the other variable when one of them goes wrong (for example, as shown in Figures 2.7(b) and 2.8(b), when the errors in CO2 fluxes overwhelm the analysis). Moreover, the current experiments use a fixed inflation factor with time so the analysis system cannot use the observation information flexibly. For these reasons, the analyses of both CO2 variables diverge at the end. We tried to increase inflation for the C-univariate data assimilation in order to stabilize the filter, but the analyses of CO2 variables actually diverged even faster with larger inflation factors (Figure 2.9). That is because the large inflation factor gives less contribution of forecast to the analysis while the observation is insufficient to constrain CO2 variables. Thus, the analysis system for CO2 variables has difficulties in combining information from both forecast and observation with a large inflation factor. Figure 2. 9. RMS errors for the first 10 days of (a) CO2 concentration on the lowest layer and (b) surface CO2 fluxes from the C-univariate (uncoupled) data assimilation with 5% (red: control), 15% (green), and 30% (blue) of multiplicative inflation. 39 On the other hand, the multivariate data assimilation has better performance than the C-univariate one in terms of both RMS error (Figure 2.6) and spatial distribution (Figures 2.7 and 2.8). Since the errors of all variables are coupled, the analyses of CO2 variables can have more constraint from the observation of other variables. Thus, the analyses of CO2 variables converge to the true state unless the estimate of surface CO2 fluxes gets far from the true state. Indeed, we have more stable and better performance on both atmospheric CO2 concentration and surface CO2 fluxes from the multivariate data assimilation than from the uncoupled one. The analysis of surface CO2 fluxes, however, has rather large spurious negative values over the eastern US (Figure 2.8(c)) and these fluxes result in the degradation of the atmospheric CO2 analysis (Figure 2.7(c)). The performance of one-way multivariate data assimilation is optimal for the CO2 variables (Figure 2.6). The negative values of surface CO2 fluxes shown in the multivariate data assimilation disappear in this scheme so that the analyses of CO2 variables have fairly good agreement with the true state overall (Figure 2.7(d) and Figure 2.8(d)). This result reveals that the multivariate data assimilation allows undesirable sampling error feedback between the CO2 variables and the atmospheric variables. The atmospheric CO2 is influenced only by the wind field and there is no relationship of CO2 with temperature, humidity and surface pressure field in our nature run. Thus, the coupled error covariances between these irrelevant meteorological variables and CO2 variables could make spurious values in the multivariate data assimilation, while the one-way multivariate data assimilation 40 Figure 2. 10. Ensemble spread of atmospheric CO2 analysis on the lowest layer of model from (a) the uncoupled data assimilation, (b) the multivariate data assimilation, and (c) the one-way multivariate data assimilation, after two months of analysis under ALL LEVELS experiments [unit: ppmv] 41 Figure 2. 11. Ensemble spread of CO2 fluxes analysis at the surface from (a) the uncoupled data assimilation, (b) the multivariate data assimilation, and (c) the one-way multivariate data assimilation, two months of analysis under ALL LEVELS experiments [unit: 10-9kg/m2/s] 42 allows the wind field to give essential information to CO2 variables and is not affected by their sampling errors. Figure 2.10 and Figure 2.11 show the ensemble spread of the analysis for atmospheric CO2 concentration and surface CO2 fluxes. Since the uncoupled data assimilation diverges, the ensemble spread has a large value without overall physical meaning. On the other hand, both multivariate data assimilations have a larger spread over the ocean rather than over the land because the observations of atmospheric variables are based on the rawinsonde distribution, which is mainly over the land. In this context, the multivariate data assimilation has less spread over the land than the one-way multivariate because the uncertainties of CO2 variables are linked to those of not only wind fields but also the other atmospheric variables in the multivariate data assimilation. In addition, the area where the atmospheric transport is active, over North Atlantic Ocean, is emphasized with the large spread of both atmospheric CO2 and surface CO2 fluxes. 2.5.2.2. ?OCO + AIRS? experiment and ?OCO? experiment We examined only the one-way multivariate data assimilation for these two experiments since we found it was the optimal way to analyze CO2 according to the results from ALL LEVELS experiments. Since we use the same observations of the atmospheric variables, and in the one-way multivariate approach the analysis of the standard atmospheric variables is not changed by assimilating atmospheric CO2, the results of atmospheric variables in the analysis are the same in these two experiments as those shown in the Section 2.5.2.1. Figure 2.7 (e)-(f) and Figure 2.8 (e)-(f) show 43 that both of OCO+AIRS and OCO experiments have comparable results with the ALL LEVELS experiment in the one-way multivariate data assimilation scheme. The RMS error for the atmospheric CO2 concentration becomes a little larger with 0.026 ppmv and 0.028 ppmv at the end of two-month analysis under OCO+AIRS experiment and OCO experiment, respectively, than that under the ALL LEVEL experiment. This is really a minor degradation when one considers the observation error of 1.0 ppmv. The accuracy of surface CO2 fluxes is not degraded visibly either. Indeed, the RMS error is only 2.2?10-10 kg/m2/s larger in OCO experiment than the ALL LEVEL experiment after two months of data assimilation. This means that the observation of CO2 concentration near the surface plays a very important role in estimating surface CO2 fluxes. From the little difference between the results of ?OCO+AIRS? and ?OCO? experiments, one may conclude that AIRS CO2 observations may not be useful for estimating surface CO2 fluxes. However, this may be also due to the systematic shortcomings of the forecast model we used. Since SPEEDY-C has only seven vertical layers and the parameterization of convection scheme is relatively simple, the forecast model can underestimate the potential impact of AIRS CO2 retrievals, which have a strong sensitivity in the upper troposphere, not near the surface. Indeed, the simulated AIRS CO2 observations do not significantly improve the atmospheric CO2 analysis at the levels where the observations are not made in our experiments. With the realistic system of LETKF/CAM3.5, however, an experiment with AIRS CO2 retrievals shows some improvement of analysis and forecast in the atmospheric CO2 fields not only at the level of highest averaging kernel but also extending to other 44 vertical layers (Liu et al., 2009). CAM3.5 is more sophisticated model with higher resolution compared to SPEEDY-C: 2.5?? 1.9? horizontal resolution and 26 vertical levels up to 3.5 hPa. Improved deep convection schemes (e.g., Neale et al. 2008) could also improve the vertical mixing of atmospheric CO2 in the forecast so that the AIRS observation may constrain the atmospheric CO2 better. Pak and Prather (2001) suggested that satellite observations of CO2 in the upper troposphere could provide a major constraint for the net carbon fluxes over the tropical land within their inversion method. In summary, although AIRS CO2 information has little effect on constraining the surface fluxes in the SPEEDY model, it may be more effective in a more advanced system. Thus, from the ?OCO+AIRS? and ?OCO? experiments, we cannot conclude that the AIRS CO2 retrieval is not able to improve the CO2 analysis near the surface, but we only stress that the instrument which has higher sensitivity of CO2 near the surface can be more useful for analyzing the atmospheric CO2 near the surface and surface CO2 fluxes. 2.6. Summary and discussions First, we developed a model, SPEEDY-C, to simulate atmospheric CO2 by modifying an atmospheric GCM of intermediate complexity, the SPEEDY model. We confirmed that the performance of SPEEDY-C in transporting carbon is reasonable compared to the results of the NCAR CCM. The comparison supports the use of SPEEDY-C for the OSSEs in this study given that it is a very fast model, and that we can address many questions about data assimilation methods for atmospheric 45 carbon and surface fluxes that would be unfeasible using a high-resolution, computationally expensive GCM. We then investigated three types of analyses by building different groups of state vectors in the Local Ensemble Transform Kalman Filter (LETKF) formulation and testing them through OSSEs: carbon-univariate, multivariate, and one-way multivariate data assimilation. Multivariate CO2 data assimilation experiments were performed for the first time, and the results indicate that multivariate EnKF assimilation is much more effective in estimating both atmospheric CO2 and surface carbon fluxes, even in the absence of observations or prior estimations of surface fluxes. Of the two multivariate schemes applied here, one-way multivariate data assimilation has better results than the fully multivariate analysis because it permits the error statistics of only the relevant variables to interact with in terms of CO2 analyses. According to the ?OCO+AIRS? and the ?OCO? experiments, it can be concluded that the surface CO2 fluxes can be estimated reasonably if the atmospheric CO2 concentration can be observed near the surface. 46 Chapter 3: Coupling SPEEDY-C to VEGAS with SLand 3.1. Introduction Human activities have increased the emission of CO2 into the atmosphere since the industrial revolution. About half of released CO2 is absorbed at the surface by land or ocean, and the rest of it remains in the atmosphere (Figure 1.2). The amount of CO2 uptake at the surface, however, has significant temporal variability with respect to the climate whereas the anthropogenic emission does not fluctuate much with the seasons, but rather increases monotonically. For example, one can easily see that the global CO2 growth rate during El Ni?o (La Ni?a) years becomes larger (smaller) in Figure 1.2. This relation between the variability of atmospheric CO2 and ENSO has been confirmed by many previous studies (Bacastow, 1976; Keeling and Revelle, 1985; Braswell et al., 1997; Rayner et al., 1999; Jones et al., 2001; Zeng et al., 2005). Thus, CO2 absorption capacity of land and ocean causes this difference between the emission and the mean and variability of atmospheric CO2 concentration, shown as the green shading in Figure 1.2. And the amount of uptake by land and ocean surface exhibits both seasonal and interannual variability, which is obviously related to the climate. Thus, the response of land and ocean uptakes to a given climate condition is very important to project future climate in terms of how much land and ocean would uptake atmospheric CO2 released from human activities, and how long these reservoirs can contain it. From a number of studies based on the ground based 47 measurements, the regional distribution of surface CO2 fluxes had been estimated on continental scales and there has been progress in understanding the global and regional carbon cycles. But, we still need more detailed understanding of where and how much the atmospheric CO2 sinks and releases by land and ocean take place even under the current climate. It is not a trivial problem to understand because the climate and the CO2 exchange process over land and ocean are linked to each other in rather complex ways of interactions and feedbacks. In other words, the problem is highly nonlinear. In the previous studies on the oceanic CO2 uptake, many of them agree on relatively modest contribution of ocean to the variability of atmospheric CO2 (Knorr, 2000; Bousquet et al., 2000; Feely et al., 2002; Rodenbeck et al., 2003). On the other hand, the atmospheric CO2 uptake by the land surface is responsible for most of the variability of atmospheric CO2 (Bousquet et al., 2000; Gurney et al., 2002; DeFries et al., 2002; Rodenbeck et al., 2003; Friedlingstein et al., 2006). The surface CO2 fluxes over the land change with larger amplitudes depending on the climate conditions, whereas the oceanic CO2 fluxes are considered to have less variability in the interannual time scales. Based on this finding we decided to develop a coupled atmosphere-vegetation model in order to create a more realistic ?nature? with time-varying surface CO2 fluxes over land. Therefore, we coupled the SPEEDY-C model with the dynamic terrestrial carbon model VEgetation-Global-Atmosphere-Soil, VEGAS, (Zeng et al, 2005), which is turn coupled to the physical land surface model Simple-Land, SLand (Zeng et al., 2000). With this system, we expect to simulate surface CO2 fluxes that 48 evolve seasonally and interact with climate anomalies. For the oceanic CO2 fluxes, we have used the prescribed monthly means estimated by Takahashi et al. (2002) with a global mean rate of -2 PgC/yr. It is reasonable assumption because the variability of surface CO2 flux is dominant over the land on the interannual time scale. This coupled system should not only create a more realistic nature run with variable surface carbon fluxes, but also make running an imperfect model OSSE possible. In a simulation mode of assimilation experiments, we can use the coupled system as the nature while the uncoupled SPEEDY-C model continues to be used for the ensemble forecast. Thus, this coupled system allows us to deal with the model error and to see the performance of data assimilation in the case where the model error is one of the serious problems, as it is in the real world. Section 3.2 describes the way we coupled SPEEDY-C with VEGAS and SLand, and there is a short summary on a spin-up run in Section 3.3. Results are shown in Section 3.4 and we summarize and discuss the coupling work in Section 3.5. 3.2. Methods 3.2.1. Interface among atmosphere, vegetation, and land models SPEEDY-C is coupled with VEGAS-SLand as described in schematic Figure 3.1. Details of each model are described in Zeng et al. (2000) for SLand, Qian (2008) for VEGAS, and Molteni (2003) for SPEEDY. In this section, the interface among 49 Figure 3. 1. Schematic diagram of interface among the coupled components of atmosphere (SPEEDY-C), vegetation (VEGAS), and land surface (SLand). Variables in the interface are described in section 3.2.1. (Prec-precipitation, Tairs-temperature near the surface, qairs-specific humidity near the surface, VsE-wind speed near the surface, Rsnet-net shortwave radiation at the surface, Evap-evaporation, FTs-sensible heat, Ts- surface temperature at layer 1 (top layer), Swet-soil wetness, Ts2-soil temperature at layer 2, Runf-runoff, gf-growth factor, vegc-vegetation cover, Zrough-roughness length, ali-leaf area index, FSWds-downward shortwave radiation at the surface, NEPa-surface CO2 fluxes between atmosphere and land, vegcmc-annual mean of vegetation cover) the coupled components necessary to develop a coupled system is discussed. First, SPEEDY-C gives SLand the information of precipitation, the temperature, specific humidity, and wind speed near the surface, and the net shortwave radiation at the surface. In turn, SLand provides SPEEDY-C with evaporation, sensible heat, surface temperature, and soil wetness over the land. We still calculate the surface fluxes over ocean with the formulation in the original version of SPEEDY because VEGAS- SLand is designed only for the land surface processes. In the interface between SLand and VEGAS, SLand provides VEGAS with soil temperature, runoff, and soil wetness while VEGAS returns the CO2 dependent growth factor, vegetation cover, roughness 50 length, and leaf area index to SLand. Lastly, in the interface between atmosphere and vegetation, SPEEDY-C provides temperature near surface and downward shortwave radiation at the surface to VEGAS, and VEGAS calculates the surface CO2 fluxes between the atmosphere and vegetation and updates the vegetation cover annually. In order to accelerate the spin-up process, the time step of the vegetation model is set up as one day while SPEEDY and SLand have the same time step of 20 minutes. Thus, the input variables of atmosphere and land models to the VEGAS are averaged at every 00Z daily. 3.2.2. Additional boundary conditions Since VEGAS-SLand requires the boundary forcing of topographic gradient and ice cover, those fields are obtained by interpolating a fine-grid data (GLOBE task team, 1999; Peltier, 1994) to the SPEEDY-C resolution (Figure 3.2). 3.2.3. Adjustment of land-sea mask SPEEDY has a fractional land-sea mask whereas VEGAS-SLand use an integer mask. Both models use unity for the land and zero for the ocean surface. But SPEEDY has fractional numbers around the coastline between land and ocean, proportional to the ratio of land and ocean in terms of area. Thus, we made VEGAS- SLand accept any point where the land-sea mask of SPEEDY is not zero as a land surface point. Furthermore, SLand computes the variables only over land and VEGAS only over land without ice so that the index of land for SLand computation and that for 51 VEGAS have been calculated according to the land-sea mask computed in a way described above and ice-cover boundary information. Then, these indices are given for the input of VEGAS and SLand systems to recognize whether the grid point belongs to land with no ice, ocean, or ice-covered land. Figure 3. 2. Additional boundary forcing for VEGAS: (a) ice cover data with 1??1? resolution (Peltier, 1994), (b) a gradient of topography data with 1??1? resolution (GLOBE task team, 1999), (c) ice cover data interpolated to SPEEDY-grid T30 resolution, and (d) a gradient of topography interpolated to SPEEDY grid system. 3.2.4. Soil moisture and tropical rainfall over land The original version of SPEEDY uses a seasonally changing climatology for the soil moisture fields. Before performing the coupling, we compared the prescribed soil wetness of original SPEEDY with that from SLand-VEGAS offline run forced by 52 Figure 3. 3. Annual mean of soil wetness in (a) VEGAS-SLand (LV) offline simulation forced by SPEEDY climatology, (b) the prescribed boundary condition used in the original version of SPEEDY. (unit: dimentionless) 53 a climatology of atmospheric variables from SPEEDY, and found that the soil wetness of SPEEDY is significantly different from that of SLand (Figure 3.3). By definition, soil wetness (Swet) is the relative soil water saturation as a ratio of modeled soil moisture (mm) to the maximum value, which is 500 mm in SLand and 350mm in SPEEDY, so that Swet varies from 0 to 1. The prescribed boundary forcing in SPEEDY has highly saturated soil moisture over the tropical land whereas SLand-Vegas has a maximum value of soil wetness less than unity. Zeng et al. (2008) validated the modeled soil moisture by SLand on seasonal, interannual and longer timescales. However, it turned out that the SPEEDY is tuned in such a way that it requires large areas where the soil moisture is saturated in order to maintain a realistic precipitation over the tropical land, but this large soil moisture is not realistic. Indeed, with the modeled soil moisture calculated by SLand, the fully coupled atmosphere-vegetation-land model did not have rainfall over the tropical land and those areas became dry like deserts. Thus, we had to tune the modeled soil moisture from SLand before giving it to the atmosphere and vegetation components. We made the soil wetness saturated over the tropics to let SPEEDY produce realistic precipitation by multiplying the soil moisture with Gaussian weights dependent of latitude between -20?S and 20?N. This resulted in a pattern of soil moisture similar to the original SPEEDY climatology and produced reasonable ranges of rainfall over the tropical land. 54 3.3. Spin-up It is necessary to obtain an equilibrium state of the coupled system through the spin-up run. Here, it is assumed the equilibrium state is reached when the annual mean of NEP (Net Ecosystem Productivity) converges to zero. Because the SLand- VEGAS is computationally economic compared to SPEEDY and we need to run the soil and vegetation models for hundreds of years in order to have the convergence of vegetation and soil variables to the proper states, an offline spin-up run of SLand- VEGAS was done first with the atmospheric forcing of SPEEDY climatology and then we run a fully coupled system. We ran the SPEEDY for nine years to get seasonally varying climatology of variables such as precipitation, temperature, specific humidity, wind, and radiation at the surface, which were then used to force the Vegetation-Land model. Under the given SPEEDY climatology, the SLand-VEGAS offline simulation continued for 600 years. During the first 200 years, an ?accelerator? factor was used to help the soil carbon pool reach the equilibrium state relatively fast. From the late states of the variables of SLand-VEGAS offline run, we let the fully coupled system run for 30 years where the prescribed ocean fluxes of CO2 have been included with a magnitude of -2 PgC/yr (Takahashi et al. 2002) and with no anthropogenic fluxes. 55 3.4. Results 3.4.1. Offline Land-Vegetation spin-up run Figure 3.4 shows that the vegetation-land model has reached an equilibrium state of Gross Primary Productivity (GPP), Net Primary Productivity (NPP), Net Ecosystem Productivity (NEP), Carbon in vegetation, and carbon in soil. The global total GPP converges to 110 PgC/yr, NPP to 55 PgC/yr, vegetation carbon pool to 550 PgC, and soil carbon to about 2000 PgC. These are comparable with those resulted from Zeng et al. (2005) which includes the experiment forced by the observed record of real atmospheric variables: The global total GPP is 122 PgC/yr, NPP is 58 PgC/yr, vegetation carbon pool is 550 PgC, and soil carbon is 1850 PgC. Figures 3.5 and 3.6 show the spatial pattern of NEP, NPP, GPP, respiration, carbon in different vegetation pools, and soil resulting from the offline spin-up run. These results are also reasonable compared to the results in Zeng et al. (2005) which used the same vegetation and land model with real atmospheric observations. 3.4.2. Fully coupled atmosphere-vegetation-land spin-up run The global total GPP is about 90 PgC/yr, NPP about 40 PgC/yr at the end of 30-year run. NEP converges to zero annual mean (Figure 3.7). Compared to the result of NPP from the offline simulation, NPP from the coupled spin-up simulation has a relatively smaller value. We could find the reason of this from the difference of precipitation over land between the offline simulation and coupled run (Figure 3.8). The climatology of SPEEDY used for the forcing in the offline spin-up has generally larger precipitation over land than the precipitation calculated by interactive mode of 56 (c) (d) (e) (a) 0 600yr 0 600yr 0 600yr 0 600yr (b) c d Figure 3. 4. Time series of global total of major variables in VEGAS during offline spinup simulation with SPEEDY climatology: (a) GPP (Gross Primary Productivity: black), NPP (Net Primary Productivity: green), and NEP (Net Ecosystem Productivity: yellow), (b) Cleaf (leaf carbon: black), Croot (root carbon: green), and Cwood (wood carbon: yellow), (c) Csfast (fast soil carbon:black), Csmed (intermediate soil carbon:green), and Csslow (slow soil carbon: yellow), (d) Cb (total biosphere carbon, i.e. soil carbon + vegetation carbon: black) , Cvege (vegetation carbon: green), and Csoil (soil carbon: yellow). Values are averaged annually. 57 Figure 3. 5. Annual mean fields of NEP (kg/m2/yr), NPP (kg/m2/yr), GPP (kg/m2/yr), Ra (autotrophic respiration, kg/m2/yr), Rh (heterotrophic respiration, kg/m2/yr), and Cb (kg/m2) for the last year of 600-year Land-Vegetation offline spin-up. 58 Figure 3. 6. Same as Figure 3.5 except for Cvege, Csoil, Cleaf, Csfast, Cwood, and Csslow (unit: kg/m2). 59 coupled simulation. Especially over the southern America, there is large dry area in the coupled simulation compared to the climatology of SPEEDY. Thus, this environment in the coupled system could not have as much vegetation as the atmospheric condition with SPEEDY climatology so that NPP converged to a lower level in the coupled spin-up simulation than in the offline run. Vegetation carbon pool has a value of 380 PgC, and soil carbon is about 1780 PgC after 30 years. Figure 3.9 shows the spatial distribution of NEP in each season. In general, the results indicate that the vegetation uptakes atmospheric CO2 during growing season whereas the land surface releases CO2 into the atmosphere during the vegetation decaying season. Although it is necessary to deal with details of the results more carefully in order to use the nature run for other applications, for our purposes we intend to use the coupled system in a simulation where the only requirement is that the ?nature? produce surface carbon fluxes with a reasonable seasonal variability, and that the ?forecast? model be significantly different, as it happens in the real world, and the coupled system that we developed satisfies this requirement. 60 Figure 3. 7. Time series of monthly mean of global total (a) GPP, (b) NPP, and (c) NEP for last 20 years? fully coupled spin-up. 61 Figure 3. 8. (a) Annual mean of precipitation from SPEEDY climatology (nine-year mean) and (b) last 10-year mean precipitation of coupled SPEEDY-VEGAS-SLand simulation for 30 years (unit: mm/d). 62 Figure 3. 9. Seasonal mean of NEP (surface carbon flux) in (a) DJF, (b) JJA, (c) MAM, and (d) SON of last year in the fully coupled atmosphere-vegetation-land spin-up. (positive: carbon sources, negative: carbon sinks) 3.5. Summary We coupled SPEEDY-C with a terrestrial dynamic carbon model (VEGAS) and a simple physical land model (SLand). Among the components of atmosphere, land, and vegetation, the heat, water, and energy fluxes are obtained through coupling. Then, we made a spin-up run to get an equilibrium state of land and vegetation with the SPEEDY-C atmosphere through an offline simulation of land and vegetation and a fully coupled simulation of three components in order. Since the coupled model produces reasonable CO2 fluxes over land, it is possible to use it as 63 ?nature?, since it has a more realistic carbon cycle with time-varying CO2 fluxes which we will try to estimate through EnKF data assimilation using the SPEEDY-C model. The coupled system can also be used for other climate studies associated with CO2 and dynamic vegetation, although users must be aware that the soil moisture was tuned in the tropics. 64 Chapter 4: Imperfect Model Simulation: Bias Correction, Adaptive Inflation and Estimation of Observation Errors 4.1. Introduction In Chapter 2, we have seen the performance of LETKF on estimating surface CO2 fluxes under the perfect model scenario. Without any direct observation or a- priori information of surface carbon fluxes, data assimilation could produce a reasonable estimate of these fluxes in the multivariate analysis system. Now, it is necessary to consider a more realistic case in which the model error cannot be ignored. Chapters 2 and 3 show we can use for this purpose two different models: the SPEEDY-C and the SPEEDY-VEGAS. Thus, we are able to do OSSEs (Observing System Simulation Experiments) under the imperfect model assumption by using SPEEDY-C for the forecast model and SPEEDY-VEGAS for the nature run. In order to deal with model errors, we applied two additional techniques: a bias correction and an adaptive inflation. First, we implemented a simple model bias correction which is similar to that introduced by Danforth et al. (2007). Next, we applied an advanced method, a simultaneous estimation of adaptive inflation and observation errors, introduced by Li et al. (2009, hereafter referred as LI09). With this method, we could estimate an adaptive inflation as well as the observation errors within the data assimilation (?online?). Both methods are introduced in Section 4.3 65 which also includes discussions on the difficulties of estimating inflation for a variable for which there are no remote observations such as surface CO2 fluxes. This chapter is organized as follows: The experimental design is discussed in Section 4.2. Section 4.3 describes a method to correct the model bias and shows the results from that. Next, an adaptive inflation technique applied to our case is presented in Section 4.4 and the results are shown. Lastly, Section 4.5 has a summary of this chapter. 4.2. Experimental Design The coupled atmosphere-land-vegetation model, SPEEDY-VEGAS, introduced in Chapter 3 is now used for the nature run while the SPEEDY-C in Chapter 2 is used for the ensemble forecast. Since the analysis occurs every six hours, we forced the nature run to update surface CO2 fluxes over the land every six hours by changing the time step of vegetation model from one day to six hour when we created the nature run. In the nature run, CO2 fluxes over land are calculated by the coupled model, whereas those over the ocean are prescribed monthly (Takahashi et al., 2002) with a rate of -2 PgC/yr. In the imperfect model simulation, there is no fossil fuel emission in the nature run. Initial conditions of (U, V, T, q, Ps, CO2) for the 20-ensemble forecast were created by adding 20 random perturbations to the 20 states which were chosen randomly in time from the one-year nature run, as in the perfect model simulation. For the surface CO2 fluxes, the states at another 20 timesteps from the nature run were chosen, and random perturbations were added to them. The magnitudes of the 66 random perturbations added to (U, V, T, q, Ps, CO2) are the same as those used for the perfect model experiment in Chapter 2. For the surface CO2 fluxes, a standard deviation of the random perturbation is 1.0?10-10 kg/m2/s. Again, the initial conditions of surface CO2 fluxes do not have any a-priori information as shown in Figure 4.1. For the imperfect model simulation, we set a fixed multiplicative inflation of 8% for all the dynamic variables for the control run and the bias correction experiment without an adaptive inflation technique. Later, for the experiments including adaptive inflation and observation errors estimation, the initial guesses of observation errors were given as twice the true values (Table 4.1), and the inflation started from 10% at the initial time. The way observations of (U, V, T, q, Ps, CO2) were generated is same as the case of perfect model simulation (Chapter 2). Here we only examined ?ALL LEVEL? experiment described in Section 2.4.1, and only the 1- way multivariate data assimilation has been examined since we found it had an optimal performance in Chapter 2. 4.3. Bias correction 4.3.1. Methodology Under the imperfect model simulation, one can expect a significant difference of climatologies between the forecast run and the nature run. With an imperfect model, as when the true atmosphere is compared with a model, the difference in climatology is not associated with forecast errors due to the errors in the initial 67 Figure 4. 1. (a) True state of surface CO2 fluxes at the initial time step, (b) initial condition for surface CO2 fluxes. (unit: 10-9kg/m2/s). (positive: CO2 sources, negative: CO2 sinks) 68 Figure 4. 2. Schematic plot to describe the low-dimensional correction of model bias: blue arrow stands for nature (or reanalysis), and green arrows indicate every 6 hour forecast stating from the nature run. The departures of 6-hour forecasts from the nature run happen to be caused by the discrepancy between the forecast and the nature runs. The two-month averaged field of those departures is considered as the model bias, and hence it is subtracted from the background at every analysis step. condition, but it indicates a presence of systematic errors in the forecast model. It is caused by the model deficiencies such as inaccurate forcings and parameterizations (Danforth et al., 2007). We applied the method of low-dimensional correction in Danforth et al. (2007) for estimating and correcting model bias. Figure 4.2 schematically describes the simple way model bias is estimated. We made a series of 6 hour forecasts which restart from the initial conditions of the nature run every six hours. Thus, the time average of these departures can be considered as an estimate of the forecast bias. We calculate this model bias over a period of two months. The model bias is then subtracted from the 6 hour forecast before every analysis step. The bias correction is applied to only the atmospheric variables, not to the CO2 variables. We do not know in the real world the truth (?nature?), but the extensive experience with atmospheric data assimilation and the reanalyses data sets assure we have good enough estimates of the 4D-state of the atmospheric variables u, v, T, q and ps to correct their biases with this method. On the other hand we do not yet have ?CO2 reanalyses? that we could use to correct the model bias of CO2 forecasts. 69 4.3.2. Results from the bias correction experiment Figure 4. 3. Estimated model bias from the low-dimensional correction in (a) wind (m/s) (shading: divergence, unit: 10-7/s), (b) T(K), (c) q (g/kg) on the lowest layer, and (d) surface pressure (Pa), for two months of analysis period (model minus nature, positive: forecast overestimates, negative: forecast underestimates) Figure 4.3 shows that major features of the model bias appear over the tropics where soil moisture fields were increased in the SPEEDY-VEGAS over the tropical land in the coupled system in order to improve the precipitation pattern as described in Section 3.2.4. The differences in soil moisture dominate the differences between the nature run. The forecast model has biases of relative divergence and higher surface pressure over the equatorial land compared to the nature run. Moreover, there 70 is lower temperature and higher precipitation over the land near 20? S. If one compares the climatologies of soil moisture, evaporation, and precipitation in the forecast model with those in the nature run (Figure 4.4), the model bias can be explained as follows: High (low) soil moisture on the equatorial land ignites the strong (weak) evaporation and convection, which corresponds to the strong (weak) convergence on the surface layer. That causes more (less) precipitation over the area. The strong (weak) evaporation can explain the low (high) temperature, high (low) humidity on the lowest layer. After correcting for this estimated model bias, we obtained a remarkable improvement in the analysis of variables that have observations such as wind, temperature, humidity and even atmospheric CO2 concentration (Figure 4.5). It is encouraging that we obtained an improvement on the atmospheric CO2 analysis without correcting its bias. This is because the atmospheric CO2 transport is predicted better after the model bias of wind is corrected. Since the atmospheric CO2 is also linked to the surface CO2 fluxes, however, the analysis of the atmospheric CO2 can get a negative effect from a poor analysis of the surface CO2 flux fields. Indeed, the analysis of surface CO2 fluxes still diverged with time even though they are better due to the indirect effect from the bias correction of wind. Figure 4.6 shows the spatial distribution of analysis errors in the zonal wind, the atmospheric CO2 on the lowest layer, and the surface CO2 flux fields without and with bias correction. Bias correction clearly gets rid of the most of the mean analysis errors in the wind fields. Other meteorological variables also benefit from the bias correction remarkably as 71 (a) (b) (c) Figure 4. 4. Difference of the climatologies between the forecast model (SPEEDY-C) and the nature run (SPEEDY-C coupled with VEGAS-SLand); (a) soil moisture (mm), (b) evaporation (W/m2), and (c) precipitation (mm/d). (positive: the forecast has larger values than the nature run, negative: the forecast has less values than the nature) 72 Figure 4. 5. RMS errors of (a) U (m/s), (b) V (m/s), (c) T (K), (d) q (kg/kg), (e) atmospheric CO2 (ppmv) on the bottom layer, (f) surface CO2 fluxes (10-8 kg/m2/s) in the analysis. Red lines indicates the control run, and blue lines result from the bias correction. 73 Figure 4. 6. The spatial distribution of errors in the analysis (analysis minus truth): zonal wind (m/s) (a) without bias correction, (b) with bias correction, atmospheric CO2 (ppmv) on the bottom layer (c) without bias correction, (d) with bias correction, and surface CO2 fluxes (10-9 kg/m2/s) (e) without bias correction, and (f) with bias correction, after two month of data assimilation. 74 indicated by the RMS errors (Figure 4.5). Since the atmospheric CO2 is observed every other grid point, the analysis without bias correction is not too bad in general, but the bias correction of wind fields obviously improves the analysis of atmospheric CO2 fields. In the Figures 4.6 (e) and (f), the global maps of surface CO2 flux analysis have very noisy errors all over the region even after the bias correction although the spatial phases of positive and negative signals seem generally matched with true state and the strength of these noisy signals is slightly weaker. 4.4. Adaptive inflation So far, a multiplicative covariance inflation of 8% (obtained by tuning the inflation parameter in the atmospheric analysis system) has been used in order to prevent the analysis system from underestimating the background error covariance, which is fixed in time. With a constant multiplicative covariance inflation, the bias correction made a significant improvement on meteorological variables in Section 4.3. However, the improvement in the CO2 analysis was only marginal in the imperfect model experiment. Now, we implement an adaptive inflation technique which can reflect the observations efficiently according to the quality of the analysis and the background in time. Also, we deal with the inflation of the atmospheric CO2 separately from that of other meteorological variables, in addition to using a different inflation on each vertical layer. The adaptive inflation technique allows us to avoid tuning these inflation factors, since it would be very expensive or even infeasible to find reasonable levels of all these inflation factors by manually tuning. 75 4.4.1. Methodology for variable having observation The basic idea on the adaptive inflation comes from LI09. Let?s assume that the background error covariance, bP , and the observation error covariance, R , are correctly specified. If the errors of the background and the observations are not correlated, then one can write an equation that relates the observational increments (that we can measure) and the error covariances as follows: +>=< ?? TbT bobo HHPdd R (4.1) where bo?d is the difference between observations and the corresponding background at the observation space and the brackets indicate an average over many cases. However, we know that it is necessary to inflate the bP since it tends to be underestimated in practice. Thus, bP in Equation (4.1) should be multiplied by the inflation factor, ? (larger than unity). From this, and transposing (4.1) the adaptive inflation can be obtained as )( )( Tb bo T bo Tr Tr HHP Rdd ?=? ?? (4.2) However, we cannot use this equation to estimate ? because (4.1) or (4.2) are based on the assumption that the observation error covariance is accurate, something not true in practice. If R is not known precisely, the use of (4.2) fails because its errors are compensated by the value of ? . Another diagnostic on background errors comes from the combination of bo?d and analysis-minus-background ba?d (Desroziers et al., 2005). From this method, the relationship TbT boba HHPdd >=< ?? (4.3) 76 has been derived by Desroziers et al. (2005), and Equation (4.3) produces another alternative formulation of an estimate of the inflation: )( Tb bo T ba Tr HHP dd ??=? (4.4) after multiplying the background error covariance matrix by the inflation factor and transposing as before. Equation (4.2) and (4.4) will be referred to as OMB2 and AMB*OMB estimations, respectively. In addition, LI09 pointed out that it is necessary to have a correct observation error covariance, R , for an accurate estimate of adaptive inflation from those methods. Thus, LI09 calculates the observation errors simultaneously from the relationship for R proven by Desroziers et al. (2005): Rdd >=< ?? T boao (4.5) where ao?d ( bo?d ) are the difference between the observation and analysis (background) in observation space. Taking the transpose of Equation (4.5) allows estimating the observation errors as follows: ? = ?? ??== ip j i b j o j a j o jibo T aoio pyyyyp 1 2 /))((/)()()~( dd? (4.6) where ojy is the value of observation j and ajy , bjy are their analysis and background at the observation space j. Equation (4.6) calculates the variance of any subset of observations i with pi observations, and LI09 called OMA*OMB estimation. LI09 estimated the adaptive inflation factor and observation error variance simultaneously with temporal smoothing (Kalnay 2003, Appendix C) in order to reduce the problem of sampling error. That is, one can tune a ?forgetting? parameter 77 ? > 1.0 which inhibits temporally drastic changes in the estimated adaptive inflation and observation errors. Thus, the final estimates are determined by a balance between the estimates at the previous and the current analysis time steps according to the magnitude of ?. Since the ratio of the weight for the current estimate to that for the previous one is ? -1, the resultant estimates of adaptive inflation and observation errors forget the previous estimates less as ? decreases. LI09 mentioned that the final estimate is not sensitive to this forgetting parameter, and our work also confirmed that. While LI09 tested for the atmospheric variables and estimated both inflation and observation errors assuming one value of inflation for all model grids, we applied this technique to our carbon cycle data assimilation system and estimated the adaptive inflation and observation errors on each vertical layer separately. In addition, for the atmospheric CO2 on the lowest layer, the inflation over the land and the ocean were estimated separately. That is because the observation network of wind fields is dense over the land (Figure 2.3) so that it tends to suppress the ensemble spread of not only the wind field but also the atmospheric CO2 over land (Figure 4.7) if we use the same inflation for all grid points at the lowest layer?s atmospheric CO2. The spread in Figure 4.7 represents the value after multiplying the inflation factor to the background ensemble spread, which is used directly for the analysis cycle, in a test experiment using a single value of inflation for the atmospheric CO2 on the lowest layer. Figure 4.7(c) indicates that there is a lack of ensemble spread over the land in atmospheric CO2 background. This result can be explained as follows: atmospheric CO2 is transported by the wind fields so that the improved wind analysis reduces the 78 Figure 4. 7. Spread of background ensemble in (a) zonal wind (U), (b) meridional wind (V), and (c) atmospheric CO2 concentration after three weeks of data assimilation under the experiment using a single inflation for the atmospheric CO2 at the lowest layer. 79 uncertainty of atmospheric CO2. However, the wind observations are located mainly over land. Therefore, the analysis cycle of one-way multivariate data assimilation reduces the ensemble spread of atmospheric CO2 analysis more over land than over the oceans when we use the horizontal average of adaptive inflation. However, CO2 observations, which have a uniform distribution in the horizontal, should be reflected in the analysis not only over the ocean but also over the land. That is because the lowest layer?s CO2 concentration is directly related to the surface CO2 fluxes and we have to resolve the dominant variation of surface CO2 fluxes over land as precisely as possible. 4.4.2. Adaptive inflation for a variable having no observation Now, we need to take account of the inflation factor for a variable which has no observation such as the surface CO2 fluxes in our case. That is because the method prescribed in Section 4.4.1 is based on measuring observational increments, and therefore is only valid for variables that are observed, but one important component of state vector in this study, namely surface CO2 fluxes, does not have observations (at least no remotely sensed observations). We conducted first several sensitivity experiments with different fixed inflations for the surface CO2 fluxes. We found that a relatively large inflation, compared to the inflation for other dynamic variables of analysis, made the analysis diverge and only smaller values such as less than 5% kept the analysis stable. We now give a mathematical argument in support of this experimental result, namely that unobserved variables should have a smaller inflation than observed variables. 80 Consider the simplest case of a state vector with just two components: one observed ( 1bx ), the other not observed ( 2bx ). State vector bx includes 1bx and 2bx , and the observation operator H should be H=(1 0) so that 11 : oobb yyy ==Hx since 2bx is not observed. Let ( ) T 21 ??=? be the vector of inflation factors (>1), then ( ) 21121211 bobo yy ?? ?+=? . We could calculate the inflation factor, 1? , and observation error, 21o? , by the method in Section 4.4.1, but the equations do not give any information about 2bx and 2? . Within the LETKF, )HxK(yxx boba ?=? where 1)( ?+= RHHPHPK TbTb and ??? ? ??? ?= 2 221 21 2 1 bbb bbbb ??? ???P (4.7) Then, ?? ? ? ??? ?= 21 2 1 bb bTb ?? ?HP and K = ?1?b1( )2 ?1?b1 ?2 ?b2 ? ? ? ? ? ? ? ? ? o12 + ?1?b1( )2 . Assume that 21o? is perfectly known and small compared to ?1?b1( )2 , then we can approximate ( ) 11111 bbobo yyHxy ????=? . From the equation (4.7), xa1 ? xb1 xa2 ? xb2 ? ?? ? ?? = ?1?b1( )2 / ? o12 + ?1?b1( )2??? ??? ?1?b1 ?2 ?b2 / ? o12 + ?1?b1( )2??? ??? ? ? ? ? ? ? ? ? ? ? yo1 ? yb1( ) (4.8) 81 If the filtering is working properly, we can also assume that 22222 bbaba xx ???? ???=? with 1