Linear optimal estimation

Optimal estimation was developed for use with terrestrial retrievals where satellite observations are used to improve upon the measurements already provided by other sources. Although the necessary a priori information is not available for planetary observations, the technique is relatively easy to understand and if used correctly the solutions from this model are effectively identical to a number of other retrieval methods that have been developed specifically for planetary retrievals.

Suppose that we again represent a set of measurements by the measurement vector ym and atmospheric conditions by the state vector xn. For terrestrial applications, the properties of the atmosphere are not a complete mystery and from models and previous measurements, we have some a priori knowledge of the state of the atmosphere to within some initial error. We may then use this profile and the assumed constraints on it as the first guess and find the solution to the state vector xn which minimizes the modified cost function

0 =(ym - F(Xn))TS-1 (ym - F(Xn)) + (Xn - Xq)tS-1 (Xn - Xq) (7.20)

where ym is the measured spectrum;

F(Xn) is the spectrum calculated with the forward model;

Se is the measurement covariance matrix, which contains both estimated random and systematic measurement errors, as well as the estimated accuracy of the radiative transfer model used;

Xn is the model state vector;

X0 is the a priori state vector;

SX is the a priori covariance matrix, which contains assumed errors on the a priori state vector, together with vertical correlations.

The cost function is simply a combination of how closely the synthetic spectrum matches the measured spectrum and how far the solution deviates from the assumed a priori state vector. The optimal solution to the state vector is thus found that maximizes the closeness of fit to the measured spectrum without deviating too greatly from the a priori state vector, and the degree to which the final solution will depart from the a priori solution will depend upon the relative size of the errors contained within the a priori, and measurement covariance matrices. For linear models such as temperature retrievals, for which the synthetic spectrum is calculated as yn — y0 = K(Xn — xq), the optimal solution is found to be (Rodgers, 2000)

X = Xq + SXKt(KSxKt + S£)-1(ym - yo - K(Xq - Xn)). (7.21)

The error covariance matrix of the optimal solution to the state vector is then found to be

Since the optimal estimation technique was developed for inverting Earth observation data, where a priori knowledge of the expected atmospheric profiles is mostly good, how can such a model be used for inverting spectra from other planetary atmospheres, where we have very little a priori knowledge? While we can apply some a priori information, such as the known saturated vapor pressure profiles of certain gaseous constituents, the a priori profiles and covariance matrices for planetary retrievals are really little more than first guesses. However, it is possible to formalize the technique for use with planetary retrievals, as is discussed by Irwin et al. (2008). First of all the measurement covariance matrix, Se, may be calculated accurately from the known noise performance of the instrument and/or noting the standard variation of measured radiances. Errors in the forward model may also be estimated and added to Se. Once Se is set, the retrieval model may be tuned by considering Equation (7.21). If the errors in the a priori covariance matrix are very large, then KSXKT dominates Se in the first bracket, and an unwanted "exact" solution is obtained. Conversely, if the errors in the a priori covariance matrix are very small, then Se dominates KSXKT in the first bracket and a solution is achieved that deviates very little from the a priori estimate. Hence, it can be seen that the relative size of the errors in Se and SX may be used to tune the retrieval, with the best trade-off achieved when errors in Se and SX have approximately equal weight. How then may we construct a suitable a priori covariance matrix that incorporates sufficient vertical smoothing? The normal procedure is to make reasonable guesses of the expected errors at each altitude of the profile to be retrieved and set the diagonal elements of the a priori covariance matrix to the square of these errors. Off-diagonal elements may then be set assuming a certain degree of cross-correlation according to schemes such as (Rodgers, 2000):

where pt and pj are the pressures at the i th and j th atmospheric levels; and c is a "correlation length'', here equivalent to the number of scale heights over which we can assume the profile to be reasonably correlated. A value of c = 1.5 is commonly used (Irwin et al, 2008). The retrieval must then be tuned to ensure that the a priori profile provides sufficient smoothing. Tests are performed with a range of a priori errors, usually by simply scaling the a priori covariance matrix. If the errors are too large, the solution is unconstrained and the retrieval does everything it can to minimize the difference between measured and modeled spectra leading to ill-conditioning and large vertical oscillations in the retrieved vertical profile. If, instead, the a priori errors are set to be very small, then the solution is over-constrained and the retrieved profile differs little from the a priori profile. By tuning the a priori errors we search for intermediate conditions where KSXKT and Se contribute similarly and thus where the solution is constrained quasi-equally by the data and by the a priori profile. This leads to solutions that match the measured spectra well, but which still have sufficient smoothing, supplied by the a priori covariance matrix SX, to be well conditioned.

7.10.3 Nonlinear optimal estimation

The linearity of temperature retrievals comes from the fact that the elements of the K-matrix (the rate of change of spectral radiances with temperature) are not strongly dependent on the temperature profile. Hence, once the K-matrix is calculated, the best-fit temperature profile is found in a single step. This situation does not apply for composition retrievals, since small changes in the composition profile strongly affect atmospheric transmission, and thus the K-matrix. Hence, composition retrievals are highly nonlinear and computationally expensive since they must be performed itera-tively, and the K-matrix recalculated at every step in the worst case.

Extending the principles of optimal estimation to such nonlinear cases the difference between the spectrum computed from the nth trial measurement vector xn and that measured is used to calculate a new estimate of the trial vector xn+1 through the equation

Xn+1 = X0 + SxKT(KnSxKT + Se)_1(ym - yn - Kn(x0 - Xn)) (7.24)

where Kn is the matrix of functional derivatives (for the n th iteration), or the Jacobian (i.e., the rate of change of radiance with state vector elements for all the wavelengths in the spectrum). It should be noted that the a priori vector, x0, is used at each iteration to ensure that a priori constraints remain applied throughout the retrieval. In practice, Kn can vary greatly between iterations and the simple iteration scheme of Equation (7.25) can become unstable. In the NEMESIS optimal estimation retrieval model (Irwin et al., 2008) a modified iteration scheme, based on the MarquardtLevenberg principal (e.g., Press et al., 1992) is used where the actual modified state vector used in the next iteration, xn+1, is calculated from Xn+1 and Xn as xn+1 = Xn + ^f+X. (7.25)

The parameter A is initially set to 1.0. If the spectrum calculated from xn+i is found to reduce the cost function then xn is set to xn+1, A is multiplied by a factor of 0.3, and the next iteration started. If, however, the spectrum calculated from xn+1 is found to increase the cost function then xn is left unchanged, the parameter A increased by a factor of 10, and a new vector xn+1 calculated. The choice of multiplication parameters (0.3, 10) is somewhat arbitrary, although it is important to ensure that they are not reciprocals of one another, as this can lead to endless loops. To ensure smooth convergence, the lower number (0.3) is chosen to be greater than the reciprocal of the larger number so that A does not decrease too quickly. As the retrieval approaches its final solution, A ^ 0 and the model tends to the optimal estimate, whose error can be estimated from Equation (7.22).

For cases where the Kn matrix does not change very rapidly with the state vector, inversion is approximately linear and convergence is achieved in 2-3 steps. However, for volume mixing ratio retrievals, where Kn can vary greatly between iterations, convergence can be slower, requiring perhaps 10-20 steps.

It is interesting to compare and contrast the optimal estimation approach as applied to planetary applications with the constrained inversion technique (Conrath et al., 1998; Hanel et al., 2003) used by many research groups. In this approach, the nonlinear iterative solution may be written, using our optimal estimation formalism, as

Xn+1 = X0 + SXKT(KnSxKT + 7S£)-1 (ym - yn - Kn(x0 - Xn)) (7.26)

where Sx is now the a priori correlation matrix (which provides vertical smoothing); Se and Kn are as before; and 7 is an adjustable parameter used to fine-tune the balance between measurement and a priori constraint. When retrieving a single profile, the a priori correlation matrix Sx is equal to our a priori covariance matrix Sx, but with each row divided by the value of the diagonal component. Thus, comparing Equation

(7.26) with Equation (7.24) we can see that the two methods are essentially identical when it comes to retrieving continuous vertical profiles.