## M

where M is restricted such that the representation is reasonably realistic (M = 2; in contact systems, perhaps M = 4 or more depending on perturbations). If M is too small, the fit error will be larger, and the weight of the light curve in the modeling process will be lower than that of other light curves. This is not a good idea if the perturbations themselves are to be modeled, say, with star spot simulations, because the leverage for spot parameter determinations will then be shortened. If the modeler is not convinced of the physical existence of spots but nevertheless wishes to model the system as well as possible, the only alternative at present is to rectify the light curve of those perturbations by subtraction or division of the "unexplained" sine term components of the Fourier representation, as prescribed by the Russell-Merrill method in Sect. 6.2.1.

### 4.1.1.6 Simultaneous Fitting

EBs can be very rich data sources. Often several photometric passbands such as B and V or u, v, b, and y are available. Ideal observation sets include one or two radial velocity curves, in addition. It is important to stress that all these data are included in the analysis to yield consistent results. Simultaneous data fitting does not mean that all observations must have been made at the same time. If polarization data or apsidal motion data are included, then these data were taken at different times, possibly separated by decades. Simultaneous data fitting means that all available observations are used simultaneously in one least-squares fit. To be sure, combining all these data from different sources requires careful weighting (see Sect. 4.1.1.5). But if a consistent least-squares solution in such a situation can be found it deserves respect! The real binary system has a definite inclination i, mass ratio q, and so on, so all the data should yield the nonwavelength-dependent properties of the model identically within the error limits. From the physical point of view, model fits in separate passbands yielding different values for i or q are difficult to interpret. At best they indicate a problem with the model or some of the fixed parameters. From the numerical point of view, the "separate curve" approach involves too many free parameters. Simultaneous fitting improves the safety at which parameters are determined. Of course, with more constraints, or fewer free parameters, the fit of the individual curves may look less attractive. A typical light curve fitting in different passbands (say, u, v, b, and y), done separately, might have the following free parameters:

In the separate curve fitting, each curve has six adjustable parameters. The result may be 4 nice looking fits produced by a total of 24 parameters. In a simultaneous fit (representing the correct physics), there would be only 5 + 4 ■ 1 = 9 free parameters (five global and four curve-dependent ones). This demonstrates how simultaneous fitting can drastically reduce the number of free parameters.

A remarkable analysis of an eclipsing X-ray binary is described in Sect. 7.3.1. The analysis by Wilson & Terrell (1994) includes B and V light curves, an optical (He I) radial velocity curve, pulse arrival times, and estimations of X-ray eclipse duration. This analysis is remarkable both for the rich physics and the advantages of simultaneous fitting that it illustrates.

For a systematic discussion of simultaneous fitting of curves (or multi-experiment analysis) from a mathematical point of view, the reader is referred to Schloder & Bock (1983) and Schloder (1988). These references also contain efficient algorithms to solve such problems.

4.2 A Brief Review of Nonlinear Least-Squares Problems

Ab origine (From the beginning)

The inverse problem formulated in Sect. 4.1 is a typical nonlinear least-squares problem which in turn is an optimization problem with a special structure. More details are given in Appendix A. Here we provide a brief overview of nonlinear unconstrained and constrained least-squares.

### 4.2.1 Nonlinear Unconstrained Least-Squares Methods

Legendre (1805) first published the method of least-squares, applying it to data from the 1795 French meridian arc survey as an example. In 1809, GauB in Theoria Motus Corporum Coelestium derived the justification for the method in terms of the normal error law, showed how to obtain the errors of the estimated parameters, and also how nonlinear problems could be linearized, so that the method could be applied to the problem of nonlinear parameter estimation. He also claimed he had been using the method since 1795, 10 years before Legendre's work was published. It appears that GauB had indeed been using the method as he claimed, but had not appreciated its wider importance until Legendre's publication. For a more detailed discussion of this priority conflict, we refer the reader to Stigler (1986) or Schneider (1988).

Since the time of GauB, numerical methods for solving several types of least-squares problems, e.g., those with probablistic constraints [Eichhorn (1978) and Eichhorn & Standish (1981)] and those involving differential equations [Bock (1981), Kallrath et al. (1993) and Kallrath (1999)] have been developed and improved, and there is still much active research in that area. For a review of the methods of least-squares as known and used in astronomy, we refer to Jefferys (1980, 1981) and Eichhorn (1993).

A popular method, sometimes also called "damped least-squares" or DLS for short, is the Levenberg-Marquardt algorithm proposed independently by Levenberg (1944) and Marquardt (1963). It modifies the eigenvalues of the normal equation matrix and tries to reduce the influence of eigenvectors related to small eigenvalues [cf. Dennis & Schnabel (1983)].

Since its original invention by Levenberg (1944) over half a century ago, there have been numerous comparisons of DLS with other methods for nonlinear least-squares, such as simple step-cutting (solving a line-search problem and reducing the step-size if necessary) and derivative-free methods. Although there was a time when the Broyden-Fletcher-Goldfarb-Shanno secant method was thought to be superior, there now seems to be general agreement that DLS is similarly effective, and much simpler to code. For this reason, it can be found in commercially available software packages. DLS handles problems with exponential or logarithmic nonlineari-ties nicely and works well on problems that have only power-law nonlinearities. To achieve global convergence (i.e., convergence even for ill-chosen initial points), it is necessary to choose the damping strategy carefully.

Damped (step-size cutting) GauB-Newton algorithms that control the damping by natural level functions [Deuflhard & Apostolescu (1977, 1980) Bock (1987)] seem to be superior to Levenberg-Marquardt-type schemes and can be more easily extended to nonlinear constrained least-squares problems. To avoid confusion about the use of the word "damping," we clarify that we use it for both the damped Levenberg-Marquardt algorithm (which modifies the system matrix) and step-cutting algorithms.

### 4.2.2 Nonlinear Constrained Least-Squares Methods

A common basic feature and limitation of least-squares methods used in astronomy, but seldomly explicitly noted, is that they require some explicit model to be fitted to the data. Most models are based on physical laws or include geometry properties, and very often lead to differential equations which may, however, not be solvable in a closed analytical form. Thus, such models do not lead to explicit functions we want fit to data. We rather need to fit an implicit model (represented by a system of differential equations or another implicit model). Therefore, it seems desirable to develop least-squares algorithms that use the differential equations as constraints or side conditions to determine the solution implicitly. By a multiple shooting approach (Bock 1987), such differential equation-based side conditions can be discretized and represented as equality constraints. The demand for, and the applications of such techniques, are widespread in science, especially in the rapidly increasing fields of nonlinear dynamics in physics and astronomy, nonlinear reaction kinetics in chemistry (Bock 1981), nonlinear models in material sciences (Kallrath et al. 1999) or biology (Baake & Schloder 1992), and nonlinear systems describing ecosystems in biology or the environmental sciences.

Formally, we want to be able to solve a least-squares problem with nc constraints as a constrained optimization problem of the type mm x

{Fi(x)2 | F2(x) = 0 or > 0 e IR"'} . (4.2.1)

Using the nomenclature of Appendix A.3 and the definition (4.1.9) of the residual vector including weights, F1 (x) follows as

F1 (x) = R(x), R(x) = Y - F(x) = VW [o - c(x)]. (4.2.2)

This usually large, constrained, structured nonlinear problem is solved by a damped generalized GauB-Newton method; cf. Bock (1987). Here we describe only the equality constrained case. Starting with an initial guess x0, the variables are iterated via x^+1 = xk + ak Axk (4.2.3)

with a damping constant ak restricted to the range 0 < amin < ak < 1. In order to compute the increment ^xk, we substitute x in (4.2.1) by xk + ^xk and linearize the problem around xk. Then ^xk is the solution of the linear, equality constrained least-squares problem min {||J1(x*)Axk + F1(xk)|2 I J2(xk)Axk + F2(x^ = 0} (4.2.4)

Axk with the Jacobian matrices (see Appendix A.1)

Under appropriate assumptions about the regularity of the Jacobians Ji , there exists a unique solution Axk of the linear problem and a unique linear mapping Jk + [called generalized inverse as introduced by Bock (1981)] obeying the relations

Axk = -Jk + F1(xk), Jk+JkJ+ = J+, JT:= [J1(xk)T, J2(xk)T]. (4.2.6)

The solution Axk of the linear problem follows uniquely from the Kuhn-Tucker conditions (Kuhn & Tucker 1951)

where Xc e JRnc is a vector of Lagrange multipliers.

For the numerical solution Axk of the linear constrained problem (4.2.4), several structure-exploiting methods, for example the one by Bock (1981), have been developed which compute special factorizations of J1 and J2 and thus implicitly, but not explicitly, the generalized inverse J+. The availability of the Jacobians J1 and J2 allows rank checks in each iteration and automatic detection of violations of the regularity assumptions. In this case, automatic regularization and computation of a relaxed solution is possible (Bock 1981).

The iteration (4.2.3) can be forced to converge globally to a stationary point of the problem if the damping factors ak are chosen appropriately. In the treatment of a large number of practical problems, strategies based on natural level functions have proven to be very successful [cf. Bock (1987), the brief summary in Kallrath et al. (1993), or Appendix A]. In the region of local convergence of the full step method, the algorithm converges linearly to a solution that is stable against statistical variations in the observations. An iterate, xk, is accepted as solution x* of the nonlinear constrained problem if a scaled norm of the increments Axk is below a user-specified tolerance. As the Jacobians and their decompositions are available in each iteration, covariance and correlation matrices are easily computable (Bock 1987) for the full variable vector x*.

4.3 Least-Squares Techniques Used in Eclipsing Binary Data Analysis

Inter nos; inter vivos (Between ourselves; between the living)

This section describes the least-squares methods used and implemented in several light curve programs.

4.3.1 A Classical Approach: Differential Corrections

Differential corrections can be understood as an undamped GauB-Newton method, or in accordance with what has been said above, as an undamped LevenbergMarquardt algorithm. Thus, it is a special case of the techniques described in the previous section. However, as the Differential Correction method is widespread within the EB community5 and has achieved venerable status - it was first used to determine the parameters of EBs by Wyse (1939) and Kopal (1943) - it deserves detailed discussion.

The nonlinear least-squares problem is linearized in the following way: starting from the necessary condition v f (x) = 0 and an initial solution xk e ]R" present at the beginning of the kth iteration, a correction vector Axk is defined such that xk + Axk obeys the necessary condition. With the unweighted residual vector d(x) e ]RN, the least-squares function (4.1.11) leads to the necessary condition v f (x) = 0 ^ 2A(x)Wd(x) = 0, A e M(n, N), (4.3.1)

where M(n, N) denotes the sets of matrices with n rows and N columns, and A is given by d d

A; v (x) := — dv (x) = -— of(x), A e M (n, N). (4.3.2)

dxi oXi

Replacing the unknown solution x by xk + Axk yields the necessary condition

or component-wise

## Telescopes Mastery

Through this ebook, you are going to learn what you will need to know all about the telescopes that can provide a fun and rewarding hobby for you and your family!

## Post a comment