## The NFL Theorems A Little Mathematics

A series of NFL theorems pertains to various situations. The original theorems (Wolpert 1996a, 1996b) dealt with problems of supervised learning. Later they were extended to optimization problems associated with search algorithms (Wolpert and Macready 1997).

The first NFL theorem for search pertains to fixed (time-independent) fitness landscapes, while the second is for time-dependent landscapes. Although there is a substantial difference between the two, their principal meaning can be understood by reviewing only the first.

Imagine a finite set X called the search space and a fitness function f that assigns a value to each point of X; the values of f are within a range denoted Y. Altogether the search points and their fitness values form the fitness landscape. We consider algorithms that explore X one point at a time. At each step, the algorithm decides which point to examine next, depending on the points that have been examined already and their fitness values, but it does not know the fitness of any other points. This decision might even be made at random or partly at random.

After an algorithm has iterated the search m times, we have a time-ordered set (a sample) denoted dm, which comprises m measured values of f within the range Y. Let P be the conditional probability of having obtained a given sample dYm after m iterations for given f, Y, and m. Then, according to the first NFL theorem, where aj and a2 are two different algorithms. The summation is performed over all possible fitness functions. Equation (1) means that, in probabilistic terms, the results of a search, if averaged over all possible fitness landscapes, are the same for any pair of algorithms.

The equation for the second NFL theorem—for time-dependent landscapes—differs in two respects. First, it contains one more factor affecting the algorithm's behavior: an evolution operator, which is a rule reflecting how the landscape evolves from iteration to iteration. Second, the probabilities of obtaining a given sample are averaged over all possible evolution operators rather than over all possible fitness functions. For the purpose of this chapter, it is sufficient to refer to the first NFL theorem only.

Equation (1) also says that, if the performance of an algorithm aj is superior to that of another algorithm a2 when applied to a specific class of fitness functions, it is necessarily inferior to the performance of a2 on some other class of fitness functions. (In my previous analogy the mountain-climbing algorithm denoted COA performed better than the HOA on one type of physical relief but worse than the HOA on another.)

Algorithms do not incorporate any prior knowledge of the properties of

the fitness function. They are therefore called black-box algorithms. They operate on a fitness landscape without prior knowledge of the landscape's relief, probing point after point, either deterministically or stochastically (according to a rule which has some random component).

Importantly, the NFL theorems do not say anything about the performance of any two algorithms on any particular landscape. If, in equation (1), we remove the summation symbols, the sign of equality must be replaced with an inequality. In other words, except for rare special cases,