Updating Quasi-Newton Matrices With Limited Storage

We study how to use the BFGS quasi-Newton matrices to precondition minimization methods for problems where the storage is critical. We give an update formula which generates matrices using information from the last m iterations, where m is any number supplied by the user. The quasi-Newton matrix is updated at every iteration by dropping the oldest information and replacing it by the newest informa- tion. It is shown that the matrices generated have some desirable properties. The resulting algorithms are tested numerically and compared with several well- known methods. 1. Introduction. For the problem of minimizing an unconstrained function / of n variables, quasi-Newton methods are widely employed (4). They construct a se- quence of matrices which in some way approximate the hessian of /(or its inverse). These matrices are symmetric; therefore, it is necessary to have n(n + l)/2 storage locations for each one. For large dimensional problems it will not be possible to re- tain the matrices in the high speed storage of a computer, and one has to resort to other kinds of algorithms. For example, one could use the methods (Toint (15), Shanno (12)) which preserve the sparsity structure of the hessian, or conjugate gradient methods (CG) which only have to store 3 or 4 vectors. Recently, some CG algorithms have been developed which use a variable amount of storage and which do not require knowledge about the sparsity structure of the problem (2), (7), (8). A disadvantage of these methods is that after a certain number of iterations the quasi-Newton matrix is discarded, and the algorithm is restarted using an initial matrix (usually a diagonal matrix). We describe an algorithm which uses a limited amount of storage and where the quasi-Newton matrix is updated continuously. At every step the oldest information contained in the matrix is discarded and replaced by new one. In this way we hope to have a more up to date model of our function. We will concentrate on the BFGS method since it is considered to be the most efficient. We believe that similar algo- rithms cannot be developed for the other members of the Broyden 0-class (1). Let / be the function to be nnnimized, g its gradient and h its hessian. We define