Temporal Diierence Learning in Continuous Time and Space

A continuous-time, continuous-state version of the temporal diier-ence (TD) algorithm is derived in order to facilitate the application of reinforcement learning to real-world control tasks and neurobi-ological modeling. An optimal nonlinear feedback control law was also derived using the derivatives of the value function. The performance of the algorithms was tested in a task of swinging up a pendulum with limited torque. Both the \critic" that speciies the paths to the upright position and the \actor" that works as a non-linear feedback controller were successfully implemented by radial basis function (RBF) networks.