XAI : une méthode incrémentale pour l'approximation de la fonction de valeur

The topic of this paper is the approximation of the Value Function in Reinforcement Learning. We present a method for modeling the Value Function that allocates memory resources while the agent explores its environment. The model of Value Function is based on a Radial Basis Functions Network, with Gaussian units. The model is build incrementally by creating new units on-line, when the system enters unknown regions of the space. The parameters of the model are adapted using gradient descent and the Sarsa(λ) algorithm. The method do not require a model of the environment neither does it involve the estimation of a model. The performance of the proposed method is demonstrated on two well-known benchmark problems: the Acrobot and the Bioreactor Both problems are simulated in real-valued state space and discrete time.