High Quality Thermostat Control by Reinforcement Learning - A Case Study

High Quality Thermostat Control by Reinforcement Learning A Case Study Martin Riedmiller Institut f ur Logik, Komplexit at und Deduktionssysteme Universit at Karlsruhe, D-76128 Karlsruhe, Germany e-mail: riedml@ira.uka.de Abstract| Temperature control is an important issue in many manufacturing processes. The requirement for high precision, fast reaction to disturbances, time delays of varying length due to considerably changing characteristics of the respective production process make it a challenging application eld for the improvement and development of reinforcement learning techniques. The article shows some rst results on the application of a neural reinforcement learning controller to a thermostat control problem. Open problems are discussed and some ideas for further research directions are presented. to appear in: Proceedings of CONALD '98, CMU, Pittsburgh I. A Thermostat Controller In many manufacturing applications it is important to keep a liquid (water, oil, chemical substance) at a certain temperature. Reasons for this may be that a chemical reaction only has the desired outcome, if the temperature is kept within (very) tight bounds. This is the case for example in wafer production processes, but many more industrial applications exist. They considerably vary with respect to the quality and the amount of the liquids used, resulting in a broad range of di erent process characteristics. This variety makes it very di cult and costly to design a controller that shows good control characteristics in every application situation. Reinforcement learning seems to be a promising approach to overcome this problem by learning to adapt the control law to varying scenarios. A. System description The following hardware structure is a common apparatus for liquid temperature control with a very broad application range ( gure 1): There is a heating device which is used to directly heat a liquid within a smaller internal tank (about 1 liter). This liquid is then pumped through a tube which is going through a larger external tank, thereby emitting energy and thus heating the liquid in the external tank (typically 10 60 liters). The temperature of the liquid in the external tank thus can be controlled by rst heating the internal liquid. The Theat T ext Pump T int Power Fig. 1. Typical hardware structure to control the liquid temperature in the external tank (right) temperature of the external liquid now depends on many parameters: the quality of the internal and the external liquid, the amount of internal liquid that is pumped through the tube per minute, the size of the interval and the external tank, the environment temperature, external disturbances, the quality of the tube, and so on.

[1]  A. Barto,et al.  Learning and Sequential Decision Making , 1989 .

[2]  Ben J. A. Kröse,et al.  Learning from delayed rewards , 1995, Robotics Auton. Syst..

[3]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[4]  Martin A. Riedmiller Learning to Control Dynamic Systems , 1996 .

[5]  Martin A. Riedmiller,et al.  Application of sequential reinforcement learning to control dynamic systems , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).

[6]  Martin A. Riedmiller,et al.  Reinforcement learning without an explicit terminal state , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).