Learning Sequential Decision Tasks

This paper presents a new approach called SANE for learning and performing sequential decision tasks. Compared to problem-general heuristics, SANE forms more effective decision strategies because it learns to utilize domain-specific information. SANE evolves neural networks through genetic algorithms and can learn in a wide range of domains with minimal reinforcement. SANE''s evolution algorithm, called symbiotic evolution, is more powerful than standard genetic algorithms because diversity pressures are inherent in the evolution. SANE is shown to be effective in two sequential decision tasks. As a value-ordering method in constraint satisfaction search, SANE required only 1/3 of the backtracks of a problem-general heuristic. As a filter for minimax search, SANE formed a network capable of focusing the search away from misinformation, creating stronger play.

[1]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[2]  Risto Miikkulainen,et al.  Evolutionary Neural Networks for Value Ordering in Constraint SatisfactionProblems , 1994 .

[3]  Risto Miikkulainen,et al.  Discovering Complex Othello Strategies through Evolutionary Neural Networks , 1995, Connect. Sci..

[4]  Robert M. Haralick,et al.  Increasing Tree Search Efficiency for Constraint Satisfaction Problems , 1979, Artif. Intell..

[5]  Vipin Kumar,et al.  Algorithms for Constraint-Satisfaction Problems: A Survey , 1992, AI Mag..

[6]  R. E. Korf,et al.  Search: A survey of recent results , 1988 .

[7]  Richard K. Belew,et al.  Evolving networks: using the genetic algorithm with connectionist learning , 1990 .

[8]  Kai-Fu Lee,et al.  The Development of a World Class Othello Program , 1990, Artif. Intell..

[9]  Richard S. Sutton,et al.  Learning and Sequential Decision Making , 1989 .

[10]  Eugene C. Freuder,et al.  Constraint satisfaction using constraint logic programming , 1994 .

[11]  David A. McAllester Conspiracy Numbers for Min-Max Search , 1988, Artif. Intell..

[12]  Risto Miikkulainen,et al.  Evolving Neural Networks to Focus Minimax Search , 1994, AAAI.

[13]  L. D. Whitley,et al.  Genetic Reinforcement Learning for Neurocontrol Problems , 2004, Machine Learning.

[14]  Richard E. Korf,et al.  Best-First Minimax Search: Othello Results , 1994, AAAI.

[15]  Michael L. Littman,et al.  A Distributed Reinforcement Learning Scheme for Network Routing , 1993 .

[16]  Pascal Van Hentenryck,et al.  Constraint Satisfaction Using Constraint Logic Programming , 1992, Artif. Intell..

[17]  A. Barto,et al.  Learning and Sequential Decision Making , 1989 .

[18]  C.W. Anderson,et al.  Learning to control an inverted pendulum using neural networks , 1989, IEEE Control Systems Magazine.

[19]  TasksDavid E. Moriarty Learning Sequential Decision , 1995 .

[20]  John R. Koza,et al.  Genetic generation of both the weights and architecture for a neural network , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.