论文信息 - Sample Complexity of Policy Search with Known Dynamics

Sample Complexity of Policy Search with Known Dynamics

We consider methods that try to find a good policy for a Markov decision process by choosing one from a given class. The policy is chosen based on its empirical performance in simulations. We are interested in conditions on the complexity of the policy class that ensure the success of such simulation based policy search methods. We show that under bounds on the amount of computation involved in computing policies, transition dynamics and rewards, uniform convergence of empirical estimates to true value functions occurs. Previously, such results were derived by assuming boundedness of pseudodimension and Lipschitz continuity. These assumptions and ours are both stronger than the usual combinatorial complexity measures. We show, via minimax inequalities, that this is essential: boundedness of pseudodimension or fat-shattering dimension alone is not sufficient.

Ambuj Tewari | Peter L. Bartlett | P. Bartlett | Ambuj Tewari

[1] D. Pollard. Empirical Processes: Theory and Applications , 1990 .

[2] David Haussler,et al. Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[3] Paul W. Goldberg,et al. Bounding the Vapnik-Chervonenkis Dimension of Concept Classes Parameterized by Real Numbers , 1993, COLT '93.

[4] Noga Alon,et al. Scale-sensitive dimensions, uniform convergence, and learnability , 1997, JACM.

[5] Lenore Blum,et al. Complexity and Real Computation , 1997, Springer New York.

[6] Peter L. Bartlett,et al. Learning in Neural Networks: Theoretical Foundations , 1999 .

[7] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .

[8] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.

[9] P. Varaiya,et al. Simulation-based uniform value function estimates of discounted and average-reward MDPs , 2004, 2004 43rd IEEE Conference on Decision and Control (CDC) (IEEE Cat. No.04CH37601).