On Evaluating Agent Performance in a Fixed Period of Time (Extended Version)
暂无分享,去创建一个
[1] H. Robbins,et al. On optimal stopping rules for $S_{n}/n$ , 1965 .
[2] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.
[3] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .
[4] Dimitri P. Bertsekas,et al. Dynamic Programming and Optimal Control, Two Volume Set , 1995 .
[5] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[6] David L. Dowe,et al. A computational extension to the Turing test , 1997 .
[7] Prasad Tadepalli,et al. Model-Based Average Reward Reinforcement Learning , 1998, Artif. Intell..
[8] David L. Dowe,et al. A Non-Behavioural, Computational Extension to the Turing Test , 1998 .
[9] José Hernández-Orallo,et al. Beyond the Turing Test , 2000, J. Log. Lang. Inf..
[10] J. Hernández-Orallo. Constructive reinforcement learning , 2000 .
[11] José Hernández-Orallo,et al. Thesis: Computational measures of information gain and reinforcement in inference processes , 2000, AI Commun..
[12] Manindra Agrawal,et al. PRIMES is in P , 2004 .
[13] Sridhar Mahadevan,et al. Average reward reinforcement learning: Foundations, algorithms, and empirical results , 2004, Machine Learning.
[14] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[15] SRIDHAR MAHADEVAN,et al. Average Reward Reinforcement Learning: Foundations, Algorithms, and Empirical Results , 2005, Machine Learning.
[16] Marcus Hutter. General Discounting Versus Average Reward , 2006, ALT.
[17] M. Tomasello,et al. Humans Have Evolved Specialized Skills of Social Cognition: The Cultural Intelligence Hypothesis , 2007, Science.
[18] Shane Legg,et al. Universal Intelligence: A Definition of Machine Intelligence , 2007, Minds and Machines.
[19] José Hernández-Orallo. A (hopefully) Unbiased Universal Environment Class for Measuring Intelligence of Biological and Artificial Systems , 2009, AGI 2010.
[20] Martin Herdegen. Optimal Stopping and Applications Example 2 : American options , 2009 .
[21] U. Rieder,et al. Markov Decision Processes , 2010 .
[22] José Hernández-Orallo,et al. Measuring universal intelligence: Towards an anytime intelligence test , 2010, Artif. Intell..
[23] José Hernández-Orallo,et al. On the Computational Measurement of Intelligence Factors , 2011 .