Detecting Rewards Deterioration in Episodic Reinforcement Learning
暂无分享,去创建一个
[1] Shie Mannor,et al. A Nonparametric Sequential Test for Online Randomized Experiments , 2016, WWW.
[2] Renato Paes Leme,et al. Bandits with adversarial scaling , 2020, ICML.
[3] Odalric-Ambrym Maillard,et al. Distribution-dependent and Time-uniform Bounds for Piecewise i.i.d Bandits , 2019, ArXiv.
[4] Andreas Krause,et al. Multi-Player Bandits: The Adversarial Case , 2019, J. Mach. Learn. Res..
[5] Diane J. Cook,et al. A survey of methods for time series change point detection , 2017, Knowledge and Information Systems.
[6] S. Pocock. Group sequential methods in the design and analysis of clinical trials , 1977 .
[7] R. Lund,et al. Changepoint Detection in Periodic and Autocorrelated Time Series , 2007 .
[8] H ⋂t,et al. CONVERGENCE THEOREM FOR FINITE MARKOV CHAINS , 2017 .
[9] G. Wahba,et al. Multivariate Bernoulli distribution , 2012, 1206.1874.
[10] Craig MacDonald,et al. Sequential Testing for Early Stopping of Online Experiments , 2015, SIGIR.
[11] Philip Bachman,et al. Deep Reinforcement Learning that Matters , 2017, AAAI.
[12] Javier García,et al. A comprehensive survey on safe reinforcement learning , 2015, J. Mach. Learn. Res..
[13] J. Westgard,et al. Combined Shewhart-cusum control chart for improved quality control in clinical chemistry. , 1977, Clinical chemistry.
[14] Gabriel Dulac-Arnold,et al. Challenges of Real-World Reinforcement Learning , 2019, ArXiv.
[15] Lantao Yu,et al. MOPO: Model-based Offline Policy Optimization , 2020, NeurIPS.
[16] H. Hotelling. The Generalization of Student’s Ratio , 1931 .
[17] Vitaly Levdik,et al. Time Limits in Reinforcement Learning , 2017, ICML.
[18] John F. Canny,et al. Measuring the Reliability of Reinforcement Learning Algorithms , 2019, ICLR.
[19] D L DeMets,et al. Interim analysis: the alpha spending function approach. , 1994, Statistics in medicine.
[20] James Bergstra,et al. Autoregressive Policies for Continuous Control Deep Reinforcement Learning , 2019, IJCAI.
[21] P. O'Brien,et al. A multiple testing procedure for clinical trials. , 1979, Biometrics.
[22] Gregory Ditzler,et al. Learning in Nonstationary Environments: A Survey , 2015, IEEE Computational Intelligence Magazine.
[23] Shie Mannor,et al. Concept Drift Detection Through Resampling , 2014, ICML.
[24] Pierre-Yves Oudeyer,et al. A Hitchhiker's Guide to Statistical Comparisons of Reinforcement Learning Algorithms , 2019, RML@ICLR.
[25] D. A. Evans,et al. An approach to the probability distribution of cusum run length , 1972 .
[26] Marion R. Reynolds,et al. Cusum Charts for Monitoring an Autocorrelated Process , 2001 .
[27] Cristiano Cervellera,et al. QuantTree: Histograms for Change Detection in Multivariate Data Streams , 2018, ICML.
[28] N. L. Johnson,et al. Multivariate Analysis , 1958, Nature.
[29] Vikash Kumar,et al. Multi-Agent Manipulation via Locomotion using Hierarchical Sim2Real , 2019, CoRL.
[30] R. Bellman. A Markovian Decision Process , 1957 .
[31] Omar Besbes,et al. Stochastic Multi-Armed-Bandit Problem with Non-stationary Rewards , 2014, NIPS.
[32] Ufuk Topcu,et al. Safe Reinforcement Learning via Shielding , 2017, AAAI.
[33] Sebastian Junges,et al. Safety-Constrained Reinforcement Learning for MDPs , 2015, TACAS.
[34] Eric Moulines,et al. On Upper-Confidence Bound Policies for Switching Bandit Problems , 2011, ALT.
[35] Emmanuel Yashchin. On the Analysis and Design of CUSUM-Shewhart Control Schemes , 1985, IBM J. Res. Dev..
[36] Gábor Orosz,et al. End-to-End Safe Reinforcement Learning through Barrier Functions for Safety-Critical Continuous Control Tasks , 2019, AAAI.
[37] S M Williams,et al. Quality control: an application of the cusum. , 1992, BMJ.
[38] Anupam Gupta,et al. Better Algorithms for Stochastic Bandits with Adversarial Corruptions , 2019, COLT.
[39] K. Hong. Conditional Value at Risk (CoVAR) , 2010 .
[40] Lorenzo Strigini,et al. Assessing the Safety and Reliability of Autonomous Vehicles from Road Testing , 2019, 2019 IEEE 30th International Symposium on Software Reliability Engineering (ISSRE).
[41] Yutaka Matsuo,et al. Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization , 2020, ICLR.
[42] W. Fuller,et al. Distribution of the Estimators for Autoregressive Time Series with a Unit Root , 1979 .
[43] Alex Graves,et al. Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.
[44] Ofir Nachum,et al. A Lyapunov-based Approach to Safe Reinforcement Learning , 2018, NeurIPS.
[45] Tom Schaul,et al. Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.
[46] Shiyu Zhou,et al. Cycle-based signal monitoring using a directionally variant multivariate control chart system , 2005 .
[47] B. Efron. © Institute of Mathematical Statistics, 2003 Second Thoughts on the Bootstrap , 2022 .
[48] E. S. Pearson,et al. On the Problem of the Most Efficient Tests of Statistical Hypotheses , 1933 .
[49] Heinz Koeppl,et al. Correlation Priors for Reinforcement Learning , 2019, NeurIPS.
[50] Erwan Lecarpentier,et al. Non-Stationary Markov Decision Processes a Worst-Case Approach using Model-Based Reinforcement Learning , 2019, NeurIPS.
[51] S. S. Wilks. The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses , 1938 .
[52] Jonathan P. How,et al. Quickest change detection approach to optimal control in Markov decision processes with model changes , 2016, 2017 American Control Conference (ACC).
[53] Laurenz Wiskott. Lecture Notes on Reinforcement Learning , 2018 .
[54] Yuval Tassa,et al. MuJoCo: A physics engine for model-based control , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.
[55] Ludmila I. Kuncheva,et al. Change Detection in Streaming Multivariate Data Using Likelihood Detectors , 2013, IEEE Transactions on Knowledge and Data Engineering.
[56] M. Mohri,et al. Bandit Problems , 2006 .
[57] Minitab. Statistical Methods for Quality Improvement , 2001 .
[58] Pablo Hernandez-Leal,et al. A Survey of Learning in Multiagent Environments: Dealing with Non-Stationarity , 2017, ArXiv.
[59] R. Rockafellar,et al. Optimization of conditional value-at risk , 2000 .
[60] Dirk P. Kroese,et al. Why the Monte Carlo method is so important today , 2014 .
[61] E. S. Page. CONTINUOUS INSPECTION SCHEMES , 1954 .
[62] J M Robins,et al. Marginal Mean Models for Dynamic Regimes , 2001, Journal of the American Statistical Association.
[63] R. Khan,et al. Sequential Tests of Statistical Hypotheses. , 1972 .
[64] Lihong Li,et al. Adversarial Attacks on Stochastic Bandits , 2018, NeurIPS.