Policy-Gradient Algorithms for Partially Observable Markov Decision Processes
暂无分享,去创建一个
[1] Ronald A. Howard,et al. Dynamic Programming and Markov Processes , 1960 .
[2] R. Bellman,et al. Polynomial approximation—a new computational technique in dynamic programming: Allocation processes , 1962 .
[3] Andrew J. Viterbi,et al. Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.
[4] V. Strassen. Gaussian elimination is not optimal , 1969 .
[5] Stephen M. Pollock,et al. A Simple Model of Search for a Moving Target , 1970, Oper. Res..
[6] Edward J. Sondik,et al. Toward an Integrated Methodology for the Analysis of Health-Care Systems , 1971, Oper. Res..
[7] E. J. Sondik,et al. The Optimal Control of Partially Observable Markov Decision Processes. , 1971 .
[8] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over a Finite Horizon , 1973, Oper. Res..
[9] R. Bakis. Continuous speech recognition via centisecond acoustic states , 1976 .
[10] D. Rubin,et al. Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .
[11] Harold J. Kushner,et al. wchastic. approximation methods for constrained and unconstrained systems , 1978 .
[12] Edward J. Sondik,et al. The Optimal Control of Partially Observable Markov Processes over the Infinite Horizon: Discounted Costs , 1978, Oper. Res..
[13] J. Douglas Faires,et al. Numerical Analysis , 1981 .
[14] Nils J. Nilsson,et al. Principles of Artificial Intelligence , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[15] John G. Proakis,et al. Digital Communications , 1983 .
[16] Peter W. Glynn,et al. Stochastic approximation for Monte Carlo optimization , 1986, WSC '86.
[17] Geoffrey E. Hinton,et al. Learning internal representations by error propagation , 1986 .
[18] Alan Weiss,et al. Sensitivity analysis via likelihood ratios , 1986, WSC '86.
[19] John N. Tsitsiklis,et al. The Complexity of Markov Decision Processes , 1987, Math. Oper. Res..
[20] A. Poritz,et al. Hidden Markov models: a guided tour , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.
[21] Raj Reddy,et al. Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .
[22] Richard Lippmann,et al. Review of Neural Networks for Speech Recognition , 1989, Neural Computation.
[23] A. Nadas,et al. A generalization of the Baum algorithm to rational objective functions , 1989, International Conference on Acoustics, Speech, and Signal Processing,.
[24] Lawrence R. Rabiner,et al. A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.
[25] Keiji Kanazawa,et al. A model for reasoning about persistence and causation , 1989 .
[26] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.
[27] Alexander H. Waibel,et al. Modular Construction of Time-Delay Neural Networks for Speech Recognition , 1989, Neural Computation.
[28] Alan Weiss,et al. Sensitivity Analysis for Simulations via Likelihood Ratios , 1989, Oper. Res..
[29] Harvey F. Silverman,et al. Combining hidden Markov model and neural network classifiers , 1990, International Conference on Acoustics, Speech, and Signal Processing.
[30] Douglas B. Paul,et al. Speech Recognition Using Hidden Markov Models , 1990 .
[31] Jürgen Schmidhuber,et al. Reinforcement Learning in Markovian and Non-Markovian Environments , 1990, NIPS.
[32] S. Young. Competitive training in hidden Markov models , 1990 .
[33] Yariv Ephraim,et al. Estimation of hidden Markov model parameters by minimizing empirical error rate , 1990, International Conference on Acoustics, Speech, and Signal Processing.
[34] Gerald Tesauro,et al. Neurogammon: a neural-network backgammon program , 1990, 1990 IJCNN International Joint Conference on Neural Networks.
[35] Peter W. Glynn,et al. Likelihood ratio gradient estimation for stochastic systems , 1990, CACM.
[36] John S. Bridle,et al. Alpha-nets: A recurrent 'neural' network architecture with a hidden Markov model interpretation , 1990, Speech Commun..
[37] Berndt Müller,et al. Neural networks: an introduction , 1990 .
[38] D. Van Compernolle,et al. TDNN labeling for a HMM recognizer , 1990, International Conference on Acoustics, Speech, and Signal Processing.
[39] Alex Waibel,et al. Connectionist speaker normalization and its applications to speech recognition , 1991, Neural Networks for Signal Processing Proceedings of the 1991 IEEE Workshop.
[40] Richard Lippmann,et al. Neural Network Classifiers Estimate Bayesian a posteriori Probabilities , 1991, Neural Computation.
[41] W. Lovejoy. A survey of algorithmic methods for partially observed Markov decision processes , 1991 .
[42] J. S. Bridle,et al. An Alphanet approach to optimising input transformations for continuous speech recognition , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.
[43] Régis Cardin,et al. Developments in High-Performance Connected Digit Recognition , 1992 .
[44] Jürgen Schmidhuber,et al. Learning Complex, Extended Sequences Using the Principle of History Compression , 1992, Neural Computation.
[45] Yves Normandin,et al. Hidden Markov models, maximum mutual information estimation, and the speech recognition problem , 1992 .
[46] Long Lin,et al. Memory Approaches to Reinforcement Learning in Non-Markovian Domains , 1992 .
[47] Yoshua Bengio,et al. Global optimization of a neural network-hidden Markov model hybrid , 1992, IEEE Trans. Neural Networks.
[48] Dana Ron,et al. The Power of Amnesia , 1993, NIPS.
[49] Mei-Yuh Hwang,et al. Shared-distribution hidden Markov models for speech recognition , 1993, IEEE Trans. Speech Audio Process..
[50] Hervé Bourlard,et al. Connectionist Speech Recognition: A Hybrid Approach , 1993 .
[51] J. Bruce Millar,et al. Two schemes of phonetic feature extraction using artificial neural networks , 1993, EUROSPEECH.
[52] Jonathan G. Fiscus,et al. Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .
[53] R. I. Bahar,et al. Algebraic decision diagrams and their applications , 1993, Proceedings of 1993 International Conference on Computer Aided Design (ICCAD).
[54] Qiang Huo,et al. The gradient projection method for the training of hidden Markov models , 1993, Speech Commun..
[55] S. Hyakin,et al. Neural Networks: A Comprehensive Foundation , 1994 .
[56] Nelson Morgan. Big dumb neural nets: a working brute force approach to speech recognition , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).
[57] Daniel S. Weld,et al. A Probablistic Model of Action for Least-Commitment Planning with Information Gathering , 1994, UAI.
[58] Yoshua Bengio,et al. An Input Output HMM Architecture , 1994, NIPS.
[59] Gerald Tesauro,et al. TD-Gammon, a Self-Teaching Backgammon Program, Achieves Master-Level Play , 1994, Neural Computation.
[60] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.
[61] Daw-Tung Lin,et al. The Adaptive Time-Delay Neural Network: Characterization and Applications to, Pattern Recognition, Prediction and Signal Processing , 1994 .
[62] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.
[63] Michael I. Jordan,et al. Learning Without State-Estimation in Partially Observable Markovian Decision Processes , 1994, ICML.
[64] T. R. Anderson,et al. Auditory models with Kohonen SOFM and LVQ for speaker independent phoneme recognition , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).
[65] Anthony J. Robinson,et al. An application of recurrent nets to phone probability estimation , 1994, IEEE Trans. Neural Networks.
[66] Michael I. Jordan,et al. Reinforcement Learning with Soft State Aggregation , 1994, NIPS.
[67] Tom Michael Mitchell. Learning Analytically and Inductively , 1995 .
[68] Günther Ruske,et al. Discriminative training for continuous speech recognition , 1995, EUROSPEECH.
[69] P. Glynn,et al. Likelihood ratio gradient estimation for stochastic recursions , 1995, Advances in Applied Probability.
[70] Reid G. Simmons,et al. Probabilistic Robot Navigation in Partially Observable Environments , 1995, IJCAI.
[71] Stuart J. Russell,et al. Approximating Optimal Policies for Partially Observable Stochastic Domains , 1995, IJCAI.
[72] Lai-Wan Chan,et al. An RNN based speech recognition system with discriminative training , 1995, EUROSPEECH.
[73] Leslie Pack Kaelbling,et al. Learning Policies for Partially Observable Environments: Scaling Up , 1997, ICML.
[74] Nevin L. Zhang. Efficient planning in stochastic domains through exploiting problem characteristics , 1995 .
[75] Illah R. Nourbakhsh,et al. DERVISH - An Office-Navigating Robot , 1995, AI Mag..
[76] Yochai Konig,et al. Remap: recursive estimation and maximization of a posteriori probabilities in transition-based speech recognition , 1996 .
[77] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[78] Yoshua Bengio,et al. Input-output HMMs for sequence processing , 1996, IEEE Trans. Neural Networks.
[79] Wenju Liu,et al. Planning in Stochastic Domains: Problem Characteristics and Approximation , 1996 .
[80] Andrew McCallum,et al. Reinforcement learning with selective perception and hidden state , 1996 .
[81] Mike Schuster,et al. Bi-directional recurrent neural networks for speech recognition , 1996 .
[82] Craig Boutilier,et al. Computing Optimal Policies for Partially Observable Decision Processes Using Compact Representations , 1996, AAAI/IAAI, Vol. 2.
[83] Mei-Yuh Hwang,et al. Speech recognition using hidden Markov models: A CMU perspective , 1990, Speech Communication.
[84] G. Casella,et al. Rao-Blackwellisation of sampling schemes , 1996 .
[85] Corso Elvezia. Probabilistic Incremental Program Evolution , 1997 .
[86] B. Greer,et al. High Performance Software on Intel Pentium Pro Processors or Micro-Ops to TeraFLOPS , 1997, ACM/IEEE SC 1997 Conference (SC'97).
[87] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[88] Richard Washington,et al. BI-POMDP: Bounded, Incremental, Partially-Observable Markov-Model Planning , 1997, ECP.
[89] Shigenobu Kobayashi,et al. Reinforcement Learning in POMDPs with Function Approximation , 1997, ICML.
[90] Rafal Salustowicz,et al. Probabilistic Incremental Program Evolution , 1997, Evolutionary Computation.
[91] Milos Hauskrecht,et al. Incremental Methods for Computing Bounds in Partially Observable Markov Decision Processes , 1997, AAAI/IAAI.
[92] Richard M. Stern,et al. The 1997 CMU Sphinx-3 English Broadcast News Transcription System , 1997 .
[93] Mikko Kurimo,et al. Training mixture density HMMs with SOM and LVQ , 1997, Comput. Speech Lang..
[94] Ronen I. Brafman,et al. A Heuristic Variable Grid Solution Method for POMDPs , 1997, AAAI/IAAI.
[95] Louis C. W. Pols,et al. Psycho-acoustics and Speech Perception , 1997 .
[96] Sarel van Vuuren,et al. Improved neural network training of inter-word context units for connected digit recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).
[97] Jack J. Dongarra,et al. Automatically Tuned Linear Algebra Software , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[98] John Loch,et al. Using Eligibility Traces to Find the Best Memoryless Policy in Partially Observable Markov Decision Processes , 1998, ICML.
[99] Shigenobu Kobayashi,et al. Reinforcement learning for continuous action using stochastic gradient ascent , 1998 .
[100] Akira Hayashi,et al. A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory , 1998, NIPS.
[101] Mark D. Pendrith,et al. An Analysis of Direct Reinforcement Learning in Non-Markovian Domains , 1998, ICML.
[102] Xavier Boyen,et al. Tractable Inference for Complex Stochastic Processes , 1998, UAI.
[103] Eric A. Hansen,et al. Solving POMDPs by Searching in Policy Space , 1998, UAI.
[104] Mithuna Thottethodi,et al. Tuning Strassen's Matrix Multiplication for Memory Efficiency , 1998, Proceedings of the IEEE/ACM SC98 Conference.
[105] Andrew W. Moore,et al. Gradient Descent for General Reinforcement Learning , 1998, NIPS.
[106] Li Deng,et al. A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition , 1998, Speech Commun..
[107] Satinder P. Singh,et al. Experimental Results on Learning Stochastic Memoryless Policies for Partially Observable Markov Decision Processes , 1998, NIPS.
[108] David Haussler,et al. Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.
[109] Ilse C. F. Ipsen,et al. THE IDEA BEHIND KRYLOV METHODS , 1998 .
[110] Balaraman Ravindran,et al. Improved Switching among Temporally Abstract Actions , 1998, NIPS.
[111] Anne Condon,et al. On the Undecidability of Probabilistic Planning and Infinite-Horizon Partially Observable Markov Decision Problems , 1999, AAAI/IAAI.
[112] Kee-Eung Kim,et al. Solving POMDPs by Searching the Space of Finite Policies , 1999, UAI.
[113] Brian Sallans,et al. Learning Factored Representations for Partially Observable Markov Decision Processes , 1999, NIPS.
[114] Leslie Pack Kaelbling,et al. Learning Policies with External Memory , 1999, ICML.
[115] David A. McAllester,et al. Approximate Planning for Factored POMDPs using Belief State Simplification , 1999, UAI.
[116] H. Ney. The Use of the Maximum Likelihood Criterion in Language Modelling , 1999 .
[117] Terrence L. Fine,et al. Feedforward Neural Network Methodology , 1999, Information Science and Statistics.
[118] Peter L. Bartlett,et al. Neural Network Learning - Theoretical Foundations , 1999 .
[119] John J. Grefenstette,et al. Evolutionary Algorithms for Reinforcement Learning , 1999, J. Artif. Intell. Res..
[120] Jesse Hoey,et al. SPUDD: Stochastic Planning using Decision Diagrams , 1999, UAI.
[121] Yishay Mansour,et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.
[122] Daphne Koller,et al. Reinforcement Learning Using Approximate Belief States , 1999, NIPS.
[123] Enrico Gobbetti,et al. Encyclopedia of Electrical and Electronics Engineering , 1999 .
[124] Jean-Paul Haton,et al. Connectionist and Hybrid Models for Automatic Speech Recognition , 1999 .
[125] Kee-Eung Kim,et al. Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.
[126] Mike Schuster,et al. On supervised learning from sequential data with applications for speech regognition , 1999 .
[127] Sebastian Thrun,et al. Monte Carlo POMDPs , 1999, NIPS.
[128] Richard S. Sutton,et al. Open Theoretical Questions in Reinforcement Learning , 1999, EuroCOLT.
[129] Thomas G. Dietterich. An Overview of MAXQ Hierarchical Reinforcement Learning , 2000, SARA.
[130] Craig Boutilier,et al. Value-Directed Belief State Approximation for POMDPs , 2000, UAI.
[131] Daphne Koller,et al. Policy Iteration for Factored MDPs , 2000, UAI.
[132] Douglas Aberdeen,et al. 92¢ /MFlops/s, Ultra-Large-Scale Neural-Network Training on a PIII Cluster , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[133] Milos Hauskrecht,et al. Value-Function Approximations for Partially Observable Markov Decision Processes , 2000, J. Artif. Intell. Res..
[134] Raymond P. LeBeau,et al. High-Cost CFD on a Low-Cost Cluster , 2000, ACM/IEEE SC 2000 Conference (SC'00).
[135] Geoffrey J. Gordon. Reinforcement Learning with Function Approximation Converges to a Region , 2000, NIPS.
[136] Thomas G. Dietterich,et al. A POMDP Approximation Algorithm That Anticipates the Need to Observe , 2000, PRICAI.
[137] Sridhar Mahadevan,et al. Hierarchical Memory-Based Reinforcement Learning , 2000, NIPS.
[138] Doina Precup,et al. Temporal abstraction in reinforcement learning , 2000, ICML 2000.
[139] Leslie Pack Kaelbling,et al. Adaptive Importance Sampling for Estimation in Structured Domains , 2000, UAI.
[140] Peter L. Bartlett,et al. Reinforcement Learning in POMDP's via Direct Gradient Ascent , 2000, ICML.
[141] Judy Goldsmith,et al. Nonapproximability Results for Partially Observable Markov Decision Processes , 2011, Universität Trier, Mathematik/Informatik, Forschungsbericht.
[142] Andrew McCallum,et al. Maximum Entropy Markov Models for Information Extraction and Segmentation , 2000, ICML.
[143] Alain Dutech,et al. Solving POMDPs Using Selected Past Events , 2000, ECAI.
[144] Michael I. Jordan,et al. PEGASUS: A policy search method for large MDPs and POMDPs , 2000, UAI.
[145] Kee-Eung Kim,et al. Approximate Solutions to Factored Markov Decision Processes via Greedy Search in the Space of Finite State Controllers , 2000, AIPS.
[146] P. Lanzi,et al. Adaptive Agents with Reinforcement Learning and Internal Memory , 2000 .
[147] J. Tsitsiklis,et al. Gradient-Based Optimization of Markov Reward Processes: Practical Variants , 2000 .
[148] Zhengzhu Feng,et al. Dynamic Programming for POMDPs Using a Factored State Representation , 2000, AIPS.
[149] Katia P. Sycara,et al. Evolutionary Search, Stochastic Policies with Memory, and Reinforcement Learning with Hidden State , 2001, ICML.
[150] Peter L. Bartlett,et al. Infinite-Horizon Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[151] Weihong Zhang,et al. Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes , 2011, J. Artif. Intell. Res..
[152] Shie Mannor,et al. Learning Embedded Maps of Markov Processes , 2001, ICML.
[153] Sebastian Thrun,et al. Integrating value functions and policy search for continuous Markov Decision Processes , 2001, NIPS 2001.
[154] Craig Boutilier,et al. Value-directed sampling methods for monitoring POMDPs , 2001, UAI 2001.
[155] Nicolas Meuleau,et al. Exploration in Gradient-Based Reinforcement Learning , 2001 .
[156] Craig Boutilier,et al. Vector-space Analysis of Belief-state Approximation for POMDPs , 2001, UAI.
[157] Ronald E. Parr,et al. Solving Factored POMDPs with Linear Value Functions , 2001 .
[158] Lex Weaver,et al. The Optimal Reward Baseline for Gradient-Based Reinforcement Learning , 2001, UAI.
[159] Jürgen Schmidhuber,et al. Market-Based Reinforcement Learning in Partially Observable Worlds , 2001, ICANN.
[160] Andrew Tridgell,et al. Reinforcement learning and chess , 2001 .
[161] Lex Weaver,et al. A Multi-Agent Policy-Gradient Approach to Network Routing , 2001, ICML.
[162] Olivier Buffet,et al. Multi-Agent Systems by Incremental Gradient Reinforcement Learning , 2001, IJCAI.
[163] Carlos Guestrin,et al. Max-norm Projections for Factored MDPs , 2001, IJCAI.
[164] Andrew W. Moore,et al. Direct Policy Search using Paired Statistical Tests , 2001, ICML.
[165] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.
[166] P. N. Paraskevopoulos,et al. Modern Control Engineering , 2001 .
[167] Sridhar Mahadevan,et al. Continuous-Time Hierarchical Reinforcement Learning , 2001, ICML.
[168] Peter L. Bartlett,et al. Experiments with Infinite-Horizon, Policy-Gradient Estimation , 2001, J. Artif. Intell. Res..
[169] Jack J. Dongarra,et al. Automated empirical optimizations of software and the ATLAS project , 2001, Parallel Comput..
[170] Douglas Aberdeen,et al. Emmerald: a fast matrix–matrix multiply using Intel's SSE instructions , 2001, Concurr. Comput. Pract. Exp..
[171] Christian R. Shelton,et al. Policy Improvement for POMDPs Using Normalized Importance Sampling , 2001, UAI.
[172] Christian R. Shelton,et al. Importance sampling for reinforcement learning with multiple objectives , 2001 .
[173] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.
[174] Douglas Aberdeen,et al. Scalable Internal-State Policy-Gradient Methods for POMDPs , 2002, ICML.
[175] John K. Slaney,et al. Anytime State-Based Solution Methods for Decision Processes with non-Markovian Rewards , 2002, UAI.
[176] Gerald DeJong,et al. Reinforcement Learning and Shaping: Encouraging Intended Behaviors , 2002, ICML.
[177] Peter L. Bartlett,et al. Estimation and Approximation Bounds for Gradient-Based Reinforcement Learning , 2000, J. Comput. Syst. Sci..
[178] Leonid Peshkin,et al. Learning from Scarce Experience , 2002, ICML.
[179] Peter L. Bartlett,et al. Model Selection and Error Estimation , 2000, Machine Learning.
[180] Lawrence K. Saul,et al. Markov Processes on Curves , 2000, Machine Learning.
[181] Sridhar Mahadevan,et al. Hierarchical Multiagent Reinforcement Learning , 2004 .
[182] Ronald J. Williams,et al. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.
[183] Yishay Mansour,et al. A Sparse Sampling Algorithm for Near-Optimal Planning in Large Markov Decision Processes , 1999, Machine Learning.
[184] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.