Optimal learning: computational procedures for bayes-adaptive markov decision processes