Reinforcement learning with hidden states