Infinite-Mixture Policies in Reinforcement Learning