On the Theory of Reinforcement Learning with Once-per-Episode Feedback