Learning Adaptive Control Flow in Transformers for Improved Systematic Generalization