Confidence Based Dual Reinforcement Q-Routing: An adaptive online network routing algorithm

This paper describes and evaluates the Confidence-based Dual Reinforcement Q-Routing algorithm (CDRQ-Routing) for adaptive packet routing in communication networks. CDRQ-Routing is based on the Q-learning framework of Q-Routing. The main contribution of this work is the increased quantity and improved quality of exploration in CDRQ-Routing, which lead to faster adaptation and better routing policies learned as compared to Q-Routing, the state-of-the-art adaptive Bellman-Ford Routing, and the non-adaptive shortest path routing. Experiments over several network topologies have shown that at different loads, CDRQ-Routing learns superior policies significantly faster than Q-Routing. Moreover, CDRQ-Routing learns policies that sustain higher load levels than Q-Routing. Analysis shows that overhead due to exploration is insignificant as eqmpared to the improvements in CDRQ-Routing.