GQ(lambda): A general gradient algorithm for temporal-difference prediction learning with eligibility traces