Benchmarking of Policy Gradient Methods