Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution