On the randori training dynamics