Layerwise Bregman Representation Learning with Applications to Knowledge Distillation