Learning algorithms andprobability distributions infeed-forward andfeed-back networks

Learning algorithms havebeenusedbothon feed-forward deterministic networks andonfeed-back statisti- calnetworks tocapture input-output relations anddopattern classification. Theselearning algorithms areexamined fora class ofproblems characterized bynoisy orstatistical data, in which thenetworks learn therelation between input dataand probability distributions ofanswers. Insimple butnontrivial networks thetwolearning rules areclosely related. Under somecircumstances thelearning problem forthestatistical networks canbesolved without MonteCarlo procedures. The usual arbitrary learning goals offeed-forward networks can begiven useful probabilistic meaning. Learning algorithms enable model"neural networks" toac- quire capabilities intasks suchaspattern recognition orcon- tinuous input-output control. Feed-forward networks ofan- alog units having sigmoid input-output response havebeen studied extensively (1-4). Thesenetworks aremultilayer perceptrons withthetwo-state threshold units oftheoriginal perceptron (5-8) replaced byanalog units having asigmoid response. Another kindofnetwork (9,10)isbased onsym- metrical connections, anenergy function (11), two-state units, andarandom process togenerate astatistical equilib- riumprobability ofbeing invarious states. Itsconnection to thephysics ofacoupled setoftwo-level units inequilibrium withathermal bath(like amagnetic system ofIsing spins withabitrary exchange) ledittobetermed aBoltzmann net- work. These networks appear rather different. Oneisdetermip- istic, theother statistical; oneisdiscrete, theother continu- ous; onehasaone-way flowofinformation (feed-forward) in operation, theother atwo-way flowofinformation (symmet- rical connections). Thelearning algorithms therefore appear quite different, somuchsothat comparisons ofthecomputa- tional effort needed tolearn agiven task forthese twokinds ofnetworks havesometimes beenmade.Thispapershows that variants ofeachofthese twoclasses ofnetworks, adapt- edtoemphasize themeaning oftheactual procedure em- ployed, often haveveryclosely related learning algorithms andproperties. Thisviewfinds useful meaning, intermsof probabilities, foraparameter that hasappeared arbitrary in analog perceptron learning algorithms. Somethree-layer sta- tistical networks canbesolved bygradient descent without thenecessity ofstatistical averaging onacomputer. Task Consider a setofinstances a ofaproblem. Foreachin- stance, input dataconsist ofanalog input values If(k= 1,