Why the logistic function? A tutorial discussion on probabilities and neural networks

This paper presents a tutorial introduction to the logistic function as a statistical object. Beyond the discussion of the whys and wherefores of the logistic function, I also hope to illuminate the general distinction between the \generative/causal/class-conditional" and the \discriminative/diagnostic/ predictive" directions for the modeling of data. Crudely put, the belief network community has tended to focus on the former while the neural network community has tended to focus on the latter (although there are numerous papers in both communities going against their respective grains). It is the author's view that these two directions are two sides of the same coin, a corollary of which is that the two network-based communities are in closer contact than one might otherwise think. To illustrate some of the issues involved, I discuss the simplest nonlinear neural network|a logistic function of a linear combination of the input variables (also known in statistics as a logistic regression). The logistic function has had a lengthy history in classical statistics and in neural networks. In statistics it plays a leading role in the methodology of logistic regression, where it makes an important contribution to the literature on classi cation. The logistic function has also appeared in many guises in neural network research. In early work, in which continuous time formalisms tended to dominate, it was justi ed via its being the solution to a particular di erential equation. In later work, with the emphasis on discrete time, it was generally used more heuristically as one of the many possible smooth, monotonic \squashing" functions that map real values into a bounded interval. More recently, however, with the increasing focus on learning, the probabilistic properties of the logistic function have begun to