The Pattern Classification Problem

The Learning Problem Introduction In this section we describe the basic model of learning we use in this part of the book. This model is applicable to neural networks with one output unit that computes either the value 0 or 1; that is, it concerns the types of neural network used for binary classification problems. Later in the book we develop more general models of learning applicable to many other types of neural network, such as those with a real-valued output. The definition of learning we use is formally described using the language of probability theory. For the moment, however, we move towards the definition in a fairly non-technical manner, providing some informal motivation for the technical definitions that will follow. In very general terms, in a supervised learning environment, neural network ‘learning’ is the adjustment of the network's state in response to data generated by the environment. We assume this data is generated by some random mechanism, which is, for many applications, reasonable. The method by which the state of the network is adjusted in response to the data constitutes a learning algorithm . That is, a learning algorithm describes how to change the state in response to training data. We assume that the ‘learner’ knows little about the process generating the data. This is a reasonable assumption for many applications of neural networks: if it is known that the data is generated according to a particular type of statistical process, then in practice it might be better to take advantage of this information by using a more restricted class of functions rather than a neural network.