SOLVING THE MYSTERY OF NEURAL NETWORKS
Suzanne M. Rodriguez, Ph.D.
Expert systems and neural networks were among the first branches of artificial intelligence (AI) technology to graduate from the research laboratory to commercial use. Of the two, neural networks deviate more markedly from traditional, sequential data processing. Because neural networks are so different from the tools most of us are trained to use, they retain an aura of mystery. This article is intended to dispel some of the mystery surrounding neural network technology. The discussion includes examples of our clients' neural network applications, an introduction to neural networks and a description of the application development process.
When would I use a neural network?
Many problems of prediction and classification require knowledge about the relationship between inputs and outputs. These broad classes of problems, incarnated in various forms in different industries, offer numerous opportunities to apply neural network technology. The examples in this section briefly describe some of the applications requested by our past and present clients. The descriptions have been limited in compliance with their respective confidentiality agreements.
Clients in the retail and distribution industries frequently request sales forecasting networks, both for inventory and purchasing decisions and in order to identify opportunities for increased sales. Neural networks use historical data to learn the relationship between information such as seasonality, price and sales levels. The relationships found in the data are in turn used to predict sales levels for various combinations of seasonality and price. Sales forecasting applications emphasize the ability of neural networks to discover relationships in point-of-sale data which experts are unable to quantify due to its size and complexity. Expertise is vital in determining relevant inputs to a neural network. However, one advantage of neural network technology is that it can be applied in situations where expertise is relatively limited.
Prioritizing Sales Calls
A client in the medical distribution field, Veratex Corporation, used a neural network to leverage limited personnel resources. The company had accumulated a large database of dormant customer accounts. While they suspected that many accounts would be reactivated following a call from one of their teleservicers, the database was much larger than the available sales resource. A neural network (designed and run on the AS/400) was used to prioritize the list of dormant accounts. The network was trained using active customer account data to recognize the characteristics of the most profitable customers. The neural network rated dormant accounts based on what it had learned about the relationship between customer characteristics and account profitability. Dormant accounts with high ratings were contacted first, while those with very low ratings were not contacted at all. More inactive accounts were "reactivated as a result of the neural network project than ever before" (Mary Lamphier, Vice President of MIS, Veratex Corporation). A similar conceptual approach could be used in mail order or collections environments to make the best use of available resources.
Credit Application Approval
Neural networks are well suited to a number of situations requiring approval decisions, whether for loans, leases, or credit cards. Networks can be trained to simply classify an application as acceptable (a yes or no decision) or to predict a value such as the revenue that will be generated. Networks with multiple outputs can be used to provide a simple reason code along with a credit evaluation, but more detailed explanations are beyond the current limits of neural network technology.
The example shown in Figure 1 is a simplified rendition of a neural network designed to evaluate mortgage loan applications. The neural network learned the relationship between attributes of a loan application (input layer) and the credit risk assigned by a loan officer (output layer). In reality, a larger number of variables were used as inputs to the evaluation process. The application also made use of multiple networks that specialized in different subproblems, as shown in Figure 2. The output values from the specialized networks were provided as inputs to the network that made the final evaluation. For our client's application, the neural network was teamed with an expert system to explain the network recommendation.
Neural Network vs. Expert System Applications
Some problems, such as loan approval decisions, can be approached as either expert system or neural network applications. In the mortgage example provided above, both technologies were employed in order to take advantage of their respective strengths. In contrast to expert systems, neural networks can be developed where expertise is limited or the experts are unable to explain their reasoning process. Because of their speed, neural networks may be appropriate where rule-based processing is too slow. Developing a neural network, however, requires far more data than building an expert system. In general, the more complex the data and the more accurate the required response, the more data is required. Typical applications require hundreds to thousands of training examples. Neural networks alone are inappropriate in situations requiring detailed explanations of the output; situations where expert systems excel. The technologies do not compete so much as they complement one another. Each technology mimics a different type of human problem-solving behavior. Expert systems embody conscious, methodical reasoning while neural networks represent instantaneous, unconscious pattern recognition.
What is a neural network?
Artificial neural networks are inspired by their biological counterpart: the brain. Human intelligence is the result of many individual neurons operating in parallel and influencing one another through interconnections. Although an individual neuron represents an extremely small piece of information, it is connected to numerous other small, related pieces of information distributed across other neurons. This network enables complex information to be maintained in the brain as a whole. The connections between neurons are essential because they determine how individual pieces of information are related to one another. The connections also vary in strength -- making some information more important or influential than other information. From this perspective, learning is a process of forming the right connections and adjusting the strength of the connections. What we learn (knowledge) is stored in the connections.
Artificial neural networks are also interconnected collections of simple, independent processors (or nodes). While loosely modeled after the brain, the details of neural network design are not guided by biology. There is insufficient knowledge of how the brain operates to provide a blueprint. Instead, for over 20 years researchers have been experimenting with different types of nodes, different patterns of interconnection, and different algorithms for adjusting connections (also called learning rules). Many different neural network models have been explored. The backpropagation network is the model most often employed in commercial applications. It relies on pattern recognition to categorize or predict values based on the input provided.
As shown in Figure 1, a backpropagation network connects a set of input nodes to a set of output nodes. Usually, information travels from the input nodes to the output nodes through a set of intermediate nodes located in the hidden layers. Each input node represents the value of a small item of information (e.g., applicant's income). The strength of the connection between an input node and a hidden layer node is represented by a number (or connection weight). This number determines how the value of the input node impacts the value of the hidden layer node. The value of each hidden layer node is determined by both the values of the input layer nodes and the connection weights. Similarly, the values of the hidden layer nodes and the strength of their respective connections to the output node determine the final rating of credit risk. The connections in the network are feedforward only, meaning that a hierarchy exists. The nodes in each layer are connected only to nodes in the next layer -- feeding forward towards the output layer.
Once the strengths of the connections in Figure 1 are properly adjusted, the neural network will have learned the relationship between attributes of a loan application and the credit risk assigned by a loan officer. The connection weights are adjusted from random starting values during training. This process presents a series of examples (e.g., numerous loan applications) to the network. Backpropagation networks require supervised training. Each set of inputs must be accompanied by the desired output (e.g., a credit risk score). Initially, the network output differs from the desired response because the connections do not capture the relationship between the inputs and the output. An algorithm called the delta learning rule is used to change the connection weights after each example so that the network output more closely resembles the correct response. (The term "backpropagation" refers to the manner in which errors are propagated or distributed back from the output layer towards the input layer.) After repeated exposure to the examples in the training set, the connection weights have been incrementally adjusted so that the network output is very similar (or identical) to the desired response. Training is complete when the network has learned the relationship between the inputs and outputs. The connection weights are no longer adjusted and the network is said to be locked.
Because of the architecture of backpropagation networks, all of the inputs to the network are considered in parallel. Using our example, a loan applicant's income is not considered before current debt. All of the loan application characteristics are considered jointly. When presented with an application containing several "borderline" values, the network could correctly assess a high credit risk even though the application might have passed a series of tests for individual risk factors. This parallel nature allows neural networks to be more tolerant of partially missing or incorrect inputs than sequential approaches. The knowledge stored in the network weights enables the network to generalize from situations it encountered during training to completely new combinations of inputs encountered during use. Once trained, neural networks produce a solution very rapidly, even when large numbers of nodes are involved.
How would I develop a neural network application?
Five types of activities take place during neural network development: project definition, knowledge engineering, network training, network validation and network integration. Over the course of the project, the emphasis on each of the activities shifts from the first towards the last, but they do not constitute separate and well defined stages. In our experience, a traditional, waterfall approach to system development is inappropriate when building intelligent applications. Developing an intelligent system is a learning process for all of the project participants. One of the best ways to facilitate and measure that learning is to follow a rapid prototyping approach to systems development. A production neural network is almost always the result of several iterations, each building upon the results of prior iterations.
Project Definition & Knowledge Engineering
Project definition and early knowledge engineering determine what problem the system is intended to address and whether data is available. Even the project definition may evolve over the course of the project as more is learned about the problem and the data. Subsequent knowledge engineering efforts are directed towards identifying data sources, determining what information is relevant and deciding how the information should be represented. Gathering and preparing the data may take a considerable amount of time. A project team member who is knowledgeable about potential data sources and willing to cross organizational boundaries is a tremendous asset.
You may be happy to learn that training a neural network does not necessarily require an intimate understanding of the mathematics underlying the technology. Several neural network development tools on the market include the most widely used network models (such as backpropagation). We typically train networks on desktop computers to take advantage of the GUI development interface even if the finished network will reside on another type of computer.
Training is not generally a one-step process. Different variations of the network are trained in order to determine the best network architecture, training parameters and data representations. The various networks are compared on the basis of performance for the training data and for a separate set of data called a holdback set. The holdback set is used to verify that the network is able to generalize to new situations. It is possible for a network to overtrain, which means that it reflects relationships that only exist in the training data.
Network Validation and Integration
During network validation, network performance is evaluated against a third set of data put aside for final performance evaluation. Performance is evaluated in absolute terms, against competing methods, or against some other performance benchmark to determine whether the network is ready for production use. In-depth statistical analysis of network errors can also be used to uncover strengths and weaknesses in network performance that guide the next iteration of knowledge engineering and training. Once the network is successfully validated, it can be embedded in the production environment of expert system or conventional code necessary to interface with data sources and users (network integration).
I hope that I have succeeded in removing some of the mystery that surrounds neural network technology. I must admit, however, that the more I learn about neural networks the more I appreciate the many mysteries that remain. Although each element of a network is deceptively simple in isolation, the learning behavior of the network as a whole is a strange and wonderful thing to behold. Neural networks are a perfect example of the old adage that the whole is more than the sum of its parts.
© 2015 Churchill Systems Inc.