Design a Detailed Procedure for Training Back Propagation Network Along With Flow Chart

Statistical and Numerical Approaches for Modeling and Optimizing Laser Micromachining Process-Review

Shadi M. Karazi , ... Khaled Y. Benyounis , in Reference Module in Materials Science and Materials Engineering, 2019

3.4 . The Back-Propagation Algorithm

Back-propagation algorithm is the most common supervised learning algorithm. The concept of this algorithm is to adjust the weights minimizing the error between the actual output and the predicted output of the ANN using a function based on delta rule. It involves working backwards from the output layer to adjust the weights accordingly and reduce the average error across all layers. This process is repeated until the output error is minimized. The basic back-propagation algorithm adjusts the weights in the steepest descent direction [22–24].

Using this algorithm, the network training consists of three stages: (a) feed-forward of the input training pattern; (b) calculation and back-propagation of the associated error; and (c) the adjustment of the weights. By starting from the output layer, backwards pass propagates the error. This process continues until the minimum error is reached. In weight update phase, input activation level and output delta are multiplied to get the gradient weight. Then weights are put in the reverse direction of the gradient by subtracting the ratio of it from the weight [25]. Since data normalization minimizes the chances of convergence to a local minimum on the error surface, convergence is more readily achieved through normalization of the input and output data [26].

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128035818116509

Knowledge Based Modeling

J.G. Lenard , ... L. Cser , in Mathematical and Physical Simulation of the Properties of Hot Rolled Products, 1999

9.6.6 Using neural networks to predict parameters in hot working of aluminum alloys

The ability of an artificial neural network model, using a back propagation learning algorithm, to predict the flow stress, roll force and roll torque obtained during hot compression and rolling of aluminum alloys, is studied. The well-trained neural network models are shown to provide fast, accurate and consistent results, making them superior to other predictive techniques.

Back propagation neural networks: The multi-layered feedforward back-propagation algorithm is central to much work on modeling and classification by neural networks. This technique is currently one of the most often used supervised learning algorithms. Supervised learning implies that a good set of data or pattern associations is needed to train the network. Input-output pairs are presented to the network, and weights are adjusted to minimize the error between network output and actual value. The knowledge that a neural network possesses is stored in these weights. The back propagation model, presented in Figures 9.5 and 9.6, has three layers of neurons: an input layer, a hidden layer and an output layer. The back-propagation training algorithm is an iterative gradient algorithm, designed to minimize the mean square error between the predicted output and the desired output. It requires continuously differentiable non-linearities. The algorithm of training a back-propagation network is summarized as follows.

•: initialize weights and threshold value: Set all weights and threshold to small random values. Present input and desired output;
•: compute the output of each node in the hidden layer;
•: compute the output of each node in the output layer;
•: compute the output layer error between the target and the observed output;
•: compute the hidden layer error,
•: adjust the weights and thresholds in the output layer;
•: adjust the weights and thresholds in the hidden layer; and
•: repeat the above steps on all pattern pairs until the output layer error is within the specified tolerance for each pattern and for each neuron.

Deformation resistance of the material: Cylindrical samples of the Al 1100-H14 alloy of 20 mm diameter and 30 mm height have been used to determine the metal's resistance to deformation. The specimens have been machined from plates with the longitudinal direction parallel to the rolling direction. The flat ends of each specimen were machined to a depth of 0.1∼0.2 mm to retain the lubricant, boron nitride. A type K thermocouple in an INCONEL shield, with outside diameter of 1.54 mm and 0.26 mm thermocouple wires, was embedded centrally in each specimen. The chemical composition of the material is given in Table 9.13. The compression tests were carried out on a servohydraulic testing system, at different true, constant strain rates and temperatures, as presented in Table 9.14. Temperatures were set to 400, 450, and 500°C and the strain rate ranged from 0.97 to 11.53 s⁻¹.

Table 9.13. The chemical composition of the material (weight %)

Mn	Si	Zn	Cu	Al
0.05	1.00	0.1	0.05	remainder

Table 9.14. Experimental matrix used in flow stress evaluation tests

temperature ⇑ – strain rate ⟹	0.97	2.97	5.04	7.58	11.53
400 °C	A1	A2	A3	A4	A5
450 °C	B1	B2	B3	B4	B5
500 °C	C1	C2	C3	C4	C5

Hot rolling of an aluminum alloy: A commercially available 3000 aluminum was used in this phase of the study. Strips were prepared, in 6.12 − 6.16mm thickness, 50 − 52 mm width and 310 mm in length, cut along the direction of rolling of the plates. The chemical composition of the strips is given in Table 9.15. The surface roughness of the strips, as delivered, was in the order of 0.3 μm. Each strip has a type K (chromel-alumel) thermocouple embedded to a depth of 15 mm in its tail end.

Table 9.15. The chemical composition of the material (weight %)

Mn	Fe	Si	Mg	Zn	Cu	Al
1.00	0.63	.20	0.005	0.016	0.097	remainder

Lubricants with different emulsion concentration are used in the hot rolling tests. The base oil is one of two kinds of natural and semi-synthetic oil. The lubricants are made of four base oils, referred to as natural A, natural B, semi-synthetic A and semi-synthetic B. In the case of natural oil, the lubricants were prepared with water and 1% and 3% of the oils by volume, and for semi-synthetic oil they were prepared of 1% and 10% of the oils by volume. Lubricant A was designed for low friction application and B for higher friction. Both lubricants are based on synthetic esters. The viscosities of the oils at 40 °C and 100 °C are given in the Table 9.16.

Table 9.16. The properties of the lubricants

lubricant	Density (g/ml)	viscosity at 40°C (mm²/s)	viscosity at 100°C (mm²/s)
Natural A	0.904	37.9	5.5
Natural B	0.914	76.3	8.6
Semi-synthetic A	0.886	28.5	5.5
Semi-synthetic B	0.883	29.6	5.7

A two-high mill having rolls of 250 mm diameter and 100 mm length, driven by a DC motor of 42 kW power is used. The rolls are hardened to Rc=52 and are ground circumferentially to a surface finish of Ra=0.18 μm. Two industrial hot-air guns are used to heat the rolls to approximately 90°C. The roll force is measured by two load cells located under the bearing blocks of the lower roll. Two torque transducers in drive spindles measure the roll torque of the upper and lower rolls. The roll speed is measured by a digital shaft encoder, installed in the top drive spindle. In order to measure the forward slip, two photosensitive diodes are installed at exit a known distance apart, the signals of which allow the determination of the exit strip velocity, leading to the original definition of the forward slip. Two reductions of nominally 15 and 35% magnitude are used. The rolling speed is varied from a low of 20 rpm to 160 rpm, giving surface velocities of 0.26 and 2.1 m/s, the higher value of which is near commercial operating speeds.

Flow stress prediction: The purpose of modeling is to develop an effective representation of material behavior at high temperatures. From each set of compression test data, 15 data points were picked in equal strain intervals of 0.05. The total number of training data sets for 13 conditions is 195. Conditions B2 and B4 were set aside as network generalization test data and the rest of the patterns were used in training the network. All the input and output values were normalized into the range [-0.9 to 0.9], to avoid premature saturation of the sigmoid function.

The learning rate and momentum rate were both initially set to 0.5, then incremented to 0.9. The learning rate of 0.9, with a momentum rate of 0.7, resulted in the fastest convergence. The logistic sigmoid function with a constant steepness factor of 0.5 was chosen as the activation function. Both one hidden layer and two hidden layer networks were examined to investigate the effect of extra hidden layers. The error measure for network performance evaluation was the mean relative error of all the training data points.

It was found that the two-hidden layer topology has no advantage over one hidden layer for equal numbers of total processing nodes. Increasing the number of hidden nodes to five increased the accuracy, however, further increase in the number of hidden nodes had no considerable benefit. Therefore, the five hidden nodes network was concluded to be the most efficient design. Training results after 5,000 iterations are shown in Table 9.17.

Table 9.17. Mean relative error in the prediction of the flow stress – training data

temperature (°C)	mean relative error (%)
400	2.584
450	2.218
500	1.799

The plot in Figure 9.11 shows the comparison of the experimental values of flow stress with those predicted by the use of the fully trained neural network, at the strain rate of 5.04 s⁻¹. It is found that the predicted flow curves follow the experimental flow curves very closely. The mean relative error was calculated to be less than 1.9%.

Similar accuracy was also found at other strain rates. This clearly indicates that the network was able to learn accurately the training data set. The main quality indicator of a neural network is its generalization ability, that is, its ability to predict accurately the output of unseen test data. The network was trained using data at 400°C and 500°C; and then tested and compared using the data at 450°C, shown in Figure 9.12. Good predictive ability was observed. The mean relative error was found to be 2.83%. This error is less than the errors that usually arise in flow stress measurements due to unavoidable variations in temperature, strain rate, and interfacial frictional resistance.

Roll force prediction: A data base of the roll separating forces and roll torque during hot rolling of the 3000 aluminum alloy strips was developed, as described above. In the first step of neural network modeling, the effect of the lubricant type was excluded, and a network was trained to predict roll forces for a given lubricant. The model inputs were: reduction, roll speed, strip temperature, and emulsion concentration. The network was trained using 45 data points, and learned the force variation very closely. The predictive ability of the network was then tested and the results of these computations are shown using the solid diamonds. The network roll force prediction for Natural-A lubricant is shown in Figure 9.13. The maximum relative error during testing was approximately 10% with a mean relative error of 4.1%. Again, this level of error is satisfactory and smaller than errors that normally arise due to experimental variations and the accuracy of instrumentation.

After successful development of the model for a given lubricant, the lubricant type was incorporated into the force prediction model. A configuration of one hidden layer with 8 nodes, five input and one output node with a learning rate of 0.7 and momentum rate of 0.7, was found to perform best. After 10000 iterations, the network converged to a solution and further iterations had an insignificant effect on error reduction. The results are given in Figure 9.14 for two sets of reductions, 15% and 35%, at various rolling velocities. The emulsion contained 3%, by volume, of the Natural-A lubricant.

The trained network predicted the roll force for 66% of conditions within 5% relative error band, 95% within 10% error band, and only three of the conditions were predicted with errors up to a maximum of 13%. As observed, the predictive ability of the network is satisfactory. The average error, obtained while testing the network, is 4.46% and 3.77%, respectively, for the two reductions, lower than is usually obtained usingthe more classical modeling methods.

Roll torque prediction: A network with one hidden layer is applied in the prediction of roll torque, with five input parameters and one output parameter. The experimental matrix included 66 data points, of which 45 were used for network training and 21 were randomly selected to test the performance of the trained neural network. The learning rate and momentum rate were both initially set to 0.3 and then increased to 0.9. One hidden layer with 8 nodes was found to perform best with a learning rate 0.7 and momentum rate of 0.7 for the optimal condition. The training and testing results, after 15000 iterations, produced relative errors within a 10% error band, shown in Figure 9.15. The network also predicted well the test data, with errors less than 10%. The points with high relative error are the training data.

Traditional models: Empirical models, one-dimensional models and finite-element based models have been used in a large number of publications to predict roll forces and torques, in addition to other variables of importance in the flat rolling process. The predictions of these have been shown to be reasonably accurate and consistent, providing the metal's resistance to deformation, as well as the boundary and initial conditions have been described in an adequate manner.

The material's flow strength may be measured with good accuracy and may be modeled using well established constitutive models. The problems encountered concern the boundary conditions; specifically the coefficients of friction and heat transfer at the roll/workpiece contact surface, which are notoriously difficult to measure. Using inappropriate magnitudes inevitably results in poor predictions. Thus, this is one of the major advantages of the neural network approach: neither of these parameters must be known to produce accurate modeling. The small price to pay for the accuracy is the lack of an empirical relation which may be used in the future, however, as long as a computer and the software are available, that relation is no longer necessary.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780080427010500097

Volume 3

B.K. Lavine , T.R. Blank , in Comprehensive Chemometrics, 2009

3.18.4.1 Backpropagation

Backpropagation of error is an effective method for training the weights of feed-forward neural networks that contain hidden layers. ⁸ The backpropagation of error paradigm is based on the assumption that all weights contribute to some portion of the output error, and that weight corrections should be proportional to the output error contributed by each weight. In the first backpropagation step, the assigned error associated with the output layer weights connected to output node n is obtained by scaling the output error with the value of the derivative of the transfer function at the sum of inputs, x, to the output node

(5) $δ_{n} = (t_{n} - h_{n}) f^{'} (x)$

In Equation (5), t _n is the target value of the output node n, and h _n is the actual output at the same node. (In a classification problem, the output nodes consist of 0s and 1s arranged carefully to signal a corresponding group membership.) In the second backpropagation step, the errors associated with hidden layer weights connected to hidden layer node j are determined from scaling the weighted sum of the scaled output errors with the value of the derivative of the transfer function at the sum of inputs, x _j, to the jth hidden layer node.

(6) $δ_{j} = [\sum_{j = 1} δ_{j} ω_{n i}] f^{'} (x_{j})$

In the third step, weights are updated by gradient descent in the form of the Widrow–Hoff delta rule.

(7) $Δ w_{j n} = η δ_{n} g_{j}$

The change in each connection weight between hidden layer node j and output layer node n is determined by a universal learning coefficient, η, the activation g _j of node j in the hidden layer, and δ_n, the scaled error associated with weights connected to the node in the output layer. Corrections to the hidden layer weights are made analogously using the hidden layer error terms and activations from the connected input layer nodes.

Given the logistic sigmoid transfer function, the output at hidden node j,g _j, is computed using the sum of weighted inputs to perceptron j, where x _j = Σa _i w _ij.

(8) $g_{j} = f (x_{j}) = \frac{1}{1 + exp (- x_{j})}$

The scaled error associated with the node in the output layer is

(9) $δ_{n} = (t_{n} - h_{n}) f^{''} (x_{n}) = (\frac{\partial E_{p}}{\partial h_{n}}) (\frac{\partial h_{n}}{\partial x_{n}})$

The derivative f′(x _n) is simply the derivative of the transfer function in the output layer, which is equal to 1 for a linear transfer function and h _n(1–h _n) for the logistic sigmoid transfer function. The activation of the jth node in the hidden layer, g _j, can also be expressed as

(10) $g_{j} = \frac{\partial x_{n}}{\partial w_{j n}}$

Substitution of Equations (9) and (10) into Equation (7) gives

(11) $Δ w_{j n} = - η (\frac{\partial E}{\partial h_{n}}) (\frac{\partial h_{n}}{\partial x_{n}}) (\frac{\partial x_{n}}{\partial w_{j n}}) = - η \frac{\partial E}{\partial ω_{j n}}$

This corresponds to gradient descent in p-dimensional weight space with a fixed universal learning coefficient η. Two successive applications of the chain rule defined in Equations (9) and (10) yield the same result for correction of the weights, w _ji, in the hidden layer. These weights are obtained by replacing δ_n with δ_j, and g _j with a _i in Equation (7).

It is important to note that in Equation (6) the output layer weights, w _jn, are factors in the calculation of the scaled error δ_j, which is used to determine hidden layer weight corrections. This means that scaled error terms and the associated weight corrections tend to become larger as the output layer weights grow. It has been suggested that all output weights should be initialized to a small constant value to facilitate equivalent learning rates in the hidden layer weights in the early stages of the optimization process. ⁹

One of the main problems with gradient descent optimized networks is that long training sessions are often required to find an acceptable weight solution because of the well-known difficulties inherent in gradient descent optimization. A plot of the weight coordinates versus the cost function gives a high-dimensional error surface that must be minimized during the iterative optimization of the weights. The location of the minimum of this error surface corresponds to the desired weight solution. When using gradient descent optimization, slow learning occurs when the error gradient, ∂E/∂w, is small.

Large, flat regions of the error surface tend to slow the optimization process dramatically. In these regions of the error surface, the learning coefficient, η, should be increased to compensate for the small magnitude of the gradient, but a smaller learning coefficient may be essential for convergence when steps are being taken in the vicinity of narrow minima. An empirically derived training schedule of learning coefficients can improve the convergence of gradient descent by starting with large coefficients and shrinking them as the optimization proceeds, but unless the desired minimum is symmetric with respect to the weight axes, a scheduled learning coefficient that is both optimal and universal will be, at best, only a compromise between the best coefficients for each weight axis. A helpful modification that helps to resolve this dilemma involves the use of momentum, α, which adds a fraction of the previous weight correction to the gradient, generating the new weight correction.

(12) $Δ w_{j i} (t) = - η (\frac{\partial E}{\partial w_{j n}}) + α Δ w_{j i} (t - 1)$

The momentum term allows for faster convergence with the use of smaller learning coefficients. Including momentum in weight updates has been shown to contribute to more rapid convergence of the network training process by increasing the effective learning coefficient by a factor of 1/(1–α) in regions of relatively constant gradient, and by damping the oscillatory behavior of weight updates at the bottom of steep valleys, where gradient components alternate in sign. ¹⁰ Even with the momentum modification, the speed of convergence is typically dependent on the choice of the learning coefficient and the initial starting weights.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780444527011000260

Handbook of Chemometrics and Qualimetrics: Part B

B.G.M. Vandeginste , ... J. Smeyers-Verbeke , in Data Handling in Science and Technology, 1998

44.5.7.2 Local minima

The back-propagation strategy is a steepest gradient method, a local optimization technique. Therefore, it also suffers from the major drawback of these methods, namely that it can become locked in a local optimum. Many variants have been developed to overcome this drawback [20–24]. None of these does however really solve the problem.

A way to obtain an idea on the robustness of the obtained solution is to retrain the network with a different weight initialization. The results of the different training sessions can be used to define a range around the performance curve as shown in Fig. 44.17. This procedure can also be used to compare different networks [20].

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/S0922348798800543

22nd European Symposium on Computer Aided Process Engineering

Arend Dubbelboer , ... Peter M.M. Bongers , in Computer Aided Chemical Engineering, 2012

3.2 Building the Neural Network

A back propagation gradient descent algorithm was chosen to train the network. The architecture of the model was adjusted with a hands-on approach. One hidden layer with 5 neurons was found sufficient to predict the target values with improved accuracy, i.e. RMSE < 0.11 (the reported root-mean-squared error by Almeida-Rivera et al. [10]).

The noise on the input and target variables was assumed to be normally distributed. All attributes used the same estimated uncertainty. This uncertainty was included in the training of the neural network. The strategy is outlined in Figure 2After each iteration–loop new input variables and new target values were generated around the measured average.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780444595195501398

Casting, Semi-Solid Forming and Hot Metal Forming

M. Zhan , ... H. Yang , in Comprehensive Materials Processing, 2014

5.20.4.3.3 BP Neural Network Model Validation and Application on Ti-Alloy (61)

A BP neural network model with improved arithmetic afforded by the FORTRAN programming language has been developed to predict grain sizes of the primary α phase in the TA15 Ti-alloy with the deformation temperature range from 860 to 980 °C and strain rate range from 0.001 to 10 s⁻¹. Training and testing data were from hot simulation experiments and quantitative metallurgical tests.

Large variations in data create difficulties in adjusting weights from the input layer to the hidden layer, slowing down convergence speed and affecting accuracy within the network. To overcome these shortcomings, a normalization pretreatment was performed for sample data.

The optimum value for the number of hidden layer nodes was determined by trial and error to be nine. In the study, the initial weight and threshold are determined by a random function returning values in the interval (0, 1), and for the hidden layer an S-type activation function was adopted. A linear transfer function is used for the output layer. Setting the learning rate at η = 0.5, with φ = 1.2 and β = 0.6, the additional momentum factor α is set as 0.9 or 0 according to requirements, with precision of 0.02.

Table 6 shows the network outputs (predictions), the corresponding targets (experimental data), and the relative error between them. The results showed that the relative error was small, the minimum being 0.4% with the maximum at only 3.8%. It is obvious that the predicted values from the trained neural network outputs track the targets very well (61).

Table 6. Comparison of the predicted results of primary α phase with the experiment

Temperature (°C)	Strain rate (s⁻¹)	Strain	Experimental value (μm)	Prediction value (μm)	Relative error (%)
860	0.1	0.598	7.33	7.61	3.8
920	1	0.163	8.03	8.06	0.4
980	0.01	0.357	7.31	7.41	1.4

Based on the user routines supplied by the DEFORM software, interface subroutines have been developed, and code to run the BP neural network model has been compiled into the software. A simulation module for grain size prediction during isothermal compression of TA15 alloy has been established and proved to be feasible. Detailed procedure and parameters of TA15 Ti-alloy and simulation were presented in Ref. (61).

Figures 21 and 22 presented predictions of grain size for the TA15 alloy. After forming, the predicted average grain size of the samples is 7.03 μm, while experimental results indicate a value of 7.35 μm, with relative error of 4.4%. For the central region of the sample, the prediction is 6.65 μm, the experimental results yield 6.85 μm, with relative error of 3.0% (Figure 21). In Figure 22 a predicted average grain size is 7.35 μm, while the experimental result gives 7.43 μm, with relative error of 1.1%. For the central region, the predicted result is 6.58 μm and the experimental result is 6.91 μm, with relative error of 4.8%. It was found that the model had a high prediction precision and could meet current practical needs.

The module developed was used to simulate the isothermal extrusion process of TA15 Ti-alloy and the grain size and volume fraction of primary α were predicted (Figure 23).

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B978008096532100529X

Multivariate analysis of data in sensory science

Knut Kvaal , Jean A. McEwan , in Data Handling in Science and Technology, 1996

2.1 Neural Networks

"Neural computing is the study of networks of adaptable nodes, which through a process of learning from task examples, store experimental knowledge and make it available for use." (Aleksander and Morton, 1990).

Neural computing is not a topic immediately associated with sensory science, yet its potential, at least from a theoretical point of view, may have far reaching consequences. First it would be useful to have a look at what neural networks are all about. Inspired by biological neuron activity and a mathematical analogy led a group of researchers to explore the possibility of programming a computer to adopt the functionality of the brain (Neural Ware, 1991).

Considering human processing and response (behaviour), it can readily be seen that the brain is constantly learning and updating information on all aspects of that person's experiences, whether these be active or passive. If a person places his hand on a hot plate, then he learns that the result is pain. This response is recorded and his future behaviour with hot plates will be influenced by this learning. There are many such examples, and the reader interested in human processing and cognition should refer to one of the may textbooks on this subject (e.g., Stillings et al, 1987).

It is very important to mention that the neural network philosophy based on biological modelling of the brain is more of an artefact. We will emphasise that the neural network is a method of a mathematical and statistical visualisation based on some fundamental ideas. We will also in this chapter restrict ourselves to a network topology based on function mapping or pattern recogtioning. Discussion will be restricted to the so called feed forward layer nets. The information flow between the different neurones in a feed forward layer network always flow towards the output. In feed forward nets, each neurone has its own life getting input and sending the local calculated output to other neurones in the next layer. The training process will force the connection weights to be adjusted to minimise the prediction errors. With all these neurones processing simultaneously and independently, a computer is needed that has the ability to do parallel task processing. On a sequential computer like the PC, neurone activity needs to be simulated sequentially. Therefore, each neurone activity is calculated in the direction from input to output.

In order to translate the functionality of the brain into a computer environment, it is first necessary to break the processing of information into a number of levels and components. The first level will be the input of which there may be several components. For example, an individual is given some chocolate from which he perceives a number of sensory attributes. The chocolate and the individual form the stimulus, and for the sake of argument it will be assumed that the sensory attributes are the input variables, as these can be recorded in the physical world.

At the output level, that is the observable response or behaviour, is one component called acceptability, which can also be measured. The hidden layers will process the information initiated at the input. The fundamental building block in a neural network is the neurone. The neuron receives input from the neurones in a earlier layer and adds the inputs after having weighted the inputs. The response of the neurone is a result of a non linear treatment in different regions in the inputspace. The neurones in the hidden layer may be identified as feature detectors. Several hidden layers may exist, but in practice only one is sufficient. This is represented in Figure 1b . The next problem is how to join the levels of the network. In the human brain there is a complex network of connections between the different levels, and the complexity of their use will depend on the amount and type of information processing required.

So far the concepts of input, output and hidden layers have been explained. The next concept is that of a neurone as the processing element. Each neurone has one or more input paths called dendrites. The input paths to processing elements in the hidden layers are combined in the form of a weighted summation (Figure 1a), sometimes referred to as the internal activation. A transfer function is then used to get to the output level. This transfer function is usually either linear, sigmoid or hyperbolic. The sigmoid transfer function is illustrated in Figure 1c. This transfer function behaves linear or non linear according to the range of input. The function acts as a threshold when low level input values are presented. It acts as a saturating function when high level input values are presented. In between it acts as a linear function. In this way we achieve a very flexible function mapping.

The feed-forward neural network in Figure 1 is defined by an equation of the form

(1) $y = f [\sum_{i} b_{i} f (\sum_{j} W_{i j} X_{j} + a_{i 1}) + a_{2}] + e$

where y is the output variable, the x's are the input variables, e is a random error term, f is the transfer function and b_i, w_ij, a₁ and a₂ are constants to be determined. The constants w_ij are the weights that each input element must be multiplied by before their contributions are added in node i in the hidden layer. In this node, the sum over j of all elements W_ijX_j is used as input to the transfer function f. This is in turn multiplied by a weight constants b_i before the summation over i. The constants b_i are the weights that each output from the hidden layer must be multiplied by before their contributions are added in the output neurone. At last the sum over i is used as input for the transfer function f. More than one hidden layer can be used resulting in a similar, but more complicated, function. The constants a₁ and a₂ acts as bias signals to the network. They play the same role as the intercept constants in linear regression.

2.1.1 Learning and backpropagation

The word backpropagation originates from the special learning rule invented by several workers (Hertz et al,1991 page 115). The method is used to optimise a cost function (error function) of the squared differences in predicted output and wanted output. In short, information flows from the input towards output, and error propagates back from output to input. The error in the output layer is calculated as the difference of the actual and the desired output. This error is transferred to the neurones in a middle layer. The middle layer errors are calculated as the weighted sum of the error contributions from the nearest layer. The derivative of the transfer function with respect to the input is used to calculate the so called deltas. The deltas are used to update the weights. The derivative of the transfer function will be zero for very small summed inputs and for very large summed inputs. Thus the derivative of the transfer function stabilizates the learning process.

The backpropagation algorithm minimises the error along the steepest decent path. This may introduce problems with local minima. Finding the global minimum of the error in a model is equivalent of estimation of an optimal set of weights. Learning in a neural network is the process of estimating the set of weights that minimise the error. A trained neural network has the ability to predict responses from a new pattern. The training process may be performed by using different learning rules. This chapter will focus on the backpropagation delta rule. Here the errors for each layer will propagate as a backward information in the network. The weights are updated based on these errors.

The weights are calculated in an iteration process. The weights are given initially random values. By presenting a pattern to net network, the weights are updated by computing the layer errors and the weight changes. The learning process will stop when the network has reached a proper minimum error. The learning process is controlled by the learning constants Irate and momentum. The learning constants are chosen between 0 and 1. Small values slow down the learning. Typical values are 0.5. The Irate controls the update rate according to the new weights change. The momentum acts as a stabilisator being aware of the previous weight changes. In this way the momentum minimises oscillation of the weights. The learning by estimating the weights is described for each layer by

(2) $W_{(new)} = W_{(old)} + Irate * {d W}_{(new)} + momentum * {d W}_{(old)}$

where W_new are the new and updated weights, W_old are the weights before updating, dW_new are the new deltaweights calculated by the backpropagaton learning rule and dW_old are the old deltaweights. The error is calculated as the difference between the actual and calculated outputs. The updates of the weights may be done after each pattern presentation or after all the patterns have been presented to the network (epoch).

There are many modifications of this rule. One approach is to vary the learning constants in a manner to speed the learning process. The method of self adapting constants is considered to be of great value to reduce the computing time during the learning phase. Each weight may also have its own learning rate and momentum term. This approach together with the self adapting learning rates, speeds the learning and therefore it is not so important to choose proper starting values (delta-bar-delta rule). For a more extensive discussion of the mathematics of the backpropagation algorithm, the reader should see the Chapter 6 of Hertz (1991).

2.1.2 Local and Global Minima of the Error

One of the major disadvantages of the backpropagation learning rule is its ability to get stuck in local minima. The error is a function of all the weights in a multidimensional space. This may be visualised as the error surface in a three dimensional space as a landscape with hills and valleys. There is no proof that the global minimum of the error surface has been reached. Starting with different randomised weights leads to different minima if the error surface is rugged. It is important to consider this when analysing the final minimum. The learning is run repeatedly at different starting locations to show that the minimum is reasonable.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/S0922348796800281

Volume 4

P.K. Hopke , in Comprehensive Chemometrics, 2009

4.03.5.2 Neural Networks

The commonly used back-propagation neural networks can be used to solve the CMB problem with the possibility of incorporating nonlinearities into the system. Song and Hopke ⁷⁸ demonstrated the utility of neural network approaches by analyzing the three data sets of Currie et al. ⁷⁴ with better results than had been previously reported using regression methods. One of the advantages of the neural network approach was that by using all of the possible emission sources that may not necessarily be active for a specific ambient data set, the network was able to identify the active sources and simultaneously quantify their source contributions.

It appeared to deal better with both the collinearity of the sources and enabled the resolution of sources that were too collinear to be determined with regression methods. The mass balance problem is actually a linear one, but there is a high level of noise in the data (5–15%). In this case, the model error from fitting sigmoid functions into the linear model was no longer significant and the ANN was able to give satisfactory prediction source contributions through its adaptive generalization based on the training set examples. Long et al. ⁷⁹ also reached a similar conclusion in spectroscopic analysis when high levels of noise were present.

However, to avoid the problem of needing specific source profiles, it is possible to perform mixture resolution with factor analysis.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780444527011000089

Mathematical Modeling and Optimization of Injection Molding of Plastics

Salah A. Elsheikhi , Khaled Y. Benyounis , in Reference Module in Materials Science and Materials Engineering, 2017

4.2.2.1 Applications

ANN in terms of BPNN has been integrated with other approaches, such as GA, expected improvement (EI), RBF, and particle swarm optimization (PSO), to complete the optimization process. An integration of BPNN with mentioned approaches have widely used in order to optimize different objectives, such as warpage, shrinkage, dimensions, mechanical properties, cycle time and cooling time, and product cost (Changyu et al., 2007; Chen et al., 2009; Shi et al., 2010; Yin et al., 2011; Yang et al., 2012; Tzeng et al., 2012; Meiabadia et al., 2013; Chen and Lin, 2013; Wang et al., 2013; Dang, 2014; Fittipaldi et al., 2015). Table 4 shows the summary of some selected literature studies of ANN applications on IM.

Table 4. Summary of selected literature optimization survey using artificial neural network (ANN)

References	Design and software used	Input/factors	Output/responses
Ozcelik and Erzurumlu (2006)	Integrated BPNN/GA, Moldflow	Melt temperature, mold temperature, packing pressure, packing pressure time, and cooling time	Minimizing warpage
Changyu et al. (2007)	Integrated BPNN/GA, Matlab	Melt temperature, mold temperature, injection time, packing time, and holding pressure	Minimizing shrinkage
Chen et al. (2009)	Integrated BPNN/GA, Visual Basic	Melt temperature, injection velocity, injection pressure, velocity pressure switch, packing pressure and packing time	Optimizing product's dimensions (length and weight)
Shi et al. (2010)	Integrated BPNN/EI, Moldflow	Melt temperature, mold temperature, injection time, packing pressure, packing time and cooling time	Minimizing warpage
Yin et al. (2011)	Integrated BPNN/GA, Moldflow	Melt temperature, mold temperature, packing pressure, packing time and cooling time	Minimizing warpage and clamping force
Yang et al. (2012)	Combining BPNN/GA and BPNN/SAA, Matlab	Mixture ratios for short glass fiber and polytetrafluoroethylene reinforced polycarbonate composites	Maximizing ultimate strength, flexural strength and impact resistance
Tzeng et al. 2012)	Integrated BPNN/GA, Matlab Neural Network Toolbox	Nozzle temperature, melt temperature, packing pressure, packing time, and mold temperature	Maximizing of ultimate strength, flexure strength, and impact resistance
Meiabadia et al. (2013)	Integrated ANN/GA, Nero solution software and Matlab	Melt temperature, mold temperature, packing pressure, and packing time	Maximizing injection pressure, desired part weight, and minimizing cycle time
Chen and Lin (2013)	Integrated BPNN/GA, Matlab	Melt temperature, injection velocity, packing pressure, packing time and cooling time	Minimizing warpage
Wang et al. (2013)	Integrated PSO/BPNN	Net weight of finished product, weight of runner, machine selection, projection area, size of molded part, height of finished product, expenses of material, process and maintenance, profit tax, and injection cost	Minimizing product cost
Dang (2014)	Integrated RBF/ANN, ISight software	Melt temperature, mold temperature, injection time, packing time and packing pressure	Minimizing arpage, and cooling time
Fittipaldi et al. (2015)	BPNN, Matlab Neural Network Toolbox	Melt temperature, mold temperature, packing time and injection rate	Maximizing tensile strength

Abbreviations: BPNN, back propagation neural network; EI, expected improvement; GA, genetic algorithm; PSO, particle swarm optimization; SAA, simulated annealing algorithm.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128035818041357

Deep learning in QSPR modeling for the prediction of critical properties

Yang Su , Weifeng Shen , in Applications of Artificial Intelligence in Process Systems Engineering, 2021

2.4 Deep neural network

A DNN combining Tree-LSTM and BPNN is developed in this work. The Tree-LSTM neural network is employed for depicting molecular tree data-structures with the canonical molecular signatures while the BPNN is used to correlate properties.

The Child-sum Tree-LSTM can be used to the dependency tree while the N-ary Tree-LSTM is applied to the constituency tree [39], and the mathematical models of these two Tree-LSTM models are listed in Table 2. The gating vectors and memory cell updates of the Tree-LSTM are dependent on the states of child units, which is different from the standard LSTM. Additionally, instead of a single forget gate, the Tree-LSTM unit contains one forget gate f _jk for each child k. This allows the Tree-LSTM to incorporate information selectively from each child. Since the components of the Child-Sum Tree-LSTM unit are calculated from the sum of child hidden states h _k, the Child-Sum Tree-LSTM is well suited for trees with high branching factor or whose children are unordered. The vector $\tilde{h_{j}}$ is the sum of the hidden states of all sub nodes under the current node j in the Child-sum Tree-LSTM model. The N-ary Tree-LSTM model can be utilized in the tree structure where the branching factor is at most N and where children are ordered from 1 to N. For any node j, the hidden state and memory cell of its k ^th child are written as h _jk and c _jk, respectively. The introduction of separate parameter matrices for each child k allows the N-ary Tree-LSTM model to learn more fine-grained conditioning on the states of a unit's children than those of Child-Sum Tree-LSTM.

Table 2. The transition equations of Child-sum Tree-LSTM and N-ary Tree-LSTM [36].

Child-sum Tree-LSTM		N-ary Tree-LSTM
$\tilde{h_{j}} = \sum_{k \in C (j)} h_{k}$	(2)	– ^a
$i_{j} = σ (W^{(i)} x_{j} + U^{(i)} {\tilde{h}}_{j} + b^{(i)})$	(3)	$i_{j} = σ (W^{(i)} x_{j} + \sum_{l = 1}^{N} U_{l}^{(i)} h_{jl} + b^{(i)})$	(9)
f _jk = σ(W ^(f) x _j + U ^(f) h _k + b ^(f))	(4)	$f_{jk} = σ (W^{(f)} x_{j} + \sum_{l = 1}^{N} U_{kl}^{(f)} h_{jl} + b^{(f)})$	(10)
$o_{j} = σ (W^{(o)} x_{j} + U^{(o)} {\tilde{h}}_{j} + b^{(o)})$	(5)	$o_{j} = σ (W^{(o)} x_{j} + \sum_{l = 1}^{N} U_{l}^{(o)} h_{jl} + b^{(o)})$	(11)
$u_{j} = tanh (W^{(u)} x_{j} + U^{(u)} {\tilde{h}}_{j} + b^{(u)})$	(6)	$u_{j} = tanh (W^{(u)} x_{j} + \sum_{l = 1}^{N} U_{l}^{(u)} h_{jl} + b^{(u)})$	(12)
$c_{j} = i_{j} \cdot u_{j} + \sum_{k \in C (j)} f_{jk} \cdot c_{k}$	(7)	$c_{j} = i_{j} \cdot u_{j} + \sum_{l = 1}^{N} f_{jl} \cdot c_{jl}$	(13)
h _j = o _j ⋅ tanh(c _j)	(8)	h _j = o _j ⋅ tanh(c _j)	(14)

a: Note: "–" represents null since $\tilde{h_{j}}$ is not involved in the N-ary Tree-LSTM unit.

The performance evaluation of two Tree-LSTM models on semantic classification indicated that both Tree-LSTM models are superior to the sequential LSTM model and is able to provide better classification capability [36]. Therefore, the N-ary Tree-LSTM network is employed in this work to depict molecules, and the input variables are vectors converted by the embedding algorithm. In the QSPR model, the variable x _j is the input vector representing a substring of a bond ("A" or "-B"), and the vector h _j is the output vector representing a molecular structure. The vector h _j is finally associated with the properties by the BPNN. The BPNN involves an input layer, a hidden layer and an output layer. For other variables and functions in Table 2, W ^(i,o,u,f) , U ^(i,o,u,f) , b ^(i,o,u,f) are parameters that need to be learned, and σ represents the activation function sigmoid. For example, the model can learn parameters W ⁽ⁱ⁾ such that the components of the input gate i _j have values close to 1 (i.e., "open") when an important atom is given as input, and values close to 0 (i.e., "closed") when the input is a less important atom. Taking acetaldoxime as an example again, the computing graph of the neural network is presented in Fig. 5. It can be observed that the Tree-LSTM network mimics the topological structure of the acetaldoxime molecule. That is, if other molecular structures are learned, the Tree-LSTM network can vary the computing graph automatically. The BPNN accepts the output vectors from the Tree-LSTM network and correlates them with the property values. In this way, a DNN is built based on the Tree-LSTM network and BPNN.

Moreover, in this work, the aim of the DNN is to predict a numeric value instead of classification. Hereby, there is no need to employ the activation function "softmax" [43]. The regularization technique "dropout" [44] is introduced to the BPNN for reducing overfitting. Huber loss [45] is adopted as the loss function in the training process, which is different from the frequently used classification scheme of Tree-LSTM network. The information about the DNN is provided in Tables 3 and 4.

Table 3. The structural parameters of the DNN.

Names of the DNN structural parameters	Values
Shape of embedding vectors	(50,1)
Shape of parameters of Tree-LSTM	(128,128)
Shape of output vectors of Tree-LSTM	(128,1)
Layer number of the BPNN	3

Table 4. The hyper parameters of training the DNN.

Names of the hyper parameters	Values
Learning rate	0.02 (the first 200 epochs); 0.0001 (others)
L2 weight decay	0.00001
Batch size of training set	200
Batch size of testing set	200

The regularization technique "dropout" is used to reduce overfitting in the proposed DNN. The "dropout" is easily implemented by randomly selecting nodes of a neural network to be dropped-out with a given probability (e.g., 20%) in each weight update cycle. With the cross validation, the expected probability is located between 5% and 25%.

Read full chapter

URL:

https://www.sciencedirect.com/science/article/pii/B9780128210925000127

Design a Detailed Procedure for Training Back Propagation Network Along With Flow Chart

Source: https://www.sciencedirect.com/topics/chemical-engineering/backpropagation

Design a Detailed Procedure for Training Back Propagation Network Along With Flow Chart

Statistical and Numerical Approaches for Modeling and Optimizing Laser Micromachining Process-Review

3.4 . The Back-Propagation Algorithm

Knowledge Based Modeling

9.6.6 Using neural networks to predict parameters in hot working of aluminum alloys

Volume 3

3.18.4.1 Backpropagation

Handbook of Chemometrics and Qualimetrics: Part B

44.5.7.2 Local minima

22nd European Symposium on Computer Aided Process Engineering

3.2 Building the Neural Network

Casting, Semi-Solid Forming and Hot Metal Forming

5.20.4.3.3 BP Neural Network Model Validation and Application on Ti-Alloy (61)

Multivariate analysis of data in sensory science

2.1 Neural Networks

2.1.1 Learning and backpropagation

2.1.2 Local and Global Minima of the Error

Volume 4

4.03.5.2 Neural Networks

Mathematical Modeling and Optimization of Injection Molding of Plastics

4.2.2.1 Applications

Deep learning in QSPR modeling for the prediction of critical properties

2.4 Deep neural network

0 Response to "Design a Detailed Procedure for Training Back Propagation Network Along With Flow Chart"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel