The process used in this project is the inverted pendulum system. Because the inverted pendulum is a highly nonlinear and open-loop unstable system, therefore, the standard linear techniques cannot model the nonlinear dynamics of the system.
The inverted pendulum has been used as a useful laboratory idealization of unstable mechanical systems and at the same time it is also a good example when considering the application of neural networks to control systems. The main task of this project is to design a neural network controller which keeps the pendulum system stabilized.
System identification is the procedure that develops models of a dynamic system based on the input and output signals from the system. The input and output data must show some of the dynamics of the process. The parameters of the model are adjusted until the output from the model is similar to the output of the real system.
This project simulation work is based on the tutorial Mastering Simulink which is written by James B.Dabney and Thomas L. Harman. This book covers all of the important capabilities of Simulink, include subsystem, masking, callbacks, S-Functions, and debugging. This project theoretical development is based on the material taught at the University of the West of England (UWE) by professor Zhu, Q. is a background for this research. There are also some resources on the website which will be listed in detail in the references.
Chapter 1 details the research on the inverted pendulum system. The dynamic system equations (linear and nonlinear) are derived. After that, the simulink models of linear and non linear pendulum system are also created. This section based on the theory on the website Wikipedia and the tutorial on Mastering Simulink of James B.Dabney and Thomas L.Harman.
Chapter 2 covers the theory, structure and operation of artificial neural networks Chapter 3 details the neural controller on simulink
Chapter 4 compare to the previous studies
Chapter 5 Conclusion
2. CHAPTER 1 - INTRODUCTION TO INVERTED PENDULUM
2.1 What is inverted pendulum?
An inverted pendulum is a pendulum which has its mass above its pivot point. This system consists of a moving cart which the inverted pendulum is mounted on. A servomotor is controlling the translation motion of the cart through a belt mechanism. However, a normal pendulum is always stable when hanging downwards whereas an inverted pendulum naturally tends to fall down from the top vertical position, which is an unstable position. Therefore, the inverted pendulum must be balanced to remain upright on the top vertical.
2.2 Mathematical modeling of inverted pendulum
An inverted pendulum is a classic control problem. The process is non linear and unstable with one input signal and several output signals. The aim is to balance a pendulum vertically on a motor driven wagon.
The dynamical equations for the inverted pendulum system will be derived using Lagrange Equations. By using this method, it is possible to derive dynamical system equations for a complicated mechanical system such as the inverted pendulum. The Lagrange equations use the kinetic and potential energy in the system to determine the dynamical equations of the cart-pole system.
where M is Mass of the cart
m is mass of the pole
l is length of the pole
f is control force
The kinetic energy of the system is the sum of the kinetic of each mass. The kinetic energy T1 of the cart is
T1 = 12 My'2 (Eq.1)
The pole can move in both the horizontal and vertical directions so the pole kinetic energy is
T2 =12 m (y2'2 +z2'2) (Eq.2)
From the free body diagram y2 and z2 equal to
y2 = y + l sinθ (Eq.3)
=>y2'=y'+ lθ'cosθ (Eq.4)
z2 =l cosθ (Eq.5)
=>z2'=-l θ'sinθ (Eq.6)
The total kinetic energy, T of the system is equal to
T= T1 + T2 = 12My'2+ m y2'2+ z2'2 (Eq.7)
Equation 3 and 5 are inputted into equation 7 to give equation 8
T = 12My'2+12my'2+ 2y'θ'lcosθ+l2θ2 (Eq.8)
The potential energy, V of the system is stored in the pendulum so
V = mgz2 = mgl cosθ (Eq.9)
The Lagrange function is
L = T - V = 12 M+my2 + ml cosθ y'θ'+ 12ml2θ2- mgl cosθ (Eq.10)
The state-space variables of the system are y and q, so the Lagrange equations are
ddtδLδy'-δLδy=f (Eq.11)
ddtδLδθ'-δLδθ=0 (Eq.12)
But,
δLδy'=M+my'+mlcosθθ' (Eq.13)
δLδy=0 (Eq.14)
δLδθ'=mlcosθy'+ml2θ' (Eq.15)
δLδθ=mlsinθ-mly'θ'sinθ (Eq.16)
The above equations (Eq. 13-16) are inputted into the Lagrange equations (Eq. 11-12) and this results in the non-linear dynamical equations for the inverted pendulum system, which are shown below
M+my''+mlcosθθ''-mlθ'2sinθ=f (Eq.17)
mlcosθy''+ml2θ''-mglsinθ=0 (Eq.18)
Both of equations are nonlinear, however, to balance the inverted pendulum or to keep the pendulum upright, these equations can be linear with
2.3 Simulink model of inverted pendulum
It possible to linearized these equations by approximatingcosθ=1 and sinθ=0 , we get the two linear equations are
M+my''+mlθ''=F
mly''+lθ''=0
Or
θ''=-FMl (Eq.19)
y''=FM (Eq.20)
According to the two linear system equations, a simulink models of the inverted pendulum system is constructed as below
The two models are set-up using a mask. The mask makes it possible to change the values of m, l, g, etc for different simulations.
% Initialize the system parameters and obtain the model characteristics
M=1.2; % cart mass
m=0.11; % pendulum mass
l=0.4; % pendulum length
g=0.98; % gravity
However, the pendulum falls over too quickly. In order to model the inverted pendulum it is necessary to stabilize it using a feedback controller.
The next chapter in the report discusses the theory and operation of artificial neural networks.
3. CHAPTER 2 - ARTIFICIAL NEURAL NETWORKS
3.1 What is Artificial Neural Networks
An artificial neural network (ANN), sometimes called “neural network” (NN), is a mathematical model which simulate the structure or function aspects of biological neural networks.
Neural networks consist of cells known as neurons that transmit electrical impulses throughout the central nervous system. Individual neurons consist of dendrites, soma, axons, and myelin sheath. Dendrites receive signals from other neurons. The soma represents the cell body, protecting the neuron nucleus. Axons act as terminals for electrical impulses, with the myelin sheath acting as an insulator. Certain neurons perform specific tasks, such as transmitting signals from sensory or motor organs to the brain. Multiple neurons transmitting data for a specific purpose form a neural network. Modern scientists continue to improve on creating ANN models that duplicate the phenomena of biological neurons, enabling inventors to create machines that perform humanlike tasks.
3.2 Advantages ANN's
a. They are non linear
A neuron itself is not necessarily non-linear, but putted together with other neurons in a network, they enter a complexity that is able to calculate non-linear processes, this unlinearity is obviously the most important ability the neural network has.
b. They can relate input with output
Another of the best qualities of neural networks is the ability to train. With a set of input-output examples and a training algorithm, you can train your neural network.
c. Neural networks are composed of elements operating in parallel. Parallel processing allows increased speed of calculation compared to slower sequential processing.
d. Artificial neural networks (ANN) have memory. The memory in neural networks corresponds to the weights in the neurons. Neural networks can be trained offline and then transferred into a process where adaptive learning takes place. In our case, a neural network controller could be trained to control an inverted pendulum system offline say in the simulink environment. After training, the network weights are set. The ANN is placed in a feedback loop with the actual process. The network will adapt the weights to improve performance as it controls the pendulum system.
3.3 Types of learning
There are three major learning paradigms, each corresponding to a particular abstract learning task. These are supervised learning, unsupervised learning and reinforcement learning. In supervised learning the output from the neural network is compared with a set of targets, the error signal is used to update the weights in the neural network. Reinforced learning is similar to supervised learning however there are no targets given, the algorithm is given a grade of the ANN performance. Unsupervised learning updates the weights based on the input data only. The ANN learns to cluster different input patterns into different classes.
3.4 Neural Network Structure
There are 3 main types of ANN structures: single layer feed forward network, multi-layer feed forward network and recurrent networks.
3.4.1 The perceptron
The perceptron is the simplest form of a neural network used for the classification of patterns said to be linearly separable. Basically, it consists of a single neuron with adjustable synaptic weights and bias.
“The perceptron is a program that learn concepts, i.e. it can learn to respond with True (1) or False (0) for inputs we present to it, by repeatedly "studying" examples presented to it.
The Perceptron is a single layer neural network whose weights and biases could be trained to produce a correct target vector when presented with the corresponding input vector. The training technique used is called the perceptron learning rule. The perceptron generated great interest due to its ability to generalize from its training vectors and work with randomly distributed connections. Perceptrons are especially suited for simple problems in pattern classification."
Professor Jianfeng feng, Centre for Scientific Computing, Warwick university, England.
According to the diagram, the inputs to the perceptron are individually weighted and then summed. The perceptron computes the output as a function F of the sum. The activation function, F needed to introduce non linearities into the network. This makes multi-layer networks powerful in representing nonlinear functions.
3.4.2 The Multilayer perceptron
Multilayer perceptrons have been applied successfully to solve some difficult and diverse problems by training them in a supervised manner with a highly popular algorithm known as the error back propagation algorithm.
Neural networks can have several layers. There are 2 main types of multi-layer networks: feed forward and recurrent.
In feed forward networks the direction of signals is from input to output, there is no feedback in the layers.
A multilayer perceptron has three distinctive characteristics:
(Neural Networks A comprehensive Foundation_Simon Haykin)
4. CHAPTER 3 - STABILIZATION OF INVERTED PENDULUM
A full state feedback controller is developed to stabilize the linear pendulum system. The linear system could have been stabilized using many different methods (PID, etc). The full state feedback controller stabilizes the system by positioning the closed loop poles in the stable region. The simulink model with controller is shown below.
The linear pendulum system is simulated, the angle of the pendulum is shown in figure 10. The stabilized system with controller keeps the pendulum angle stable. The pendulum can be simulated for longer times.
Developing a controller for the non-linear pendulum is more difficult. Linear control techniques such as the PID, full-state feedback were tested but had no success in controlling the non-linear pendulum. A feedback linearization controller was developed to control the non-linear pendulum system. Feedback linearization cancels the non-linearity in the pendulum system so that the closed loop system is more linear.
The following equations are a control law developed for the inverted pendulum controller. The first four equations (Eq. 21-24) are entered into the main equation. The main equation (Eq. 25) calculates the required force, U to keep the pendulum stable.
h1=34lgsinθ Eq.21
h2=34lcosθ (Eq.22)
f1=mlsinθθ'2-38gsin2θ (Eq.23)
f2=M+m1-34cos2θ (Eq.24)
u=f2h2h1+k1θ-θd+k2θ'+c1(x-x'd)+c2x'-f1 (Eq.25)
For the simulations M, m, l, g are set to the values of the pendulum model. The following numeric values are used: M = 1.2 Kg, m = 0.1 Kg, l = 0.4 m, g = 9.81 m/s, k1=25, k2=10, C1=1, C2=2.6. Also xd=0 meters and θd = 0 rad, which are the desired position of the cart and angle of the pendulum respectively. A simulink model of the above control law was developed and is shown as below
The following diagram shows the set-up of the non-linear pendulum with controller
Figure 12. Simulink diagram of the nonlinear system with controller
Figure 13 is the closed loop pendulum angle plotted by matlab. The closed loop response is stable and shows that the control law is working.
Figure 13. Simulation of the nonlinear pendulum with controller
The linear and nonlinear models of the cart-pole system have been developed and simulated. It was found that the system is open loop unstable. For accurate system identification the process must be stable, because of this, standard feedback controllers were developed and tested.
5. CHAPTER 4 - NEURAL CONTROL OF INVERTED PENDULUM IN SIMULINK
The main task of this project is to design a controller which keeps the pendulum system inverted. There are a few important points to remember when designing a controller for the inverted pendulum. The inverted pendulum is open-loop unstable, non-linear and a multi output system. To show the advantages of using neural-control in this project a comparison between a standard PID control and neural control is made.
Nonlinear system: Standard linear PID controllers cannot be used for this system because they cannot map the complex nonlinearities in the pendulum process. ANN's have shown that they are capable of identifying complex nonlinear systems. They should be well suited for generating the complex internal mapping from inputs to control actions.
Multi-output system: The inverted pendulum has four outputs, in order to have full state feedback control four PID controllers would have to be used. Neural networks have a big advantage here due to their parallel nature. One ANN could be used instead of four PID's.
Open-loop unstable: The inverted pendulum is open-loop unstable. As soon as the system is simulated the pendulum falls over. Neural networks take time to train so the pendulum system will have to be stabilized somehow before a neural network can be trained.
Before the actual neuro-controller is developed in matlab, the main types of neuro-control are discussed. The five types of neural network control methods that have been researched are supervised, model refernce control, direct inverse, internal model control and unsupervised.
5.1 Supervised Control
It is possible to teach a neural network the correct actions by using an existing controller or human feedback. This type of control is called supervised learning. But why would we want to copy an existing controller that already does the job? Most traditional controllers (feedback linearization, rule-based control) are based around an operating point. This means that the controller can operate correctly if the plant/process operates around a certain point. These controllers will fail if there is any sort of uncertainty or change in the unknown plant. The advantages of neural control is if an uncertainty in the plant occurs the ANN will be able to adapt it's parameters and maintain controlling the plant when other robust controllers would fail. In supervised control, a teacher provides correct actions for the neural network to learn. In offline training the targets are provided by an existing controller, the neural network adjusts its weights until the output from the ANN is similar to the controller.
When the neural network is trained, it is placed in the feedback loop. Because the ANN is trained using the existing controller targets, it should be able to control the process.
At this stage, there is a ANN which controls the process similar to the existing controller. The real advantage of neuro-control is the ability to be adaptive online.(Fig.55) An error signal (desired signal - real output signal) is calculated and used to adjust the weights online.
If a large disturbance/uncertainty occurs in the process- the large error signal is feedback into the ANN and this adjusts the weights so the system remains stable.
5.2 Model Reference Control
In the diagram above (Fig. 55) the error signal is generated by subtracting the output signal from the desired system response. In model reference control the desired closed loop response is specified through a stable reference model (Fig.56). [19] The control system attempts to make the process output similar to the reference model output.
6. Chapter 5 - Compare
7. Conclusion
7.1 Summary
This research has applied artificial neural networks to the identification and control of the inverted pendulum. Before identification techniques could be tested, a model representing the inverted pendulum was developed in simulink. Some of the modelling and control techniques involved in the project are linear so a linearized version of the inverted pendulum was developed. Open loop identification was initially tested but it was found that the inverted pendulum is open loop unstable. One of the requirements for accurate identification is experimental input-output data that shows the dynamics of the system. It was decided that system identification would be performed in closed-loop so stabilizing feedback controllers had to be developed for the linear and nonlinear inverted pendulum. A simple full-state feedback controller stabilized the linear pendulum and a control law was developed to stabilize the nonlinear pendulum. The closed loop data is stable and the inverted pendulum can be simulated for longer times so more data can be collected.
To achieve a better approximation of the inverted pendulum, the nonlinear system must be used. The linear identification techniques were applied to the nonlinear pendulum system and were found to be inadequate in modelling the nonlinear nature of the system. The nonlinear nature of neural networks gives them an advantage over linear models in the prediction of non-linear systems. Before the inverted pendulum system is identified, the process is stabilized using the control law. The control law removes some of the nonlinearities from the process so a detuned control law was used which allows the process to exhibit more of its dynamics. This improves the quality of the data used in the system identification.
Initially single-input single-output networks were developed, the input being the control force and the output pendulum angle. The first type of neural network to be developed arefeedforward. Feedforward networks with a range of hidden layer neurons were tested. Thefeedforward networks modelled the inverted pendulum well. The MSE between the processand the neuron model is low and the model predicts the dynamics of the pendulum angle. In open-loop identification, increasing the number of hidden layer neurons will have a direct influence on the accuracy of the model. In the closed loop case, it was found that using adetuned controller had more of an influence on the model accuracy than increasing the number of hidden layer neurons.
The main task in the project was to design a controller which keeps the pendulum system inverted. The four main types of neural control (supervised, unsupervised, direct inverse and internal model control) were researched to determine which control technique would be the most efficient to implement. The earliest application of neural networks to the inverted pendulum is by Widrow and Smith [25] and Widrow [26]. They used traditional control methods to derive a control law to stabilize the linearized system. They then trained a neural network to mimic the output of the control law. It was decided that supervised control would be the least complex to implement. It was not possible to develop direct inverse control because this control method requires that the process to be controlled is already open-loop stable. The unsupervised control technique developed by Anderson was just too complex for the project time frame. The first neuro-controller was developed by training a feedforward network to model the control law. Elman networks were also used here to model the control law but were not as accurate. When the training was finished the neural network was exported into simulink and the network was placed in the feedback loop instead of the existing control law. The neural network controlled the inverted pendulum similar to the control law.
An experiment was set-up which creates a disturbance to the process during the simulation. The neural network lost control of the inverted pendulum because it was unable to adjust its weights to counteract this disturbance. This problem was solved by using the adaptive neural toolbox. This toolbox makes it possible for online neural learning to occur. Two types of neural network were used - Adaline and multi-layered perceptron (MLP). The ANN was trained offline using the control law. The advantage of using this type of network is if a disturbance occurs during operation, the error signal is fed back into the Adaline which adjusts the weights of the network and this counteracts the disturbance. The Adaline adaptive block is designed for approximating ‘almost linear' functions. It was found that the Adaline could approximate the control law very accurately. It was decided to test some of the identification and control techniques on the real time inverted pendulum rig. The real-time inverted pendulum is also open-loop unstable. The real time kernal (RTK) uses standard PID controllers to stabilize the system. Online identification was possible using the adaptive neural toolbox. It was not possible to develop a neural controller for the real time system but significant progress was made.
7.2 Future work
The results from the Elman networks were not as accurate as the feedforward networks. The dynamic Elman networks should have been more accurate when modelling a dynamic system such as the inverted pendulum. This could be investigated. When modelling the inverted pendulum closed loop identification must be used. One of the faults of closed loop identification is the controller removes some of the dynamics of the process. More research is needed in developing models from closed loop data. The neural network controllers developed in the project were all based on the traditional control law developed. When training an ANN using supervised learning there must be an existing controller to copy. In order to develop a control law the dynamics of the process must be known. If it is not possible to develop a control law or the dynamics of the process are not known then there is no way to train a neural network. A solution to this problem is developing an unsupervised controller. Unsupervised control does not require an accurate model of the system dynamics or the systems desired behaviour. The only feedback signal to the controller is a failure signal when the pendulum falls past a certain angle. The control signal must learn through experience by trying various actions. The work done by Anderson [23] in unsupervised control gives practical guidelines in developing a controller. The next possible future research could be on unsupervised control of the inverted pendulum. Supervised control with neural networks has been done a thousand times now and unsupervised control is a more difficult but interesting problem.
References
Mastering Simulink
http://www.codeproject.com/KB/recipes/NeuralNetwork_1.aspx?msg=2353454