ANN-Based Control of a Wheeled Inverted Pendulum System Using an Extended DBD Learning Algorithm

Abstract

This paper presents a dynamic model for a self-balancing vehicle using the Euler-Lagrange approach. The design and deployment of an artificial neuronal network (ANN) in a closed-loop control is described. The ANN is characterized by integration of the extended delta-bar-delta algorithm (DBD), which accelerates the adjustment of synaptic weights. The results of the control strategy in the dynamic model of the robot are also presented.

Keywords

Self-balancing Robot Extended Delta-bar-delta Algorithm Artificial Neuronal Network Euler-Lagrange

1. Introduction

The development of heuristic control algorithms for intelligent robots is common practice today. Since these control strategies do not require the dynamic model of any given system, previous knowledge of such a system is sufficient for implementing control algorithms. The adaptive properties of ANNs give rise to the possibility for achieving control strategies for systems with unknown and random disturbances. The self-organization of ANNs, i.e., the ability to modify the entire network to fulfil a specific task, enables the algorithm's capability of adequately reacting to unexpected circumstances [10].

Similarities between wheeled inverted pendulum (WIP) systems and human posture has become the primary reason for the study of these systems in recent years [9]. In 2001, the Segway PT^© was commercially released as a personal transport vehicle. The Segway uses the WIP principle and is capable of reaching a speed of 20.1 km/h [5]. Most control strategies proposed for WIP systems use model-free control strategies, in which the purpose of a dynamic model is only to evaluate the behaviour of the control algorithm and therefore, the dynamic model is not part of the control strategy [15, 13].

Dynamic models as part of the controller's algorithm and the combination of theoretical models with a heuristic approach have been studied in the past. A control strategy based on two decoupled state-space controls that are obtained from a mathematical model which, in turn, is obtained from the physical characteristics of the robotic platform, has been reported [8], where one control strategy is used for pitch and the other for yaw. Additionally, a Tagaki-Sugeno fuzzy control was designed and implemented [1] for a WIP system. This strategy combines heuristic knowledge and information from the dynamic model in order to maintain a robot in the vertical position. Furthermore, this approach allows for motion of the WIP over ascending slopes. Other approaches have also considered this issue [2, 3].

A controller consisting of two ANNs with a radial basis function in the control loop has been presented [6], one for pitch and the other for yaw. This work demonstrates the adequate performance of the controller for low velocities. A neural adaptive output feedback control has been reported [6]. This framework incorporates a linear dynamic compensator in order to increase the vertical stability of a WIP robot, while allowing for the tracing of paths.

Several techniques have been proposed for deriving the dynamic equations of motion in WIP systems, including Newton equations [8], Lagrangian equations [4, 12], non-holonomic constraints [13, 7] and the addition of these constraints alongside Boltzman-Hamel equations [14]. In the present work, a dynamic model is obtained using the Euler-Lagrange equations.

2. Dynamic Model

In order to obtain the dynamic model of the proposed system it was assumed that:

The robot moves over a flat surface.

The wheels are perpendicular to the ground.

The robot's body behaves as a rigid body.

According to the free-body diagrams depicted in Fig. 1 and in Fig. 2, the self-balancing robot consists of three elements: the two wheels and the robot's body. The chosen parameters for the dynamic model are summarized in Table 1.

Table 1.

System parameters

Variable	Description
m ₁	Mass of the left wheel
m ₂	Mass of the right wheel
m ₃	Mass of the robot's body
I ₁	Moment of inertia caused by m₁
I ₂	Moment of inertia caused by m₂
I ₃	Moment of inertia caused by m₃
W	Distance between wheels
hc	Distance from the wheels’ axis to the centre of mass m₃.
r	Wheel radius (distance from the (x,y) plane to the centre of the mass of the wheels
g	Acceleration of gravity

Figure 1.

Side view of the free-body diagram

Figure 2.

Top view of the free-body diagram

The proposed modelling method, an Euler-Lagrange approach, defines a set of differential equations describing the time evolution of a mechanical system under holonomic constraints [9]. This methodology requires the definition of the generalized coordinates vector, the calculation of the kinetic energy of the system, definition of the potential energy, computation of the Lagrangian and finally, establishment of the differential equations.

2.1. Generalized coordinates vector

Information-carrying variables of the degrees-of-freedom define the generalized coordinates vector. In our study, the coordinates vector is defined as:

q (t) = [\begin{matrix} θ \\ ϕ \\ d \end{matrix}],

(1)

where θ represents the angle with respect to the vertical component, φ is the rotational angle of the robot with respect to the reference plane and d is the displacement of the robot.

2.2. Kinetic energy

It is known that the body masses in the robot are subject to rotational inertia, which contributes to the acceleration of the body masses. Hence, the kinetic energy can be established as:

K_{m_{i}} = \frac{1}{2} m_{i} v_{m_{i}}^{2} + \frac{1}{2} I_{m_{i}} ω_{m_{i}}^{2},

(2)

where m_i represents the body masses in the robot, v_i represents the velocity of each corresponding mass, $I_{m_{i}}$ represents the moment of inertia of the masses and $ω_{m_{i}}$ is the angular velocity.

The overall kinetic energy of the system becomes the sumatory of the kinetic energy in each body mass. Therefore

K_{T} = K_{m_{1}} + K_{m_{2}} + K_{m_{3}},

(3)

where $K_{m_{1}} = K_{m_{2}}$ . Hence

K_{m_{1,2}} = \frac{1}{2} m_{1,2} (d^{2} {\dot{ϕ}}^{2} + {\dot{d}}^{2} + \dot{d} W \dot{ϕ} + \frac{1}{4} W^{2} {\dot{ϕ}}^{2}) + \frac{1}{2} I_{1,2} {\dot{ϕ}}^{2}

(4)

and

\begin{array}{l} K_{m_{3}} = \frac{1}{2} m_{3} (d^{2} {\dot{ϕ}}^{2} + 2 d {\dot{ϕ}}^{2} h c \sin (θ) + 2 h c^{2} {\dot{ϕ}}^{2} + \\ {\dot{d}}^{2} - h c^{2} {\dot{ϕ}}^{2} \cos {(θ)}^{2} + 2 \dot{d} h c \cos (θ) \dot{θ}) + \frac{1}{2} I_{m_{3}} \dot{ϕ} \end{array}

(5)

2.3. Potential energy

Since the energy of the proposed configuration depends on the vertical position and mass of the robot, the force applied by the gravitational potential energy can be expressed as $F_{y} (y) = - m g$ . Therefore:

U = m_{i} g y

(6)

where U is the potential energy, m_i is the corresponding mass, g is the acceleration of gravity and y is the distance from the origin of the reference frame to the centre of mass. Similar to kinetic energy, the total potential energy is the sumatory of the potential energy contained in each body mass. Hence, for our proposed configuration:

U_{T} (t) = U_{m_{1}} (t) + U_{m_{2}} (t) + U_{m_{3}} (t)

(7)

where $U_{m_{1}} = U_{m_{2}}$ . Hence

U_{m_{1,2}} (t) = m_{1,2} g r

(8)

and

U_{3} (t) = m_{3} g (r + h c \cos (θ (t)))

(9)

2.4. Lagrangian

The Lagrangian is defined as the difference between the total kinetic energy (3) and the total potential energy (7). Hence:

L = K_{T} (t) - U_{T} (t)

(10)

Consequently,

\begin{matrix} L = (\frac{1}{2} m_{3} h c (1 - \cos {(θ)}^{2}) + m_{3} d h c \sin (θ) + \frac{1}{2} \\ (m_{1} + m_{2} + m_{3}) d^{2} + \frac{1}{8} W^{2} (m_{1} + m_{2})) {\dot{ϕ}}^{2} + \frac{1}{2} \dot{ϕ} \\ (W (m_{1} + m_{2}) \dot{d} + I_{m 1} + I_{m 2} + I_{m 3}) + \frac{1}{2} (m_{1} + m_{2} + m_{3}) {\dot{d}}^{2} \\ + \frac{1}{2} m_{3} h c (2 \cos (θ) \dot{d} \dot{θ} + h c {\dot{θ}}^{2}) + g m_{3} \cos (θ) \\ + g r (m_{1} + m_{2} + m_{3}) \end{matrix}

2.5. Euler-Lagrange equation

Once the Lagrangian has been defined, the Euler-Lagrange equation for each element in the generalized coordinates vector is required. Therefore:

\frac{d}{d t} (\frac{\partial L (q, \dot{q})}{\partial \dot{q}}) - \frac{\partial L (q, \dot{q})}{\partial q} = τ_{i}

(11)

where $i = 1$ ,…, n and τ_i is the torque in each actuator.

According to (1), the WIP robot has three parameters that lead to three differential equations. These equations are obtained by replacing q and $\dot{q}$ with each element of (11). Hence:

\begin{array}{l} \frac{d}{d t} (\frac{\partial L (θ, \dot{θ})}{\partial \dot{θ}}) - \frac{\partial L (θ, \dot{θ})}{\partial θ} = \\ (h c \ddot{θ} + \ddot{d} \cos (θ) + \sin (θ) \dot{θ} (m_{3} h c \sin (θ) {\dot{ϕ}}^{2} \\ + d (m_{1} + m_{2} + m_{3}) {\dot{ϕ}}^{2} - \dot{d}) - \cos (θ) {\dot{ϕ}}^{2} \\ (d + h c \sin (θ)) - g \sin (θ) - \sin (θ) \dot{θ} \dot{d}) m_{3} h c . \end{array}

(12)

Then,

\begin{array}{l} \frac{d}{d t} (\frac{\partial L (ϕ, \dot{ϕ})}{\partial \dot{ϕ}}) - \frac{\partial L (ϕ, \dot{ϕ})}{\partial ϕ} = \\ (m_{3} h c^{2} (1 - \cos {(θ)}^{2}) + d (2 m_{3} h c \sin (θ) + m_{1} \\ + m_{2} + m_{3}) + W^{2} (m_{1} + m_{2})) \ddot{ϕ} + \frac{1}{2} W \\ (m_{1} + m_{2})) \ddot{d} + 2 ((m_{3} h c \sin (θ) + d (m_{1} + m_{2} + m_{3})) \dot{d} \\ + \cos (θ) m_{3} h c \dot{θ} (d + h c \sin (θ))) \dot{ϕ} . \end{array}

(13)

Also,

\begin{array}{l} \frac{d}{d t} (\frac{\partial L (d, \dot{d})}{\partial \dot{d}}) - \frac{\partial L (d, \dot{d})}{\partial d} = \\ (m_{1} + m_{2} + m_{3}) \ddot{d} + \frac{1}{2} W (m_{1} + m_{2}) \ddot{ϕ} \\ + m_{3} h c \cos (θ) \ddot{θ} - (m_{3} h c \sin (θ) + \\ d (m_{1} + m_{2} + m_{3})) {\dot{ϕ}}^{2} m_{3} h c \sin (θ) {\dot{θ}}^{2} . \end{array}

(14)

Arranging the generalized acceleration and velocity vectors, equations (12), (13) and (14) can be expressed as follows:

M (q (t)) \ddot{q} (t) + C (q (t), \dot{q} (t)) \dot{q} (t) + g (q (t)) = τ

(15)

where $M (q (t))$ is the inertia matrix, $C (q (t))$ and $\dot{q} (t)$ represent the centrifugal force matrix and the Coriolis force matrix, respectively, $g (q (t))$ is the gravitational force vector and τ represents the external force vector. This vector is generally associated with the control strategy, since the vector elements represent the applied torques in the robot links.

For the WIP system, matrices in (15) are established as follows:

M (q (t)) = [\begin{matrix} m_{1,1}, m_{1,2}, m_{1,3} \\ m_{2,1}, m_{2,2}, m_{2,3} \\ m_{3,1}, m_{3,2}, m_{3,3} \end{matrix}]

(16)

where,

\begin{matrix} m_{1,1} = 2 I_{m_{3}} + m_{3} h c^{2} \cos {(θ)}^{2} \\ m_{1,2} = 0 \\ m_{1,3} = m_{3} h c \cos (θ) \\ m_{2,1} = 0 \end{matrix}

m_{2,2} = \begin{array}{l} - m_{3} h c^{2} \cos {(θ)}^{2} + 2 m_{3} d h c \sin (θ) \\ - m_{3} h c^{2} \cos {(ϕ)}^{2} + (m_{1} + m_{2} + m_{3}) d^{2} \\ + \frac{1}{4} W^{2} (m_{1} + m_{2}) + 2 I_{m_{1}} + 2 I_{m_{2}} + 2 m_{3} h c^{2} \end{array}

m_{2,3} = \frac{- W}{2} (m_{1} + m_{2})

m_{3,1} = m_{3} h c \cos (θ)

m_{3,2} = \frac{- W}{2} (m_{1} + m_{2})

m_{3,3} = m_{1} + m_{2} + m_{3}

then,

C (q (t), \dot{q} (t)) = [\begin{matrix} c_{1,1}, c_{1,2}, c_{1,3} \\ c_{2,1}, c_{2,2}, c_{2,3} \\ c_{3,1}, c_{3,2}, c_{3,3} \end{matrix}]

(17)

where,

c_{1,1} = - m_{3} h c^{2} \cos (θ) \sin (θ) \dot{ϕ}

c_{1,2} = - m_{3} h c \dot{ϕ} \cos (θ) (d + h c \sin (θ))

c_{1,3} = 0

c_{2,1} = 2 \dot{ϕ} m_{3} h c \cos (θ) (h c \sin (θ) + d)

c_{2,2} = m_{3} h c^{2} \dot{ϕ} \sin (ϕ) \cos (ϕ)

c_{2,3} = 2 d \dot{ϕ} (m_{1} + m_{2} + m_{3}) + 2 m_{3} h c \sin (ϕ)

c_{3,1} = - m_{3} h c \sin (θ) \dot{θ}

c_{3,2} = - m_{3} h c \sin (θ) \dot{ϕ} - d (m_{1} + m_{2} + m_{3}) \dot{ϕ}

c_{3,3} = 0

and

g (q (t)) = [\begin{matrix} g_{1,1} \\ g_{2,1} \\ g_{3,1} \end{matrix}]

(18)

where,

g_{1,1} = - m_{3} h c \sin (θ) g

g_{2,1} = 0

g_{3,1} = 0

2.6. Dynamic model verification

A number of properties must be fulfilled in order to validate the proposed model [9, 11]. This section describes these properties.

The inertia matrix $M (q (t))$ is an $n \times n$ symmetric positive-definitive matrix whose elements are function-dependent on $q (t)$ . equation (16) shows the symmetry of the matrix, while equation (19) proves that (16) is continuous and positive along a given path; this is illustrated in Fig. 3.

d e t (M (q (t)) > 0

(19)

Figure 3.

Value of the determinant of $M (q (t))$ along a given path

The inverse of the inertia matrix $M^{- 1} (q (t))$ exists and is positive-definite. Using $d e t (M^{- 1} (q (t)))$ , it is shown that $M^{- 1} (q (t))$ is positive and continuous along a given path (see Fig. 4).

Figure 4.

Value of the determinant of $M^{- 1} (q (t))$ along a given path

Dynamic equations contain terms from the generalized coordinates vector $\dot{q}$ that has no particular form. Therefore, the variables that define the generalized coordinates vector were chosen by the control strategy designer. equation (20) states an alternative to matrix $C (q (t), \dot{q} (t))$ for the dynamic model of the WIP.

C_{a} (q (t), \dot{q} (t)) = [\begin{matrix} c_{a_{1,1}}, c_{a_{1,1}}, c_{a_{1,1}} \\ c_{a_{1,1}}, c_{a_{1,1}}, c_{a_{1,1}} \\ c_{a_{1,1}}, c_{a_{1,1}}, c_{a_{1,1}} \end{matrix}]

(20)

where,

c_{a_{1,1}} = \begin{array}{l} m_{3} h c \sin (θ) (m_{3} {\dot{ϕ}}^{2} (h c \sin (θ) + d) \\ + d ϕ^{2} (m_{1} + m_{2}) - \dot{d}) \end{array}

c_{a_{1,2}} = - m_{3} h c \dot{ϕ} \cos (θ) (d + h c \sin (θ)

c_{a_{1,3}} = 0

c_{a_{2,1}} = 2 \dot{ϕ} h c \cos (θ) (m_{2} h c \sin (θ) + m_{3} d)

c_{a_{2,2}} = 0

c_{a_{2,3}} = 2 d \dot{ϕ} (m_{1} + m_{2} + m_{3}) + 2 m_{3} \dot{ϕ} h c \sin (θ)

c_{a_{3,1}} = - m_{3} h c \sin (θ) \dot{θ}

c_{a_{3,2}} = - m_{3} h c \sin (θ) \dot{ϕ} - d (m_{1} + m_{2} + m_{3}) \dot{ϕ}

c_{a_{3,3}} = 0

Every element in (17) is multiplied by a component in the generalized velocities vector; therefore, $C (q (t), 0) = 0$ , ∀ vector $q (t) \in ℜ^{n}$ .

Regardless of how $C (q (t), \dot{q} (t))$ is obtained (21), the following condition should always be satisfied:

\dot{q} {(t)}^{T} [N (q (t), \dot{q} (t))] \dot{q} (t) = 0 \forall q (t), \dot{q} (t) \in ℜ^{n}

(21)

where $N (q (t), \dot{q} (t)) = \frac{1}{2} \dot{M} (q (t)) - C (q (t), \dot{q} (t))$ . Fig. 5 depicts the value of $N (q (t), \dot{q} (t))$ along a defined path of the WIP robot.

Figure 5.

Behaviour along a defined path at a 10 second duration

It can be observed that the boundary conditions force a non-zero value in the beginning of the simulation. Immediately after this condition passes, a zero value can be observed. The gravitational forces vector $g (q (t))$ is a $n \times 1$ vector, whose variables depend only on $q (t)$ . This vector can be derived from the gravitational potential energy. Hence:

\frac{\partial U_{T} (q)}{\partial q (t)} = g (q (t))

Therefore,

\begin{array}{l} \frac{\partial U_{T} (q)}{\partial q (t)} = \frac{\partial (m_{1} g r + m_{2} g r + m_{3} g (r + h c \cos (θ (t))))}{\partial q (t)} \\ = [\begin{matrix} - m_{3} h c \sin (θ) \\ 0 \\ 0 \end{matrix}] \end{array}

(22)

Once the model has been verified using the aforementioned properties, it can be stated that the mathematical approximation obtained by the Euler-Lagrange approach is appropriate for the control strategy proposed in the following section.

3. Control Design

A multi-variable recurrent neural network, in which recurrence is set in the hidden layer, is proposed as a control strategy for the WIP system. Fig. 6 shows the general outline of the control strategy. Parameters y_d and y_r represent the desired and actual angular position of the WIP system, respectively. These two parameters are one-dimensional vectors. In Fig. 6, the error $e (k)$ represents the difference between the desired position $y_{d} (k)$ and the actual position $y_{r} (k)$ . The outputs $u 1$ and $u 2$ of the ANN are also one-dimensional unitless vectors that control the WIP system. These outputs are used to control the torque in each wheel of the WIP robot.

Figure 6.

Control strategy outline with the ANN

The ANN consists of four inputs (the most recent $e (k)$ and three past measurements $e (k - 1)$ , $e (k - 2)$ and $e (k - 3)$ ), three neurons in the hidden layer with a sigmoid activation function and two neurons in the output layer with a linear activation function. There are three types of synaptic weights: $W_{i j}$ , which is linked to the number of inputs and neurons in the hidden layer, $W d_{j}$ , which are the weights related to recurrence in the hidden layer and $W_{j k}$ , which are the weights related to the hidden layer and the output layer.

Figure 7.

General scheme of the ANN

Inputs I_i are defined as:

I_{1} = e (t), I_{2} = e (t - 1),

(23)

I_{3} = e (t - 2), I_{4} = e (t - 3)

(24)

At the input of each neuron, a weighted sum is performed by multiplying the input variable by its synaptic weight. This weighted sum, denoted by $s_{j} (t)$ , is obtained by:

s_{j} (t) = W d_{j} X_{j} (t - 1) + \sum W_{i j} I_{i} (t),

(25)

where $X_{j} (t - 1)$ is the output of the j -th neuron in the hidden layer at a previous time. This output is given by the sigmoid activation function:

X_{j} (t) = \frac{1}{1 + e^{- s_{j} (t)}} .

(26)

Since no limit in the output values is required, a linear activation function in the output layer is specified as:

μ_{k} (t) = \sum W_{j k} (t) X_{j} (t) .

(27)

Backpropagation learning requires an error function that depends on the synaptic weights; concurrently, this function must be capable of evaluating the performance of the network. An iterative minimization of such a function will result in a general method for optimizing the synaptic weights. The proposed function is:

E (t) = \frac{1}{2} {[y_{d} - y_{r} (t)]}^{2}

(28)

The chosen optimization method uses gradient descent, as this takes the first derivative of the error function $E (t)$ with respect to the synaptic weights matrix in order to update said weights. Gradient descent obtains a linear approximation of the error function [10] by using the following:

E (W + Δ W) \approx E (W) + Δ W \frac{\partial E (t)}{\partial W (t)}

(29)

The update of the synaptic weights is performed by using the following:

Δ W = - η \frac{\partial E (t)}{\partial W (t)} .

(30)

equation (30) is substituted in $W (t) = W (t - 1) + Δ W (t)$ to define the final weight-updating function as follows:

W (t) = W (t - 1) + [- η \frac{\partial E (t)}{\partial W (t)}]

(31)

where η represents the learning factor and $\frac{\partial E (t)}{\partial W (t)}$ is obtained by using the chain rule on each synaptic weight matrix. The partial derivative of $E (t)$ with respect to $W_{j k}$ is:

\frac{\partial E}{\partial W_{j k}} = \frac{\partial E}{\partial e (t)} \frac{\partial e (t)}{\partial μ_{k} (t)} \frac{\partial μ_{k} (t)}{\partial W_{j k}} = - e (t) X_{j} (t)

(32)

The weight-updating function in the hidden layer is:

W_{j k} (t) = W_{j k} (t - 1) + η e (t) X_{j} (t) .

(33)

The recurrent synaptic weights $W d_{j}$ related to the gradient descent are updated using:

\begin{array}{l} \frac{\partial E}{\partial W d_{j}} = \frac{\partial E}{\partial e (t)} \frac{\partial e (t)}{\partial y_{r} (t)} \frac{\partial y_{r} (t)}{\partial X_{j} (t)} \frac{\partial X_{j} (t)}{\partial s_{j} (t)} \frac{\partial s_{j} (t)}{\partial W d_{j}} \\ = - e (t) X_{j} (t) W_{j k} δ_{j} (t) \end{array}

(34)

By substituting (35) in (31) the update of $W d_{j}$ is obtained. Therefore:

W d_{j} (t) = W d_{j} (t - 1) + η e (t) X_{j} (t) W_{j k} (t) δ_{j} (t)

(35)

Gradient descent for $W_{i j}$ is expressed as:

\begin{array}{l} \frac{\partial E}{\partial W_{i j}} = \frac{\partial E}{\partial e (t)} \frac{\partial e (t)}{\partial μ_{k} (t)} \frac{\partial μ_{k}}{\partial X_{j}} \frac{\partial X_{j}}{\partial s_{j}} \frac{\partial s_{j}}{\partial W_{i j}} \\ = - e (t) η W_{j k} ρ_{i j} (t) \end{array}

(36)

Finally, the update of $W_{i j}$ is defined by:

W_{i j} (t) = W_{i j} (t - 1) + η e (t) W_{j k} ρ_{i j} (t) .

(37)

The response of θ, the variable to be controlled, after applying the ANN algorithm is depicted in Fig. 8. The initial conditions are: $θ = 0.053 r a d$ , $\dot{θ} = 0.07 r a d / s e c$ and $η = 0.04$ .

Figure 8.

Simulation response of $θ (t)$ and $φ (t)$ of the WIP robot

The main objective of a control loop for a WIP system is to maintain a vertical position ( $θ (t) = 0 \forall t$ ), regardless of all external and internal conditions. The negative growth of θ in Fig. 8 suggests that the control strategy has no effect on the behaviour of the system.

After carrying out a strict analysis of the performance of the ANN, compensation of the error produced by θ is observed. However, when the error sign changes from positive to negative, the compensation of the control loop is too slow, because the learning factor is too small. In other words, when a sudden change in θ occurs, the network fails to adapt and hence, is unable to control the system.

3.1. Extended delta-bar-delta algorithm

The delta-bar-delta algorithm is a heuristic approach that can be used to improve the convergence speed of the weights in ANNs. The weights are updated by:

W (t + 1) = W (t) + α (t) δ (t)

(38)

where $α (t)$ is the learning coefficient assigned to each connection and $δ (t)$ is the gradient component of the weight change. Here, $δ (t)$ is employed to heuristically implement decrements or increments of the learning coefficients for each connection. The weighted average $\bar{δ} (t)$ is formulated as follows:

\bar{δ} (t) = (1 - θ) δ (t) + θ δ (t - 1)

(39)

where θ is the convex weighting factor. The learning coefficient change is given as:

\begin{array}{l} Δ α (t) = {l l ν i f \bar{δ} (t - 1) δ (t) > 0 - ϕ α (t) i f \\ \bar{δ} (t - 1) δ (t) < 00 o t h e r w i s e \end{array}

(40)

where ν is the constant learning coefficient increment factor and θ is the constant learning coefficient decrement factor. However, knowing how to choose the heuristic parameters is not a straightforward task. Therefore, implementing this algorithm online is not feasible [16].

The extended delta-bar-delta (EDBD) algorithm is an extension of the delta-bar-delta (DBD) algorithm. In the EDBD, the changes in weights are calculated from:

Δ w (t + 1) = α (t) δ (t) + μ (t) Δ w (t)

(41)

and the weights are then found as:

w (t + 1) = w (t) + Δ w (t)

(42)

In Eq. 41, $α (t)$ and $μ (t)$ are the learning momentum coefficients. The learning coefficient change is given as:

\begin{array}{l} Δ α (t) = {l l ν_{α} e x p (- γ_{α} | \bar{δ} (t) |) i f \\ \bar{δ} (t - 1) δ (t) > 0 - ϕ_{α} α (t) i f \bar{δ} (t - 1) δ (t) < 00 o t h e r w i s e \end{array}

(43)

where ν_α is the constant momentum coefficient scale factor, $e x p$ is the exponential function, φ_α is the constant learning coefficient decrement factor and γ_α is the constant learning coefficient exponential factor. In order to prevent oscillations in the weight space, ceilings are placed on the individual learning and momentum coefficients.

An extended delta-bar-delta algorithm is then used to solve the adaptation issue of the ANN. The algorithm adds a learning factor for each synaptic weight, allowing for an adaptation when the control variable presents a sign change. The simple DBD algorithm provides a suitable speed-up. However, this presents some disadvantages. For instance, when using momentum alongside DBD, the algorithm may diverge dramatically. When k is small, the learning rate may increase. Therefore, when the algorithm decreases exponentially, it may not be able to handle sudden sign changes. The EDBD algorithm is implemented by adding (31) and the momentum terms [17]. The proposed equation is then:

W_{i j} (t) = W_{i j} (t - 1) + η_{i j} (t) \frac{\partial E (t)}{\partial W_{i j} (t)} + μ_{i j} W_{i j} (t - 1)

(44)

where i is the neuron number in the previous layer and j is the neuron number in the consecutive layer. Hence,

η_{i j} (t) = M I N [η_{m a x}, η_{i j} (t - 1) + Δ η_{i j} (t - 1)]

μ_{i j} (t) = M I N [μ_{m a x}, μ_{i j} (t - 1) + Δ μ_{i j} (t - 1)]

Parameters $η_{m a x}$ and $μ_{m a x}$ represent the limit values of $η_{i j} (t)$ and $μ_{i j} (t)$ . As long as said variables are below these limits, they are defined by $η_{i j} (t - 1) + Δ η_{i j} (t - 1)$ and $μ_{i j} (t - 1) + Δ μ_{i j} (t - 1)$ . Where,

\begin{matrix} Δ η_{m n} (t) = {\begin{array}{l} κ_{l} e x p (- γ_{l} | {\bar{δ}}_{m n} (t) |), if {\bar{δ}}_{m n} (t - 1) δ_{m n} (t) > 0 \\ - ϕ_{l} η_{m n} (t), if {\bar{δ}}_{m n} (t - 1) δ_{m n} (t) < 0 \\ 0, otherwise \end{array} \\ Δ μ_{m n} (t) = {\begin{array}{l} κ_{m} e x p (- γ_{m} | {\bar{δ}}_{m n} (t) |), if {\bar{δ}}_{m n} (t - 1) δ_{m n} (t) > 0 \\ - ϕ_{m} μ_{m n} (t), if {\bar{δ}}_{m n} (t - 1) δ_{m n} (t) < 0 \\ 0, otherwise \end{array} \end{matrix}

κ_l, φ_l, γ_l, κ_m, φ_m and γ_m are parameters defined by the control designer.

4. Controller Simulation with a Dynamic Model

A set of initial values for the system parameters were defined; following on, heuristic optimization was performed. Table 2 summarizes the parameters.

Table 2.

Parameters of the EDBD algorithm for the ANN

Parameter	Description	Value
$κ_{l_{j k}}$	Required to calculate $η_{j k}$	1.35
$γ_{l_{j k}}$	Required to calculate $η_{j k}$	85
$φ_{l_{j k}}$	Required to calculate $η_{j k}$	0.4
$κ_{l_{i j}}$	Required to calculate $η_{i j}$	1.35
$γ_{l_{i j}}$	Required to calculate $η_{i j}$	80
$φ_{l_{i j}}$	Required to calculate $η_{i j}$	0.4
$κ_{l_{d}}$	Required to calculate η_d	0.7
$γ_{l_{d}}$	Required to calculate η_d	95
$φ_{l_{d}}$	Required to calculate η_d	0.4
$κ_{m_{j k}}$	Required to calculate $μ_{j k}$	1.35
$γ_{m_{j k}}$	Required to calculate $μ_{j k}$	80
$φ_{m_{j k}}$	Required to calculate $μ_{j k}$	0.4
$κ_{m_{i j}}$	Required to calculate $μ_{i j}$	1.35
$γ_{m_{i j}}$	Required to calculate $μ_{i j}$	80
$φ_{m_{i j}}$	Required to calculate $μ_{i j}$	0.4
$κ_{m_{d}}$	Required to calculate μ_d	1.35
$γ_{m_{d}}$	Required to calculate μ_d	80
$φ_{m_{d}}$	Required to calculate μ_d	0.4
$η_{m a x_{j k}}$	Maximum value of $η_{j k}$	7.0
$μ_{m a x_{j k}}$	Maximum value of $μ_{j k}$	7.0
$η_{m a x_{i j}}$	Maximum value of $μ_{j k}$	7.0
$μ_{m a x_{i j}}$	Maximum value of $μ_{i j}$	7.0
$η_{m a x_{d}}$	Maximum value of μ_d	7.0
$μ_{m a x_{d}}$	Maximum value of μ_d	7.0
θ_δ	Required to calculate δ	0.5

Synaptic weights $W_{i j}$ , W_d and $W_{j k}$ are randomly initialized at $0.1$ and $- 0.1$ , θ is set at $0.087 r a d$ and θ̇ is set at $0.07 r a d / s e c$ .

The simulation results shown in Fig. 9 prove that the controller is capable of maintaining the value of θ between $0.03$ and $- 0.03 r a d$ . Rotation φ is retained at zero at all times; since the displacements that keep the vertical position of the robot are linear, no rotation is experienced by the system. Fig. 10 shows an oscillatory displacement d from 0 to $0.055 m$ .

Figure 9.

Simulation response of the ANN controller for θ and φ

Figure 10.

System displacement response of the ANN controller

The control output signal is the same for both output neurons because there is only one variable to control and as a consequence, only one error parameter $e (t)$ is defined. The simulation depicted in Fig. 11, shows a significant variation of μ_k generated by the speed of the learning algorithm. Even when such variation is taken into account, the ANN is capable of controlling the self-balancing vehicle.

Figure 11.

Torque applied to the system

Fig. 12 shows the performance of the ANN during the first second; the average torque value is within $0.25$ and $- 0.25 N m$ and considering both outputs of the network, the torque required for keeping the WIP system in a vertical position is within $0.5$ and $- 0.5 N m$ .

Figure 12.

Applied torque to the system, first second

Several simulation runs were carried out, changing the initial value of θ from $0.017$ to $0.0873 r a d$ and a fixed value of $\dot{θ} = 0.07 r a d / s e c$ . The ANN controller was able to maintain the vertical position of the system within $- 0.03 r a d \leq θ \leq 0.03 r a d$ . A photograph of the self-balancing vehicle is shown in Fig. 13.

Figure 13.

Photograph of the self-balancing vehicle

5. Conclusions

A control strategy that does not require the dynamic model of a WIP system was proposed. The reported control strategy focuses on a small variation range for the variable to be controlled. This control strategy can be combined with other strategies to work within a wider range. Since conventional PID-based controllers may experience a small but noticeable oscillation around the desired set point, it was proven that external disturbances can be successfully compensated for by using a dynamic learning factor for the synaptic weights. This dynamic factor significantly improves the adaptation process at each iteration of the ANN controller. This improvement was possible through the addition of the extended EDBD algorithm.

Footnotes

6. Acknowledgements

The authors acknowledge the financial support provided by the Mexican National Council for Scientific and Technological Development and the Department of Applied Research at the Centre for Engineering and Industrial Development. Additionally, this work was partially funded by the CONACYT grant 163660.

References

Jian-Xin

Zhao-Qin

Tong Heng

(2013). Design and Implementation of a Takagi-Sugeno-Type Fuzzy Logic Controller on a Two-Wheeled Mobile Robot. IEEE Transactions on industrial electronics, IEEE, 60:5717–5728.

Fukushima

Muro

Matsuno

, (2014) Sliding-Mode Control for Transformation to an Inverted Pendulum Mode of a Mobile Robot With Wheel-Arms Industrial Electronics, IEEE Transactions on, vol.62, no.7, pp.4257–4266.

Uyanik

Morgul

Saranli

(2015) Experimental Validation of a Feed-Forward Predictor for the Spring-Loaded Inverted Pendulum Template Robotics, IEEE Transactions on, vol.31, no.1, pp. 208–216.

Zhijun

Chenguang

(2012). Neural-Adaptive Output Feedback Control of a Class of Transportation Vehicles Based on Wheeled Inverted Pendulum Models. IEEE Transactions on control systems technology, IEEE, 20:1583–1591.

Segway Inc. (2001). http://www.segway.com/about-segway/segway-milestones.phpWebpagelink. Segway Inc., New Hampshire. Accessed on 15 Jun 2013.

Ching-Chih

Hsu-Chih

Shui-Chun

(2010). Adaptive Neural Network Control of a Self-Balancing Two-Wheeled Scooter. IEEE Transactions on industrial electronics, IEEE, 57:1420–1428.

Kaustubh

Juame

Sunil

(2005). Velocity and Position Control of a Wheeled Inverted Pendulum by Partial Feedback Linearization. IEEE Transactions on Robotics, IEEE, 21:505–513.

Grasser

D'Arrigo

Colombi

Rufer

A.C.

(2002). JOE: A Mobile, Inverted Pendulum. IEEE Transactions on industrial electronics, IEEE, 49:107–114.

Zhijun

Chenguang

(2013). Advanced Control of Wheeled Inverted Pendulum Systems. Springer-Verlag, London.

10.

Flórez-López

(2008). Redes Neuronales Artificiales. Fundamentos teóricos y aplicaciones prácticas. Netbiblo, SL.

11.

Kelly

Santibáñez

(2003). Robótica Automática. Control de movimiento de robots manipuladores. Pearson Educación, Madrid.

12.

Chenxi

Tao

Kui

(2013). Balance Control of Two-wheeled Self-balancing Robot Based on Linear Quadratic Regulator and Neural Network Fourth International Conference on Intelligent Control and Information Processing (ICICIP), IEEE.

13.

Qin

Liu

Zang

(2011). Balance Control of Two-Wheeled Self-Balancing Mobile Robot Based on TS Fuzzy Model The 6th International Forum on Strategic Technology, IEEE.

14.

Petrov

Parent

(2010). Dynamic Modeling and Adaptive Motion Control of a Two-Wheeled Self-Balancing Vehicle for Personal Transport Annual Conference on Intelligent Transportation Systems, IEEE.

15.

Ching-Chih

Shang-Yu

Shih-Min

(2010). Trajectory Tracking of a Self-Balancing Two-Wheeled Robot Using Backstepping Sliding-Mode Control and Fuzzy Basis Function Networks The 2010 IEEE/RSJ International Conference on Intelligent Robots and Systems, IEEE.

16.

Genevieve

(1999) Neural Networks Course. Lecture Notes. https://www.willamette.edu/gorr/classes/cs449/Momentum/deltabardelta.html#top, Accessed on 15 Jun 2013.

17.

Minai

Williams

(1990). Back-propagation Hueristics: A Study of the Extended Delta-Bar-Delta Algorithm IEEE/INNS International Joint Conference on Neural Networks, IEEE.

18.

Barrera-Navarro

Noriega-Ponce

(2011). Diseño y simulación de un controlador neuronal multivariable 4° Congreso Internacional de Ingenierías Mecánica, Eléctrica, Electrónica, Mecatrónica y Computación (CIMEEM2011) 69:449–455.