Distributed parallel computing of the recursive eigenvalue search in the context of transfer matrix method for multibody systems

Abstract

The modeling and solving a transcendental eigenvalue problem are important issues in the transfer matrix method for linear multibody systems. Based on the recursive eigenvalue search algorithm for transfer matrix method for linear multibody system, the distributed parallel approach for assembling overall transfer matrix and searching eigenvalues is proposed. This is achieved based on Message Parallel Interface. The influence of the CPU core number as well as the distributed network environment on the final computational time is analyzed through numerical examples of both a non-uniform beam and a multiple launch rocket system. The results indicate that the computational time is significantly reduced by the proposed parallel computing method, so that the computational efficiency on optimization and design of complex multibody systems can be improved.

Keywords

Multibody system dynamics transfer matrix method for multibody systems eigenvalue distributed parallel computing Message Parallel Interface

Introduction

The demand for multibody system dynamics (MSD) computation is increasing in various fields, such as vehicles, ships, aeronautics, and astronautics weapons, while the MSD models become more and more complicated.¹ Typically, the solution of a complex multibody system will cost a plenty of time, thus the conventional static design pattern can hardly meet the requirement of dynamics design. In light of such concern, the computational speed needs to be accelerated. Although this can be achieved by raising the level of hardware, the computational speed of a single-core CPU is limited due to manufacturing process,² thus resulting in the emergence of the multicore CPU handling a problem simultaneously. At the same time, the computer cluster system can be constructed using relatively inexpensive computers connected via the high-speed local area network (LAN), which is more cost-effective compared with supercomputers.³ These provide the foundation for the software technology such as the parallelization of numerical algorithms.⁴ To shorten the computing time, many commercial and open source dynamics software including ADAMS, ANSYS, MBDyn, and so on have introduced the idea of parallel computing. A lot of research work in parallel computing^4–9 is also investigated by scholars. D Negrut et al.⁵ concentrated on the use of commodity parallel computing in many-body frictional-contact dynamics, fluid–solid interaction analysis, and proximity computation. F González et al.⁶ evaluated two non-intrusive parallelization techniques for MSD, namely, parallel sparse linear equation solvers and OpenMP. OA Bauchau⁷ used domain decomposition techniques to achieve parallel processing of finite element method for flexible multibody systems.

The transfer matrix method for multibody systems (MSTMM) is a rather novel method for MSD. Compared with ordinary dynamics methods, MSTMM has the following characteristics: no global dynamics equation of the system is required, low matrix order is involved, and fast computing speed and highly stylized programming are achieved.¹⁰ Its basic idea is to (1) break up the whole system into several elements, (2) use a matrix representing the mechanical properties of each element, (3) assemble these matrices by applying the automatic deduction theorem of the overall transfer equation of multibody systems,¹¹ and (4) solve the overall transfer equation of the system.

The solution of the overall transfer equation is one of a key technique in MSTMM. D Bestle et al.¹² proposed a recursive eigenvalue search algorithm for the transfer matrix method for linear multibody systems (linear MSTMM). Although this algorithm has a linear convergence rate, as is the same with those strategies based on sign change, it can solve both real and complex eigenvalue problems of a system and even find double roots. Moreover, this algorithm is more competitive and efficient in choosing step size than other strategies based on sign change.

To further improve the computational efficiency, a parallel algorithm for assembling the overall transfer matrix as well as recursively searching the eigenvalue is proposed in this article, and the distributed parallel computing environment based on the Message Parallel Interface (MPI) is built. A damped non-uniform beam with complex eigenvalues and a multiple launch rocket system (MLRS) with real eigenvalues are taken as numerical examples to verify the proposed method. This article provides a feasible way to further improve the computational speed of linear MSTMM.

Basic idea of linear MSTMM

In the inertial coordinate system, r , $θ$ , m , and q are used to represent the linear displacement, angular displacement, internal moment, and internal force, respectively, of one point fixed on an element vibrating in space relative to its equilibrium position, where $r = [\begin{matrix} x & y & z \end{matrix}]^{T}$ , $θ = [\begin{matrix} θ_{x} & θ_{y} & θ_{z} \end{matrix}]^{T}$ , $m = [\begin{matrix} m_{x} & m_{y} & m_{z} \end{matrix}]^{T}$ , and $q = [\begin{matrix} q_{x} & q_{y} & q_{z} \end{matrix}]^{T}$ . The state vector of the connection point is defined as $z = [\begin{matrix} r^{T} & θ^{T} & m^{T} & q^{T} \end{matrix}]^{T}$ under physical coordinates or $Z = [\begin{matrix} R^{T} & Θ^{T} & M^{T} & Q^{T} \end{matrix}]^{T}$ under modal coordinates by introducing modal transformation $z = Z e^{i ω t}$ , where $R = [\begin{matrix} X & Y & Z \end{matrix}]^{T}$ , $Θ = [\begin{matrix} Θ_{x} & Θ_{y} & Θ_{z} \end{matrix}]^{T}$ , $M = [\begin{matrix} M_{x} & M_{y} & M_{z} \end{matrix}]^{T}$ , and $Q = [\begin{matrix} Q_{x} & Q_{y} & Q_{z} \end{matrix}]^{T}$ are the modal coordinates arrays corresponding to r , $θ$ , m , and q , and $ω$ is the natural frequency. Here, $Z_{j, I}$ and $Z_{j, O}$ are used to represent the state vectors of the input end I and output end O on element j, respectively. $Z_{j, I}$ and $Z_{j, O}$ possess the relationship $Z_{j, O} = U_{j} Z_{j, I}$ , where $U_{j}$ is the transfer matrix of element j. Specially, if the input or output end of an element j is a boundary end, the state vector will be denoted as $Z_{j, 0}$ , where “0” represents the boundary. Rui et al.¹⁰ provide a library for the transfer matrices for a variety of typical elements.

For clarity, an example with a one-degree-of-freedom (1-DOF) lumped mass connected with a rod is shown in Figure 1.

Figure 1.

The 1-DOF system with a rod and a lumped mass.

The rod with length l and the lumped mass with mass m are numbered as 1 and 2, respectively. The state vector of the inner connection point between elements 1 and 2 is defined as $Z_{2, 1} = [\begin{matrix} X & Q \end{matrix}]_{2, 1}^{T}$ , where X and Q denote the displacement and internal force related to this point. Analogously, the boundary state vector of the fixed end of the rod and the free end of the lumped mass are denoted as $Z_{1, 0} = [\begin{matrix} 0 & Q \end{matrix}]_{1, 0}^{T}$ and $Z_{2, 0} = [\begin{matrix} X & 0 \end{matrix}]_{2, 0}^{T}$ , respectively, after taking the boundary condition into consideration. The transfer matrices of the rod and the lumped mass take the form¹⁰

U_{1} = [\begin{matrix} \cos β l & - \frac{\sin β l}{β E A} \\ β E A \sin β l & \cos β l \end{matrix}], U_{2} = [\begin{matrix} 1 & 0 \\ m ω^{2} & 1 \end{matrix}]

(1)

where $β = \sqrt{ρ ω^{2} / E}$ , and $ρ$ , E, and A represent the density, Young’s modulus, and sectional area of the rod, respectively.

Then the transfer equation of the system can be obtained as

Z_{2, 0} = U_{2} Z_{2, 1} = U_{2} U_{1} Z_{1, 0} = : U_{all} Z_{1, 0}

(2)

where

U_{all} = U_{2} U_{1} = [\begin{matrix} \cos β l & - \frac{\sin β l}{β E A} \\ m ω^{2} \cos β l + β E A \sin β l & \cos β l - \frac{m ω^{2} \sin β l}{β E A} \end{matrix}]

(3)

is the overall transfer matrix of the system.

The non-trivial solution of equation (2) leads to system characteristic equation as

Δ (ω) = \det {\bar{U}}_{all} = \det U_{all} (2, 2) = | \cos β l - \frac{m ω^{2} \sin β l}{β E A} | = 0

(4)

Then the eigenvalue $ω$ of the system can be acquired by solving equation (4).

For a multibody system in a more generic topology, the overall transfer equation and the overall transfer matrix can be obtained by simply following the automatic deduction theorem of the overall transfer equation for multibody systems¹¹ taking a general form

U_{all} Z_{all} = 0

(5)

where $Z_{all}$ contains the state vectors of all the boundary points and cutting points and $U_{all}$ is the overall transfer matrix of the system.

By eliminating zero elements corresponding to the known system boundary conditions in $Z_{all}$ and removing the corresponding columns in $U_{all}$ , equation (5) can be rewritten as

{\bar{U}}_{all} {\bar{Z}}_{all} = 0

(6)

where the coefficient matrix ${\bar{U}}_{all}$ is a square matrix. For a given linear multibody system with determined dynamics parameters, the value of ${\bar{U}}_{all}$ only depends on the vibration frequencies $ω$ or complex numbers $s = - δ \pm i ω$ summarizing damping and frequency. To satisfy equation (6), the coefficient matrix ${\bar{U}}_{all}$ needs to be singular resulting in the system characteristic equation as

\det {\bar{U}}_{all} = 0

(7)

The eigenvalues of a multibody system can be obtained by solving equation (7), which will be discussed in section “Recursive eigenvalue search algorithm.” It is worth mentioning that the order of the coefficient matrix ${\bar{U}}_{all}$ is very low, which will not cost too much time in computation of determinant $\det {\bar{U}}_{all}$ . From equation (4), it can also be found that the characteristic equation is a nonlinear or even transcendental function w.r.t. the eigenvalue. This indicates some popular eigensolvers, such as Lanczos, QR, and Arnoldi, cannot be applied in the context of linear MSTMM where the characteristic equation has no special structure.

Recursive eigenvalue search algorithm

For an undamped system, the solution of equation (7) can be achieved by searching the zeros of the equation $Δ (ω) = \det {\bar{U}}_{all}$ , where $ω$ is the real eigenvalues of the system. Common zero search strategy such as the bisection method is typically based on the sign change of $Δ (ω)$ , however, flawed to search double roots. And for a damped system, solving equation (7) indicates to solve $Δ (s) = 0$ , where $s = - δ \pm i ω$ is the complex eigenvalue. Therefore, solution of equation (7) changes to a two-dimensional variable problem $Δ (δ, ω) = 0$ , which cannot be simply solved by search strategies based on sign change.

The basic idea of the recursive eigenvalue search algorithm for linear MSTMM is switching from zero search for $Δ$ to minimization of its absolute value $| Δ |$ , which is equally well applicable to both the real and the complex cases. It is much easier to recognize craters in $| Δ |$ than sign changes in $Δ$ as conical breaking-ins of $| Δ (ω) |$ or $| Δ (s) |$ only happen in the neighborhood of singularities of the overall transfer matrix.¹² Here, the real eigenvalue search is taken as an example to describe the basic process of this algorithm. For a one-dimensional variable continuous function $y = f (x)$ , x represents $ω$ , and f represents $| Δ |$ . N sample points $x_{i} (i = 1, 2, \dots, N)$ are chosen on the search region $[x_{1}, x_{N}]$ of interest, and the corresponding function value $f (x_{i})$ is denoted by $f_{i} (i = 1, 2, \dots, N)$ . If $f_{i} < f_{i - 1}$ and $f_{i} < f_{i + 1}$ , there must be a local minimum in the interval $[x_{i - 1}, x_{i + 1}]$ . All such intervals may be considered as interesting regions. Furthermore, N sample points on each region are chosen again for more detailed scanning to get more accurate local minimum. When the precision reaches the requirement, the resulting local minimum could be considered as the solution of $f (x) = 0$ . During programming, the recursive method is used to achieve the above process, as it is named recursive eigenvalue search algorithm. Moreover, to ease the algorithm, and to reduce the comparison times while not missing the potential candidates, the local minimum condition is weakened as $f_{i} < f_{i - 1}$ and $f_{i} \leq f_{i + 1}$ . The logical array L_i is defined as

L_{i} = {\begin{matrix} 1 & f_{i} \leq f_{i + 1} \\ 0 & else \end{matrix}

(8)

where 1 means true, while 0 means false. Consequently, the local minimum conditions $f_{i} < f_{i - 1}$ and $f_{i} \leq f_{i + 1}$ can be rewritten as $M_{i} = \neg L_{i - 1} \land L_{i}$ shown in Figure 2. There “¬” and “∧” means logical operation “NOT” and “AND,” respectively.

Figure 2.

Logical arrays of minimization strategy.

For each iteration in the one-dimensional minimum search in Bestle et al.,¹² a user-defined resolution $ε$ corresponds to the necessary number $N_{ε}$ of sample points required for an interval of interest, whereas the user-provided sample size N is used on such region of interest. The iteration will terminate if $N_{ε}$ on this region of interest is no greater than N, indicating that resolution $ε$ is achieved within this iteration. The algorithm recursively sweeps through all regions with potential minima till all local minima are found. As for two-dimensional minimum search, the scanning is done on a regular grid and the function comparison is evaluated in both the x- and y-directions. In correspondence, the user may provide different sampling numbers $N_{x}$ , $N_{y}$ and precision tolerances dx, dy in two search directions, and the algorithm stops only if both resolutions are achieved.

As discussed in the preceding section, the recursive eigenvalue search algorithm is more competitive in choosing step size than some strategies based on sign change. The following numerical example will provide some insights. If the number of rods and lumped masses in Figure 1 increases to 10, respectively, the system will become as shown in Figure 3, where the masses of the lumped masses are denoted as $m_{2}, m_{4}, \dots, m_{20}$ .

Figure 3.

The 1-DOF system with 10 rods and 10 lumped masses.

The overall transfer equation of the system can be obtained as

Z_{20, 0} = U_{20} U_{19}, \dots, U_{2} U_{1} Z_{1, 0} = U_{all} Z_{1, 0}

(9)

with the characteristic equation

Δ (ω) = det {\bar{U}}_{all} = det U_{all} (2, 2) = 0

(10)

The variation trend of $Δ (ω)$ w.r.t. $ω$ is shown in Figure 4 for aluminum rods with $ρ = 2700 kg / m^{3}$ , $E = 0.69 \times 10^{11} N / m^{2}$ , $l = 1 m$ , $A = 10^{- 4} m^{2}$ , and the lumped masses being $m_{2} = m_{4} = \dots m_{20} = 0.1 kg$ , respectively. The following transformation is made when the figure is plotted

\bar{Δ} (ω) = {\begin{matrix} Δ (ω) & for | Δ (ω) | \leq 1 \\ sign [Δ (ω)] [1 + \log_{10} | Δ (ω) |] & for | Δ (ω) | > 1 \end{matrix}

(11)

Figure 4.

Variation in $\bar{Δ} (ω)$ w.r.t. $ω$ with same lumped masses.

If one sets the sampling size $N = 50$ and the absolute resolution $ε = 10^{- 6}$ on the region $[0, 10^{4}]$ , the eigenvalues of the system will be captured by both recursive eigenvalue search algorithm and the bisection method as depicted in Figure 4, where zeros are well distributed and isolated with a clear sign change. In this case, both methods work well.

If the lumped masses become different from each other with different mass values shown in Table 1, the variation trend of $Δ (ω)$ w.r.t. $ω$ will become as shown in Figure 5, where there exists two roots in a narrow region. This makes the search algorithm based on sign change fail to find these roots without a high amount of enumeration effort. In contrast, the recursive eigenvalue search algorithm still works well with the same sampling size $N = 50$ , which proves that the recursive search algorithm has good robustness in choosing sample points.

Table 1.

Values of lumped masses.

Lumped masses	m ₂	m ₄	m ₆	m ₈	m ₁₀	m ₁₂	m ₁₄	m ₁₆	m ₁₈	m ₂₀
Mass (kg)	0.25	0.093	0.55	1.00	0.50	0.50	0.60	0.15	0.15	1.00

Figure 5.

Variation in $\bar{Δ} (ω)$ w.r.t. $ω$ with different lumped masses.

Principle and realization of distributed parallel computing technique

The basic idea of parallel computing is to separate a task into several parts and every part is solved by an individual processor parallel to reduce the computational time.⁴ Furthermore, the distributed parallel computing technique not only divides the problem into parts but also assigns tasks to multiple computers interconnected by network so that the computers solve the tasks simultaneously. Then each computer uploads the results to the host to get a final result.¹³

Construction of distributed parallel computing environment

As shown in Figure 6, to construct an MPI distributed parallel environment is to connect computers to the same LAN via Ethernet switches or other network equipment. In this article, four computers connected by a 100 Mb/s Ethernet switch are used to set up the distributed parallel computing environment. Each computer has an i7 quad-core processor with the frequency of 3.06 GHz. The parallel software used is MPICH2.

Figure 6.

Distributed parallel computing environment.

Parallel computing algorithm for linear MSTMM

The master–slave mode is used as task allocation mechanism in this article. The master process is mainly responsible for the global division of tasks, which sends tasks to the slave processes and receives the results from the slaves. Meanwhile, the slave processes are responsible for computing the tasks assigned by the master process and returning the computational results to the master process. The major task of the eigenvalue computation using linear MSTMM is to assemble the overall transfer matrix and solving the characteristic equation. Thereby, the parallel algorithms can be designed in these two aspects.

Parallel computing algorithm for assembling the overall transfer matrix

Based on the automatic deduction theorem of the overall transfer equation of multibody systems, the assembly of the overall transfer matrix is a reassembling process after multiplying transfer matrices of each element in proper sequence according to the system topology.

As a chain system shown in Figure 7, the overall transfer matrix can be written as $Z_{1, 0} = Z U_{all} Z_{n, 0}$ , where $U_{all} = U_{1} U_{2} U_{3} \dots U_{n}$ . The successive multiplication of n matrices can be distributed to m processes concurrently, including one master process numbered as 0 and m − 1 slave processes numbered from 1 to m − 1. As shown in Table 2, the multiplication of the first p (p = ceil(n/m)) matrices is allocated to the master process 0. In a similar manner, the multiplications of each follow-up p matrices are allocated to the slave processes from 1 to m − 2, respectively. Eventually, the remaining multiplications of n − (m − 1)p matrices are allocated to the last slave process m − 1. The slave processes send each computing result ${\bar{U}}_{i} (i = 1, 2, \dots, m - 1)$ to the master process. Then, the overall transfer matrix $U_{all} = {\bar{U}}_{0} {\bar{U}}_{1} \dots {\bar{U}}_{m - 1}$ can be obtained by the master process.

Figure 7.

Topology figure of a chain system.

Table 2.

Matrix multiplication assignment.

Process	Assignment
Master (process 0)	${\bar{U}}_{0} = U_{1} \dots U_{p}$
Slave (process 1)	${\bar{U}}_{1} = U_{p + 1} \dots U_{2 p}$
…	…
Slave (process m − 2)	${\bar{U}}_{m - 2} = U_{(m - 2) p + 1} \dots U_{(m - 1) p}$
Slave (process m − 1)	${\bar{U}}_{m - 1} = U_{(m - 1) p + 1} \dots U_{n}$

Multibody systems with general topology could eventually be treated as a tree system. Taking a tree system as an example shown in Figure 8, the overall transfer equation can be obtained as $U_{all} Z_{all} = 0$ , where

U_{all} = [\begin{matrix} - I & T_{1 - 18} & T_{2 - 18} & T_{3 - 18} & T_{4 - 18} \\ O & G_{1 - 9} & G_{2 - 9} & O & O \\ O & O & O & G_{3 - 10} & G_{4 - 10} \\ O & G_{1 - 15} & G_{2 - 15} & G_{3 - 15} & G_{4 - 15} \end{matrix}], Z_{all} = [\begin{matrix} Z_{18, 0} \\ Z_{1, 0} \\ Z_{2, 0} \\ Z_{3, 0} \\ Z_{4, 0} \end{matrix}]

(12)

{\begin{matrix} T_{1 - 18} = U_{18} U_{17} U_{16} U_{15} U_{13} U_{12} U_{11} U_{9} U_{5} U_{1} \\ T_{2 - 18} = U_{18} U_{17} U_{16} U_{15} U_{13} U_{12} U_{11} U_{9, 6} U_{6} U_{2} \\ T_{3 - 18} = U_{18} U_{17} U_{16} U_{15, 14} U_{14} U_{10} U_{7} U_{3} \\ T_{4 - 18} = U_{18} U_{17} U_{16} U_{15, 14} U_{14} U_{10, 8} U_{8} U_{4} \end{matrix}

(13)

\begin{matrix} {\begin{matrix} G_{1 - 9} = - H_{9} U_{5} U_{1} \\ G_{2 - 9} = H_{9, 6} U_{6} U_{2} \end{matrix} \\ {\begin{matrix} G_{3 - 10} = - H_{10} U_{7} U_{3} \\ G_{4 - 10} = H_{10, 8} U_{8} U_{4} \end{matrix} \end{matrix}, {\begin{matrix} G_{1 - 15} = - H_{15} U_{13} U_{12} U_{11} U_{9} U_{5} U_{1} \\ G_{2 - 15} = - H_{15} U_{13} U_{12} U_{11} U_{9, 6} U_{6} U_{2} \\ G_{3 - 15} = H_{15, 14} U_{14} U_{10} U_{7} U_{3} \\ G_{4 - 15} = H_{15, 14} U_{14} U_{10, 8} U_{8} U_{4} \end{matrix}

(14)

Figure 8.

Topology figure of a tree system.

The detailed meaning of the above symbols can be found in Rui et al.¹¹ As shown in equations (12)–(14), the overall transfer matrix $U_{all}$ can be obtained by computing n sub-matrices T and G , which could be generally denoted by $M_{i} (i = 1, 2, \dots, n)$ . $M_{i}$ can be allocated to m processes concurrently. For one master process and m − 1 slave processes, the matrix multiplication assignment is shown in Table 3. The master process and each of the first m − 2 slave processes have to compute p (p = ceil(n/m)) matrices, respectively. At the same time, the remaining n − (m − 1)p matrices are assigned to the last slave process m − 1. Finally, the slave processes send each computing result to the master process, and then the master process assembles $U_{all}$ to obtain the overall transfer matrix.

Table 3.

Matrix computation assignment.

Process	Assignment
Master (process 0)	$M_{i} (i = 1, \dots, p)$
Slave (process 1)	$M_{i} (i = p + 1, \dots, 2 p)$
…	…
Slave (process m − 2)	$M_{i} (i = (m - 2) p + 1, \dots, (m - 1) p)$
Slave (process m − 1)	$M_{i} (i = (m - 1) p + 1, \dots, n)$

Parallel recursive eigenvalue search algorithm

The first step of the eigenvalue search algorithm is to compute the function value $f (x_{i}) (i = 1, 2, \dots, N)$ on the N sample points $x_{i}$ on the region $[x_{1}, x_{N}]$ . This step can be parallelized first. Function evaluations $f (x_{i})$ can be allocated to m processes in parallel. As shown in Table 4, every p (p = ceil(N/m)) sample points are assigned to master process 0 and slave processes 1∼m − 2 to compute, respectively. And N − (m − 1)p sample points are assigned to the slave process m − 1.

Table 4.

Function value computation assignments.

Process	Assignment
Master (process 0)	$f (x_{i}) (i = 1, \dots, p)$
Slave (process 1)	$f (x_{i}) (i = p + 1, \dots, 2 p)$
…	…
Slave (process m − 2)	$f (x_{i}) (i = (m - 2) p + 1, \dots, (m - 1) p)$
Slave (process m − 1)	$f (x_{i}) (i = (m - 1) p + 1, \dots, N)$

After the slave processes return each function value $f (x_{i})$ to the master process, the master process is then able to compute the local minima on the region $[x_{1}, x_{N}]$ according to the algorithm given in section “Recursive eigenvalue search algorithm.”

Assuming there are k local minima $f_{j} (x_{i}) (j = 1, \dots, k)$ and the corresponding subintervals are $[x_{i - 1}, x_{i + 1}]_{j} (j = 1, \dots, k)$ , the final result can be obtained by further iterations on these k subintervals till the required accuracy. The computation on these k subintervals is independent from each other and can also be performed in parallel. These k subintervals can be allocated to m processes. As shown in Table 5, each p (p = ceil(k/m)) subintervals are assigned to the master process 0 and slave processes 1∼m − 2, respectively. And the remaining k − (m − 1)p subintervals are assigned to the slave process m − 1.

Table 5.

Assignments of subintervals.

Process	Assignment
Master (process 0)	$[x_{i - 1}, x_{i + 1}]_{j} (j = 1, \dots, p)$
Slave (process 1)	$[x_{i - 1}, x_{i + 1}]_{j} (j = p + 1, \dots, 2 p)$
…	…
Slave (process m − 2)	$[x_{i - 1}, x_{i + 1}]_{j} (j = (m - 2) p + 1, \dots, (m - 1) p)$
Slave (process m − 1)	$[x_{i - 1}, x_{i + 1}]_{j} (j = (m - 1) p + 1, \dots, k)$

The whole process of the eigenvalue search is shown in Figure 9.

Figure 9.

Flowchart of parallel eigenvalue search.

Example of a non-uniform beam

Figure 10(a) shows a non-uniform beam with length L. More detailed geometry parameters of the cross section are marked in Figure 10(a). The cross-sectional area is A(x), and the sectional moment of inertia is I(x), where

A (x) = A_{0} χ^{2}, I (x) = I_{0} χ^{4}

(15)

A ₀ and I₀ are the cross-sectional area and the sectional moment of inertia at x = 0, respectively, and $χ = x (α - 1) / L + 1$ and $α = b_{L} / b_{0} = h_{L} / h_{0}$ .

Figure 10.

Non-uniform beam model.

The non-uniform beam can be discretized into n segments of uniform beam elements with length $l = L / n$ and height $h_{k} = α_{k} h_{0} (k = 1, 2, \dots, n)$ shown in Figure 10(b), where $α_{k} = (α - 1) k / n + 1$ . The total volume of the beam segments from 1 to k is $V_{k} = A_{0} L_{k} (α_{k}^{2} + α_{k} + 1) / 3$ , where $L_{k} = kL / n$ is the length of the corresponding beam segments. Therefore, the equivalent volume of each uniform beam element is $V_{k}^{e} = V_{k} - V_{k - 1}$ , the equivalent cross-sectional area is $A_{k}^{e} = V_{k}^{e} / l$ , and the equivalent sectional moment of inertia is $I_{k}^{e} = I_{0} (\sqrt{A_{k}^{e} / A_{0}})^{4}$ .

Taking the uniform beam segment as a Euler–Bernoulli beam with free and damped transverse vibration, its dynamics equation can be obtained as¹⁴

\bar{m} \frac{\partial^{2} y}{\partial t^{2}} + η \frac{\partial y}{\partial t} + EI \frac{\partial^{4} y}{\partial x^{4}} = 0

(16)

where $\bar{m} = ρ A$ is the line mass density, and $ρ$ , A, $η$ , E, and I are the density, cross-sectional area, damping coefficient, Young’s modulus, and sectional moment of inertia, respectively.

By applying linear MSTMM, the transfer matrix of the beam can be obtained as

U = [\begin{matrix} S (λ l) & \frac{T (λ l)}{λ} & \frac{U (λ l)}{EI λ^{2}} & \frac{V (λ l)}{EI λ^{3}} \\ λ V (λ l) & S (λ l) & \frac{T (λ l)}{EI λ} & \frac{U (λ l)}{EI λ^{2}} \\ EI λ^{2} U (λ l) & EI λ V (λ l) & S (λ l) & \frac{T (λ l)}{λ} \\ EI λ^{3} T (λ l) & EI λ^{2} U (λ l) & λ V (λ l) & S (λ l) \end{matrix}]

(17)

where $λ = \sqrt[4]{((- (\bar{m} s^{2} + η s)) / (EI))}$ , and $S (x)$ , $T (x)$ , $U (x)$ , and $V (x)$ are the Крылов functions

{\begin{matrix} S (x) = \frac{\cosh x + \cos x}{2}, T (x) = \frac{\sinh x + \sin x}{2} \\ U (x) = \frac{\cosh x - \cos x}{2}, V (x) = \frac{\sinh x - \sin x}{2} \end{matrix}

(18)

If each beam segment from left to right is numbered as 1 to n, respectively, and the transfer direction is defined from left to right, then the overall transfer equation of the non-uniform beam can be acquired as

Z_{n, 0} = U_{n} U_{n - 1} \dots U_{1} Z_{1, 0} = U_{all} Z_{1, 0}

(19)

where $Z_{1, 0}$ and $Z_{n, 0}$ are the boundary state vectors at the leftmost and rightmost sides of the beam, respectively, and $U_{i} (i = 1, 2, \dots, n)$ is the transfer matrix of each beam segment, and

U_{all} = [\begin{matrix} u_{11} & u_{12} & u_{13} & u_{14} \\ u_{21} & u_{22} & u_{23} & u_{24} \\ u_{31} & u_{32} & u_{33} & u_{34} \\ u_{41} & u_{42} & u_{43} & u_{44} \end{matrix}]

(20)

is the overall transfer matrix of the system with the matrix element $u_{ij} (i = 1, 2, 3, 4; j = 1, 2, 3, 4)$ .

By substituting the boundary conditions $Z_{n, 0} = [\begin{matrix} Y & Θ_{z} & 0 & 0 \end{matrix}]_{n, 0}^{T}$ and $Z_{1, 0} = [\begin{matrix} 0 & 0 & M_{z} & Q_{y} \end{matrix}]_{1, 0}^{T}$ into equation (19), the characteristic equation could be obtained by

Δ (s) = | \begin{matrix} u_{33} & u_{34} \\ u_{43} & u_{44} \end{matrix} | = 0

(21)

where $s = - δ + i ω$ is the complex variable.

The proposed method can be applied to solve equation (21). The search accuracy is set to be 10⁻⁴, and the dynamics parameters are given as follows: L = 4 m, b₀ = 0.04 m, h₀ = 0.04 m, b_L = 0.02 m, h_L = 0.02 m, ρ = 2730 kg/m³, I₀ = 2.133 × 10⁻⁷ m⁴, E = 6.9 × 10¹⁰ Pa, and $η = 40 kg / (ms)$ . Figure 11 shows the three-dimensional (3D) plot of $\log_{10} | Δ (s) |$ with the number of segments n = 50. Table 6 shows the computational results of eigenvalue by MSTMM, which is the same as those obtained by direct variable separation method (DVSM) similar to Joshi.¹⁵ The computational time by a single process of MSTMM and DVSM with different numbers of segments is shown in Figure 12, where MSTMM certainly costs less computational time that increases linearly with the number of segments.

Figure 11.

The eigenvalues log₁₀|Δ(s)| of non-uniform beam.

Table 6.

Comparison of the computational results of eigenvalue between MSTMM and DVSM.

Modal order	$- δ + i ω$ (MSTMM)	$- δ + i ω$ (DVSM)
1	−13.1872 + 10.4693i	−13.1872 + 10.4693i
2	−10.6857 + 69.8781i	−10.6857 + 69.8781i
3	−10.0808 + 175.8220i	−10.0808 + 175.8220i
4	−9.8454 + 332.8125i	−9.8454 + 332.8125i
5	−9.7312 + 541.7148i	−9.7312 + 541.7148i
6	−9.6679 + 802.6656i	−9.6679 + 802.6656i
7	−9.6294 + 1115.7216i	−9.6294 + 1115.7216i

MSTMM: transfer matrix method for multibody systems; DVSM: direct variable separation method.

Figure 12.

Comparison of computational time.

Typically, there are three indexes to describe the computational efficiency of parallel computing: computational time t, speedup ratio S_p, and parallel efficiency F_p, where S_p is defined as the ratio of the computational time with a single process (T₁) to that with p processes (T_p), namely, S_p = T₁/T_p and F_p = S_p/p. For a number of segments n = 100, the three indexes obtained by applying the proposed distributed parallel computing algorithm are listed in Table 7. Obviously, the parallel computing can accelerate the computational speed. From scheme no. 2–5 in Table 7, it is known that the computational speed of one PC with dual-core is slightly faster than two PCs with single-core; meanwhile, one PC with quad-core is slightly faster than four PCs all with single-core. This indicates that network delay has an impact on distributed parallel computing. Therefore, good network transmission speed should be ensured when using distributed parallel computing technique. Figure 13 shows how these indexes vary with CPU core numbers. Furthermore, with the increasing number of CPU cores, the computational time t is on the decline, but the trend is leveling off; meanwhile, the S_p increases and F_p decreases. This may be attributed to the increase in the communication overhead between different processes. Although the computation amount of a single process decreases since the number of CPU cores increases, the proportion of communication overhead relative to the computation amount for each process will increase, leading to the decrease in computational efficiency. To clarify, the computation scale is reduced by cutting down the number of segments of the beam to three and reducing the size of sample points. This will result in the preceding three indexes shown in Figure 14 and the proportion of average communication time t_c of each process to computational time t shown in Figure 15, respectively, where t_c only contains the time for each process to receive and send data using the MPI functions MPI_RECV, MPI_SEND, and MPI_BARRIER. From Figures 14 and 15, it can be found that S_p increases slowly and F_p becomes very low when the number of CPU cores rises. This is because the proportion of the communication time relative to the computational time enlarges and cannot be ignored. Under these circumstances, it has less effect to simply increase the number of CPU cores for improving the computational efficiency.

Table 7.

Parallel computational time, speedup ratio, and parallel efficiency.

Scheme no.	Number of process	Computational time t (s)	Speedup ratio S_p	Parallel efficiency F_p
1	1 (Single PC)	20.8462	1.0000	1.0000
2	2 (Single PC)	10.4343	1.9978	0.9989
3	4 (Single PC)	5.2757	3.9514	0.9878
4	1 + 1 (Distributed)	10.4576	1.9934	0.9967
5	1 + 1 + 1 + 1 (Distributed)	5.3131	3.9235	0.9809
6	2 + 2 + 2 + 2 (Distributed)	3.0752	6.7787	0.8473
7	3 + 3 + 3 + 3 (Distributed)	2.3135	9.0106	0.7509
8	4 + 4 + 4 +4 (Distributed)	1.7736	11.7537	0.7346

Figure 13.

Computational time, speedup ratio, and parallel efficiency with different CPU core numbers.

Figure 14.

Computational time, speedup ratio, and parallel efficiency of the reduced computation scale.

Figure 15.

Proportion of average communication time t_c to computational time t.

Engineering example of an MLRS

An MSD model of MLRS with 18 launch tubes is shown in Figure 16. According to the natural properties of each MLRS components, these components can be regarded as dynamics elements including lumped masses, rigid bodies, elastic beams, springs, torsional springs, and so on and are sequentially numbered. The corresponding topology figure of the dynamics model is shown in Figure 17. The body elements labeled 2, 5, 8, 11, 14, and 17 are the wheels; 19 the vehicle body; 21 the revolving part; 23 the pitch part; 27 + 5l and 28 + 5l the segments of the launch tube; 26 + 5l the tail of the launch tube, where l stands for the launch tube in the firing state and $l = 1, 2, \dots, 18$ . The hinge elements are 24 + 5l and 25 + 5l connecting the launch tube to the pitch part; 22 connecting the revolving part to the pitch part; 20 connecting the vehicle body to the revolving part; 3, 6, 9, 12, 15, and 18 connecting the wheels to the vehicle body; 1, 4, 7, 10, 13, and 16 connecting the wheels to the ground. The system has eight boundary points in total, where $Z_{28 + 5 l, 0}$ and $Z_{26 + 5 l, 0}$ are the free ends at the front and tail of the launch tube, respectively; $Z_{1, 0}$ , $Z_{4, 0}$ , $Z_{7, 0}$ , $Z_{10, 0}$ , $Z_{13, 0}$ , and $Z_{16, 0}$ are the connection ends of hinges 1, 4, 7, 10, 13, and 16 connected to the ground. The boundary $Z_{28 + 5 l, 0}$ is considered as the root of the system, and all other boundaries ( $Z_{26 + 5 l, 0}$ , $Z_{1, 0}$ , $Z_{4, 0}$ , $Z_{7, 0}$ , $Z_{10, 0}$ , $Z_{13, 0}$ , and $Z_{16, 0}$ ) are considered as the treetop of the system. The transfer direction of the system is from the treetop to the root as the arrow direction shown in Figure 17.

Figure 16.

MSD model of MLRS.

Figure 17.

Topology figure of MSD model of MLRS.

Based on the topology of the MLRS dynamics model, the overall transfer equation and the overall transfer matrix of the MLRS given by equations (22)–(24) can be obtained by applying the automatic deduction theorem of the overall transfer equation of multibody systems¹¹

U_{all} Z_{all} = 0

(22)

\begin{matrix} Z_{all} = & [Z_{28 + 5 l, 0}^{T}, Z_{1, 0}^{T}, Z_{4, 0}^{T}, Z_{7, 0}^{T}, Z_{10, 0}^{T}, Z_{13, 0}^{T}, \\ Z_{16, 0}^{T}, Z_{26 + 5 l, 0}^{T}, Z_{23, 25 + 5 l}^{T}]^{T} \end{matrix}

(23)

U_{all} = [\begin{matrix} \begin{matrix} - I_{12} \\ O \\ O \\ O \\ O \\ O \\ O \\ O \\ O \end{matrix} & \begin{matrix} T_{1 - 28 + 5 l} \\ G_{1 - 19} \\ G_{1 - 19} \\ G_{1 - 19} \\ G_{1 - 19} \\ G_{1 - 19} \\ G_{1 - 23} \\ G_{1 - 27 + 5 l} \\ G_{1 - 28 + 5 l} \end{matrix} & \begin{matrix} T_{4 - 28 + 5 l} \\ G_{4 - 19} \\ O \\ O \\ O \\ O \\ G_{4 - 23} \\ G_{4 - 27 + 5 l} \\ G_{4 - 28 + 5 l} \end{matrix} & \begin{matrix} T_{7 - 28 + 5 l} \\ O \\ G_{7 - 19} \\ O \\ O \\ O \\ G_{7 - 23} \\ G_{7 - 27 + 5 l} \\ G_{7 - 28 + 5 l} \end{matrix} & \begin{matrix} T_{10 - 28 + 5 l} \\ O \\ O \\ G_{10 - 19} \\ O \\ O \\ G_{10 - 23} \\ G_{10 - 27 + 5 l} \\ G_{10 - 28 + 5 l} \end{matrix} & \begin{matrix} T_{13 - 28 + 5 l} \\ O \\ O \\ O \\ G_{13 - 19} \\ O \\ G_{13 - 23} \\ G_{13 - 27 + 5 l} \\ G_{13 - 28 + 5 l} \end{matrix} & \begin{matrix} T_{16 - 28 + 5 l} \\ O \\ O \\ O \\ O \\ G_{16 - 19} \\ G_{16 - 23} \\ G_{16 - 27 + 5 l} \\ G_{16 - 28 + 5 l} \end{matrix} & \begin{matrix} T_{26 + 5 l - 28 + 5 l} \\ O \\ O \\ O \\ O \\ O \\ O \\ G_{26 + 5 l - 27 + 5 l} \\ G_{26 + 5 l - 28 + 5 l} \end{matrix} & \begin{matrix} T_{23 - 28 + 5 l} C + T_{25 + 5 l - 28 + 5 l} \\ O \\ O \\ O \\ O \\ O \\ G_{23 - 23} C \\ G_{23 - 27 + 5 l} C \\ G_{23 - 28 + 5 l} C + G_{25 + 5 l - 28 + 5 l} \end{matrix} \end{matrix}]

(24)

where $Z_{23, 25 + 5 l}$ means the state vector of the connection point between body 23 and hinge 25 + 5l in the closed loop, $U_{all}$ is the overall transfer matrix, $I_{12}$ is a 12 × 12 identity matrix, $T_{i - 28 + 5 l} (i = 1, 4, 7, 10, 13, 16, 23, 25 + 5 l, 26 + 5 l)$ , and $G_{i - j} (i = 1, 4, 7, 10, 13, 16, 23, 25 + 5 l, 26 + 5 l; j = 19, 23, 27 + 5 l, 28 + 5 l)$ are the coefficient matrices in the main transfer equation and geometry equations of the overall transfer equation of a system, respectively.¹¹ The transfer matrix of each element could be found in Rui et al.^10,16

Eliminating all zeros in $Z_{all}$ corresponding to the known variables in the system boundary conditions, and removing the columns associated with these zeros in $U_{all}$ , equation (22) reduced to

{\bar{U}}_{all} {\bar{Z}}_{all} = 0

(25)

The characteristic equation can be obtained as

\det {\bar{U}}_{all} = 0

(26)

The eigenvalues of the MLRS can be solved by applying the proposed method in this article. Furthermore, the computational time, speedup ratio, and parallel efficiency of the fully loaded MLRS with its first 50 computed eigenvalues by applying the suggested method are shown in Table 8. Moreover, the corresponding curves with different CPU core numbers are shown in Figure 18. Similar to the non-uniform beam, the parallel computing technique substantially increases the computational speed of the eigenvalues of the MLRS.

Table 8.

Parallel computational time, speedup ratio, and parallel efficiency.

Scheme no.	Number of process	Computational time t (s)	Speedup ratio S_p	Parallel efficiency F_p
1	1 (Single PC)	4.2457	1.0000	1.0000
2	2 (Single PC)	2.1295	1.9938	0.9969
3	4 (Single PC)	1.0862	3.9089	0.9772
4	1 + 1 (Distributed)	2.1366	1.9872	0.9936
5	1 + 1 + 1 + 1 (Distributed)	1.0883	3.9012	0.9753
6	2 + 2 + 2 + 2 (Distributed)	0.6562	6.4701	0.8088
7	3 + 3 + 3 + 3 (Distributed)	0.6205	6.8426	0.5702
8	4 + 4 + 4 + 4 (Distributed)	0.5193	8.1755	0.5110

Figure 18.

Computational time, speedup ratio, and parallel efficiency with different CPU core numbers.

During the design period, optimization on the parameters of MLRS such as connecting stiffness among MLRS components is an important step of the dynamics design. The firing interval of the MLRS is 0.6 s, namely, the launching frequency is about 1.67 Hz. The natural frequency of MLRS should keep away from the launching frequency to avoid resonances. According to the existing empirical natural frequency of other MLRSs, taking the first 10 eigenvalues of the fully loaded MLRS as optimization objective, and taking the stiffness among components of MLRS as design variables, the genetic algorithm is applied to optimize the dynamics performance of the MLRS. The optimization flow chart is shown in Figure 19. Figure 20 shows the first 10 eigenvalues of the MLRS varying with the genetic generations during the optimization process. The estimation evolution of the best fitness is shown in Figure 21. The eigenvalues obtained by computation and modal test are shown in Table 9. It can be seen that the computational results are in good agreement with those obtained by the experiment.

Figure 19.

The optimization flow chart of MLRS.

Figure 20.

The eigenvalues of MLRS.

Figure 21.

Convergence of the best fitness.

Table 9.

Comparison between the computational eigenvalues and the experimental values of MLRS.

Modal order (fully loaded)	Computational eigenvalues (Hz)	Experimental eigenvalues (Hz)	Relative error (%)
1	3.00	3.00	0.00
2	3.56	3.73	−4.56
3	5.17	5.23	−1.24
4	5.98	5.74	4.23
5	6.21	6.24	−0.53
6	7.19	6.73	6.83
7	8.77	8.71	0.64
8	9.79	10.60	−7.59
9	11.31	10.81	4.68
10	15.37	14.56	5.56

Table 10 shows the computational time by applying the proposed method and by a single process, respectively, which reveals the value of applying the proposed method to the dynamics optimization of complex multibody systems.

Table 10.

Comparison of computational time.

Number of process	Computational time t (h)
Single process	3.59
4 + 4 + 4 + 4 (Distributed)	0.44

Conclusion

The parallel algorithms for assembling the overall transfer matrix and recursive eigenvalue search are proposed based on MPI parallel computing theory in this article. The eigenvalues of a non-uniform beam and an MLRS are computed using this method. The influence of the CPU core number and the network environment on the computational time is analyzed. This study shows the following points:

The proposed parallel computing algorithm can significantly increase the computational speed to search eigenvalues and efficiently reduce the time cost of the dynamics optimization of complex multibody systems.

The distributed parallel computing environment based on MPI has strong scalability and high performance. It makes it possible to use ordinary computers to construct a computer cluster so that hardware resources can be fully utilized to achieve higher computational efficiency. Moreover, in order to achieve higher computational efficiency, good network environment is essential because the network environment has a certain impact on computational time.

With the increase in the number of CPU cores, the computational efficiency decreases due to the increase in the communication between processes. Therefore, computational time cannot be reduced infinitely by continually increasing the number of CPU cores. Appropriate number of CPU cores should be selected according to the computation scale.

Footnotes

Acknowledgements

The authors owe special thanks to associate professor Bin He, lecturer Wei Zhu, and PhD students Wenbing Tang and Jianshu Zhang for providing important and constructive proposals to this article.

Academic Editor: Mario L Ferrari

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by Research Fund for the Doctoral Program of Higher Education of China (no. 20133219110037), Defense Industrial Technology Development Program of China (no. A2620133008), and Fundamental Research Funds for the Central Universities (no. 30915011326).

References

Shabana

AA.

Dynamics of multibody system. Cambridge: Cambridge University Press, 2009.

Manferdelli

The many-core inflection point for mass market computer systems. CTWatch Q 2007; 3: 11–17.

Wang

Xin

Wang

How to deploy and apply a high performance cluster computing system. Comput Eng Appl 2001; 32: 157–159.

Negrut

Serban

Mazhar

. Parallel computing in multibody system dynamics: why, when, and how. J Comput Nonlin Dyn 2014; 9: 1–12.

Negrut

Tasora

Mazhar

. Leveraging parallel computing in multibody dynamics. Multibody Syst Dyn 2012; 27: 95–117.

González

Luaces

Lugrís

. Non-intrusive parallelization of multibody system dynamic simulations. Comput Mech 2009; 44: 493–504.

Bauchau

OA.

Parallel computation approaches for flexible multibody dynamics simulations. J Frankl Inst 2010; 347: 53–68.

Xiong

Qin

Ding

. Parallel cell mapping method for global analysis of high-dimensional nonlinear dynamical systems. J Appl Mech 2015; 82: 111010-1–111010-12.

Wang

Jin

A method for large-scale parallel tetrahedral mesh generation. J Vib Shock 2014; 33: 102–107.

10.

Rui

Yun

. Transfer matrix method of multibody system and its applications. Beijing, China: Science Press, 2008.

11.

Rui

Zhang

Zhou

Automatic deduction theorem of overall transfer equation of multibody system. Adv Mech Eng 2014; 2014: 378047-1–378047-12.

12.

Bestle

Abbas

Rui

Recursive eigenvalue search algorithm for transfer matrix method of linear flexible multibody systems. Multibody Syst Dyn 2014; 32: 429–444.

13.

Yang

Rui

Liu

. Study on distributed parallel computing of transfer matrix method for multibody systems. J Vib Eng 2014; 27: 9–15.

14.

Frederico

Matt

Simulation of the transverse vibrations of a cantilever beam with an eccentric tip mass in the axial direction using integral transforms. Appl Math Model 2013; 37: 9338–9354.

15.

Joshi

Free vibration characteristics of variable mass rockets having large axial thrust/acceleration. J Sound Vib 1995; 187: 727–736.

16.

Rui

Wang

. Simulation and test methods of launch dynamics of multiple launch rocket system. Beijing, China: National Defense Industry Press, 2003.