The modeling and solving a transcendental eigenvalue problem are important issues in the transfer matrix method for linear multibody systems. Based on the recursive eigenvalue search algorithm for transfer matrix method for linear multibody system, the distributed parallel approach for assembling overall transfer matrix and searching eigenvalues is proposed. This is achieved based on Message Parallel Interface. The influence of the CPU core number as well as the distributed network environment on the final computational time is analyzed through numerical examples of both a non-uniform beam and a multiple launch rocket system. The results indicate that the computational time is significantly reduced by the proposed parallel computing method, so that the computational efficiency on optimization and design of complex multibody systems can be improved.
The demand for multibody system dynamics (MSD) computation is increasing in various fields, such as vehicles, ships, aeronautics, and astronautics weapons, while the MSD models become more and more complicated.1 Typically, the solution of a complex multibody system will cost a plenty of time, thus the conventional static design pattern can hardly meet the requirement of dynamics design. In light of such concern, the computational speed needs to be accelerated. Although this can be achieved by raising the level of hardware, the computational speed of a single-core CPU is limited due to manufacturing process,2 thus resulting in the emergence of the multicore CPU handling a problem simultaneously. At the same time, the computer cluster system can be constructed using relatively inexpensive computers connected via the high-speed local area network (LAN), which is more cost-effective compared with supercomputers.3 These provide the foundation for the software technology such as the parallelization of numerical algorithms.4 To shorten the computing time, many commercial and open source dynamics software including ADAMS, ANSYS, MBDyn, and so on have introduced the idea of parallel computing. A lot of research work in parallel computing4–9 is also investigated by scholars. D Negrut et al.5 concentrated on the use of commodity parallel computing in many-body frictional-contact dynamics, fluid–solid interaction analysis, and proximity computation. F González et al.6 evaluated two non-intrusive parallelization techniques for MSD, namely, parallel sparse linear equation solvers and OpenMP. OA Bauchau7 used domain decomposition techniques to achieve parallel processing of finite element method for flexible multibody systems.
The transfer matrix method for multibody systems (MSTMM) is a rather novel method for MSD. Compared with ordinary dynamics methods, MSTMM has the following characteristics: no global dynamics equation of the system is required, low matrix order is involved, and fast computing speed and highly stylized programming are achieved.10 Its basic idea is to (1) break up the whole system into several elements, (2) use a matrix representing the mechanical properties of each element, (3) assemble these matrices by applying the automatic deduction theorem of the overall transfer equation of multibody systems,11 and (4) solve the overall transfer equation of the system.
The solution of the overall transfer equation is one of a key technique in MSTMM. D Bestle et al.12 proposed a recursive eigenvalue search algorithm for the transfer matrix method for linear multibody systems (linear MSTMM). Although this algorithm has a linear convergence rate, as is the same with those strategies based on sign change, it can solve both real and complex eigenvalue problems of a system and even find double roots. Moreover, this algorithm is more competitive and efficient in choosing step size than other strategies based on sign change.
To further improve the computational efficiency, a parallel algorithm for assembling the overall transfer matrix as well as recursively searching the eigenvalue is proposed in this article, and the distributed parallel computing environment based on the Message Parallel Interface (MPI) is built. A damped non-uniform beam with complex eigenvalues and a multiple launch rocket system (MLRS) with real eigenvalues are taken as numerical examples to verify the proposed method. This article provides a feasible way to further improve the computational speed of linear MSTMM.
Basic idea of linear MSTMM
In the inertial coordinate system, r, , m, and q are used to represent the linear displacement, angular displacement, internal moment, and internal force, respectively, of one point fixed on an element vibrating in space relative to its equilibrium position, where , , , and . The state vector of the connection point is defined as under physical coordinates or under modal coordinates by introducing modal transformation , where , , , and are the modal coordinates arrays corresponding to r, , m, and q, and is the natural frequency. Here, and are used to represent the state vectors of the input end I and output end O on element j, respectively. and possess the relationship , where is the transfer matrix of element j. Specially, if the input or output end of an element j is a boundary end, the state vector will be denoted as , where “0” represents the boundary. Rui et al.10 provide a library for the transfer matrices for a variety of typical elements.
For clarity, an example with a one-degree-of-freedom (1-DOF) lumped mass connected with a rod is shown in Figure 1.
The 1-DOF system with a rod and a lumped mass.
The rod with length l and the lumped mass with mass m are numbered as 1 and 2, respectively. The state vector of the inner connection point between elements 1 and 2 is defined as , where X and Q denote the displacement and internal force related to this point. Analogously, the boundary state vector of the fixed end of the rod and the free end of the lumped mass are denoted as and , respectively, after taking the boundary condition into consideration. The transfer matrices of the rod and the lumped mass take the form10
where , and , E, and A represent the density, Young’s modulus, and sectional area of the rod, respectively.
Then the transfer equation of the system can be obtained as
where
is the overall transfer matrix of the system.
The non-trivial solution of equation (2) leads to system characteristic equation as
Then the eigenvalue of the system can be acquired by solving equation (4).
For a multibody system in a more generic topology, the overall transfer equation and the overall transfer matrix can be obtained by simply following the automatic deduction theorem of the overall transfer equation for multibody systems11 taking a general form
where contains the state vectors of all the boundary points and cutting points and is the overall transfer matrix of the system.
By eliminating zero elements corresponding to the known system boundary conditions in and removing the corresponding columns in , equation (5) can be rewritten as
where the coefficient matrix is a square matrix. For a given linear multibody system with determined dynamics parameters, the value of only depends on the vibration frequencies or complex numbers summarizing damping and frequency. To satisfy equation (6), the coefficient matrix needs to be singular resulting in the system characteristic equation as
The eigenvalues of a multibody system can be obtained by solving equation (7), which will be discussed in section “Recursive eigenvalue search algorithm.” It is worth mentioning that the order of the coefficient matrix is very low, which will not cost too much time in computation of determinant . From equation (4), it can also be found that the characteristic equation is a nonlinear or even transcendental function w.r.t. the eigenvalue. This indicates some popular eigensolvers, such as Lanczos, QR, and Arnoldi, cannot be applied in the context of linear MSTMM where the characteristic equation has no special structure.
Recursive eigenvalue search algorithm
For an undamped system, the solution of equation (7) can be achieved by searching the zeros of the equation , where is the real eigenvalues of the system. Common zero search strategy such as the bisection method is typically based on the sign change of , however, flawed to search double roots. And for a damped system, solving equation (7) indicates to solve , where is the complex eigenvalue. Therefore, solution of equation (7) changes to a two-dimensional variable problem , which cannot be simply solved by search strategies based on sign change.
The basic idea of the recursive eigenvalue search algorithm for linear MSTMM is switching from zero search for to minimization of its absolute value , which is equally well applicable to both the real and the complex cases. It is much easier to recognize craters in than sign changes in as conical breaking-ins of or only happen in the neighborhood of singularities of the overall transfer matrix.12 Here, the real eigenvalue search is taken as an example to describe the basic process of this algorithm. For a one-dimensional variable continuous function , x represents , and f represents . N sample points are chosen on the search region of interest, and the corresponding function value is denoted by . If and , there must be a local minimum in the interval . All such intervals may be considered as interesting regions. Furthermore, N sample points on each region are chosen again for more detailed scanning to get more accurate local minimum. When the precision reaches the requirement, the resulting local minimum could be considered as the solution of . During programming, the recursive method is used to achieve the above process, as it is named recursive eigenvalue search algorithm. Moreover, to ease the algorithm, and to reduce the comparison times while not missing the potential candidates, the local minimum condition is weakened as and . The logical array Li is defined as
where 1 means true, while 0 means false. Consequently, the local minimum conditions and can be rewritten as shown in Figure 2. There “¬” and “∧” means logical operation “NOT” and “AND,” respectively.
Logical arrays of minimization strategy.
For each iteration in the one-dimensional minimum search in Bestle et al.,12 a user-defined resolution corresponds to the necessary number of sample points required for an interval of interest, whereas the user-provided sample size N is used on such region of interest. The iteration will terminate if on this region of interest is no greater than N, indicating that resolution is achieved within this iteration. The algorithm recursively sweeps through all regions with potential minima till all local minima are found. As for two-dimensional minimum search, the scanning is done on a regular grid and the function comparison is evaluated in both the x- and y-directions. In correspondence, the user may provide different sampling numbers , and precision tolerances dx, dy in two search directions, and the algorithm stops only if both resolutions are achieved.
As discussed in the preceding section, the recursive eigenvalue search algorithm is more competitive in choosing step size than some strategies based on sign change. The following numerical example will provide some insights. If the number of rods and lumped masses in Figure 1 increases to 10, respectively, the system will become as shown in Figure 3, where the masses of the lumped masses are denoted as .
The 1-DOF system with 10 rods and 10 lumped masses.
The overall transfer equation of the system can be obtained as
with the characteristic equation
The variation trend of w.r.t. is shown in Figure 4 for aluminum rods with , , , , and the lumped masses being , respectively. The following transformation is made when the figure is plotted
Variation in w.r.t. with same lumped masses.
If one sets the sampling size and the absolute resolution on the region , the eigenvalues of the system will be captured by both recursive eigenvalue search algorithm and the bisection method as depicted in Figure 4, where zeros are well distributed and isolated with a clear sign change. In this case, both methods work well.
If the lumped masses become different from each other with different mass values shown in Table 1, the variation trend of w.r.t. will become as shown in Figure 5, where there exists two roots in a narrow region. This makes the search algorithm based on sign change fail to find these roots without a high amount of enumeration effort. In contrast, the recursive eigenvalue search algorithm still works well with the same sampling size , which proves that the recursive search algorithm has good robustness in choosing sample points.
Values of lumped masses.
Lumped masses
m2
m4
m6
m8
m10
m12
m14
m16
m18
m20
Mass (kg)
0.25
0.093
0.55
1.00
0.50
0.50
0.60
0.15
0.15
1.00
Variation in w.r.t. with different lumped masses.
Principle and realization of distributed parallel computing technique
The basic idea of parallel computing is to separate a task into several parts and every part is solved by an individual processor parallel to reduce the computational time.4 Furthermore, the distributed parallel computing technique not only divides the problem into parts but also assigns tasks to multiple computers interconnected by network so that the computers solve the tasks simultaneously. Then each computer uploads the results to the host to get a final result.13
Construction of distributed parallel computing environment
As shown in Figure 6, to construct an MPI distributed parallel environment is to connect computers to the same LAN via Ethernet switches or other network equipment. In this article, four computers connected by a 100 Mb/s Ethernet switch are used to set up the distributed parallel computing environment. Each computer has an i7 quad-core processor with the frequency of 3.06 GHz. The parallel software used is MPICH2.
Distributed parallel computing environment.
Parallel computing algorithm for linear MSTMM
The master–slave mode is used as task allocation mechanism in this article. The master process is mainly responsible for the global division of tasks, which sends tasks to the slave processes and receives the results from the slaves. Meanwhile, the slave processes are responsible for computing the tasks assigned by the master process and returning the computational results to the master process. The major task of the eigenvalue computation using linear MSTMM is to assemble the overall transfer matrix and solving the characteristic equation. Thereby, the parallel algorithms can be designed in these two aspects.
Parallel computing algorithm for assembling the overall transfer matrix
Based on the automatic deduction theorem of the overall transfer equation of multibody systems, the assembly of the overall transfer matrix is a reassembling process after multiplying transfer matrices of each element in proper sequence according to the system topology.
As a chain system shown in Figure 7, the overall transfer matrix can be written as , where . The successive multiplication of n matrices can be distributed to m processes concurrently, including one master process numbered as 0 and m − 1 slave processes numbered from 1 to m − 1. As shown in Table 2, the multiplication of the first p (p = ceil(n/m)) matrices is allocated to the master process 0. In a similar manner, the multiplications of each follow-up p matrices are allocated to the slave processes from 1 to m − 2, respectively. Eventually, the remaining multiplications of n − (m − 1)p matrices are allocated to the last slave process m − 1. The slave processes send each computing result to the master process. Then, the overall transfer matrix can be obtained by the master process.
Topology figure of a chain system.
Matrix multiplication assignment.
Process
Assignment
Master (process 0)
Slave (process 1)
…
…
Slave (process m − 2)
Slave (process m − 1)
Multibody systems with general topology could eventually be treated as a tree system. Taking a tree system as an example shown in Figure 8, the overall transfer equation can be obtained as , where
Topology figure of a tree system.
The detailed meaning of the above symbols can be found in Rui et al.11 As shown in equations (12)–(14), the overall transfer matrix can be obtained by computing n sub-matrices T and G, which could be generally denoted by . can be allocated to m processes concurrently. For one master process and m − 1 slave processes, the matrix multiplication assignment is shown in Table 3. The master process and each of the first m − 2 slave processes have to compute p (p = ceil(n/m)) matrices, respectively. At the same time, the remaining n − (m − 1)p matrices are assigned to the last slave process m − 1. Finally, the slave processes send each computing result to the master process, and then the master process assembles to obtain the overall transfer matrix.
Matrix computation assignment.
Process
Assignment
Master (process 0)
Slave (process 1)
…
…
Slave (process m − 2)
Slave (process m − 1)
Parallel recursive eigenvalue search algorithm
The first step of the eigenvalue search algorithm is to compute the function value on the N sample points on the region . This step can be parallelized first. Function evaluations can be allocated to m processes in parallel. As shown in Table 4, every p (p = ceil(N/m)) sample points are assigned to master process 0 and slave processes 1∼m − 2 to compute, respectively. And N − (m − 1)p sample points are assigned to the slave process m − 1.
Function value computation assignments.
Process
Assignment
Master (process 0)
Slave (process 1)
…
…
Slave (process m − 2)
Slave (process m − 1)
After the slave processes return each function value to the master process, the master process is then able to compute the local minima on the region according to the algorithm given in section “Recursive eigenvalue search algorithm.”
Assuming there are k local minima and the corresponding subintervals are , the final result can be obtained by further iterations on these k subintervals till the required accuracy. The computation on these k subintervals is independent from each other and can also be performed in parallel. These k subintervals can be allocated to m processes. As shown in Table 5, each p (p = ceil(k/m)) subintervals are assigned to the master process 0 and slave processes 1∼m − 2, respectively. And the remaining k − (m − 1)p subintervals are assigned to the slave process m − 1.
Assignments of subintervals.
Process
Assignment
Master (process 0)
Slave (process 1)
…
…
Slave (process m − 2)
Slave (process m − 1)
The whole process of the eigenvalue search is shown in Figure 9.
Flowchart of parallel eigenvalue search.
Example of a non-uniform beam
Figure 10(a) shows a non-uniform beam with length L. More detailed geometry parameters of the cross section are marked in Figure 10(a). The cross-sectional area is A(x), and the sectional moment of inertia is I(x), where
A0 and I0 are the cross-sectional area and the sectional moment of inertia at x = 0, respectively, and and .
Non-uniform beam model.
The non-uniform beam can be discretized into n segments of uniform beam elements with length and height shown in Figure 10(b), where . The total volume of the beam segments from 1 to k is , where is the length of the corresponding beam segments. Therefore, the equivalent volume of each uniform beam element is , the equivalent cross-sectional area is , and the equivalent sectional moment of inertia is .
Taking the uniform beam segment as a Euler–Bernoulli beam with free and damped transverse vibration, its dynamics equation can be obtained as14
where is the line mass density, and , A, , E, and I are the density, cross-sectional area, damping coefficient, Young’s modulus, and sectional moment of inertia, respectively.
By applying linear MSTMM, the transfer matrix of the beam can be obtained as
where , and , , , and are the Крылов functions
If each beam segment from left to right is numbered as 1 to n, respectively, and the transfer direction is defined from left to right, then the overall transfer equation of the non-uniform beam can be acquired as
where and are the boundary state vectors at the leftmost and rightmost sides of the beam, respectively, and is the transfer matrix of each beam segment, and
is the overall transfer matrix of the system with the matrix element .
By substituting the boundary conditions and into equation (19), the characteristic equation could be obtained by
where is the complex variable.
The proposed method can be applied to solve equation (21). The search accuracy is set to be 10−4, and the dynamics parameters are given as follows: L = 4 m, b0 = 0.04 m, h0 = 0.04 m, bL = 0.02 m, hL = 0.02 m, ρ = 2730 kg/m3, I0 = 2.133 × 10−7 m4, E = 6.9 × 1010 Pa, and . Figure 11 shows the three-dimensional (3D) plot of with the number of segments n = 50. Table 6 shows the computational results of eigenvalue by MSTMM, which is the same as those obtained by direct variable separation method (DVSM) similar to Joshi.15 The computational time by a single process of MSTMM and DVSM with different numbers of segments is shown in Figure 12, where MSTMM certainly costs less computational time that increases linearly with the number of segments.
The eigenvalues log10|Δ(s)| of non-uniform beam.
Comparison of the computational results of eigenvalue between MSTMM and DVSM.
Modal order
(MSTMM)
(DVSM)
1
−13.1872 + 10.4693i
−13.1872 + 10.4693i
2
−10.6857 + 69.8781i
−10.6857 + 69.8781i
3
−10.0808 + 175.8220i
−10.0808 + 175.8220i
4
−9.8454 + 332.8125i
−9.8454 + 332.8125i
5
−9.7312 + 541.7148i
−9.7312 + 541.7148i
6
−9.6679 + 802.6656i
−9.6679 + 802.6656i
7
−9.6294 + 1115.7216i
−9.6294 + 1115.7216i
MSTMM: transfer matrix method for multibody systems; DVSM: direct variable separation method.
Comparison of computational time.
Typically, there are three indexes to describe the computational efficiency of parallel computing: computational time t, speedup ratio Sp, and parallel efficiency Fp, where Sp is defined as the ratio of the computational time with a single process (T1) to that with p processes (Tp), namely, Sp = T1/Tp and Fp = Sp/p. For a number of segments n = 100, the three indexes obtained by applying the proposed distributed parallel computing algorithm are listed in Table 7. Obviously, the parallel computing can accelerate the computational speed. From scheme no. 2–5 in Table 7, it is known that the computational speed of one PC with dual-core is slightly faster than two PCs with single-core; meanwhile, one PC with quad-core is slightly faster than four PCs all with single-core. This indicates that network delay has an impact on distributed parallel computing. Therefore, good network transmission speed should be ensured when using distributed parallel computing technique. Figure 13 shows how these indexes vary with CPU core numbers. Furthermore, with the increasing number of CPU cores, the computational time t is on the decline, but the trend is leveling off; meanwhile, the Sp increases and Fp decreases. This may be attributed to the increase in the communication overhead between different processes. Although the computation amount of a single process decreases since the number of CPU cores increases, the proportion of communication overhead relative to the computation amount for each process will increase, leading to the decrease in computational efficiency. To clarify, the computation scale is reduced by cutting down the number of segments of the beam to three and reducing the size of sample points. This will result in the preceding three indexes shown in Figure 14 and the proportion of average communication time tc of each process to computational time t shown in Figure 15, respectively, where tc only contains the time for each process to receive and send data using the MPI functions MPI_RECV, MPI_SEND, and MPI_BARRIER. From Figures 14 and 15, it can be found that Sp increases slowly and Fp becomes very low when the number of CPU cores rises. This is because the proportion of the communication time relative to the computational time enlarges and cannot be ignored. Under these circumstances, it has less effect to simply increase the number of CPU cores for improving the computational efficiency.
Parallel computational time, speedup ratio, and parallel efficiency.
Scheme no.
Number of process
Computational time t (s)
Speedup ratio Sp
Parallel efficiency Fp
1
1 (Single PC)
20.8462
1.0000
1.0000
2
2 (Single PC)
10.4343
1.9978
0.9989
3
4 (Single PC)
5.2757
3.9514
0.9878
4
1 + 1 (Distributed)
10.4576
1.9934
0.9967
5
1 + 1 + 1 + 1 (Distributed)
5.3131
3.9235
0.9809
6
2 + 2 + 2 + 2 (Distributed)
3.0752
6.7787
0.8473
7
3 + 3 + 3 + 3 (Distributed)
2.3135
9.0106
0.7509
8
4 + 4 + 4 +4 (Distributed)
1.7736
11.7537
0.7346
Computational time, speedup ratio, and parallel efficiency with different CPU core numbers.
Computational time, speedup ratio, and parallel efficiency of the reduced computation scale.
Proportion of average communication time tc to computational time t.
Engineering example of an MLRS
An MSD model of MLRS with 18 launch tubes is shown in Figure 16. According to the natural properties of each MLRS components, these components can be regarded as dynamics elements including lumped masses, rigid bodies, elastic beams, springs, torsional springs, and so on and are sequentially numbered. The corresponding topology figure of the dynamics model is shown in Figure 17. The body elements labeled 2, 5, 8, 11, 14, and 17 are the wheels; 19 the vehicle body; 21 the revolving part; 23 the pitch part; 27 + 5l and 28 + 5l the segments of the launch tube; 26 + 5l the tail of the launch tube, where l stands for the launch tube in the firing state and . The hinge elements are 24 + 5l and 25 + 5l connecting the launch tube to the pitch part; 22 connecting the revolving part to the pitch part; 20 connecting the vehicle body to the revolving part; 3, 6, 9, 12, 15, and 18 connecting the wheels to the vehicle body; 1, 4, 7, 10, 13, and 16 connecting the wheels to the ground. The system has eight boundary points in total, where and are the free ends at the front and tail of the launch tube, respectively; , , , , , and are the connection ends of hinges 1, 4, 7, 10, 13, and 16 connected to the ground. The boundary is considered as the root of the system, and all other boundaries (, , , , , , and ) are considered as the treetop of the system. The transfer direction of the system is from the treetop to the root as the arrow direction shown in Figure 17.
MSD model of MLRS.
Topology figure of MSD model of MLRS.
Based on the topology of the MLRS dynamics model, the overall transfer equation and the overall transfer matrix of the MLRS given by equations (22)–(24) can be obtained by applying the automatic deduction theorem of the overall transfer equation of multibody systems11
where means the state vector of the connection point between body 23 and hinge 25 + 5l in the closed loop, is the overall transfer matrix, is a 12 × 12 identity matrix, , and are the coefficient matrices in the main transfer equation and geometry equations of the overall transfer equation of a system, respectively.11 The transfer matrix of each element could be found in Rui et al.10,16
Eliminating all zeros in corresponding to the known variables in the system boundary conditions, and removing the columns associated with these zeros in , equation (22) reduced to
The characteristic equation can be obtained as
The eigenvalues of the MLRS can be solved by applying the proposed method in this article. Furthermore, the computational time, speedup ratio, and parallel efficiency of the fully loaded MLRS with its first 50 computed eigenvalues by applying the suggested method are shown in Table 8. Moreover, the corresponding curves with different CPU core numbers are shown in Figure 18. Similar to the non-uniform beam, the parallel computing technique substantially increases the computational speed of the eigenvalues of the MLRS.
Parallel computational time, speedup ratio, and parallel efficiency.
Scheme no.
Number of process
Computational time t (s)
Speedup ratio Sp
Parallel efficiency Fp
1
1 (Single PC)
4.2457
1.0000
1.0000
2
2 (Single PC)
2.1295
1.9938
0.9969
3
4 (Single PC)
1.0862
3.9089
0.9772
4
1 + 1 (Distributed)
2.1366
1.9872
0.9936
5
1 + 1 + 1 + 1 (Distributed)
1.0883
3.9012
0.9753
6
2 + 2 + 2 + 2 (Distributed)
0.6562
6.4701
0.8088
7
3 + 3 + 3 + 3 (Distributed)
0.6205
6.8426
0.5702
8
4 + 4 + 4 + 4 (Distributed)
0.5193
8.1755
0.5110
Computational time, speedup ratio, and parallel efficiency with different CPU core numbers.
During the design period, optimization on the parameters of MLRS such as connecting stiffness among MLRS components is an important step of the dynamics design. The firing interval of the MLRS is 0.6 s, namely, the launching frequency is about 1.67 Hz. The natural frequency of MLRS should keep away from the launching frequency to avoid resonances. According to the existing empirical natural frequency of other MLRSs, taking the first 10 eigenvalues of the fully loaded MLRS as optimization objective, and taking the stiffness among components of MLRS as design variables, the genetic algorithm is applied to optimize the dynamics performance of the MLRS. The optimization flow chart is shown in Figure 19. Figure 20 shows the first 10 eigenvalues of the MLRS varying with the genetic generations during the optimization process. The estimation evolution of the best fitness is shown in Figure 21. The eigenvalues obtained by computation and modal test are shown in Table 9. It can be seen that the computational results are in good agreement with those obtained by the experiment.
The optimization flow chart of MLRS.
The eigenvalues of MLRS.
Convergence of the best fitness.
Comparison between the computational eigenvalues and the experimental values of MLRS.
Modal order (fully loaded)
Computational eigenvalues (Hz)
Experimental eigenvalues (Hz)
Relative error (%)
1
3.00
3.00
0.00
2
3.56
3.73
−4.56
3
5.17
5.23
−1.24
4
5.98
5.74
4.23
5
6.21
6.24
−0.53
6
7.19
6.73
6.83
7
8.77
8.71
0.64
8
9.79
10.60
−7.59
9
11.31
10.81
4.68
10
15.37
14.56
5.56
Table 10 shows the computational time by applying the proposed method and by a single process, respectively, which reveals the value of applying the proposed method to the dynamics optimization of complex multibody systems.
Comparison of computational time.
Number of process
Computational time t (h)
Single process
3.59
4 + 4 + 4 + 4 (Distributed)
0.44
Conclusion
The parallel algorithms for assembling the overall transfer matrix and recursive eigenvalue search are proposed based on MPI parallel computing theory in this article. The eigenvalues of a non-uniform beam and an MLRS are computed using this method. The influence of the CPU core number and the network environment on the computational time is analyzed. This study shows the following points:
The proposed parallel computing algorithm can significantly increase the computational speed to search eigenvalues and efficiently reduce the time cost of the dynamics optimization of complex multibody systems.
The distributed parallel computing environment based on MPI has strong scalability and high performance. It makes it possible to use ordinary computers to construct a computer cluster so that hardware resources can be fully utilized to achieve higher computational efficiency. Moreover, in order to achieve higher computational efficiency, good network environment is essential because the network environment has a certain impact on computational time.
With the increase in the number of CPU cores, the computational efficiency decreases due to the increase in the communication between processes. Therefore, computational time cannot be reduced infinitely by continually increasing the number of CPU cores. Appropriate number of CPU cores should be selected according to the computation scale.
Footnotes
Acknowledgements
The authors owe special thanks to associate professor Bin He, lecturer Wei Zhu, and PhD students Wenbing Tang and Jianshu Zhang for providing important and constructive proposals to this article.
Academic Editor: Mario L Ferrari
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was supported by Research Fund for the Doctoral Program of Higher Education of China (no. 20133219110037), Defense Industrial Technology Development Program of China (no. A2620133008), and Fundamental Research Funds for the Central Universities (no. 30915011326).
References
1.
ShabanaAA.Dynamics of multibody system. Cambridge: Cambridge University Press, 2009.
2.
ManferdelliJ.The many-core inflection point for mass market computer systems. CTWatch Q2007; 3: 11–17.
3.
WangWXinXWangR.How to deploy and apply a high performance cluster computing system. Comput Eng Appl2001; 32: 157–159.
4.
NegrutDSerbanRMazharH. Parallel computing in multibody system dynamics: why, when, and how. J Comput Nonlin Dyn2014; 9: 1–12.
RuiXYunLLuY. Transfer matrix method of multibody system and its applications. Beijing, China: Science Press, 2008.
11.
RuiXZhangJZhouQ.Automatic deduction theorem of overall transfer equation of multibody system. Adv Mech Eng2014; 2014: 378047-1–378047-12.
12.
BestleDAbbasLRuiX.Recursive eigenvalue search algorithm for transfer matrix method of linear flexible multibody systems. Multibody Syst Dyn2014; 32: 429–444.
13.
YangHRuiXLiuY. Study on distributed parallel computing of transfer matrix method for multibody systems. J Vib Eng2014; 27: 9–15.
14.
FredericoCMattT.Simulation of the transverse vibrations of a cantilever beam with an eccentric tip mass in the axial direction using integral transforms. Appl Math Model2013; 37: 9338–9354.
15.
JoshiA.Free vibration characteristics of variable mass rockets having large axial thrust/acceleration. J Sound Vib1995; 187: 727–736.
16.
RuiXLuYWangG. Simulation and test methods of launch dynamics of multiple launch rocket system. Beijing, China: National Defense Industry Press, 2003.