Abstract
Modular self-reconfigurable robots (SRRs) have redundant degrees of freedom and various configurations. There are two hard problems imposed by SRR features: locomotion planning and the discovery of multiple locomotion patterns. Most of the current research focuses on solving the first problem, using evolutionary algorithms based on the philosophy of searching-for-the-best. The main problem is that the search can fall into a local optimum in the case of a complex non-linear problem. Another drawback is that the searched result lacks diversity in the behaviour space, which is inappropriate in addressing the problem of discovering multiple locomotion patterns. In this paper, we present a new strategy that evolves an SRR's controller by searching for behavioural diversity. Instead of converging on a single optimal solution, this strategy discovers a vast variety of different ways to realize robot locomotion. Optimal motion is sparse in the behaviour space, and this method can find it as a by-product through a diversity-keeping mechanism. A revised particle swarm optimization (PSO) algorithm, driven by behaviour sparseness, is implemented to evolve locomotion for a variety of configurations whose efficiency and flexibility is validated. The results show that this method can not only obtain an optimized robot controller, but also find various locomotion patterns.
Keywords
1. Introduction
The major features of self-reconfigurable robots (SRRs) are adaptability and robustness, owing to their modularity [20]. Many kinds of SRR prototypes have been developed and investigated since the first reconfigurable robot, CEBOT, was proposed [4, 3]. This research field has made enormous progress in whole-body locomotion and reconfiguration planning. Whole-body locomotion planning is complicated considering the diverse configurations of SRRs and super-redundant degrees-of-freedom (DOF). Locomotion planning is a complex non-linear problem and there is no analytical solution. It is hard to resolve it using conventional methods. Bio-inspired algorithms, like genetic algorithms (GAs), are usually employed to generate robot locomotion automatically. In this way, quasi-optimal motion can be obtained in a short time.
Leading SRR researchers employ evolutionary algorithms to generate robot locomotion automatically. In these studies, the motion controller is specified as a vector of real-valued parameters. The M-TRAN research group used genetic algorithms to evolve a locomotive gait for various configurations and implemented a quasi-optimal gait for four- and six-legged robots, etc., [14, 7]. The Roombots group applied particle swarm optimization to optimize the central pattern generator (CPG) parameters of a robot controller [18, 16]. To eliminate the reality gap, some scholars have used on-line optimization to generate robot locomotion, such as the Yamor group [17] and the ATRON group [1].
The above-mentioned algorithms use a heuristic strategy with an objective function, also known as objective-based (OB) search algorithms. Here, the parameters to be optimized are encoded into a ‘gene’ and the parameter variation space is called the ‘gene space’. If a vector representing behaviour characterization is defined, its variation space is called the ‘behaviour space’. The mapping between the gene space and the behaviour space is non-linear, complex and difficult to find the law for the locomotion planning problem [15]. These algorithms search for the best solutions directly and are likely to fall into a local optimum when dealing with complicated problems. Moreover, they always acquire just one locomotion pattern in one evolutionary run. To resolve the local optimum problem, some mechanisms, like fitness value-sharing, are employed to maintain the diversity of the gene population [6, 11, 12]. Some investigators prevent the premature convergence of the algorithm by maintaining behavioural diversity [13]. It is intuitive and meaningful to maintain the diversity of the behaviour space. If the searched diverse behaviours can be used appropriately, many significant solutions can be found in just one evolutionary run.
This paper presents a new search strategy for addressing UBot SRR [19] locomotion problems: namely, motion planning and multiple patterns. This strategy discards the philosophy of searching-for-the-best, i.e., taking objective-function value as the fitness value. However, it employs behaviour sparseness for fitness and searches for behavioural diversity. The performance of the algorithm is evaluated and compared with an objective-based algorithm in a newly developed UBotSim simulator. The results show that the new algorithm is better than the objective-based algorithm and that it has the advantage of finding many meaningful solutions in one evolutionary process.
The present paper is structured as follows: Section 2 introduces the search strategy based on behavioural sparseness and the evolutionary locomotion framework that is used; Section 3 describes the UBot SRR module and the 3D dynamics simulator, UBotSim; Section 4 evolves the locomotion for a worm-like configuration and some other typical configurations; Section 5 describes the algorithm's suitability to discover multiple locomotion patterns for SRRs; Section 6 discusses the differences between two algorithms and the future work to be investigated. Finally, a conclusion is given.
2. The evolutionary computation framework
There are many bio-inspired evolutionary algorithms that can generate optimal or else quasi-optimal motion automatically. Generally, in these algorithms an objective is defined in terms of ‘fitness' and the search procedure will ensure that the fitness value is as big as possible until the algorithm converges or else maximum generation is reached. Taking the objective as fitness directly would make the search fall into a local optimum in non-linear problems. Recent studies have shown that the search will be effective if it discards an objective-based mechanism and maintains the diversity of the behaviour space [9].
2.1 A search strategy based on behaviour sparseness
Behaviour sparseness indicates how different a behaviour is within a population. A behaviour sparseness-based (BSB) search explores the behaviour space and maintains behavioural diversity by searching for new behaviours but not better behaviours. The emergent new behaviours will be stored as the searched solutions, i.e., the solution is a by-product of the search. This methodology has been applied to a maze robot and a simple humanoid planning simulation and it exhibited better performance compared with objective-based algorithms in trap problems and complex non-linear problems [9].
SRRs have variable configurations and super-redundant DOF, such that the gene space is changeable and vast for locomotion evolution. Using BSB search, the behaviour space will be fixed and the diversity-maintenance mechanism can improve the algorithm's performance by avoiding premature convergence. On the other hand, by-product solutions might also include multiple patterns of locomotion, since they are different behaviours in the behaviour space.
This algorithm uses behaviour sparseness as fitness but not as the objective. A robot behavioural characterization and behaviour sparseness computation strategy are defined in order to use this new algorithm.
2.1.1 Behaviour characterization
Usually, the behaviour characterization is represented as a real-number vector demonstrating the locomotive features of robot behaviour. As given in expression (1), the element ei is a feature of robot behaviour:
where the behaviour characterization is directly related to specific mission requirements, the behavioural vector directly determines the behaviour space, and the behaviour space has a crucial impact on the search performance. There is no universal definition to determine a robot's behavioural characterization.
2.1.2 Behavioural sparseness
To drive the searching process, it is necessary to calculate the sparseness of an individual behaviour in the population, i.e., ascertain how different the behaviour is. Next, according to the sparseness value, the search can explore the sparse region of the behaviour space and find new behaviours.
The method for calculating behavioural distance will impact the exploration of the behaviour space [5]. The Euclidean distance between the behavioural vectors is defined as the behavioural distance. The average distance to the k-nearest neighbours of this behaviour can be used to indicate its sparseness in the population in a simple manner. Equation (2) shows the calculation strategy:
where dij is the distance between behaviour bi and behaviour
2.2 A procedure of locomotion evolution
The BSB search algorithm is realized for other basic searching methods. PSO was used as the foundation in this paper, and we call it BSB-PSO. A behaviour sparseness metric is employed as the fitness function. Solutions with high sparseness values will be stored in an archive as the searched result.
2.2.1 Objective-based PSO
In 1995, Kennedy and Eberhart proposed PSO [2, 8]. It begins from a population of random solutions. Similar to genetic algorithms, it finds the optimal solution by iteration based on fitness values. For objective-based PSO (OB-PSO), the fitness value is the indicator of the objective. In this research, the speed of the robot is taken as the objective. Its evolution process is illustrated in Figure 1.

Locomotion evolution using objective-based PSO
In PSO, the controller parameters are encoded into a particle. After the fitness values are obtained from the UBotSim simulator, the population of candidate solutions (particles) is refreshed according to Equation (3):
Here, vi is the particle's velocity vector, w is the weight, xi(t) is the current position of the particle,
2.2.2 Behavioural sparseness-based PSO
The flow chart for BSB-PSO is shown in Figure 2. The main differences compared to OB-PSO are the calculation strategy for the fitness value and the introduction of a solutions archive. The comparison of the two algorithms is illustrated in Table 1.
Comparison of the two search algorithms

Locomotion evolution using BSB-PSO
Firstly, we load the robot with a certain configuration into the UBotSim simulator and set the controller parameters as well as the variation scope. Next, we initialize a population of particles randomly. Each particle corresponds to a virtual robot in the simulator. The virtual robot moves dynamically and its position-information can be obtained. Using this information, the robot's behaviour characterization-vector can be established. Afterwards, each robot's behavioural sparseness is computed using Equation (2). If the sparseness value is higher than the established threshold value, the particle parameter is added to the archive as a searched solution. The population is refreshed according to Equation (3).
We iterate the calculation procedure until the maximum evolution generation is reached or else the convergence criterion is satisfied. Finally, a group of individuals is acquired in the archive as the searching result. Generally, the searched solutions are listed in order, according to objective values such as the locomotion speed.
3. The UBot system
3.1 The UBot SSR module
UBot is a hybrid modular SRR that can adopt various configurations by connecting and disconnecting basic modules [21]. Figure 3 shows the module structure and the DOF-configuration. There are two parts (P00 and P10) in a given module connected by an L-shaped axis. Two rotary DOF (J00 and J10) are mounted on the axis. Each locomotive joint can independently rotate within the range −90° to 90°. There are four connecting faces on the module. Each face of an active module has a hooking mechanism for docking and un-docking.

Structural characteristics of UBot module
3.2 The UBotSim dynamics simulator.
Locomotion evolution in this article is executed using off-line simulation. Robot locomotion information is obtained from the dynamics simulator. Virtual 3D dynamics simulation software for the UBot SRR was developed, as illustrated in Figure 4. OGRE is used as a graphics engine and PhysX as a physics engine.

The virtual dynamics simulator-UBotSim
A variety of configurations can be constructed by connecting basic UBot modules. Some typical configurations are shown in Figure 5, such as four-legged, short-cross, long-cross and snake-like configurations. As each module has two rotary DOF, the configuration composed of many modules has super-redundancy features and the gene space formed by the joint controller parameters is huge. Changing configurations and super-redundancy features make locomotion generation difficult to implement for SRRs.

Typical configurations of UBot
4. Evolving locomotion for UBot SRRs
The locomotion evolution framework for the UBot SRR is established as above. In this section, the factors affecting the algorithm's performance will be investigated and a comparison with an objective-based algorithm will be provided.
For correlation between controllers, various definitions of behaviour characterization will be considered for the presented algorithm. All these factors impact upon the gene space or the behaviour space. Worm-like configurations with different numbers of modules (as shown in Figure 6) are studied in detail. The locomotion evolution of other typical configurations is also investigated (as shown in Figure 5).

Worm-like configurations with different numbers of modules
4.1 The controller type
4.1.1 The independent module controller
Rhythmic movement is realized by means of the coordinated movement of each locomotive joint. Generally, the joint drive-function is a wave signal. Here, the output of sinusoidal oscillators is employed to control the module joint. The controller is given as in (4):
where i is the module ID, and p1
i
, p2i and p3
i
are evolutionary parameters to be optimized. The parameter-variation range is set as
In this case, there is no correlation between each module controller. The Offset, Amplitude and Phase parameters of each module are independent. As such, it covers all possible movement patterns. However, the gene space will be expanded with any increase in module numbers.
4.1.2 The wave pattern controller
By introducing the correlation between module controllers, certain specific motions can be performed and the gene space can be significantly decreased. Learning from the locomotion gait of earthworms and caterpillars, a wave pattern controller is established. The Offset and Amplitude of each joint and the Phase Difference between adjacent module are the same. This type of controller is given as Equation (5). The gene space will not change as the number of modules increases:
Here, only three parameters need to be optimized: p1, p2 and p3, i.e., all p1 i = p1, all p2 i = p2 and p3 i =i * p3. The parameters' variation range is as above.
4.2 Definitions of behavioural characterization
The controller-type determines the features of the gene space. In addition, the behavioural characterization vector determines the features of the behaviour space. The definition of behaviour characterization will impact upon the search results directly for the BSB search algorithm. This algorithm has an intuitive exploration and exploitation mechanism in the behaviour space. The definition must have relations with the final searching objective in order to ensure that the desired behaviours can be included in the solutions archive. Since this search algorithm is not objective-based, the definition of behaviour characterization tends to be flexible.
The goal of locomotion-evolution in this paper is to find a longer moving-distance simulated in the UBotSim simulator over 10 seconds. Various definitions of behaviour characterization are listed as follows:
where xk and yk are plane coordinates of the robot's centre of gravity relative to the robot's initial centre of gravity at the k-th sample second. The total sample time is n=10s.
where xk and yk are explained as above. This definition constrains the behaviour vector element to the same order of magnitude (it might also be known as the projection of average speed for each second).
This behaviour vector is especially intuitive, i.e., the controller which makes the robot end-position as sparse as possible in a plane space will be preferred.
In this way, the controller that results in a different distance will be preferred but not the farthest. However, objective-based algorithms search for the farthest distance.
4.3 Evolutionary results and analysis
4.3.1 Evolving locomotion for worm-like configurations
OB-PSO and BSB-PSO are executed in the UBotSim simulator. Three types of worm configurations were used to investigate the factors affecting algorithm performance, e.g., controller-type and behaviour characterization. Each evolutionary setup was run 30 times. The evolution results for the three worm configurations are shown in Figure 7. The red bar represents the results of OB-PSO.

Evolution results of the worm configuration
In the case of an independent controller type, it can be observed that BSB-PSO has equivalent searching performance to OB-PSO. This indicates that BSB-PSO is effective and practical in large search-spaces. It also demonstrates the flexibility of defining behavioural characterization in searching for high-speed controllers. However, the objective-based algorithm has only one objective definition -distance - over 10 s.
In the case of the wave pattern controller, it might be noticed that BSB-PSO is significantly better than OB-PSO given two definitions of behaviour characterization for the evolution of an eight-module worm configuration (student T-test, p<0.01). These are: centre trajectory behaviour (behaviour 1) and speed projection behaviour (behaviour 2). However, the other two behaviour characterization definitions do not show significantly better performance. This illustrates that, in some searching tasks, BSB-PSO is better than OB-PSO, and that the selection for behaviour characterization is significant. The other two evolutions for four- and 16-module worms do not show big differences for various behaviour definitions.
Figure 8 and Figure 9 display the evolution procedure curves and final results analysis of an eight-module worm configuration under a wave pattern controller setup. Taking motions moving further than 3 m as useful solutions, on average they can be found with (>500), (147,189,319,376) generations (each generation with 20 particles) for OB-PSO and BSB-PSO (behaviours 1-4) separately. It is evident from these two figures that behaviour 1 and behaviour 2 give better average results and that they exhibit significantly better results, making the movement distance greater than 3 m in the 30 run-times' evolution.

Evolution procedure of an eight-module worm configuration under a wave pattern controller setup

Results for an eight-module worm configuration under a wave pattern controller setup
It was discovered that solutions involving a long distance comprise dynamic rolling patterns by checking the simulation results of an eight-module worm configuration under a wave pattern controller. From the controller parameters of dynamic rolling locomotion, we found that one or two gene parameters were at the edge of variation pattern controller setup, the end position (behaviour 3) and distance (behaviour 4) do not exhibit a significant advantage over OB-PSO. This illustrates that these two behaviour characterizations are not adequate to find the parameters or milestone-parameters generating dynamic rolling locomotion.
The four-module worm does not have enough modules to shape the rolling motion. The 16-module worm has a potential rolling pattern, but it falls easily and is very hard to maintain dynamic locomotion. As such, their motion patterns comprise mostly worm-like motions. In addition, their mapping between the gene space and the behaviour space is not complicated with respect to the eight-module configuration. In these searching tasks, the two algorithms do not exhibit any significant difference.
4.3.2 Evolving locomotion for typical configurations
Depending upon the various behaviour definitions, locomotion evolution for several typical configurations is performed. Figure 5 shows snake-like, four-legged, long-cross and short-cross configurations.
The leg modules of four-legged and long-cross configurations are set as locomotive modules in the simulation. Each locomotive module uses one joint to oscillate. All the modules of the short-cross and snake-like configurations use two joints to oscillate. The rotational axes are illustrated in Figure 5 with coloured arrows.
Equation (4) is used as the controller of a four-legged configuration and a long-cross configuration. Equation (10) is employed as the controller of a short-cross and a snake-like configuration, introducing another module joint based on Equation (5):
Here, p4 is a coupling parameter. The variation range is set as

Evolution results of the investigated typical configurations
For the four-legged and long-cross configurations, the gene space is very large. BSB-PSO does not exhibit advantages in looking for high-speed solutions. BSB-PSO exhibits significantly better performance compared to OB-PSO for a snake-like configuration (behaviour 1 and 2, student test, p<0.01 and p<0.05). The results show that the BSB-PSO algorithm can be applied to evolve the locomotion for any configuration. At the same time, BSB-PSO has many by-products which might be suitable for solving problems involving finding multiple locomotion patterns for SRRs.
5. Automatic discovery of multiple locomotion patterns
Since a multi-module configuration for the UBot is super-redundant, it generally has multiple locomotion patterns. Here, an eight-module worm configuration is used to illustrate the effectiveness of the BSB algorithm in discovering multiple locomotion patterns. Expression (6) is employed to depict the behaviour-characterization and to differentiate behaviours. The idea is that the trajectory of two locomotion patterns should be different, hence, the Euclidean distance between them should be significant in the behaviour space. A behaviour with a large average distance relative to the others might represent a new locomotion pattern. Both of the joints of each module are locomotive and controlled by Equation (10). In all, 30-run evolutions are performed for PSO, SPSO (species-based PSO)[10] and BSB-PSO, separately, with each run involving 500 generations (iterations). Here, PSO and SPSO belong to the scope of OB-PSO, which takes its objective as the fitness.
Here, the particle (gene) is a 4D real number vector, [p1,p2,p3,p4], where
range. In our opinion, it is the searching mechanism that created the difference.
OB-PSO refreshes the particle parameters by following the best particles of a population and the best particle itself achieved so far, i.e., searching for the best so that it can easily fall into the local optimum for non-linear complex problems. However, BSB-PSO searches for different behaviours that have a high sparseness value in the behaviour space. Even the locomotion of falling down for a few seconds can be recognized as a different behaviour and stored. Such kinds of behaviours could be the milestones to achieving the optimal goal. Dynamic rolling locomotion is an example which is unstable and which can easily fall during the locomotion procedure. Since behaviours 1 and 2 have encoded history information, a gait that is good in the beginning but which then falls (perhaps the milestone to the optimal solutions) will show up as a decent gait as opposed to a terrible one. Meanwhile, behaviour 3 and 4 do not have sufficient historical information - they lack the motivation to find the milestone.
Behaviour characterization determines the size and features of the behaviour space. Not all behavioural definitions are suitable for a search task. In the evolution for an eight-module worm configuration under a wave pattern controller setup, the end position (behaviour 3) and distance (behaviour 4) do not exhibit a significant advantage over OB-PSO. This illustrates that these two behaviour characterizations are not adequate to find the parameters or milestone-parameters generating dynamic rolling locomotion.
Table 2 show the results. BSB-PSO searches as at a higher speed than OB-PSO. At the same time, the resultant diversity of BSB-PSO outperforms PSO and SPSO. The results of the objective-based algorithm (PSO and SPSO) support the final goal, including a longer distance travelled in 10 s. As such, its top 20 results stored in the solution list exhibit highly similar moving distances and locomotion patterns. However, the BSB algorithm is notable for its focused searching for different behaviours with high sparseness in the behaviour space. Hence, this algorithm can find many locomotion patterns, some of which may be totally new or else hard to imagine for a human.
Evolutionary results and diversity of the searched solutions
By checking the results, we count the number of locomotion patterns in each evolution. If the searched solutions do not include meaningful motion patterns in one evolution, we classify it as a ‘0 pattern’. As shown in Table 3, BSB-PSO finds 11 out of 30 runs that exhibit more than three motion patterns. Meanwhile, PSO and SPSO find 1 and 4 out of 30 runs separately. Thus, BSB-PSO is more powerful in searching diverse behaviours than both PSO and SPSO.
Statistics for 30-run evolutions for different algorithms
Multiple locomotion patterns can be found in one evolution run using BSB-PSO. In fact, some results contain all the locomotion patterns, as illustrated in Figure 11: motion like a caterpillar, roll dynamically or statically, moving modules from backwards to forwards, twist-scrolling, sidewinding, etc. Moreover, most solutions from OB-PSO (PSO and SPSO) converged on a few patterns, such as rolling and udulatory motion. Gait snapshots of twist-scrolling are shown in Figure 12(a). This pattern is mostly found in BSB-PSO, while SPSO only converges on this pattern four runs (out of 30 runs). Figure 12(b) shows circular movement, which is only found in BSB-PSO. This is because the speed of this pattern is low and OB-PSO does not reward it. Furthermore, BSB-PSO rewards different gaits, and circular movement was taken as a different pattern and stored in the results. Many simulation videos can be seen on the website: http://i.youku.com/hitrobot.

Multiple locomotion patterns of a worm configuration

Snapshots of the selected motion patterns
6. Discussion and future work
The BSB search takes behaviour sparseness as a fitness value. It discards the philosophy of searching-for-the-best. Instead, it searches for behavioural diversity. In the gene space, the solutions that satisfy certain objectives are rare, and their corresponding behaviours are also sparse. Based on this mechanism, the solution fulfilling the final objective can be included in the archive. For this paper - and only if searching for high speed - performing some iterations of OB-PSO after BSB-PSO may yield solutions with higher speed. This will constitute our future work, i.e., combining BSB-PSO with OB-PSO more efficiently.
The behaviour definition and behavioural distance metric also have a great impact on the BSB search algorithm. In future, we will study the rules for selecting behaviour definitions according to evolutionary tasks. On the other hand, the BSB search found a wide range of new behaviours; this is very useful for addressing multiple locomotion patterns for SRRs. In Section 5, the results demonstrate that multiple locomotion patterns can be discovered for eight-module worm configurations. Such diversity of behaviours is well-maintained based on the definition of behaviour; however, locomotion patterns are hard to separate. The next step is to automatically identify and classify multiple locomotion patterns for UBot SRRs.
SRRs have variable configurations. Considering configuration as part of behaviour characterization and co-evolution could be executed in the future. This may allow for the discovery of more and different configurations with vivid motion patterns beyond our imagination.
7. Conclusion
This paper analysed the characterizations of locomotion evolution for a UBot SRR. A PSO algorithm is revised based on behaviour sparseness. In our developed dynamic simulator, the feasibility of locomotion evolution is proven for a SRR using a modified algorithm. The comparative results showed that, for some searching tasks, the algorithm is more robust in avoiding a local optimum. The flexibility of the behaviour-definition for the algorithm is also verified. The results show that searching for high-speed motion and discovering multiple locomotion patterns at the same time can be addressed well by this algorithm.
Footnotes
8. Acknowledgements
This research is supported by the National Natural Science Foundation of China (Grant No. 60273316).
