Abstract
Wireless ranging measurements have been proposed for enabling multiple Micro Air Vehicles (MAVs) to localize with respect to each other. However, the high-dimensional relative states are weakly observable due to the scalar distance measurement. Hence, the MAVs have degraded relative localization and control performance under unobservable conditions as can be deduced by the Lie derivatives. This paper presents a nonlinear model predictive control (NMPC) by maximizing the determinant of the observability matrix to generate optimal control inputs, which also satisfy constraints including multi-robot tasks, input limitation, and state bounds. Simulation results validate the localization and control efficacy of the proposed MPC method for range-based multi-MAV systems with weak observability, which has faster convergence time and more accurate localization compared to previously proposed random motions. A real-world experiment on two Crazyflies indicates the optimal states and control behaviours generated by the proposed NMPC.
Introduction
The use of multiple aerial robots has been studied deeply in recent years for more complicated tasks and challenging environments. 1 For example, predictive control is proposed for flights of a swarm of five quadrotors despite cluttered obstacles. 2 In outdoor confined spaces, multiple drones are controlled with the evolutionary optimization method for flocking flights. 3 Multiple flying robots coordinate with simultaneous localization based on ranging measurements with beacons. 4 These recent studies show the state-of-art aerial swarm methods. However, most of them rely on extra positioning systems such as indoor OptiTrack, 2 outdoor GPS 3 or beacons. 4 To remove the dependence of the external infrastructure such as positioning systems, onboard sensors are deployed for developing an autonomous swarm of drones. For example, 3D relative direction can be estimated by sound-based microphone arrays and allows for leader-follower flights of micro aerial vehicles. 5 An array of infrared sensors can also enable relative positioning and inter-robot spatial coordination. 6 However, these sensor arrays are too heavy and power-consuming for tiny flying robots. In, 7 fully distributed and autonomous multiple tiny flying robots explore unknown environments with a finite state machine. However, the relative localization is not very accurate due to the direct usage of signal strength, which may not fulfil the precise cooperative tasks.
Vision is the most widely used solution for multi-robot relative localization. In outdoor flocking, multiple drones localize each other with deep neural networks and cameras for safe navigation in. 8 This requires heavy AI hardware to run the deep network and is also the case for.10,9 Marker-based localization requires simple computation such as recognizing black circles 11 or April tags. 12 But these visual methods are easily influenced by the field of view or lighting conditions that lead to detection failure and localization disaster.
Wireless ranging sensors provide omni-directional and low-cost ranging measurements, and recently have been used frequently for relative localization. It was initially proposed in, 13 where use was still made of Bluetooth to fit on tiny MAVs. In 14 , an ultra-wideband (UWB) based cooperative relative localization was proposed to estimate the neighbour drones’ position based on the distance and self-displacement measurements under common orientation. Furthermore, 15 removes the orientation assumption and achieves the relative localization purely using the distance measurement and acceleration model. However, these experiments assumed a high-order dynamic model and have low ranging frequency, which is not efficient for a large number of tiny robots.
In 16 , a simplified velocity model and robust ranging protocol are designed for multiple tiny flying robots with self-regulated localization convergence. However, the initialization procedure with random velocity inputs is not efficient. Thus, this paper considers using nonlinear MPC to design the multi-robot controller by maximizing the task performance and degree of observability, while satisfying the constraints such as input velocity bounds and state bounds.
Some related papers are discussing the control of bearing-based or rang-based multi-robot systems. 17 Most papers use persistent excitation methods by setting specific active control patterns to maintain observability, which is not flexible nor optimal for other tasks or constraints.
The main contribution of this paper is leveraging weak observability theory to optimize the multi-robot control inputs, which has not yet been presented, to the best knowledge of authors. Specifically, the proposed NMPC framework maximizes the nonlinear observability condition derived by Lie derivatives, which is coupled with the velocity inputs and relative states. This leads to faster localization convergence and higher estimation accuracy even after convergence, compared to the random control inputs.
The rest of the paper is organized as follows. Section 2 states the problem including the range-based multi-MAV model, weak observability condition, and the problem definition. Section 3 proposes the nonlinear MPC method with the cost function and corresponding constraints. Section 4 gives the testing results of the proposed control with Acados, an integrated nonlinear MPC tool. The conclusion is discussed in Section 5.
Preliminaries
This section briefly introduces the multi-MAV kinematic model and relative Kalman filter. Based on the relative model and distance observations, the observability matrix is determined with Lie derivatives. Finally, the control problem is defined by considering both the model and observability.
Relative multi-MAV model
The model of twin MAVs is described in this subsection, as the relative localization is distributed and triggered by the ranging event among arbitrary two MAVs. The simulated relative model has been tested in real experiments in our previous work, thus it has a small gap compared to the real-world multi-robot system. For details, consult in.
16
For simplicity, we assume the yaw rate of both robots to be zero. This assumption does not influence the 3D movements of each robot. The control input vector

The diagram of a twin-MAV kinematic model, and two coordinated frames. Body frames and horizontal frames are shown with blue axes and red axes, respectively. Both frames are fixed to the robot, while the horizontal frames always have a vertical Z-axis. The background image shows previous experiments of multi-MAV relative localization but without optimal control.
The nonlinear relative kinematic model can be derived from Newton formulas by considering the states
Relative estimation
This subsection briefly reviews the Extended Kalman filter (EKF) for the relative localization. The discrete prediction is formulated as:
The final state estimation is estimated by using the distance measurement as shown below:
Remark 1
The kinematic model and EKF-based relative localization have been validated in real-world experiments. 16
Observability constraint
The observability of nonlinear systems can be analyzed by Lie derivatives.
18
The corresponding observability matrix is defined as
Therefore, the relative states are observable only when the observability matrix
Problem statement
The optimal control problem
Methodology
This section proposes a nonlinear model predictive control for solving the optimization problem as described in (7). Then the cost function is further extended for multi-robot tasks such as formation control and motion tracking. In the end, the solver settings for the nonlinear problem (NLP) are presented.
Nonlinear MPC
The intuitive solution for NLP is MPC, which can achieve the target by minimizing the cost function. Hence, the nonlinear MPC for the proposed problem is designed as follows
The overall objective function
The constraint of the initial state is related to the estimated state which is not correct before localization convergence. Hence, the limitation of the initial value is necessary to avoid singularity when solving the NLP. A saturation function is employed to limit the initial value as shown in (8c). This is reasonable as many nonlinear robust MPC methods for systems with uncertain states have their stability proof by assuming bounds on the uncertain state.
Cost functions
A nonlinear least square (NLS) method is deployed for minimizing the objective function of (8), which is written as:
To maximize the observability with the NLS method, the observability objective (7) is reformulated as the following cost function.
The coordination cost of
Since the coordination task is inaccurate before the localization convergence, the weight
Sometimes, a penalty cost can be introduced to smooth the control inputs as follows:
Acados solver
The nonlinear MPC solver we use in this paper is Acados, which is an open-source and high-performance library for fast optimal control. 19 This software supports Python and is finely tuned for multiple CPUs. As for the model definition and differentiation, CasADi is employed to deal with the constraints and model calculations. 20 The brief process of the solver setting is summarized below. First, the continuous optimization problem is discretized by the multiple shooting method. Furthermore, real-time iteration (RTI) is selected to solve the sequential quadratic programming (SQP). The corresponding Hessian approximation is based on Gauss-Newton. The quadratic problems (QP) in SQP are solved with the partial condensing HPIPM (a high-performance quadratic programming framework), which is based on the linear algebra library BLASFEO. Overall, this solver has a competitive computation speed compared to other state-of-the-art NMPC solvers.
Testing Results
This section shows the improvement of the proposed nonlinear MPC on the relative localization performance compared to the stochastic initialization procedure studied in. 16 The statistics of the localization errors and convergence speed are analyzed to validate the efficiency of the proposed controller. In addition, the adaptive formation flight of multiple MAVs is studied as an example application.
Simulation set-up
The following simulation experiments are conducted on a Dell Latitude 7480 laptop with a i7-6600U CPU with 4 cores at 2.60GHz and 8GB of RAM. For the simulation experiments, the corresponding EKF parameters are chosen as
As for the parameters of the proposed nonlinear MPC, the horizon is set to
For the constraint settings, the saturation parameters for the initial state vector are chosen as
Improvement on relative localization
This subsection compares stochastic initialization with nonlinear MPC, to verify that the proposed controller with consideration of pure observability cost has better localization performance than the former one. In this subsection, the multi-robot task cost
Figure 2 shows the relative localization performance with the same initial relative states and same parameters for the EKF. Be notified that the three initial states are completely unknown for both the EKF and the controllers. Additionally, the maximum velocities for both controllers are set to be 2 m/s. From these two figures, we can see that the relative positioning with the optimal controller has a faster convergence time (about 5s) compared to that of the random controller (about 9s). Especially, observability optimized NMPC has finite-time convergence in the axis of relative yaw, while the random control leads to overshooting as shown in the third subplot of Figure 2. Therefore, the proposed controller with observability consideration excites all relative states which become more observable even with the unknown initial state errors.

Relative state from EKF estimation and ground-truth between two MAVs under the random velocity inputs. The data consists of 2-axis relative positions and 1-axis relative orientation.
In addition, after the localization is converged in Figure 3, the optimal controller automatically generates a periodic motion pattern which is similar to the manual-designed persistent excitation motions. In addition, even with incorrect relative states, they still can avoid each other as shown in Figure 3, because the observability cost penalizes the collision situation during which the observability determinant approximates zero.

Relative state from EKF estimation and ground-truth between two MAVs under the nonlinear MPC controller. The data consists of 2-axis relative positions and 1-axis relative orientation.
To validate the general efficacy of the proposed NMPC, we gather more statistics on the performance. As shown in Figure 4 and Figure 5, 30 random simulation experiments are conducted for each controller. During each simulation epoch, the initial positions for both robots are generated randomly. Moreover, the velocity and distance measurement noise are also created randomly. Both figures imply that the proposed NMPC controller has in general a faster localization convergence speed.

30 simulation experiments of the stochastic controller from
16
with random initial MAV positions. This figure shows the estimation errors of 2-axis relative positions and 1-axis relative orientation. Note that yaw error with -2

30 simulation experiments of the optimal controller with random initial MAV positions. This figure shows the estimation errors of 2-axis relative positions and 1-axis relative orientation.
Figure 6 shows the comparison of the detailed convergence time of two controllers with 30 random tests. From it, we can see that the average convergence time of the NMPC on all axes is smaller than that of the random controller. Besides, NMPC with observability constraints has a lower maximum convergence time compared to random control inputs.

The statistics of convergence time of three-dimensional relative localization under 30 random tests. Blue: the proposed nonlinear MPC; Green: the stochastic control.
Another interesting result is the localization accuracy after estimation convergence. Figure 7 shows the distributions of position estimation errors in the last 5 seconds of two controllers in the 30 random tests. The proposed NMPC has lower averaged position estimation errors compared to the stochastic controller. Therefore, the behaviours after convergence are still meaningful to the localization performance. To study it, the control input

Localization error of two controllers after estimation convergence. Each distribution has a total of 15000 data points on these 30 random tests, which is taken from the last 5 seconds when all estimators have converged.

The control inputs are generated by the proposed NMPC with observability optimization. These sequences show the velocity input values corresponding to the simulation in Figure 3.
Formation control with NMPC
This subsection uses the NMPC controller for multi-robot tasks. Examples of formation flight and dynamic motion tracking are given below. At the beginning, a constant relative position is set in the task cost
The corresponding control results are shown in the following figures. In Figure 9 we can see that the proposed NMPC has fast and stable tracking performance given the formation and dynamic tracking tasks at

Relative localization and ground-truth between two MAVs under the proposed optimal control method and formation tracking multi-robot tasks. The target relative position is constant before
To view the motion of each MAV in the world-frame, the trajectories of both MAVs are plotted in Figure 10. For the formation flight during 10–15s, both MAVs move slowly with constant relative positions. During the interval from 15–20s, both MAVs move to achieve circle tracking and keep optimizing the observability according to the asynchronous behaviours. In addition, from trajectories after 15s in Figure 9 we can see that introducing the multi-robot task cost eliminates the transients as shown in Figure 8.

The world-frame trajectories of both MAVs under the proposed optimal control method and formation tracking multi-robot tasks. The time range of the data is between 10s and 20s.
Real-world flight experiment
This subsection implements the proposed controller on two commercial quadrotors - Crazyflies2, which are shown in Figure 1. The experiment has the following setup: 1) Two drones perform onboard relative localization; 2) The ground laptop receives the estimated relative positions and yaw from both drones via two Crazyradio dongles; and 3) The laptop runs the proposed NMPC and sends the calculated control inputs of velocities to both drones via Crazyradio in real-time. All parameters of the experimental NMPC remain the same as those in the simulation.
From Figure 11, we can see that after the take-off at 0s, both drones move in chaos because the NMPC control is based on incorrect relative positions. From 0s to 3s, the left drone has a significant motion deviation caused by the convergence of the onboard relative localization. From 3s on, both drones move synchronously on a circle trajectory, respectively. Without a converged relative position, the calculated velocity leads to messy flight or even crash of both drones.

The optimal motion trajectories of two Crazyflies under the proposed NMPC controller. On each trajectory, ten stars are representing the drone position with respect to time after take-off at 0s.
As can be seen from the flight after 3s, the velocity commands calculated by the NMPC for each drone keep as large as possible, i.e., 2m/s. In addition, both drones have orthogonal motion direction by comparing the paired points with the same timestamps. These two behaviours excite the motion in all dimensions such that all-dimensional relative estimation can converge quickly. These two behaviours also increase the time-varying deviation of the ranging measurements, which also contributes to the system observability.
Conclusions
This paper proposes a novel nonlinear MPC controller with an observability cost to improve range-based multi-MAV relative localization. Simulation results demonstrate its faster localization convergence and lower estimation errors with respect to previously studied stochastic motion. Experimental flight tests validate the optimal trajectory of orthogonal motions expected by the simulation analysis. Future work involves the implementation of this controller on a larger number of drones with fully onboard computation.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
