Abstract
This paper proposes an approach method that enables an autonomous underwater vehicle (AUV) to accurately approach an unmanned aerial vehicle (UAV)-deployed station in the final stage of a mission. AUVs are increasingly used as fully autonomous underwater survey platforms, yet their operation is still constrained by the need for human or vessel support during deployment and recovery. To remove this constraint, the UAV deploys an underwater station that provides a positioning reference and acts as a recovery system. Using onboard acoustic positioning and communication devices, the AUV and the station mutually transmit signals, allowing the AUV to estimate its state relative to the station and guide its approach. Sea experiments evaluated the resulting approach accuracy relative to the station, showing that the AUV can converge to the station vicinity with the precision required for subsequent recovery. This study validates only the pre-recovery approach phase to the recovery station. Full docking and recovery are beyond the scope of this article. The proposed method provides fundamental technology for a reliable close-range approach, which is a key prerequisite for fully autonomous recovery without human or vessel assistance.
Keywords
Introduction
As interest in sustainable ocean environmental management grows, unmanned aerial vehicles (UAVs) and autonomous underwater vehicles (AUVs) are increasingly utilized as key technological foundations in the fields of ocean research and environmental monitoring.
Several recent observation technology studies have pointed out that UAVs can play an important role in unmanned observation of marine regions.1,2 Multicopter UAVs, in particular, are frequently used for environmental monitoring in coastal areas and coastal waters due to their ability to quickly capture images over a wide area, and are particularly effective in detecting floating debris and analyzing its spatiotemporal distribution. 3 Furthermore, review studies on risk assessment of contact between marine organisms and floating debris, as well as monitoring methods for coastal areas using UAVs, are progressing, demonstrating the expanding potential applications of these technologies.4–7
On the other hand, AUVs are suitable for detailed data acquisition and long-term observation in underwater environments, and their use is advancing in scenarios where human operations are challenging, such as surveys in deep sea or shallow water, where the support vessel is difficult to access. One study has introduced filtering methods that combine inertial and acoustic data to achieve more reliable localization in large and deep-sea environments. 8 Another study has developed vision-based AUV systems that use computer vision and machine learning for coral reef inspection and geotagging, enabling efficient autonomous monitoring. 9
Traditionally, these unmanned vehicles were typically operated individually, but in recent years, coordinated control technology using multiple AUVs has gained attention, and research utilizing swarm robotics and autonomous navigation algorithms has become active. For example, proposals include relative navigation using acoustic communication between multiple AUVs, 10 obstacle avoidance control using reinforcement learning, 11 control algorithms for stable formation navigation in dynamic environments, 12 and area exploration based on swarm optimization. 13
This trend toward multi-vehicle coordination has further developed, and research is now expanding into cross-domain collaboration between different platforms (air, water, and underwater). By linking UAVs and AUVs, it is possible to divide roles, with UAVs performing wide-area exploration and communication relay, and AUVs performing detailed underwater observation.14,15 Additionally, by adding unmanned surface vehicles (USVs) as relay nodes, multi-layer collaboration between UAVs, USVs, and AUVs has made it possible to autonomously execute complex missions.16,17 There are also review papers on communication technologies using UAVs 18 and autonomous operation technologies for UAVs and AUVs. 19 The deployment of the AUV by the UAV has also been proposed. 20
Furthermore, research is also progressing on aerial AUV deployment from UAVs as a physical deployment and recovery method, as well as on the evaluation of the impact of such deployment, 21 and a framework for aerial deployment of miniature AUVs has also been proposed. 22 Furthermore, hybrid unmanned aerial underwater vehicles capable of continuously moving between air and water (hybrid UAV/AUV) are also gaining attention, with research being conducted on their structural design, control methods, and medium transition technologies.23–26
As a supporting technology for these collaborations, the construction of a highly reliable and efficient communication network is also important. Research is being conducted on hybrid communication configurations combining RF communication and optical wireless communication,27,28 the design of an integrated communication protocol (IMC), 29 communication framework design for UAV–USV formations, 30 a review paper on communication technologies for underwater sensor networks, 31 and reviews UAV networks from a cyber-physical systems perspective, laying the groundwork for UAV network infrastructures. 32
Additionally, as applications of unmanned vehicle groups to environmental tasks, methods for UAVs and USVs to collaborate in manipulating objects on the water surface 33 and proposals for autonomous systems where a UAV and an AUV coordinate to assist in marine debris collection operations 34 have been presented. Diverse application developments are progressing, including evaluations of USV development history and marine environmental monitoring by robotic swarms.35,36
This study proposes an approach method of the AUV for cooperative survey with the UAV. The AUV is an underwater observation platform with a built-in energy source and a computer. This research project is developing a fully automated method to realize ocean surveys without human or ship support, in which a UAV transports and deploys the AUV to the target area and performs the entire process from survey and recovery to return to port (Figure 1). The elemental methods are (1) transport and deployment of the AUV to the target area by the UAV (Figure 1(A) and (B)), (2) ocean survey of the AUV based on the UAV (Figure 1(C)), and (3) recovery and return to the port (Figure 1(D) and (E)). Since the UAV can measure its position using the signals from the Global Navigation Satellite System (GNSS), the use of the UAV as a survey station makes it possible for the AUV to conduct surveys while also knowing its position using GNSS coordinates.

The overall concept of the cooperative survey by a UAV and an AUV.
In particular, this study develops an approach method with accurate state estimation based on a UAV to recover an AUV (shown in Figure 1 with a red frame). This research establishes a fundamental technology for automating the recovery of the AUV, which has until now been carried out with the assistance of humans and vessels. Finally, the collaboration between the UAV and the AUV will make it possible to survey environments that are inaccessible to vessels.
The elemental methods of this research are (1) state estimation of an AUV based on a UAV and (2) an approach to a recovery station (Figure 2). The AUV performs relative positioning based on the recovery station deployed underwater from the UAV using the acoustic devices mounted on both the AUV and the recovery station. The AUV approaches the recovery station based on its relative positioning results. The recovery station is deployed by suspending it from the UAV using a single tether. Because wind, waves, and currents may cause a small horizontal offset from the UAV's vertical line, the proposed operation keeps the tether length as short as necessary to reduce the effective suspension length and minimize horizontal offset. This article assumes that the recovery station operates in the vicinity of the UAV's vertical line. The scope of this study is limited to the pre-recovery approach phase to the recovery station. Full docking and recovery are outside the scope of this article.

The elements of the proposed method.
The contributions of this study are summarized as follows:
This article proposes an AUV state-estimation method based on mutual UAV–AUV relative positioning, aiming to achieve the accuracy required for the pre-docking phase. This article proposes an approach method that guides the AUV toward the recovery station based on the relative positioning. Real sea experiments using a UAV and an AUV are conducted to validate the approach performance and to evaluate the state estimation accuracy against the accuracy requirement for the pre-docking phase toward automatic recovery.
Proposed method
State estimation based on mutual positioning
This study proposes an approach method for an AUV based on mutual positioning with a reference and recovery station deployed from a UAV (UAV-ST). In the previous study, only positioning information from an AUV was used, 37 but by performing mutual positioning between an AUV and a UAV and aggregating the positioning information, accurate state estimation can be performed in real time. At the same time, an AUV approaches the UAV-ST in real time based on its own positioning results.
Figure 3 illustrates a concept of state estimation, assuming the AUV is equipped with minimal navigational sensors such as an acceleration sensor, an attitude sensor, a magnetometer, and a pressure sensor. Depth z can be directly measured by the pressure sensor, and roll angle

State estimation based on mutual positioning.
The AUV corrects the state distribution based on mutual acoustic positioning by the UAV-ST deployed from the UAV. In this article, mutual positioning is defined as a bidirectional transmission and information-sharing scheme in which the AUV and the UAV-ST alternate acoustic transmissions. The UAV-ST obtains the relative positioning measurement and sends it to the AUV, and the AUV updates its state using both its own measurement and the measurement communicated from the UAV-ST, distinguishing the method from conventional transponder positioning.
Time interval of acoustic positioning is
The AUV's velocity at time
Since the measured position is subject to fluctuations caused by measurement noise, the following smoothing process is applied. The index set of samples within the most recent T seconds is defined as follows:
The smoothed position is then computed as the simple average over this time window.
Finally, the velocity is estimated from the smoothed positions as follows:
The velocity estimated here can be affected when the positioning results include large noise, for example, when outliers are present, which may cause deviations in the estimated value. Outliers are defined as abnormal acoustic positioning results that deviate markedly from the expected measurement behavior. They are assumed to originate from multipath due to reflections from the seafloor and the sea surface, as well as from failures in acoustic signal processing. To mitigate this effect, a velocity-based gating step is introduced. If the velocity is greater than or equal to
The AUV modifies the state distribution based on the above distance, direction, and velocity information. Derive the likelihood
The state distribution is also corrected based on the likelihood
State initialization and correction
The UAV deploys the UAV-ST underwater, but the position of the AUV relative to it is unknown. Based on the relative distance
There are n particles, and the estimated position of the particles are replaced using the following equation. The ratio of replaced particles to be set among the n particles is
This method can be used not only for initializing the position but also for correcting it when the position is lost. The particles are added constantly, so that the discrepancy between the estimation and the positioning result does not increase.
Approach method
The AUV obtains the distance
The AUV controls the yaw direction so that the direction
The AUV does not control itself using the estimated state. Instead, it approaches the UAV-ST by controlling surge force and yaw based on the distance and azimuth obtained from acoustic relative positioning. The acoustic positionings from the AUV are obtained every 2
Based on the above, the AUV approaches the target recovery device, the UAV-ST.
Implementation
The proposed method is implemented on a Blue AUV (Figure 4) and a UAV of PRODRONE Co., Ltd 41 (PRODRONE, Figure 5). The Blue AUV is a Blue ROV 2 sold by Blue Robotics 42 with an autonomous navigation program to enable autonomous operation. Blue AUV is equipped with an acceleration sensor, an attitude sensor, a magnetometer, a depth sensor, a camera, and an acoustic positioning and communication device. Tables 1 and 2 show the specifications of the Blue AUV and the UAV, respectively. Figure 6 shows the acoustic positioning system (UAV-ST) deployed from the UAV underwater. The dimensions of the UAV-ST are 0.4 × 0.2 × 0.2 m, which is the minimum size adopted because the primary focus is on a close-range approach rather than recovery in this study. Even if the frame were enlarged to a practical size for docking, such as 0.6 × 0.5 × 0.4 m, the additional mass would remain minor because a lightweight aluminum structure is used, and therefore, the increase would not hinder the UAV from transporting the UAV-ST. The UAV is also connected to the acoustic positioning system by a communication cable and can receive the system's depth, heading, and acoustic positioning results in real time.

Blue AUV used in the sea trial.

UAV used in the sea trial.

Acoustic positioning system (UAV-ST).
Specifications of Blue AUV.
Specifications of the UAV.
Experiments
A sea trial using a UAV and an AUV was conducted in Hiratsuka New Port, Japan, with the UAV-ST suspended from the UAV and operated under manual control. First, the UAV flew to the exploration area, and upon arrival at the target point, it stopped hovering and deployed the UAV-ST underwater. After UAV-ST was deployed underwater, the AUV began autonomous navigation with depth control at a depth of 1 m. The AUV and the UAV-ST alternately transmitted positioning signals at 6-second intervals to measure their relative positions (

The procedure of the sea experiment.

The map of Hiratsuka Fishing Port.

The scene of the sea experiment.
The AUV's state was estimated through post-processing using the acoustic positioning and navigation sensor measurements acquired during the experiment. The parameters of the particle filter are shown in Table 3. The number of particles was set to 2000.
Parameters of the particle filter.
Since the UAV-ST-referenced acoustic relative positioning is updated every 12 s, the update interval is relatively sparse. To limit the smoothing delay under such sparse updates, the moving-average window size is set to 2 samples for velocity estimation. This corresponds to a smoothing time parameter of
Figure 10 shows the distances obtained by the UAV-ST and the AUV using acoustic positioning. The blue and light blue dots represent the distance acquired by the UAV-ST and the AUV, respectively. The distance decreases with time, indicating that the AUV is approaching the UAV-ST. Figure 11 shows the relative azimuth of the acoustic positioning system acquired by the AUV. After that, the AUV approaches while controlling its own azimuth in the direction of the acoustic arrival while keeping it within

Distance measured by the UAV-ST and the AUV.

Relative azimuth measured by the AUV.
Figures 12 to 14 show the results of one of five trials.

Estimated trajectory of the AUV and the AUV's position measured by the UAV-ST.

Standard deviations of X and Y positions of the AUV.

Standard deviations of X and Y velocities of the AUV.
Figure 12 shows the estimated position of the AUV, which was estimated based on the relative positioning measurements acquired by the AUV and the UAV-ST through alternating acoustic positioning and the measurements of the AUV's onboard navigation sensors (acceleration sensor and magnetometer). The blue dots represent the estimated position, and the red dots represent the positioning results of the UAV-ST. The origin is the position of the UAV-ST, the X-axis represents the north direction, and the Y-axis represents the east direction. At the beginning, the AUV initializes its position at the origin, but it gradually corrects its position based on the mutual positioning results, and it navigates in the direction of the arrow to approach the UAV-ST.
Figures 13 and 14 show the standard deviation of the estimated positions and estimated velocities of the AUV. The standard deviations are gradually decreasing, and the estimation is converging.
Figure 15 shows the time series of the station-referenced residual between the AUV position acquired by the UAV-ST through acoustic relative positioning and the estimated AUV position at the time of acquisition. Box plots summarizing the five trials are also provided. The blue plots represent the residual in the X-direction, and the red plots represent the residual in the Y-direction. Initially, the residual is about 20 m. A transient spike is observed around 50 s, which is attributed to the filter initialization phase, where the initial uncertainty is large and several acoustic updates are required before the estimation stabilizes. The residual decreases over time and, in the terminal phase, settles to about 1–2 m in both the X and Y directions. In particular, the X-direction residual shows a small spread of at most 0.2 m with no noticeable temporal variation, while the Y-direction residual exhibits a larger but still limited spread of at most 1.0 m with only minor temporal variation. Since an independent absolute ground-truth reference was not available in the sea trials, the reported 1–2 m values are defined as station-referenced measurement–estimate residuals. They are used to evaluate convergence and repeatability rather than absolute positioning accuracy. Because the residual does not increase with time, eventually settles within a bounded range, and does not exhibit large degradation across trials, these results support that the estimation is non-divergent.

Time series of the station-referenced residual between the estimated AUV position and the acoustic measurement.
Discussion
Robustness evaluation
This subsection evaluates the robustness of the proposed method against outliers of positioning measurements by varying the preprocessing window size and by examining the resulting changes in estimation and approach accuracy.
Since the UAV-ST-referenced acoustic relative positioning is updated every 12 s, moving-average smoothing is applied with window sizes of two and three samples to limit the smoothing delay under sparse updates. Accordingly, the smoothing time parameter is set to
State estimation in the post-processing was conducted using the same data described in the Experiments section. Table 4 shows parameter settings for outlier measurements. The compared conditions were as follows. Five trials were performed under each condition.
Condition 1: No outliers, window size W = 2 (same condition in the Experiments section) Condition 2: No outliers, window size W = 3 Condition 3: Outliers included, window size W = 2 Condition 4: Outliers included, window size W = 3
Parameter settings for outlier measurements.
The time-series station-referenced residual between the estimated AUV position and the acoustic measurement is shown in Figure 15 for Condition 1 and in Figure 16 for Conditions 2 to 4. In Figure 16, the top, middle, and bottom panels correspond to Conditions 2 to 4, respectively. When the window size was 2, the residual did not increase even when outliers occurred; instead, it remained stable and converged. In contrast, when the window size was 3, there was a period during which the residual slightly increased, but it eventually converged to 1–2 m. This is because the UAV-ST-referenced acoustic positioning is updated at 12-second intervals, and increasing the window size gradually introduces a delay in smoothing, which can lead to estimation errors.

Time series comparison of the station-referenced residual between the estimated AUV position and the acoustic measurement under different preprocessing conditions. The top, middle, and bottom panels correspond to Conditions 2, 3, and 4, respectively.
Scalability evaluation
This subsection investigates scalability to longer-range and deeper-water scenarios by introducing range-dependent degradations in acoustic measurements and analyzing how the approach performance changes under these conditions.
The sea trials in this study were conducted in shallow water (approximately 1 m depth) over a short range of 25 m. In such environments, acoustic measurements can be affected by strong multipath from surface and bottom reflections, leading to large errors and intermittent observations. In deeper-water and longer-range operations, the propagation conditions change, and additional factors may become dominant, such as depth-dependent sound-speed structure, higher attenuation, and increased intermittency. To discuss scalability beyond the tested range, an additional analysis introduces range-dependent degradations associated with reduced SNR (signal-to-noise ratio), namely increased measurement error, decreased positioning success rate, and an increased outlier rate. These degradations are modeled as an approximation using first-order (linear) functions of range to represent monotonic deterioration trends, rather than a physics-accurate propagation model or a precise extrapolation.
Let
The success probability
The outlier occurrence probability
The measurements
The positioning measurements used for the observations when the outlier occurs are computed in the same way as the Robustness evaluation subsection. Table 5 shows parameter settings for the distance-based measurement model. Parameter settings for outlier measurements are the same as those in Table 4. The moving-average parameter T for the velocity was set to 24.0 s.
Parameter settings for the distance-based measurement model.
State estimation in the post-processing was conducted using the same data described in the Experiments section. Five trials were performed.
Figure 17 shows the time-series station-referenced residual in simulation using the distance-based measurement model. Because positioning failures, measurement errors, and disturbances are more pronounced at long range, the residual is large immediately after the start, and the variability across trials is also large. However, the positioning results become more stable over time, and accordingly, the estimation results also stabilize; the final residual in the vicinity of the UAV-ST converges to 1–2 m. This indicates that even when approaching from a long distance, the estimation gradually converges, enabling the AUV to approach the vicinity of the UAV-ST with stable accuracy. Validation in real-world deep-water and long-range conditions is left for future work.

Time series of the station-referenced residual between the estimated AUV position and the acoustic measurement in simulation using the distance-based measurement model.
Transition evaluation
This subsection evaluates the feasibility of transitioning from acoustic guidance to vision-based guidance in the pre-recovery phase by assessing whether the achieved terminal accuracy satisfies the visual acquisition conditions near the UAV-ST.
In this study, the approach performance in the pre-docking phase is evaluated using acoustic relative positioning. In the subsequent phase, a staged guidance scheme is planned, where the acoustic approach is followed by vision-based docking using optical markers. Moreover, prior work using the same AUV platform has demonstrated successful optical-marker docking to a hovering-type AUV underwater. 43 Therefore, the key focus of this study is to guide the AUV into the vicinity of the UAV-ST and satisfy the marker acquisition conditions required for vision-based docking, after which the established optical-marker docking method can be applied.
Even if the UAV-ST is horizontally offset from the UAV nadir due to tether motion, the approach objective in this study is defined with respect to the UAV-ST itself. Therefore, a static offset from the UAV does not directly degrade the ability of the AUV to approach the UAV-ST. The relevant concern is time-varying tether motion, which can introduce additional geometric uncertainty in acoustic relative measurements. Although tether motion was not directly measured in the present trials, a design-oriented bound can be discussed.
For docking operations, maintaining the tether inclination within
A feasibility-based acquisition rate is evaluated using the distribution of terminal residuals and the camera field-of-view geometry. A trial is counted as visually acquirable at a distance R when
Comparison with prior studies
This subsection compares the performance of the proposed method with prior work using the same evaluation metrics.
For performance comparison, several previous studies are referenced. Using the acoustic virtual long baseline (LBL) method, the AUV's positioning accuracy has been reported to be within 2 m, demonstrating consistently high performance. 44 In addition, a representative DVL/INS-based approach can exhibit a final error on the order of several meters relative to the recovery station due to drift accumulation. 45 Thus, its performance is primarily characterized by long-term drift rather than instantaneous accuracy. On the other hand, the proposed method aims to suppress the terminal station-referenced residual by positioning based on the recovery station, and achieves a terminal residual on the order of 1–2 m even without onboard INS/DVL, relying only on lower-accuracy navigation sensors. This comparison is made in terms of the final residual relative to the station and is not intended as a direct comparison of instantaneous positioning error.
Electromagnetic (EM)-guidance docking with sub-decimeter accuracy has also been reported, but this is achieved in the close-range final stage using a dedicated EM docking station with large coils. 46 Accordingly, it is not directly comparable to the 1–2 m accuracy reported here, which corresponds to an acoustic-based approach. The proposed approach targets robust mid- to long-range guidance with minimal additional hardware, where performance is constrained by acoustic propagation effects such as multipath and sound-speed variability. For the close-range final docking stage, the system is intended to transition to a vision-based approach using an optical marker to obtain higher precision. As described above, the achieved final accuracy indicates that the AUV can be guided into the near-field region where optical localization becomes feasible, enabling a transition from an acoustic-based approach to vision-based docking.
Because prior studies differ in sensors, ranges, and ground-truth availability, an additional controlled quantitative comparison is provided by reprocessing the same sea-trial log under a conventional one-sided positioning configuration that uses only the AUV-side positioning updates. To emulate this configuration, state re-estimation was performed using only the AUV-side positioning measurements. To reduce the effect of filter initialization, the UAV-ST-side measurements were used only for the first three measurements and were disabled thereafter. Figure 18 shows the time series of the station-referenced residual between the estimated AUV position and the acoustic measurement for one-sided positioning. As a result, the median terminal residual exceeded 10 m, indicating that reaching the station vicinity is difficult under one-sided positioning. In addition, the feasibility of transitioning to vision-based localization at

Time series of the station-referenced residual between the estimated AUV position and the acoustic measurement for one-sided positioning.
Conclusion
This study developed an approach method that enables an AUV to autonomously approach a UAV-deployed recovery station (UAV-ST) in real sea environments. After the UAV deployed the UAV-ST, the AUV navigated autonomously and successfully approached the UAV-ST using acoustic positioning updates. In post-processing, the AUV's state was estimated from measurements of the acoustic positioning and onboard sensors. The station-referenced measurement–estimate residual was shown to remain within 1–2 m in the vicinity of the UAV-ST. In addition, scalability was examined through a simulation that introduced distance-dependent positioning success rates and measurement noises as a first-order (linear) approximation model. The results indicate that the proposed approach remains feasible under degraded measurement conditions. This study validates the pre-recovery approach phase. Future work will build on this approach to develop a docking method that enables fully automatic AUV recovery by the UAV.
Footnotes
Acknowledgements
This study was conducted with PRODRONE Co., Ltd, which operated the UAV in the sea experiment. We thank all the staff of the company. We also thank Kenji Kouno at Institute of Industrial Science, the University of Tokyo, who supported the sea experiments. This work was supported by the grant of the Specialists Center of Port and Airport Engineering, Japan (grant number 2023-19-5). We would like to express our deepest appreciation to them.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Specialists Center of Port and Airport Engineering (grant number 2023-19-5).
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Data availability statement
No data are publicly available.
