Leveling the Playing Field

Abstract

Learning-based control approaches like reinforcement learning (RL) have recently produced a slew of impressive results for tasks like quadrotor trajectory tracking and drone racing. Naturally, it is common to demonstrate the advantages of these new controllers against established methods like analytical controllers. We observe, however, that reliably comparing the performance of such very different classes of controllers is more complicated than might appear at first sight. As a case study, we take up the problem of agile tracking of an end-effector for a quadrotor with a fixed arm. We develop a set of best practices for synthesizing the best-in-class RL and geometric controllers (GC) for benchmarking. In the process, we resolve widespread RL-favoring biases in prior studies that provide asymmetric access to: (1) the task definition, in the form of an objective function, (2) representative datasets, for parameter optimization, and (3) “feedforward” information, describing the desired future trajectory. The resulting findings are the following: our improvements to the experimental protocol for comparing learned and classical controllers are critical, and each of the above asymmetries can yield misleading conclusions. Prior works have claimed that RL outperforms GC, but we find the gaps between the two controller classes are much smaller than previously published when accounting for symmetric comparisons. Geometric control achieves lower steady-state error than RL, while RL has better transient performance, resulting in GC performing better in relatively slow or less agile tasks, but RL performing better when greater agility is required. Finally, we open-source implementations of geometric and RL controllers for these aerial vehicles, implementing best practices for future development.

Prior Comparisons in the Literature

We performed a partial literature review of prior comparisons between RL and geometric controllers for quadrotor trajectory tracking. The table below summarizes the components of the RL and geometric controllers used in each paper, as well as the asymmetries in task, data, and feedforward information.

$\checkmark$ represents a controller had this component, $\sim$ represents a suboptimal implementation, and $\unicode{10007}$ represents a method that did not have this component. Asymmetries occur when comparing across model classes whose implementation was granted unequal access to task, data, or feedforward information.

Correcting Asymmetries

We correct for asymmetries in task, data, and feedforward information by ensuring that both RL and geometric controllers have equal access to these components during evaluation. We ensure that controllers are optimized on a task objective, given fair access to the task-aligned data in simulation, and have equal access to feedforward information about the desired trajectory.

We show that any one of the asymmetries can lead to misleading conclusions about the performance of RL and geometric controllers. Applying our corrective protocols improves the performance of both controllers, and leads to a more accurate comparison of their performance.

Best-in-Class Comparison

Having corrected for asymmetries, we compare the performance of the best-in-class RL and geometric controllers. We train and evaluate the GC and RL conrtollers for the task of trajectory tracking for both a quadrotor and an aerial manipulator with a fixed-arm end-effector.

Quadrotor

GC

RL

Aerial Manipulator

GC

RL

Performance between the two controller classes is much more similar than previously published.

There is no clear winner between RL and GC.

Trajectory Tracking

Digging deeper into the performance in trajectory tracking of the two controllers, we find interesting insights into where each controller excels.

We find that RL performs better in transient performance, while GC performs better in steady-state error. The GC controller is able to converge to perfect tracking given enough time, but the RL controller converges to lower error sooner for end-effector tracking.

Agile Behaviors

We evaluate the controllers on the task of ball catching, which requires agile behaviors. This agility is motivated by the time limitation of the task, as well as the precision needed for the end-effector to reach the goal catch location.

GC

RL

We find that RL performs better in this ball catching task, owing to the ability to get the end-effector close to the catch location quickly, while GC is slightly slower but converges to the catch location exactly.

Sim2Real Considerations

We also consider common sim2real considerations including the use of domain randomization (DR) and realistic motor dynamics in the simulation, and find that the results hold.

DR

Motor Dynamics

GC is more robust to domain randomization due to the model-based nature of the controller. RL and GC both perform similarly under realistic motor dynamics.

BibTeX


@inproceedings{kunapuli2025leveling,
  title={Leveling the Playing Field: Carefully Comparing Classical and Learned Controllers for Quadrotor Trajectory Tracking}, 
  author={Pratik Kunapuli and Jake Welde and Dinesh Jayaraman and Vijay Kumar},
  year={2025},
  journal={Robotics: Science and Systems (RSS)},
  url={https://pratikkunapuli.github.io/rl-vs-gc/},
}