EQ Tracking Control

TLDR: We show how symmetry can reduce MDP size for tracking control problems in various robots.

Abstract

Tracking controllers enable robotic systems to accurately follow planned reference trajectories. In particular, reinforcement learning (RL) has shown promise in the synthesis of controllers for systems with complex dynamics and modest online compute budgets. However, the poor sample efficiency of RL and the challenges of reward design make training slow and sometimes unstable, especially for high-dimensional systems. In this work, we leverage the inherent Lie group symmetries of robotic systems with a floating base to mitigate these challenges when learning tracking controllers. We model a general tracking problem as a Markov decision process (MDP) that captures the evolution of both the physical and reference states. Next, we prove that symmetry in the underlying dynamics and running costs leads to an MDP homomorphism, a mapping that allows a policy trained on a lower-dimensional "quotient" MDP to be lifted to an optimal tracking controller for the original system. We compare this symmetry-informed approach to an unstructured baseline, using Proximal Policy Optimization (PPO) to learn tracking controllers for three systems: the Particle (a forced point mass), the Astrobee (a fullyactuated space robot), and the Quadrotor (an underactuated system). Results show that a symmetry-aware approach both accelerates training and reduces tracking error after the same number of training steps.

Video

Methods

Given some system with a symmetry group $\Phi_g$, we can relate different state and referneces to be equivalent under the group action of $g$. This allows us to consider equivariant states and references, which can improve sample efficiency of RL algorithms.

Instead of training RL in the full state space $\mathcal{S}$, we can train in the quotient space $\mathcal{S} / \mathcal{G}$, which is a lower-dimensional space that captures the essential dynamics of the system. This allows us to train faster and more efficiently. Note that since the reward function is also equivariant, the policy learned in the quotient space can be lifted to the full state space, and furthermore the optimal policy in the quotient space is also optimal in the full state space.

Results

Rollouts

Select a robot to view the evaluation video.

Training and evaluation curves for the Particle system.

Training and evaluation curves for the Astrobee system.

Training and evaluation curves for the Quadrotor system.

BibTeX


@misc{welde2024leveragingsymmetryacceleratelearning,
  title={Leveraging Symmetry to Accelerate Learning of Trajectory Tracking Controllers for Free-Flying Robotic Systems}, 
  author={Jake Welde and Nishanth Rao and Pratik Kunapuli and Dinesh Jayaraman and Vijay Kumar},
  year={2024},
  eprint={2409.11238},
  archivePrefix={arXiv},
  primaryClass={cs.RO},
  url={https://arxiv.org/abs/2409.11238}
}

Leveraging Symmetry to Accelerate Learning of Trajectory Tracking Controllers for Free-Flying Robotic Systems

This work won one of three Best Paper Awards (Neuroscience and Interpretability Track), NeurReps Workshop @ NeurIPS 2024