SAFE-GIL: SAFEty Guided Imitation Learning for Robotic Systems

Under Review

Audio Overview (generated using NotebookLM)

Abstract

Behavior cloning (BC) is a widely-used approach in imitation learning, where a robot learns a control policy by observing an expert supervisor. However, the learned policy can make errors and might lead to safety violations, which limits their utility in safety-critical robotics applications. While prior works have tried improving a BC policy via additional real or synthetic action labels, adversarial training, or runtime filtering, none of them explicitly focus on reducing the BC policy's safety violations during training time. We propose SAFE-GIL, a design-time method to learn safety-aware behavior cloning policies. SAFE-GIL deliberately injects adversarial disturbance in the system during data collection to guide the expert towards safety-critical states. This disturbance injection simulates potential policy errors that the system might encounter during the test time. By ensuring that training more closely replicates expert behavior in safety-critical states, our approach results in safer policies despite policy errors during the test time. We further develop a reachability-based method to compute this adversarial disturbance. We compare SAFE-GIL with various behavior cloning techniques and online safety-filtering methods in three domains: autonomous ground navigation, aircraft taxiing, and aerial navigation on a quadrotor testbed. Our method demonstrates a significant reduction in safety failures, particularly in low data regimes where the likelihood of learning errors, and therefore safety violations, is higher.

Experiments

We demonstrate the robustness of the learned policy under SAFE-GIL on two simulation case studies (state-based autonomous navigation and camera-based aircraft taxiing) and on a hardware testbed (aerial navigation). Each study varies in dynamics, observation space, and compute resources, with the intention of demonstrating safety enhancement.

Navigation Using a State-Based Policy

A wheeled robot needs to navigate in a 2D space to reach a goal position without colliding with obstacles in the environment. The navigation task is to be performed autonomously during the test time, starting from various initial states.

Safety and performance tradeoff

Fig. Top row: Computed BRT and disturbance for θ = 0. Middle row: Demonstration trajectories with (Orange) and without (Blue) disturbance injection. Bottom row: BC and SAFE-GIL policy rollouts. Right column (Top): Mean collision rate and (Bottom) cost of safe trajectories vs number of demonstrations. SAFE-GIL results in a significant safety improvement..
Fig. The value function slice for 0° heading shown by the arrow, of the unicycle dynamics with ̄d = 0.5. The d*(x) pushes the system towards -∇V*(x). Contours of the value function are plotted on px, py plane..

Effect of adversarial disturbance

Fig. Mean collision rate (Left) and cost of safe rollouts (Right) vs number of demonstrations. Adversarial noise injection leads to a significant safety improvement over random noise..

Mitigating Covariate Shift

Fig. SAFE-GIL can be combined with other imitation learning approaches to have complementary safety and performance advantages..

Aircraft Taxiing Using a Vision-Based Policy

An aircraft needs to taxii on the runway based on the RGB image observations obtained through a camera mounted on the plane’s right wing.
Fig. An example RGB image from the right wing of the aircraft captured from X-Plane flight simulator..
Fig. Top: Expert demonstration with and without disturbance injection. Middle: BC and SAFE-GIL rollouts from the same initial state. BC fails to keep the aircraft on the runway. Bottom: Mean excursion rate (Left) and Mean squared distance from the centerline (Middle) vs number of demonstrations. Safety value distribution of the collected demonstrations (Right) is shifted towards lower values for SAFE-GIL..
Fig. Expert demonstrations (left) and imitation policy rollouts (right)..

Safety Filtering

Fig. Left: Mean excursion rates vs number of demonstrations. Right: SAFE-GIL, BC+Filter, BC+Vision Filter (More Data) trajectories from the same initial state. States where safety filter engages are denoted with red..

Quadrotor Navigation: Hardware experiment

A Crazyflie 2.1 quadrotor needs to reach a goal location without collisions. The human controlled demonstrations are imitated by a neural network policy running onboard the heavily resource-constrained Crazyflie, based on 8-pixel row of depth measurements for obstacles and position and velociy estimates from an optical flow camera.
Fig.Setting with two obstacles in the first row and three obstacles in the second row.

BibTeX


        @misc{ciftci2024safegilsafetyguidedimitation,
          title={SAFE-GIL: SAFEty Guided Imitation Learning for Robotic Systems}, 
          author={Yusuf Umut Ciftci and Darren Chiu and Zeyuan Feng and Gaurav S. Sukhatme and Somil Bansal},
          year={2024},
          eprint={2404.05249},
          archivePrefix={arXiv},
          primaryClass={cs.RO},
          url={https://arxiv.org/abs/2404.05249}, 
        }