SAFE-GIL: SAFEty Guided Imitation Learning for Robotic Systems

University of Southern California logo
2025 IEEE International Conference on Robotics and Automation (ICRA)

Audio Overview (generated using NotebookLM)

SAFE-GIL paper cover illustration showing safety-guided imitation learning concept

Abstract

Behavior cloning (BC) is a widely-used approach in imitation learning, where a robot learns a control policy by observing an expert supervisor. However, the learned policy can make errors and might lead to safety violations, which limits their utility in safety-critical robotics applications. While prior works have tried improving a BC policy via additional real or synthetic action labels, adversarial training, or runtime filtering, none of them explicitly focus on reducing the BC policy's safety violations during training time. We propose SAFE-GIL, a design-time method to learn safety-aware behavior cloning policies. SAFE-GIL deliberately injects adversarial disturbance in the system during data collection to guide the expert towards safety-critical states. This disturbance injection simulates potential policy errors that the system might encounter during the test time. By ensuring that training more closely replicates expert behavior in safety-critical states, our approach results in safer policies despite policy errors during the test time. We further develop a reachability-based method to compute this adversarial disturbance. We compare SAFE-GIL with various behavior cloning techniques and online safety-filtering methods in three domains: autonomous ground navigation, aircraft taxiing, and aerial navigation on a quadrotor testbed. Our method demonstrates a significant reduction in safety failures, particularly in low data regimes where the likelihood of learning errors, and therefore safety violations, is higher.

Experiments

We demonstrate the robustness of the learned policy under SAFE-GIL on two simulation case studies (state-based autonomous navigation and camera-based aircraft taxiing) and on a hardware testbed (aerial navigation). Each study varies in dynamics, observation space, and compute resources, with the intention of demonstrating safety enhancement.

Navigation Using a State-Based Policy

A wheeled robot needs to navigate in a 2D space to reach a goal position without colliding with obstacles in the environment. The navigation task is to be performed autonomously during the test time, starting from various initial states.

Safety and performance tradeoff

SAFE-GIL results showing computed BRT, disturbance, demonstration trajectories, and policy rollouts with collision rate comparison
Fig. Top row: Computed BRT and disturbance for θ = 0. Middle row: Demonstration trajectories with (Orange) and without (Blue) disturbance injection. Bottom row: BC and SAFE-GIL policy rollouts. Right column (Top): Mean collision rate and (Bottom) cost of safe trajectories vs number of demonstrations. SAFE-GIL results in a significant safety improvement..
Value function visualization for unicycle dynamics showing contours on position plane
Fig. The value function slice for 0° heading shown by the arrow, of the unicycle dynamics with ̄d = 0.5. The d*(x) pushes the system towards -∇V*(x). Contours of the value function are plotted on px, py plane..

Effect of adversarial disturbance

Comparison of mean collision rate and cost between adversarial and random noise injection
Fig. Mean collision rate (Left) and cost of safe rollouts (Right) vs number of demonstrations. Adversarial noise injection leads to a significant safety improvement over random noise..

Mitigating Covariate Shift

SAFE-GIL combined with DAgger showing complementary safety and performance advantages
Fig. SAFE-GIL can be combined with other imitation learning approaches to have complementary safety and performance advantages..

Aircraft Taxiing Using a Vision-Based Policy

An aircraft needs to taxii on the runway based on the RGB image observations obtained through a camera mounted on the plane’s right wing.
RGB image from aircraft right wing camera in X-Plane flight simulator showing runway view
Fig. An example RGB image from the right wing of the aircraft captured from X-Plane flight simulator..
Aircraft taxiing results comparing expert demonstrations, BC and SAFE-GIL rollouts, and excursion rate metrics
Fig. Top: Expert demonstration with and without disturbance injection. Middle: BC and SAFE-GIL rollouts from the same initial state. BC fails to keep the aircraft on the runway. Bottom: Mean excursion rate (Left) and Mean squared distance from the centerline (Middle) vs number of demonstrations. Safety value distribution of the collected demonstrations (Right) is shifted towards lower values for SAFE-GIL..
Comparison of expert demonstrations and imitation policy rollouts for aircraft taxiing
Fig. Expert demonstrations (left) and imitation policy rollouts (right)..

Safety Filtering

Safety filtering comparison showing mean excursion rates and trajectory comparisons with filter engagement points
Fig. Left: Mean excursion rates vs number of demonstrations. Right: SAFE-GIL, BC+Filter, BC+Vision Filter (More Data) trajectories from the same initial state. States where safety filter engages are denoted with red..

Quadrotor Navigation: Hardware experiment

A Crazyflie 2.1 quadrotor needs to reach a goal location without collisions. The human controlled demonstrations are imitated by a neural network policy running onboard the heavily resource-constrained Crazyflie, based on 8-pixel row of depth measurements for obstacles and position and velociy estimates from an optical flow camera.
Crazyflie quadrotor hardware experiment setup showing obstacle avoidance scenarios with two and three obstacles
Fig.Setting with two obstacles in the first row and three obstacles in the second row.

BibTeX


        @article{Ciftci2024SAFEGILSG,
          title={SAFE-GIL: SAFEty Guided Imitation Learning for Robotic Systems},
          author={Yusuf Umut Ciftci and Darren Chiu and Zeyuan Feng and Gaurav S. Sukhatme and Somil Bansal},
          journal={2025 IEEE International Conference on Robotics and Automation (ICRA)},
          year={2024},
          pages={3559-3566}
        }