SoftMimic: Learning Compliant Whole-body Control from Examples


Gabriel B. Margolis*     Michelle Wang*     Nolan Fey     Pulkit Agrawal




Abstract


We introduce SoftMimic, a framework for learning compliant whole-body control policies for humanoid robots from example motions. Imitating human motions with reinforcement learning allows humanoids to quickly learn new skills, but existing methods incentivize stiff control that aggressively corrects deviations from a reference motion, leading to brittle and unsafe behavior when the robot encounters unexpected contacts. In contrast, SoftMimic enables robots to respond compliantly to external forces while maintaining balance and posture. Our approach leverages an inverse kinematics solver to generate an augmented dataset of feasible compliant motions, which we use to train a reinforcement learning policy. By rewarding the policy for matching compliant responses rather than rigidly tracking the reference motion, SoftMimic learns to absorb disturbances and generalize to varied tasks from a single motion clip. We validate our method through simulations and real-world experiments, demonstrating safe and effective interaction with the environment.


Overview


Traditional motion tracking controllers are stiff, unsafe, and not generalizable.

Policies trained to rigidly track a reference motion treat any deviation as an error to correct. When making unexpected contact with the world, they respond with large, uncontrolled forces, leading to brittle and potentially dangerous behavior.



Our Compliant Motion Augmentation Approach

Instead of training an RL agent to strictly track motions under any perturbations, we want it to respond to external forces with a controllable force-displacement relationship. A key challenge is how to balance compliant behaviors with motion imitation objectives. Rather than tuning this balance through competing rewards, we first generate a dataset of feasible and stylistically desirable compliant motions using an offline IK solver, providing a fine-grained specification that simplifies task prioritization. The policy then learns to reproduce these compliant behaviors while only observing the original reference, forcing it to implicitly infer external forces and react appropriately.

SoftMimic System Diagram


Result: Controllable Stiffness

Our policy can be commanded to behave with a specific stiffness. At low stiffness, it interacts gently and safely with its environment; at high stiffness, it firmly resists external forces to maintain its posture.


A teleoperator can command the robot to be stiff or compliant at deployment time. Below, the operator adjusts the joystick in the bottom left corner to increase and decrease stiffness while the reference posture remains unchanged.



Result: Safety and Stability

SoftMimic policies softly absorb unexpected contacts, whereas traditional motion tracking policies apply unspecified large forces that can damage the environment. Below, a traditional motion tracking baseline (left) and SoftMimic (right) are commanded to raise their arms next to a delicate Lego structure.



Result: Task Generalization from a Single Motion

Compliance enables a single reference motion to generalize to a range of task variations. Here, a motion reference dimensioned to pick a 20cm-wide box successfully enables picking boxes of various sizes by compliantly adjusting its grip. This policy was never trained on boxes, only generalized external forces.


The same policy can also respond to a variety of failure cases without specialized training. These represent scenarios where e.g. a high-level planner or teleoperator was ignorant of a misplaced box and attempted to pick it anyway.



Result: Compliant Locomotion and Manipulation

SoftMimic policy trained with a walking reference can comply with a payload and human interactions while maintaining balance.


SoftMimic policy trained with a pouring reference maintains smooth pouring while its other hand is significantly displaced. A stiff baseline jitters when forced the same distance, spilling the contents.



Paper


SoftMimic: Learning Compliant Whole-body Control from Examples
Gabriel B. Margolis*, Michelle Wang*, Nolan Fey, and Pulkit Agrawal
Preprint, 2025
paper / bibtex


Website template adated from here.