Simulation and Sim-to-Real
Training a reinforcement learning policy on a physical robot for a walking task might require millions of trial steps. At one step per second, that is 11 days of continuous operation — per training run. Physics simulators solve this by running thousands of virtual robots in parallel, at hundreds of times real speed, generating millions of steps in hours on a GPU cluster. But simulation is an approximation of reality. The policy trained in simulation must still run on a robot made of metal, rubber, and electronics subject to friction, temperature, wear, and all the imprecision of the physical world. Bridging this gap — making simulation-trained policies work on real hardware — is one of the central engineering challenges of modern robot learning.
Physics Simulators: What They Model and What They Omit
Modern robot physics simulators (MuJoCo, IsaacGym/IsaacSim, Gazebo, PyBullet, CARLA for autonomous driving) model three things: rigid body dynamics, contact and collision physics, and sensor rendering. Rigid body dynamics tracks how robot links and objects move under applied forces and torques, governed by Newton-Euler equations integrated forward in time. This part of simulation is well-understood and highly accurate for most robot structures. Contact and collision physics is harder. When a gripper contacts an object, micro-deformations, friction, and the geometry of contact edges all affect outcomes. Simulators approximate contact with simplified models (spring-dampers, Coulomb friction) that differ from real contact behavior. This is the primary source of sim-to-real gap in manipulation tasks. Sensor rendering: simulating what a camera, LiDAR, or force-torque sensor would observe requires additional models. Simulated RGB images are rendered by graphics engines that do not capture real sensor noise, lens distortion, bloom, or the subtle texture differences between real surfaces and their simulated counterparts. Simulated LiDAR may miss the multi-path reflections and beam divergence of real sensors. The result is a simulation fidelity spectrum. Proprioceptive signals (joint angles, angular velocities) simulate very accurately because they depend only on rigid body dynamics. Contact forces simulate moderately well. Visual sensor output simulates poorly. This is why RL locomotion controllers (which depend mostly on proprioceptive feedback) transfer more readily to hardware than manipulation policies (which depend heavily on contact and vision).
Walking depends mainly on joint angles and velocities — quantities that simulators model accurately. Grasping depends on the exact contact geometry between fingertip and object surface — quantities that simulators approximate poorly. This explains why sim-to-real transfer is more mature for locomotion than for dexterous manipulation.
Domain Randomization: Teaching Robustness Through Variety
If the simulation will never perfectly match reality, one strategy is to make the policy robust to a wide range of simulation parameters — so wide that reality falls somewhere within the distribution. This is domain randomization. In domain randomization, simulation parameters are randomly sampled from broad distributions during each training episode: - Mass and inertia of robot links: ±30% of nominal - Friction coefficients on surfaces: uniform sample from [0.3, 1.5] - Motor torque constants: ±20% - Camera position and orientation: small random offsets - Object visual appearance: random textures, lighting conditions, colors - Sensor noise: random Gaussian noise added to observations A policy trained across this enormous variety of simulated worlds cannot rely on any single set of physical parameters being correct. Instead, it learns behaviors that work despite parameter uncertainty — and these behaviors tend to work on real hardware because real hardware parameters fall within the randomized range. OpenAI's Dactyl project (2019) used extensive domain randomization to train a robotic hand to manipulate a Rubik's cube entirely in simulation. The randomized parameters included 100 different physical attributes. When transferred to the real Shadow Hand hardware, the policy generalized successfully — the first time a physical robot hand solved a Rubik's cube with a learned policy. Domain randomization does have limits. If the real robot's parameters fall outside the randomization range, the policy will fail. And overly aggressive randomization can prevent the policy from converging at all, because the task becomes too hard under extreme parameter combinations.
Match each sim-to-real technique to the specific problem it addresses.
Terms
Definitions
Drag terms onto their definitions, or click a term then click a definition to match.
System Identification and Adaptive Methods
An alternative to randomization is system identification: carefully measuring the real robot's physical parameters and configuring the simulation to match them as closely as possible. Techniques include: Measuring mass and inertia by observing how the robot responds to known torques. Estimating friction by recording joint motion under gravity and fitting a friction model. Identifying motor characteristics (torque constant, back-EMF) from electrical measurements under controlled loads. System identification produces a 'digital twin' — a simulation tuned to match a specific physical robot. Policies trained on a well-tuned digital twin can transfer with much less domain randomization because the simulation is already accurate. Boston Dynamics uses digital twins extensively in their development pipeline. A more powerful approach is adaptive transfer: train the policy in simulation with a learned context variable that captures the current physical parameters. At deployment, the robot quickly infers its context variable by observing its own behavior for a few seconds, then uses the matched policy. This is sometimes called meta-RL or rapid motor adaptation (RMA, developed at CMU, 2021). RMA-trained ANYmal robots adapted their gait to mud, sand, and carrying payloads within seconds of deployment — without any prior real-world training on those surfaces.
Major robot manufacturers now provide digital twin environments alongside their hardware. NVIDIA Isaac Sim, ABB RobotStudio, and FANUC's virtual robot controller let engineers train and verify policies before the physical robot is even built. This shifts robotics development from hardware-first to simulation-first, dramatically accelerating iteration cycles.
A team trains a manipulation policy in IsaacGym with perfect contact simulation and transfers it to a real robot. The policy fails immediately on contact tasks but works on non-contact reaching motions. What does this pattern reveal?
Rapid Motor Adaptation (RMA) allows a robot to adapt to new terrain conditions within seconds of deployment. The key mechanism that enables this is:
Evaluate the Reality Gap
- This thought experiment develops your intuition for simulation fidelity by analyzing specific robot tasks.
- Step 1: For each robot task below, rate the expected sim-to-real gap on a scale of 1 (small gap, transfers easily) to 5 (large gap, transfers poorly). Justify each rating in one sentence.
- - Task A: A robot arm reaches from a start pose to a target pose in free space
- - Task B: A gripper grasps a smooth aluminum cylinder
- - Task C: A gripper grasps a crumpled paper bag
- - Task D: A quadruped walks on flat tiled floor
- - Task E: A quadruped walks on wet grass
- - Task F: A robot reads a label on a bottle using a camera
- Step 2: For the two tasks you rated 4 or 5, propose one domain randomization strategy and one system identification strategy that could reduce the gap.
- Step 3: Task F involves visual perception. Why does photorealistic rendering of simulated environments not fully close the visual sim-to-real gap? What information does a real camera capture that a render engine does not faithfully reproduce?
- Step 4: A startup wants to build a simulation-first development pipeline for a medical robot that inserts IV catheters. What would you need to simulate accurately for the policy to transfer, and what aspects might still require real-world validation before deployment?