Skip to main content

· 2 min read
Haoran Geng
Yuyang Li

This is the editors' words from Haoran and Yuyang. They want to explain their ideas of creating this website. If you find what you want here, or just simply like it, please consider giving a star to its repo on GitHub!

Why Simulately?

At the intersection of robotics and computer vision, physics simulators serve as crucial tools for validating algorithms and testing models before they're deployed in real-world robots. They cater to a wide range of functions, from rendering photorealistic perceptions with ground truth segmentation masks, to spawning parallel environments with GPU for efficient data sampling in Reinforcement Learning. Consequently, an array of physics simulators has emerged, each with its unique set of offerings. However, several challenges persist:

  1. Some of the required functions are found missing on the half way of development with an initially chosen simualtor.
  2. Simulation results are inconsistent across different simulators, adding another layer of complexity.
  3. Some simulators suffer from poor documentation, leaving users in the dark about how to implement the desired functions using the provided APIs.
  4. Troubleshooting can be a nightmare, with solutions either hard to come by, incomplete or entirely absent from online resources.
  5. The list goes on and on and on and on... Don't stop believing

Addressing these hurdles for the benefit of the wider community, particularly researchers who could better spend their time on more meaningful pursuits, we've created Simulately. Our mission extends beyond creating a repository for our experiences in research and development. We aim to foster a collaborative platform where learnings and insights are shared, and where others are encouraged to do the same. Welcome to Simulately, your comprehensive guide to navigating the world of robotics simulators.

· 4 min read
Haoran Geng

In this blog, we discuss what is physics simulator and why it is important for robotics. Some text gets help from GPT-4. If you find what you want here, or just simply like it, please consider giving a star to its repo on GitHub!

What is a Simulator?

A physics simulator is a computer program or software that models the behavior of physical systems according to the laws and principles of physics. These simulators are used in a wide range of applications, from video games and movies to engineering and scientific research. The goal of a physics simulator is to predict or replicate how physical objects interact and evolve over time in a given environment.

There are several key features and components commonly found in physics simulators:

  1. Collision Detection and Response: Determines when two or more objects collide and dictates how they should respond, such as bouncing off each other or deforming upon impact.
  2. Rigid Body Dynamics: Simulates the motion of solid objects that don't change shape. This includes things like velocity, acceleration, and forces like friction and gravity.
  3. Soft Body Dynamics: Simulates the motion and deformation of objects that are not rigid, like cloth or jelly.
  4. Particle Systems: Simulates systems of many small particles, which could be anything from water spray to smoke or fire.
  5. Fluid Dynamics: Models the behavior of liquids and gases, capturing phenomena like flow, turbulence, and wave propagation.
  6. Constraint Solvers: Manage constraints or restrictions on how objects can move, like a door that can swing only in one direction.

Different physics simulators may emphasize or specialize in one or more of these areas depending on their intended application. For instance:

  • Video games might use simplified or approximated physics to ensure fast real-time performance.
  • Animation software for films might have detailed soft body dynamics to model realistic clothing or hair movement.
  • Engineering software might focus on accurate rigid body dynamics to predict the behavior of machinery or structures.

Using physics simulators, researchers and developers can test scenarios in a virtual environment before implementing them in the real world, which can save time, money, and reduce risks.

Why Physics Simulator is crucial for Robotics?

In the field of robotics, physics simulators play an indispensable role. They allow engineers and researchers to model and test robotic systems in a controlled virtual environment before physical prototypes are built. This is particularly crucial in robotics due to the complexity and cost of robotic systems. Simulators provide a safe and cost-effective way to explore the behavior of robots in various scenarios, including challenging or hazardous environments. By using these tools, developers can optimize the design and functionality of robots, ensuring that they perform as intended in the real world. This includes testing the robot's ability to navigate terrain, manipulate objects, and interact safely and effectively with humans and other machines. Furthermore, simulators are essential for training artificial intelligence systems in robotics, offering a diverse range of scenarios for machine learning algorithms to learn from, without the risks and costs associated with real-world testing.

What we want to do?

Our project is designed to cater to both beginners and seasoned developers in the field of robotics and simulation. For beginners, we aim to create a single, comprehensive website that serves as a one-stop resource. This platform will feature easy-to-follow tutorials for those just starting their journey in physics simulations and robotics. Additionally, we plan to offer a series of blogs that break down complex concepts into digestible, easy-to-learn formats, fostering a friendly learning environment for newcomers.

For the more experienced developers, our project takes a deeper dive. We intend to provide a thorough summary of related works and benchmarks in the field, giving professionals a quick yet comprehensive overview of the current state of the art. Our platform will also feature a collection of useful toolkits, designed to streamline the development process and enhance efficiency. Furthermore, a key component of our project is a deep comparison of all available simulators, offering detailed insights and evaluations to assist developers in choosing the right tools for their specific needs.

In essence, our goal is to bridge the gap between beginners and experts in the world of robotics and physics simulations, creating a harmonious community where knowledge and resources are shared efficiently and effectively.

· 7 min read
Yuzhe Qin

Physical simulations are a crucial tool in many fields, from game development and computer graphics to robotics and prototype modeling. One fundamental aspect of these simulations is the concept of rotation. Be it planets whirling around a star in a space simulation, joints operating in a humanoid robot, or an animated character performing a thrilling parkour backflip, rotations are indeed everywhere. This blog post seeks to unravel the complexities of 3D rotations and acquaint you with the diverse rotation representations used in physical simulations.

Challenges of 3D Rotations

3D rotations are crucial for modeling the orientation of objects in space. They enable us to visualize and manipulate 3D models mathematically. However, handling rotations in 3D space can be quite tricky. Many bugs in simulations can be traced back to mismanaged rotations. The complexities arise from the nature of the 3D rotation itself – it isn't commutative (the sequence of rotations is crucial) and interpolation isn't straightforward ( calculating a rotation halfway between two given rotations is complex). Additionally, 3D rotations form a group structure known as the Special Orthogonal Group, SO(3), which isn't a typical Euclidean space where we can perform standard linear operations.

Rotation Representations

1. Rotation Matrices

Rotation matrices are 3x3 matrices that signify a rotation around the origin in 3D space. They provide an intuitive approach to understanding rotation, with each column (or row, depending on convention) of the matrix representing the new directions of the original axes after the rotation.

However, rotation matrices come with their set of limitations. The degree of freedom for rotation in an n-dimensional space is n(n1)2\frac{n(n-1)}{2}. Thus, the 3D rotation resides in a 3-dimensional space (while 2D rotation resides in a 1-dimensional space). This means that 3D rotation matrices consume more memory (9 floating point numbers) than necessary, and maintaining the orthogonality and normalization of the rotation matrix during numerical operations can be computationally burdensome. In practical applications, the majority of libraries, including the simulators we've discussed, employ quaternions as their core representation for rotations.

2. Quaternions

Quaternions are a type of mathematical object that extend complex numbers. They consist of one real component and three imaginary components, often denoted as w+xi+yj+zkw+xi+yj+zk. Quaternions have emerged as an extremely effective method of representing rotations in 3D space for computation.

Different from rotation matrix, they merely require four floating point numbers, can be easily interpolated using techniques like Spherical Linear Interpolation (SLERP), and they bypass the gimbal lock problem. However, they are not as intuitive as the other methods, and comprehending how they work necessitates some mathematical background. Also, quaternions have a double covering problem. This means that each 3D rotation can be represented by two different quaternions: one and its negation. In other words, a quaternion qq and its negative q-q will represent the same 3D rotation.

3. Euler Angles

Euler angles represent a rotation as three angular rotations around the axes of a coordinate system. The axes can be in any order (XYZ, ZYX, etc.), and this order makes a difference, leading to what is known as the "gimbal lock" problem.

Gimbal lock occurs when the axes of rotation align, causing a loss of one degree of freedom. This can lead to unexpected behavior in simulations. And same 3D rotation can be mapped into multiple Euler angles. Euler angles also have issues with interpolation, as interpolating between two sets of Euler angles will not produce a smooth rotation.

4. Axis-Angle Representation

The Axis-Angle representation is another way to understand 3D rotations. In this representation, a 3D rotation is characterized by a single rotation about a specific axis. The amount of rotation is given by the angle, and the direction of rotation is specified by the unit vector along this axis.

This representation is simple and intuitive, but it's not easy to concatenate multiple rotations. Also, like Euler angles, it has a gimbal lock problem when the rotation angle reaches 180 degrees. However, it's very useful in some scenarios such as generating a random rotation, or rotating an object around a specific axis.

Conversion Between Representations

Now, let's discuss the conversion between a rotation matrix and other common rotation representations: Euler angles, quaternions, and the axis-angle representation.

1. Rotation Matrix to Euler Angles

The process of extracting Euler angles from a rotation matrix depends on the Euler angles convention. For the XYZ convention (roll, pitch, yaw), the extraction is:

roll = atan2(R[2, 1], R[2, 2])
pitch = atan2(-R[2, 0], sqrt(R[0, 0]^2 + R[1, 0]^2))
yaw = atan2(R[1, 0], R[0, 0])

where R[i,j]R[i, j] denotes the element at the ii-th row and the jj-th column of the rotation matrix RR.

2. Rotation Matrix to Quaternions

The conversion from a rotation matrix RR to a quaternion q=(w,x,y,z)q = (w, x, y, z) can be computed as:

w = sqrt(1 + R[0, 0] + R[1, 1] + R[2, 2]) / 2
x = (R[2, 1] - R[1, 2]) / (4 * w)
y = (R[0, 2] - R[2, 0]) / (4 * w)
z = (R[1, 0] - R[0, 1]) / (4 * w)

3. Rotation Matrix to Axis-Angle

For converting a rotation matrix to axis-angle representation, the axis a=(ax,ay,az)a = (a_x, a_y, a_z) and the angle θ\theta can be calculated as:

θ = acos((trace(R) - 1) / 2)
a_x = (R[2, 1] - R[1, 2]) / (2 * sin(θ))
a_y = (R[0, 2] - R[2, 0]) / (2 * sin(θ))
a_z = (R[1, 0] - R[0, 1]) / (2 * sin(θ))

where trace(R) is the sum of the elements on the main diagonal of R.

Common Issues and Bugs

Different Simulator, Different Rotation Convention

Both Euler Angles and Quaternions adhere to multiple conventions. Various software libraries utilize different conventions, which can potentially lead to errors when these libraries are used in tandem, a situation that occurs quite frequently.

For instance, some libraries represent Quaternion as (w,x,y,z)(w, x, y, z), positioning the real part as the first element, while others represent it as (x,y,z,w)(x, y, z, w). The following table illustrates the convention adopted by some widely used software and simulators.

Quaternion ConventionSimulator/Library
wxyzMuJoCo, SAPIEN, CoppeliaSim, IsaacSim, Gazebo, Blender, Taichi, Transforms3d, Eigen, PyTorch3D, USD
xyzwIsaacGym, ROS 1&2, IsaacSim Dynamic Control Extension, PhysX, SciPy, Unity, PyBullet

Besides the convention of quaternion. It's essential to recognize that several popular game engines, including Unity and Unreal Engine 4, operate within a left-handed coordinate framework. Within this system, the positive x-axis extends to the right, the positive y-axis ascends upwards, and the positive z-axis stretches forward. These game engines are not only pivotal in game development but also serve as simulators in various research domains.

Conversely, the majority of simulation engines adopt a right-handed coordinate system. The distinction between left-handed and right-handed coordinate systems is a critical aspect to consider during development.

When integrating different libraries and tools, this variation in coordinate system conventions can lead to discrepancies in spatial calculations and visual representations. As such, maintaining consistency across these systems is key to ensuring accurate and reliable outcomes in your projects.

Conclusion

Understanding 3D rotation representations and their conversion plays a pivotal role in creating sophisticated and realistic physical simulations. While this tutorial provides a comprehensive overview of the primary rotation representations, it's up to developers to determine which representation best suits their specific use-cases and computational constraints.

· 3 min read
Yang You

In this blog, we delve into the mechanics of differentiable simulators and explore why they are sometimes a more advantageous choice compared to reinforcement learning (RL) methods.

What is a Differentiable Simulator?

Imagine a robot in an environment where, in each state sSs \in \mathcal{S}, the agent can execute an action aAa \in \mathcal{A} leading to a subsequent state sSs' \in \mathcal{S}. We can describe this transition with the function f:S×ASf: \mathcal{S} \times \mathcal{A} \to \mathcal{S}. In conventional non-differentiable simulators, this function ff is often treated as a black box, with observed rewards rRr \in \mathcal{R} serving as the primary signal for RL-based learning.

Contrastingly, in differentiable simulators, the function ff is perceived as an end-to-end differentiable operator. This implies that if we define some loss related to the output state l(s)l(s'), it becomes feasible to compute the gradient in relation to the input state and actions, such as l(s)s\frac{\partial l(s')}{\partial s} and l(s)a\frac{\partial l(s')}{\partial a}. This capability enables the optimization of an entire sequence of actions using the chain rule.

Why are Differentiable Simulators Effective for Policy Learning?

As Yann LeCun insightfully noted at NeuIPS 2016, "If intelligence is a cake, the bulk of the cake is unsupervised learning, the icing on the cake is supervised learning, and the cherry on the cake is reinforcement learning" (source). The key takeaway here is the direct supervisory gradient flow in relation to the actions we aim to optimize, in contrast to the indirect reward-based approach of the REINFORCE algorithm in RL.

For instance, consider a task where the goal is to maneuver an object to a target position using a pusher. In an RL scenario, this would require extensive exploration, potentially involving thousands of trials before significant progress is made. Conversely, in a differentiable simulator, each iteration of gradient descent inherently contributes to progress toward the goal.

Miscellaneous Considerations

While differentiable simulators present certain advantages over RL, they are not without their limitations and challenges, such as:

  • Invalid Gradients in Certain Scenarios: Consider a scenario involving a rigid rolling pin used to flatten dough. If the initial sequence of actions fails to make contact with the dough, the resulting gradient will be zero throughout. To address this, some studies, like PlasticineLab, suggest incorporating a contact loss based on proximity to the target object. Others propose 'softening' rigid tools by increasing their influence radius, allowing objects to be affected without direct contact.

  • Limited Efficiency in Long-Horizon Tasks: As discussed in this paper, the dependency of differentiable physics on local gradients poses significant challenges. The loss landscape in these scenarios is often complex and riddled with potentially misleading local optima, which can diminish the reliability of this method for certain tasks.

· 10 min read
Mingtong Zhang
Haoran Geng

Robotics, as an interdisciplinary field, relies on foundational principles of physics, mathematics, control systems, and computer science to create intelligent machines. This guide will introduce the key concepts essential for robotics, covering transformation, dynamics, etc.

Rodrigues' Rotation Formula

Rodrigues' rotation formula provides a simple and efficient way to rotate a vector in 3D around a specified axis by a given angle. Given a unit vector u\mathbf{u} representing the axis of rotation and an angle θ\theta, the formula for rotating a vector v\mathbf{v} is given by:

vrot=vcosθ+(u×v)sinθ+u(uv)(1cosθ)\mathbf{v}_{\text{rot}} = \mathbf{v} \cos \theta + (\mathbf{u} \times \mathbf{v}) \sin \theta + \mathbf{u} (\mathbf{u} \cdot \mathbf{v}) (1 - \cos \theta)

Here, vrot\mathbf{v}_{\text{rot}} is the rotated vector.

This formula elegantly decomposes the rotated vector into three components: a projection along the rotation axis that remains unchanged, a component perpendicular to the axis that rotates in a plane, and a cross-product term to handle rotation-induced perpendicularity.

Prove by Exponential Map

The exponential map approach leverages the mathematical relationship between rotation matrices and the Lie algebra associated with SO(3) (the special orthogonal group in 3D).

  1. Lie Algebra and Exponential Map: In this context, any rotation can be represented as an exponential of a skew-symmetric matrix formed from the axis of rotation u\mathbf{u}. Specifically, the rotation matrix R\mathbf{R} for a rotation by an angle θ\theta around the axis u\mathbf{u} is given by:

    R(θ)=exp(θ[u]×)\mathbf{R}(\theta) = \exp(\theta [\mathbf{u}]_\times)

    Here, [u]×[\mathbf{u}]_\times denotes the skew-symmetric matrix of the unit vector u\mathbf{u}:

    [u]×=[0uzuyuz0uxuyux0][\mathbf{u}]_\times = \begin{bmatrix} 0 & -u_z & u_y \\ u_z & 0 & -u_x \\ -u_y & u_x & 0 \end{bmatrix}
  2. Exponential Series Expansion: We expand exp(θ[u]×)\exp(\theta [\mathbf{u}]_\times) using its Taylor series:

    exp(θ[u]×)=I+sin(θ)[u]×+(1cos(θ))[u]×2\exp(\theta [\mathbf{u}]_\times) = \mathbf{I} + \sin(\theta) [\mathbf{u}]_\times + (1 - \cos(\theta)) [\mathbf{u}]_\times^2

    where $ \mathbf{I} $ is the identity matrix. Applying this matrix to the vector $ \mathbf{v} $, we derive the components of the Rodrigues' formula:

    vrot=(I+sin(θ)[u]×+(1cos(θ))[u]×2)v\mathbf{v}_{\text{rot}} = (\mathbf{I} + \sin(\theta) [\mathbf{u}]_\times + (1 - \cos(\theta)) [\mathbf{u}]_\times^2) \mathbf{v}
  3. Interpretation: This expression matches the form of Rodrigues' rotation formula, breaking the rotation into linear and cross-product terms with respect to the axis u\mathbf{u} and angle θ\theta.

Prove by Geometry

The geometric proof of Rodrigues' rotation formula involves decomposing the vector v\mathbf{v} into parallel and perpendicular components relative to the axis of rotation u\mathbf{u}.

  1. Decomposition of v\mathbf{v}: We first decompose v\mathbf{v} into two components:

    • Parallel component: v=(uv)u\mathbf{v}_{\parallel} = (\mathbf{u} \cdot \mathbf{v}) \mathbf{u}
    • Perpendicular component: v=vv\mathbf{v}_{\perp} = \mathbf{v} - \mathbf{v}_{\parallel}
  2. Rotation of the Perpendicular Component: The perpendicular component v\mathbf{v}_{\perp} lies in the plane orthogonal to u\mathbf{u}. When rotating v\mathbf{v}_{\perp} around u\mathbf{u} by an angle θ\theta, we obtain:

    v,rot=vcosθ+(u×v)sinθ\mathbf{v}_{\perp, \text{rot}} = \mathbf{v}_{\perp} \cos \theta + (\mathbf{u} \times \mathbf{v}_{\perp}) \sin \theta

    Since v=v(uv)u\mathbf{v}_{\perp} = \mathbf{v} - (\mathbf{u} \cdot \mathbf{v}) \mathbf{u}, we can substitute and simplify to get:

    vrot=v+vcosθ+(u×v)sinθ\mathbf{v}_{\text{rot}} = \mathbf{v}_{\parallel} + \mathbf{v}_{\perp} \cos \theta + (\mathbf{u} \times \mathbf{v}) \sin \theta
  3. Combine Components: Adding the parallel component (which remains unchanged during rotation) and the rotated perpendicular component gives the full Rodrigues' formula:

    vrot=(uv)u+(v(uv)u)cosθ+(u×v)sinθ\mathbf{v}_{\text{rot}} = (\mathbf{u} \cdot \mathbf{v}) \mathbf{u} + (\mathbf{v} - (\mathbf{u} \cdot \mathbf{v}) \mathbf{u}) \cos \theta + (\mathbf{u} \times \mathbf{v}) \sin \theta

This geometric decomposition intuitively explains how the rotation occurs in three dimensions around the axis u\mathbf{u} by an angle θ\theta, preserving the properties of length and orthogonality.

Forward and Inverse Kinematics

Kinematics is a fundamental concept in robotics that deals with the motion of robot parts without considering the forces that cause the motion. Two critical types of kinematics are forward kinematics (FK) and inverse kinematics (IK), which define how robots move and interact with their environments.

Forward Kinematics (FK)

Forward kinematics involves computing the position and orientation of a robot's end-effector (e.g., the tip of a robotic arm or gripper) given specific joint angles or displacements. The goal is to determine where the end-effector will be in the robot's workspace when the individual joints are moved in a predefined way. Forward kinematics typically uses transformation matrices, such as Denavit-Hartenberg (DH) parameters, to map joint positions to the corresponding end-effector position and orientation in 3D space.

Example:

Consider a simple two-link robotic arm with two rotational joints. Given the angles of these joints, the forward kinematics computation can determine the precise position of the end-effector relative to the robot's base. This is often represented as a chain of matrix transformations:

  1. Compute Transformations for Each Joint: Each joint's rotation or translation is represented by a transformation matrix.
  2. Chain the Transformations: Multiply the transformation matrices to derive the overall pose (position and orientation) of the end-effector.

Forward kinematics is typically straightforward to compute and results in a unique solution for a given set of joint positions.

Common Packages for Forward Kinematics:

  • ROS (Robot Operating System): ROS offers libraries such as tf and robot_state_publisher to compute FK for robot models described in URDF (Unified Robot Description Format).

  • MoveIt!: This is a powerful motion planning framework in ROS that provides FK capabilities among many other functions.

  • PyBullet: This physics engine for simulating robots offers FK functions for articulated robots.

  • Drake: Developed by MIT, this robotics library includes both FK and IK solvers and is particularly powerful for optimization-based approaches.

Inverse Kinematics (IK)

Inverse kinematics works in the reverse direction: it determines the joint angles or movements needed to place the end-effector at a desired position and orientation. Unlike forward kinematics, IK can be more challenging because there may be multiple possible solutions, no solution, or constraints due to the robot's physical limits, joint boundaries, and obstacles in the environment.

Example:

Suppose you would like to move the end-effector of the same two-link robotic arm to a specific point in space. The inverse kinematics algorithm calculates the angles for each joint required to reach that position. However, depending on the arm's configuration, there could be multiple sets of angles (known as solutions) that achieve the desired end-effector position.

Challenges in Inverse Kinematics:

  • Multiple Solutions: There can be more than one valid way to position the joints for a given end-effector location, especially in complex or redundant systems.
  • No Solutions: Certain desired positions may be outside the robot's reachable workspace, leading to situations where no joint configuration can achieve the target.
  • Constraints and Singularities: Physical constraints (like joint limits) and singularities (positions where the robot loses degrees of freedom or encounters instability) can complicate the solution process.

Inverse kinematics often requires numerical methods, such as Jacobian-based approaches, gradient descent, or optimization algorithms, to find feasible joint angles that meet a target end-effector pose. In some cases, closed-form solutions may exist, providing exact solutions without iterative calculations.

Dynamics in Robotics

Dynamics in robotics deals with understanding the forces and torques that cause motion. Unlike kinematics, which focuses on motion without considering what causes it, dynamics considers the effect of physical quantities like mass, inertia, and external forces. Understanding dynamics is essential for accurately controlling and simulating the behavior of robots, especially when interacting with their environments.

Forward Dynamics

Forward dynamics determines how a robot moves in response to applied forces and torques. Given a set of joint torques or forces, forward dynamics computes the resulting joint accelerations, velocities, and positions over time. This process requires solving the equations of motion that describe the robot's behavior.

Example:

Consider a robotic arm with several joints, each affected by gravity, friction, and external forces like contact with an object. Given the forces applied to the arm's joints, forward dynamics calculates how the arm will accelerate, which in turn determines how it moves through space.

Applications:

  • Simulation: Forward dynamics is widely used in robotic simulators to predict how robots will move and interact with their environments based on physical forces.
  • Motion Planning: By understanding how a robot will respond to forces, planners can generate physically feasible motions.
  • Control: Robots can use forward dynamics to predict future states and adjust control inputs accordingly.

Inverse Dynamics

Inverse dynamics works in the opposite direction: it computes the required forces and torques at each joint to achieve a specified motion. Given a desired trajectory of joint positions, velocities, and accelerations, inverse dynamics calculates the forces or torques necessary to produce that motion. This process is critical for controlling robots accurately and efficiently.

Example: Suppose a robotic arm needs to lift a heavy object along a predefined trajectory. The inverse dynamics approach calculates the torques that each joint motor must exert to follow the trajectory while overcoming gravity, inertia, and any applied loads.

Applications:

  • Control Systems: Inverse dynamics is commonly used in controllers, such as computed torque control, to ensure that the robot follows desired paths accurately.
  • Trajectory Optimization: By determining the forces and torques required to achieve a motion, engineers can optimize trajectories for efficiency or specific objectives, such as minimizing energy consumption.
  • Physical Interaction: Robots interacting with their environment—like pushing, pulling, or carrying objects—rely on inverse dynamics to calculate the appropriate force exertion.

Equations of Motion

The dynamics of a robot are typically described using the Newton-Euler equations or Lagrangian mechanics:

  • Newton-Euler Formulation: This approach uses Newton's laws of motion and Euler's equations for rotational motion to describe the dynamics of each link in the robot's structure. It is particularly effective for calculating joint forces and torques in a recursive manner.

Key Equations:

Linear Motion (Newton's Second Law):

F=ma\mathbf{F} = m \mathbf{a}

where F\mathbf{F} is the net force acting on the body (link). mm is the mass of the body. a\mathbf{a} is the linear acceleration of the body.

Rotational Motion (Euler's Equations):

τ=Iω˙+ω×(Iω)\boldsymbol{\tau} = \mathbf{I} \dot{\boldsymbol{\omega}} + \boldsymbol{\omega} \times (\mathbf{I} \boldsymbol{\omega})

where τ\boldsymbol{\tau} is the net torque acting on the body, I\mathbf{I} is the inertia tensor of the body, ω\boldsymbol{\omega} is the angular velocity, and ω˙\dot{\boldsymbol{\omega}} is the angular acceleration.

The Newton-Euler method typically involves two passes for computation:

  • Forward Recursion: Calculates the velocities and accelerations of each link, starting from the base (root) and moving outward to the end-effector.
  • Backward Recursion: Computes forces and torques required at each joint, starting from the end-effector and moving back toward the base.
  • Lagrangian Formulation: This method uses energy functions (kinetic and potential energy) to derive the equations of motion. The Lagrangian approach often leads to more compact expressions and is useful for analytical modeling of complex systems.
L=KPL = K - P

where KK is the kinetic energy and PP is the potential energy. The equations of motion are derived using the Euler-Lagrange equation:

ddt(Lq˙i)Lqi=τi\frac{d}{dt} \left( \frac{\partial L}{\partial \dot{q}_i} \right) - \frac{\partial L}{\partial q_i} = \tau_i

where qiq_i are the generalized coordinates (e.g., joint positions), q˙i\dot{q}_i are the generalized velocities (time derivatives of qiq_i), and τi\tau_i are the generalized forces or torques.

Key Equations for Energy Calculation:

  • Kinetic Energy:
K=12i=1nmiviTvi+12i=1nωiTIiωiK = \frac{1}{2} \sum_{i=1}^{n} m_i \mathbf{v}_i^T \mathbf{v}_i + \frac{1}{2} \sum_{i=1}^{n} \boldsymbol{\omega}_i^T \mathbf{I}_i \boldsymbol{\omega}_i

where mim_i is the mass of the ii-th link, vi\mathbf{v}_i is the linear velocity of the ii-th link, ωi\boldsymbol{\omega}_i is the angular velocity of the ii-th link, and Ii\mathbf{I}_i is the inertia tensor of the ii-th link.

  • Potential Energy:
P=i=1nmighiP = \sum_{i=1}^{n} m_i g h_i

where gg is the gravitational constant and hih_i is the height of the ii-th link in the gravitational field.

Challenges in Dynamics

  • Nonlinearity: Robot dynamics are inherently nonlinear, particularly for systems with many degrees of freedom or under the influence of significant external forces.
  • Complexity and Computation: Computing dynamics can be computationally expensive, especially for high-degree-of-freedom robots or real-time applications.
  • Contact and Friction: Dynamics calculations must account for interactions with the environment, such as contact forces and friction, which can introduce discontinuities and complexity in modeling.

· 9 min read
Mingtong Zhang

Reinforcement Learning is a subfield of machine learning that focuses on training agents to make decisions in environments by maximizing a reward signal. This guide will introduce the key concepts essential for Reinforcement Learning, covering Value/Policy Iteration, Q-Learning and Variants, Policy Gradient Methods, etc. We refer to OpenAI Spinning Up and CMU 16-831 for more details.

Recap of Markov Decision Processes

A Markov Decision Process (MDP) provides the mathematical framework for modeling decision-making in situations where outcomes are partly random and partly under the control of a decision-maker. An MDP is defined by the following components:

  1. State Space (S\mathcal{S}): The set of all possible states in the environment

    • Each state sSs \in \mathcal{S} contains all relevant information needed to make decisions
    • The Markov property states that the future only depends on the current state, not the history
  2. Action Space (A\mathcal{A}): The set of all possible actions available to the agent

    • Actions can be discrete (finite set) or continuous
    • The available actions may depend on the current state
  3. Transition Function P(ss,a)P(s'|s,a): The probability of transitioning to state ss' when taking action aa in state ss

    • P:S×A×S[0,1]P: \mathcal{S} \times \mathcal{A} \times \mathcal{S} \rightarrow [0,1]
    • Must satisfy sSP(ss,a)=1\sum_{s' \in \mathcal{S}} P(s'|s,a) = 1 for all s,as,a
    • Captures the dynamics of the environment
  4. Reward Function R(s,a,s)R(s,a,s'): The immediate reward received when transitioning from ss to ss' via action aa

    • R:S×A×SRR: \mathcal{S} \times \mathcal{A} \times \mathcal{S} \rightarrow \mathbb{R}
    • Defines the goal/objective for the agent
    • Can also be written as R(s,a)R(s,a) or R(s)R(s) depending on dependencies
  5. Discount Factor (γ\gamma): A value in [0,1][0,1] that determines the importance of future rewards

    • γ=0\gamma = 0: Myopic evaluation (only immediate rewards matter)
    • γ1\gamma \rightarrow 1: Far-sighted evaluation (future rewards are important)
    • Ensures convergence of infinite horizon problems

The goal in an MDP is to find an optimal policy π(s)\pi^*(s) that maximizes the expected cumulative discounted reward:

V(s)=maxπEπ[t=0γtR(st,at,st+1)s0=s]V^*(s) = \max_\pi \mathbb{E}_\pi \left[ \sum_{t=0}^{\infty} \gamma^t R(s_t,a_t,s_{t+1}) | s_0 = s \right]

Key Functions:

  • Value Function V(s)V(s): Expected return starting from state ss
  • Action-Value Function Q(s,a)Q(s,a): Expected return starting from state ss, taking action aa
  • Policy π(as)\pi(a|s): Probability distribution over actions given the current state

The optimal policy and value function are related through the Bellman optimality equations:

V(s)=maxasP(ss,a)[R(s,a,s)+γV(s)]V^*(s) = \max_a \sum_{s'} P(s'|s,a)[R(s,a,s') + \gamma V^*(s')]
Q(s,a)=sP(ss,a)[R(s,a,s)+γmaxaQ(s,a)]Q^*(s,a) = \sum_{s'} P(s'|s,a)[R(s,a,s') + \gamma \max_{a'} Q^*(s',a')]

Value/Policy Iteration

Value iteration and policy iteration are fundamental algorithms in reinforcement learning for solving Markov Decision Processes (MDPs). These algorithms aim to find optimal policies in environments with known transition dynamics and reward functions.

Value Iteration

Value iteration works by iteratively updating state values using the Bellman optimality equation:

Vk+1(s)=maxasP(ss,a)[R(s,a,s)+γVk(s)]V_{k+1}(s) = \max_a \sum_{s'} P(s'|s,a)[R(s,a,s') + \gamma V_k(s')]

where:

  • Vk(s)V_k(s) is the value of state ss at iteration kk
  • P(ss,a)P(s'|s,a) is the transition probability from state ss to ss' under action aa
  • R(s,a,s)R(s,a,s') is the immediate reward
  • γ\gamma is the discount factor
  • maxa\max_a finds the action that maximizes expected future value

The algorithm continues until the value function converges (i.e., the maximum change in value between iterations is below a threshold ε). Once converged, the optimal policy can be extracted by:

π(s)=argmaxasP(ss,a)[R(s,a,s)+γV(s)]\pi^*(s) = \arg\max_a \sum_{s'} P(s'|s,a)[R(s,a,s') + \gamma V^*(s')]

Policy Iteration

Policy iteration alternates between two steps:

  1. Policy Evaluation: For a fixed policy π, compute the value function by solving:

    Vπ(s)=sP(ss,π(s))[R(s,π(s),s)+γVπ(s)]V^\pi(s) = \sum_{s'} P(s'|s,\pi(s))[R(s,\pi(s),s') + \gamma V^\pi(s')]

    This is typically done iteratively until convergence.

  2. Policy Improvement: Update the policy to be greedy with respect to the current value function:

    πnew(s)=argmaxasP(ss,a)[R(s,a,s)+γVπ(s)]\pi_{new}(s) = \arg\max_a \sum_{s'} P(s'|s,a)[R(s,a,s') + \gamma V^\pi(s')]

The algorithm continues alternating between these steps until the policy no longer changes. Policy iteration is guaranteed to converge to the optimal policy in finite MDPs.

Comparison and Practical Considerations

  • Value iteration typically requires fewer iterations than policy iteration but does more computation per iteration
  • Policy iteration may converge in fewer iterations but requires solving a system of linear equations in each evaluation step
  • Modified Policy Iteration (MPI) combines aspects of both by performing partial policy evaluation steps
  • In practice, both algorithms may be impractical for large state spaces, leading to approximate versions using function approximation

Q-Learning and Variants

Q-Learning is a model-free reinforcement learning algorithm that learns an action-value function Q(s,a) representing the expected return of taking action a in state s. The core update equation is:

Q(st,at)Q(st,at)+α[rt+γmaxaQ(st+1,a)Q(st,at)]Q(s_t,a_t) \leftarrow Q(s_t,a_t) + \alpha[r_t + \gamma \max_a Q(s_{t+1},a) - Q(s_t,a_t)]

Deep Q-Network (DQN) extends this by using neural networks to approximate the Q-function, adding experience replay and target networks for stability.

Policy Gradient Methods

Policy gradient methods directly optimize a policy π(a|s) to maximize expected return. The key insight is the policy gradient theorem:

θJ(πθ)=Eτπθ[tθlogπθ(atst)R(τ)]\nabla_\theta J(\pi_\theta) = E_{\tau \sim \pi_\theta}[\sum_t \nabla_\theta \log \pi_\theta(a_t|s_t)R(\tau)]

Common algorithms include:

  • REINFORCE: The simplest policy gradient algorithm
  • PPO (Proximal Policy Optimization): Uses clipped surrogate objective for stable updates
  • TRPO (Trust Region Policy Optimization): Ensures policy updates stay within trust regions

Actor-Critic Methods

Actor-critic methods combine policy gradient approaches with value function approximation. The actor (policy) learns to take actions while the critic (value function) evaluates those actions. This architecture addresses key limitations of pure policy gradient and value-based methods:

  • Reduced Variance: The critic provides a lower-variance baseline for policy updates compared to Monte Carlo returns
  • Online Updates: Unlike REINFORCE, actor-critic methods can learn online without waiting for episode completion
  • Structured Exploration: The actor can learn sophisticated exploration strategies while the critic evaluates their effectiveness

The general update equations for actor-critic methods are:

Critic Update:

δt=rt+γV(st+1)V(st)(TD Error)\delta_t = r_t + \gamma V(s_{t+1}) - V(s_t) \quad \text{(TD Error)}
θvθv+αvδtθvV(st)\theta_v \leftarrow \theta_v + \alpha_v \delta_t \nabla_{\theta_v} V(s_t)

Actor Update:

θπθπ+απδtθπlogπθπ(atst)\theta_\pi \leftarrow \theta_\pi + \alpha_\pi \delta_t \nabla_{\theta_\pi} \log \pi_{\theta_\pi}(a_t|s_t)

Key algorithms include:

A2C/A3C (Advantage Actor-Critic)

  • Uses advantage function A(s,a)=Q(s,a)V(s)A(s,a) = Q(s,a) - V(s) to reduce variance
  • A3C runs multiple parallel actors for better exploration and stability
  • Synchronous (A2C) vs asynchronous (A3C) variants trade off between computational efficiency and stability
  • Typical architecture uses shared lower layers between actor and critic networks

DDPG (Deep Deterministic Policy Gradient)

  • Designed specifically for continuous action spaces
  • Uses deterministic policy gradient theorem: θπJ(πθ)=Esρπ[aQ(s,a)a=π(s)θππθπ(s)]\nabla_{\theta_\pi} J(\pi_\theta) = E_{s\sim\rho^\pi}[\nabla_a Q(s,a)|_{a=\pi(s)} \nabla_{\theta_\pi} \pi_{\theta_\pi}(s)]
  • Employs target networks and experience replay for stability
  • Adds exploration noise (typically Ornstein-Uhlenbeck process) to the deterministic policy
  • Architecture includes:
    • Actor network: μ(sθμ)\mu(s|\theta^\mu)
    • Critic network: Q(s,aθQ)Q(s,a|\theta^Q)
    • Target networks: μ\mu' and QQ' with soft updates

SAC (Soft Actor-Critic)

  • Maximizes both expected return and policy entropy
  • Objective: J(π)=tE(st,at)ρπ[r(st,at)+αH(π(st))]J(\pi) = \sum_t E_{(s_t,a_t)\sim\rho_\pi}[r(s_t,a_t) + \alpha H(\pi(\cdot|s_t))]
  • Uses two Q-functions to mitigate overestimation bias
  • Automatically tunes temperature parameter α\alpha for optimal entropy
  • Key components:
    • Soft value function: V(st)=Eatπ[Q(st,at)αlogπ(atst)]V(s_t) = E_{a_t\sim\pi}[Q(s_t,a_t) - \alpha\log\pi(a_t|s_t)]
    • Soft Q-function: Q(st,at)=rt+γEst+1[V(st+1)]Q(s_t,a_t) = r_t + \gamma E_{s_{t+1}}[V(s_{t+1})]
    • Policy: π(atst)exp(1αQ(st,at))\pi(a_t|s_t) \propto \exp(\frac{1}{\alpha}Q(s_t,a_t))

TD3 (Twin Delayed DDPG)

  • Addresses overestimation bias in DDPG using double Q-learning
  • Delays policy updates to reduce variance
  • Uses clipped double Q-learning and target policy smoothing
  • Particularly effective for complex continuous control tasks

Implementation considerations:

  • Network architectures typically use separate networks for actor and critic
  • Learning rates often differ between actor and critic (απ<αv\alpha_\pi < \alpha_v)
  • Gradient clipping and batch normalization can improve stability
  • Experience replay buffer size and batch size significantly impact performance

Advanced RL Algorithms

Modern RL research has produced several sophisticated algorithms that address different challenges in reinforcement learning:

Model-Based RL

  • Learns an explicit model of environment dynamics: st+1=f(st,at)s_{t+1} = f(s_t, a_t)
  • Uses learned model for planning and policy optimization
  • Key advantages:
    • Improved sample efficiency through model-based planning
    • Better generalization to new scenarios
  • Notable algorithms:
    • PETS (Probabilistic Ensembles with Trajectory Sampling)
    • MBPO (Model-Based Policy Optimization)
    • PlaNet (Deep Planning Network)

Hierarchical RL

  • Decomposes complex tasks into hierarchical structure of subtasks
  • Uses multiple levels of temporal abstraction
  • Components typically include:
    • High-level policy (meta-controller): Selects subtasks/goals
    • Low-level policy (controller): Executes primitive actions
  • Popular frameworks:
    • Options Framework
    • HAM (Hierarchical Abstract Machines)
    • MAXQ Decomposition

Multi-Agent RL

  • Handles systems with multiple interacting agents
  • Key challenges:
    • Non-stationarity due to changing agent behaviors
    • Credit assignment across agents
    • Coordination and communication
  • Major approaches:
    • Independent Learning: Each agent learns independently
    • Centralized Training with Decentralized Execution (CTDE)
    • Fully Centralized: Joint action-value function
  • Notable algorithms:
    • QMIX: Monotonic value function factorization
    • MADDPG: Multi-agent extension of DDPG
    • MAPPO: Multi-agent PPO variant

Meta-RL

  • Learns to quickly adapt to new tasks using previous experience
  • Key components:
    • Meta-learner: Learns general learning strategy
    • Task-specific learner: Adapts to specific tasks
  • Common approaches:
    • Recurrent architectures (RL²)
    • Gradient-based meta-learning (MAML)
    • Context-based adaptation (PEARL)
  • Applications:
    • Few-shot task adaptation
    • Continuous learning and adaptation
    • Transfer learning across domains

Offline RL

  • Learns optimal policies from fixed datasets without environment interaction
  • Addresses challenges like:
    • Distribution shift
    • Out-of-distribution actions
    • Conservative policy updates
  • Key algorithms:
    • Conservative Q-Learning (CQL)
    • Behavior Regularized Actor Critic (BRAC)
    • Implicit Q-Learning (IQL)
  • Applications:
    • Healthcare
    • Robotics from demonstrations
    • Industrial process control

Each approach offers distinct advantages and faces unique challenges:

  • Sample Efficiency: Model-based methods typically require fewer interactions
  • Stability: Offline RL provides stable learning from fixed datasets
  • Scalability: Multi-agent methods face exponential complexity with agent count
  • Generalization: Meta-RL enables quick adaptation but requires diverse training tasks
  • Computational Cost: Hierarchical methods reduce search space but increase architectural complexity

· 4 min read
Mingtong Zhang

Sensors and perception are essential components in robotics, enabling robots to gather information about their environment and interpret it to make intelligent decisions. This section provides an overview of key topics in sensors and perception, highlighting the role of various sensors, data processing techniques, and the integration of sensory data for robotic systems.

Sensors for Robotics

Robots rely on different types of sensors to perceive their environment, including:

  • Proprioceptive Sensors: Measure internal states of the robot (e.g., joint angles, motor currents).

    • Encoders: Measure rotational or linear position, often used to determine joint positions.
    • Inertial Measurement Units (IMUs): Measure acceleration and rotational rates, used for balancing and motion tracking.
  • Exteroceptive Sensors: Measure external data from the environment.

    • Vision Sensors (Cameras): Capture images or video. Types include:
      • RGB Cameras: Capture color images.
      • Depth Cameras: Provide distance measurements to objects (e.g., RGB-D cameras like Microsoft Kinect).
      • Stereo Cameras: Use two cameras to infer depth through disparity.
    • LIDAR (Light Detection and Ranging): Emits laser pulses to measure distances to objects, creating 3D maps of environments.
    • Ultrasonic Sensors: Use sound waves to measure distances, often used for obstacle avoidance.
    • Infrared Sensors: Detect heat signatures or proximity, useful for close-range detection.
    • Touch Sensors: Detect physical contact or pressure, critical for robotic manipulation and tactile feedback.
    • Force/Torque Sensors: Measure forces and torques applied to a robot’s body or end-effector, providing tactile information and feedback during manipulation.
    • Radar: Measure distance and velocity of objects using radio waves.

Perception Systems

Robotic perception involves processing raw sensor data to extract useful information about the environment, enabling decision-making and task execution. Key aspects of perception systems include:

  • Preprocessing: Raw sensor data often needs to be filtered and processed to remove noise and improve reliability. Common techniques include:

    • Noise Filtering: Techniques like Kalman filters or particle filters are used to estimate the true state of the system in the presence of noisy measurements.
    • Sensor Fusion: Combines data from multiple sensors to produce more accurate and robust estimates (e.g., combining IMU data with GPS for precise localization).
  • Feature Extraction: Identifying key features or patterns from sensor data, such as edges in images, keypoints, or texture information.

    • Image Processing Techniques: Techniques such as edge detection, corner detection, and template matching.
    • 3D Point Cloud Processing: For LIDAR or depth camera data, methods include segmentation, registration, and feature extraction to understand 3D structures.
  • Object Recognition and Detection: Identifying and localizing objects in the environment, often using computer vision and machine learning techniques.

    • Classical Methods: Histogram of Oriented Gradients (HOG), Scale-Invariant Feature Transform (SIFT), etc.
    • Deep Learning-Based Approaches: Convolutional Neural Networks (CNNs) for object detection (e.g., YOLO, SSD) and instance segmentation (e.g., Mask R-CNN).
  • Scene Understanding: Building a semantic understanding of the environment, which can include tasks such as object segmentation, scene segmentation, and recognizing spatial relationships among objects.

    • Semantic Segmentation: Assigning a label to each pixel in an image.
    • SLAM (Simultaneous Localization and Mapping): Constructing a map of the environment while tracking the robot’s position within it.

Integration of Sensory Data

Combining sensory information to create a coherent and actionable understanding of the environment is crucial for robotic perception. Techniques for integrating sensory data include:

  • Sensor Fusion Algorithms: Integrating data from multiple sensors to improve accuracy and robustness (e.g., combining LIDAR and camera data for robust object detection).
  • Probabilistic Methods: Using probabilistic models, such as Bayesian networks, to account for uncertainty and variability in sensor data.
  • Graph-Based Representations: Representing sensory data and relationships between observations in graph structures to enable efficient querying and reasoning.

Challenges in Robotic Perception

Robotic perception systems face several challenges, including:

  • Environmental Variability: Changes in lighting, weather, or other environmental factors can affect sensor performance.
  • Data Noise and Uncertainty: Real-world sensor data is often noisy and uncertain, requiring robust algorithms for processing.
  • High-Dimensional Data: Sensors like cameras and LIDAR generate high-dimensional data, making processing computationally intensive.
  • Real-Time Requirements: Perception systems must often operate in real-time, balancing computational demands with speed.

Applications in Robotics

  • Navigation and Mapping: Using sensors to build maps, localize within them, and navigate through environments (e.g., autonomous vehicles, drones).
  • Robotic Manipulation: Leveraging tactile, force, and vision sensors to interact with objects precisely and adaptively.
  • Human-Robot Interaction (HRI): Using visual and auditory sensors to understand and respond to human actions and gestures.

By integrating various sensors and leveraging advanced perception algorithms, robots can perceive their environment accurately, enabling complex tasks ranging from autonomous navigation to fine-grained manipulation and human interaction.