Hostname: page-component-cb9f654ff-pvkqz Total loading time: 0 Render date: 2025-09-05T04:00:52.060Z Has data issue: false hasContentIssue false

Quasi-static contact-rich manipulation using dynamic movement primitives and haptic potential map

Published online by Cambridge University Press:  29 August 2025

Huu-Thiet Nguyen
Affiliation:
School of Mechanical and Aerospace Engineering, https://ror.org/02e7b5302 Nanyang Technological University , Singapore, Singapore
Lin Yang
Affiliation:
School of Mechanical and Aerospace Engineering, https://ror.org/02e7b5302 Nanyang Technological University , Singapore, Singapore
Chen Lv
Affiliation:
School of Mechanical and Aerospace Engineering, https://ror.org/02e7b5302 Nanyang Technological University , Singapore, Singapore
Domenico Campolo*
Affiliation:
School of Mechanical and Aerospace Engineering, https://ror.org/02e7b5302 Nanyang Technological University , Singapore, Singapore
*
Corresponding author: Domenico Campolo; Email: d.campolo@ntu.edu.sg

Abstract

Robots need a sense of touch to handle objects effectively, and force sensors provide a straightforward way to measure touch or physical contact. However, contact force data are typically sparse and difficult to analyze, as it only appears during contact and is often affected by noise. Therefore, many researchers have consequently relied on vision-based methods for robotic manipulation. However, vision has limitations, such as occlusions that block the camera’s view, making it ineffective or insufficient for dexterous tasks involving contact. This article presents a method for robotic systems operating under quasi-static conditions to perform contact-rich manipulation using only force/torque measurements. First, the interaction forces/torques between the manipulated object and its environment are collected in advance. A potential function is then constructed from the collected force/torque data using Gaussian process regression with derivatives. Next, we develop haptic dynamic movement primitives (Haptic DMPs) to generate robot trajectories. Unlike conventional DMPs, which primarily focus on kinematic aspects, our Haptic DMPs incorporate force-based interactions by integrating the constructed potential energy. The effectiveness of the proposed method is demonstrated through numerical tasks, including the classical peg-in-hole problem.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright
© The Author(s), 2025. Published by Cambridge University Press

Impact Statement

Visual simultaneous localization and mapping (SLAM) has been intensively studied and widely adopted over the years, becoming a key component of mobile robot navigation. However, when it comes to robotic manipulation—where navigation involves moving the arm and end-effector through contact-rich environments to accomplish tasks—vision-only approaches can easily fall short. Humans, on the other hand, can perform dexterous manipulation tasks using only haptic sensing. Inspired by the environment mapping technique in visual SLAM, this article proposes to establish a potential energy distribution via premeasured contact forces and torques. Using this distribution, the robot’s trajectory can be generated, helping it navigate the contact-rich environment and accomplish the task. The approach has the potential to change the way contact-rich tasks are executed in the future.

1. Introduction

In robotics, physical contact (or simply contact) occurs when a part of the robot or the object it handles touches other objects, humans, or the environment. Depending on the application, this contact can be accidental or intentional. Avoiding unwanted contact is crucial in applications such as collision avoidance (Koptev et al., Reference Koptev, Figueroa and Billard2021), whereas in other cases, contact is either inevitable or deliberately introduced (Suomalainen et al., Reference Suomalainen, Karayiannidis and Kyrki2022). In robotic assembly, maintaining contact with objects is essential for executing tasks such as insertion and drilling (Suomalainen et al., Reference Suomalainen, Karayiannidis and Kyrki2022; Yang et al., Reference Yang, Turlapati, Lu, Lv and Campolo2025; Yang et al., Reference Yang, Nguyen, Yu, Lv and Campolo2025). Robotic maneuvring that involves contact requires careful trajectory planning to ensure successful task execution. However, traditional planning methodologies often focus on avoiding direct contact, aiming for collision-free navigation (LaValle, Reference LaValle2006). In contact-rich manipulation, while preventing accidental collisions is crucial, occasional or deliberate contact is essential for task completion. Proper contact modeling can enhance contact-rich manipulation planning, but modeling contact between objects remains challenging due to its complex and nonlinear nature (Wirnshofer et al., Reference Wirnshofer, Schmitt, Meister, Wichert and Burgard2019). Therefore, it is important to establish a systematic framework for modeling and planning in contact-rich manipulation, enabling robots to effectively navigate contact-rich environments during manipulation tasks.

Vision-based approaches for robotic mapping, planning (Garg et al., Reference Garg, Sünderhauf, Dayoub, Morrison, Cosgun, Carneiro, Wu, Chin, Reid and Gould2020), and manipulation (Ehsani et al., Reference Ehsani, Han, Herrasti, VanderBilt, Weihs, Kolve, Kembhavi and Mottaghi2021) have attracted much attention from roboticists, aligning with the rapid development of computer vision applications in the last decade. The use of cameras has also led to the emergence and growth of visual simultaneous localization and mapping (Kazerouni et al., Reference Kazerouni, Fitzgerald, Dooly and Toal2022; Kok et al., Reference Kok, Solin and Schön2024). However, vision systems are prone to failure in many scenarios, such as poor lighting conditions or visual obstructions. These drawbacks are more apparent during contact, as the geometries or shapes of objects can create shadows on themselves and affect the cameras’ field of view. In such cases, other sensing techniques, such as haptics, may help. Humans also rely on haptics, or the sense of touch, to handle objects, especially when visibility is limited.

For contact-rich robotic manipulation, haptic information plays an essential role in perception, planning, and control (Diana and Marescaux, Reference Diana and Marescaux2015; Dawson-Elli and Adamczyk, Reference Dawson-Elli and Adamczyk2020). Various devices provide haptic feedback, including tactile sensors and force/torque sensors. Tactile sensors are designed to mimic the sense of touch in human skin, often incorporating specialized materials and techniques (Girão et al., Reference Girão, Ramos, Postolache and Pereira2013). They are mostly used for in-hand robotic perception, allowing robots to recognize objects through touch, similar to how humans explore their environment. For example, Bhattacharjee et al. (Reference Bhattacharjee, Shenoi, Park, Rehg and Kempn.d.) combined tactile sensing with vision to label pixels on visible surfaces using hidden Markov models. Another study (Strub et al., Reference Strub, Wörgötter, Ritter and Sandamirskayan.d.) used tactile data from a two-fingered robotic hand manipulating a polygon to infer the object’s shape. On the other hand, force/torque sensors measure the magnitude of contact forces and torque. Unlike tactile sensors, which typically generate high-dimensional outputs, force/torque sensors provide much lower-dimensional data (typically six-dimensional) and are commonly integrated into robot arms by manufacturers. Force measurements are particularly important in force-based control strategies, such as impedance control (Caccavale et al., Reference Caccavale, Natale, Siciliano and Villani1999), and in manipulation tasks that require precise force regulation, such as robotic assembly of delicate objects. However, force feedback is typically sparse as it is only available upon contact. This contrasts with vision-based data, where cameras can continuously capture a scene from a wide range of distances. Moreover, force data are often noisy due to factors, such as torque ripple, mechanical vibrations, and electrical interference (Katsura et al., Reference Katsura, Matsumoto and Ohnishi2007; Turlapati et al., Reference Turlapati, Gurnani, Ariffin, Kana, Wong, Han and Campolo2024). Therefore, constructing an environment map solely from force feedback is a challenging task. Some recent studies have explored the use of force/torque sensors to infer geometric information. For instance, Turlapati and Campolo (Reference Turlapati and Campolo2022) estimate object kinematics by integrating haptic feedback at contact points with an initial vision-based pose estimate. Another work (Suresh et al., Reference Suresh, Bauza, Yu, Mangelson, Rodriguez and Kaess2021) employs Gaussian process (GP) implicit surfaces on contact data to infer object geometry from planar pushing. While these approaches demonstrate the potential of haptics for geometric reasoning, a fully realized environment map derived purely from force/torque sensors, particularly for general manipulation tasks involving motion planning, has yet to be developed.

When a map is created that captures all objects in the robot’s operating space with which the robot can interact, trajectory planning can be performed for a manipulation task. Dynamic movement primitives (DMPs) (Ijspeert et al., Reference Ijspeert, Nakanishi, Hoffmann, Pastor and Schaal2013) have been widely adopted as an effective method for robot trajectory generation. DMPs model complex movements as nonlinear dynamical systems, incorporating stable behavior while allowing flexibility to follow learnt trajectories. DMPs have demonstrated robustness and generalizability properties, especially in the context of learning from demonstration (LfD) (Saveriano et al., Reference Saveriano, Abu-Dakka, Kramberger and Peternel2023). By adding a coupling term, DMPs can also be adapted for learning the feedback force during interactions. For instance, Gams et al. (Reference Gams, Nemec, Ijspeert and Ude2014) showed that this coupling term allows DMPs to handle interaction forces from the environment or another robot arm. However, their work did not address more complex tasks, such as assembly, which may involve multiple contact points or intermittent contact. It has also been shown that assembly tasks can be accomplished through DMPs and LfD (Wang et al., Reference Wang, Beltran-Hernandez, Wan and Harada2022; Zhao et al., Reference Zhao, Chen, Li and Ding2023). While LfD is an effective method for quick imitation of a task, the underlying policy, intention, or skill is difficult to extract from LfD, making it challenging to generalize to a broader class of similar tasks. This challenge arises because LfD primarily replicates movements from kinematic demonstrations, rather than teaching a systematic strategy for why one trajectory is chosen over another. Another approach for learning the parameters of the DMPs is through black-box optimization (BBO) (Stulp and Sigaud, Reference Stulp and Sigaud2013). A recent study demonstrated that a peg-in-hole task can be accomplished by combining DMPs with BBO algorithms (Yang et al., Reference Yang, Ariffin, Lou, Lv and Campolo2023, Reference Yang, Turlapati, Lu, Lv and Campolo2025).

In modeling manipulation tasks, a commonly used method is the quasi-static assumption, where the system is assumed to be in equilibrium and inertial effects are minimal (Whitney et al., Reference Whitney1982; Katayarna et al., Reference Katayarna, Taniai and Tanaka2022). Under this assumption, Campolo and Cardin (Campolo and Cardin, Reference Campolo and Cardinn.d., Reference Campolo and Cardin2025) introduced a potential-based framework for quasi-static manipulation that conceptualizes it as a planning problem on an implicit manifold. In this framework, the configuration space is split into two types of variables: directly controllable variables (or control inputs) and indirectly controllable variables (or internal states). The core concept revolves around the system’s total potential energy, represented as a function of these variables. Under quasi-static conditions, the planning problem is reduced to finding trajectories on an equilibrium manifold (EM), a set of points where the gradient of the potential energy with respect to the internal states equals zero. Using this framework, contact-rich manipulation can be smoothly analyzed in the space of directly controllable variables. The internal states then adjust to follow these variables along the EM. This approach differs from traditional methods that rely on predefined discrete stages or contact modes, such as the four-stage framework (approaching, searching, aligning, and inserting) with a different number of contact points (e.g., no contact, two-point contact, and three-point contact) (Lee et al., Reference Lee, Choi, Park, Jang, Park and Bae2022).

This study employs the framework where the potential energy is assumed to comprise two parts: (i) internal control energy, which results from the displacement between the desired (directly controllable) and actual (indirectly controllable) positions of the object, and (ii) external interaction energy, which arises from interactions such as elastic contact with the environment or gravitational forces. The external potential is built from premeasured forces/torques via GP regression (GPR) with derivatives (Solak et al., Reference Solak, Murray-Smith, Leithead, Leith and Rasmussen2002). Haptic-DMPs are proposed to generate manipulation trajectories based on the constructed potential function, guiding the robot to accomplish the manipulation tasks. In summary, the contributions of this study are as follows:

  • - The development of a haptic potential map of the environment based on observations of contact forces at various positions in space, using GPR with derivatives. The use of GPR to derive energy from forces makes the model resemble a physics-based machine.

  • - The development of haptic DMPs (Haptic-DMPs that account for elastic interactions between the robot and the object, as well as the object and the environment. Haptic-DMPs solve ordinary differential equations (ODEs) to determine the object’s path from the control trajectory.

  • - A systematic framework for trajectory generation in contact-rich manipulation tasks through the combination of GPR, Haptic-DMPs, and BBO.

2. Quasi-static mechanical manipulation

Consider a mechanical system governed solely by conservative forces (an example of such a system is shown in Figure 1). Adopting the framework proposed by Campolo and Cardin (Reference Campolo and Cardinn.d., Reference Campolo and Cardin2025), where the configuration space can be divided into directly controllable and indirectly controllable variables, with $ \boldsymbol{z}\in \mathcal{Z}\subset {\mathrm{\mathbb{R}}}^N $ denoting the internal states (indirectly controllable) and $ \boldsymbol{u}\in \mathcal{U}\subset {\mathrm{\mathbb{R}}}^K $ being the control inputs (directly controllable), there exists a total potential energy of the system as

(2.1) $$ W\left(\boldsymbol{z},\boldsymbol{u}\right):\mathcal{Z}\times \mathcal{U}\to \mathrm{\mathbb{R}}-2 mm $$

Figure 1. A robot manipulating an object while trying to maintain the object’s contact with the wall. Our framework considers both the indirectly controllable variable (state $ \boldsymbol{z} $ ) and the directly controllable variable (control input $ \mathbf{u} $ ). In this specific problem, $ \mathbf{u}={\left[{u}_x,{u}_y\right]}^T $ is the desired position of the robot’s end effector, $ \boldsymbol{z}={\left[{z}_x,{z}_y,{z}_{\theta}\right]}^T $ is the pose of the object (its $ xy $ -coordinates and $ \theta $ -rotation), and $ \boldsymbol{c}\left(\boldsymbol{z}\right) $ is the function that converts the pose of the object to the location of the contact point with the robot’s end effector.

This total potential can be partitioned into $ {W}_{ctrl}\left(\boldsymbol{z},\boldsymbol{u}\right) $ —the elastic energy arising from the displacement between the control inputs $ \boldsymbol{u} $ and the object’s states $ \boldsymbol{z} $ , and $ U\left(\boldsymbol{z}\right) $ —the energy resulting from the interaction between the object and its environment.

(2.2) $$ W\left(\boldsymbol{z},\boldsymbol{u}\right)={W}_{ctrl}\left(\boldsymbol{z},\boldsymbol{u}\right)+U\left(\boldsymbol{z}\right) $$

The term $ {W}_{ctrl}\left(\boldsymbol{z},\boldsymbol{u}\right) $ in Equation (2.2) can be described as

(2.3) $$ {W}_{ctrl}\left(\boldsymbol{z},\boldsymbol{u}\right)=\frac{1}{2}\parallel \boldsymbol{u}-\boldsymbol{c}\left(\boldsymbol{z}\right){\parallel}_{\boldsymbol{K}}^2\triangleq \frac{1}{2}{\left(\boldsymbol{u}-\boldsymbol{c}\left(\boldsymbol{z}\right)\right)}^T\boldsymbol{K}\left(\boldsymbol{u}-\boldsymbol{c}\left(\boldsymbol{z}\right)\right) $$

where $ \boldsymbol{K} $ is the stiffness matrix of the virtual spring and $ \boldsymbol{c}\left(\boldsymbol{z}\right):\mathcal{Z}\to \mathcal{U} $ is a conversion function.

The term $ U\left(\boldsymbol{z}\right) $ includes components such as gravitational potential $ {U}_g $ and elastic contact potential $ {U}_c $ . This study focuses on contact interactions, so we consider scenarios where $ {U}_g $ hardly changes during tasks and can thus be treated as negligible. For example, in tasks where manipulated objects are restricted to a horizontal surface, the vertical position ( $ {z}_z $ ), which represents the object’s position along the vertical axis, does not change. This allows $ {U}_g $ , typically expressed as $ {mgz}_z+\zeta $ (where $ m $ is the mass of the object, $ g $ is the gravitational acceleration, and $ \zeta $ is a constant offset), to be set to zero by choosing an appropriate constant ( $ \zeta $ ). In contrast, the contact potential, $ {U}_c $ , depends on various factors, such as the materials of the object and its environment, and their shapes or geometries, making it very difficult to compute $ {U}_c\left(\boldsymbol{z}\right) $ explicitly. Therefore, we rely on GPR to approximate $ {U}_c\left(\boldsymbol{z}\right) $ , which is presented in the next section. For simplicity, throughout the rest of the article, $ U $ refers to $ {U}_c $ .

2.1. Equilibrium manifold

Once the potential function $ W\left(\boldsymbol{z},\boldsymbol{u}\right) $ is explicitly defined, we can consider quasi-static manipulation as maneuvring on the so-called EM, which contains all mechanical equilibria $ {\boldsymbol{z}}^{\ast } $ as the solutions of $ {\nabla}_{\boldsymbol{z}}W\left(\boldsymbol{z},\boldsymbol{u}\right)=\mathbf{0} $ . In other words,

(2.4) $$ {\nabla}_{\boldsymbol{z}}W\left({\boldsymbol{z}}_m^{\ast },\boldsymbol{u}\right)=\mathbf{0}\in {\mathrm{\mathbb{R}}}^N. $$

We define $ {\nabla}_qW\equiv {\left[{\partial}_{q_1}W,\dots, {\partial}_{q_a}W\right]}^T $ , where the nabla (column) operator is defined as $ {\nabla}_q={\left[{\partial}_{q_1},\dots, {\partial}_{q_a}\right]}^T $ . Meanwhile, we define the shorthand notation $ {\nabla}_z^2\equiv \left({\nabla}_z{\nabla}_z^T\right)={\nabla}_z{\nabla}_z^T $ for Hessians and mixed-derivative operators. Here, the subscript $ m $ represents possibly different solutions for a single value of $ \boldsymbol{u} $ . This means that for the same control input, the object can have different positions or states in equilibrium. As such, the EM is defined when $ \boldsymbol{u} $ is seen as a parameter, that is,

(2.5) $$ {\mathcal{M}}^{eq}\hskip0.3em \triangleq \hskip0.3em \left\{\left(\boldsymbol{z},\boldsymbol{u}\right)\in \mathcal{Z}\times \mathcal{U}\;|\;{\nabla}_{\boldsymbol{z}}W\left(\boldsymbol{z},\boldsymbol{u}\right)=\mathbf{0}\right\} $$

is a smooth embedded submanifold in the ambient space $ \mathcal{Z}\times \mathcal{U} $ . The state transitions are controlled purely by the robotic agent $ \boldsymbol{u} $ . Meanwhile, a point is strictly stable when its Hessian is positive definite, that is, $ {\left.{\nabla}_{\boldsymbol{zz}}^2W\right|}_{\ast}\succ 0 $ .

3. GPR with derivatives for mapping the environment

In this section, we map the environment by means of potential energy. More specifically, we aim to approximate $ U\left(\boldsymbol{z}\right) $ in Equation (2.2) from observations of contact forces/torques. Now, let us consider a general case, an unknown but differentiable scalar function $ f:{\mathrm{\mathbb{R}}}^N\to \mathrm{\mathbb{R}} $ ; $ \boldsymbol{z}\mapsto f\left(\boldsymbol{z}\right) $ for which readings are available at specific locations. Normal regression can be used to estimate the function $ y=f\left(\boldsymbol{z}\right) $ by using its observed values. However, in some scenarios, the observations of $ y $ can be difficult to obtain or even infeasible, but the observations of derivatives $ \partial y/\partial {z}_i $ are abundant. The contact energy $ U\left(\boldsymbol{z}\right) $ in Equation (2.2) is an example where direct measurement is infeasible, but its derivatives, forces, and torques collectively denoted as $ \boldsymbol{F} $ , are measurable, noting that $ -\nabla U\left(\boldsymbol{z}\right)=\boldsymbol{F} $ in conservative systems. In this case, GPR can be used. We will look at how this can be done, from reviewing normal GPR using function value measurements to adapting to deal with function derivative measurements.

A GP is normally described as

(3.1) $$ f\left(\boldsymbol{z}\right)\sim \mathcal{GP}\left(\mu \left(\boldsymbol{z}\right),\kappa \left(\boldsymbol{z},{\boldsymbol{z}}^{\prime}\right)\right) $$

where $ \mu :{\mathrm{\mathbb{R}}}^N\to \mathrm{\mathbb{R}} $ is the mean function, and $ \kappa :{\mathrm{\mathbb{R}}}^N\times {\mathrm{\mathbb{R}}}^N\to \mathrm{\mathbb{R}} $ is the covariance function (or kernel) $ \kappa \left(\boldsymbol{z},{\boldsymbol{z}}^{\prime}\right)=\operatorname{cov}\left(f\left(\boldsymbol{z}\right),f\left({\boldsymbol{z}}^{\prime}\right)\right) $ . Additionally, for any finite set of input points, the corresponding function values follow a multivariate normal (Gaussian) distribution (Williams and Rasmussen, Reference Williams and Rasmussen2006). Using the concept of GP, the GPR can be formulated as follows:

Given $ S $ noisy observations $ {y}_1,{y}_2,\dots, {y}_S $ of $ f\left(\boldsymbol{z}\right) $ at corresponding points $ {\boldsymbol{z}}_1,{\boldsymbol{z}}_2,\dots, {\boldsymbol{z}}_S $ , we want to estimate the unknown function $ f\left(\boldsymbol{z}\right) $ by regression. In this case, we can consider that $ f\left(\boldsymbol{z}\right) $ follows a GP with zero mean, that is, $ f\left(\boldsymbol{z}\right)\sim \mathcal{GP}\left(0,\kappa \left(\boldsymbol{z},{\boldsymbol{z}}^{\prime}\right)\right) $ . Denoting $ \boldsymbol{y}={\left[{y}_1\hskip0.5em {y}_2\hskip0.5em \dots \hskip0.5em {y}_S\right]}^T\in {\mathrm{\mathbb{R}}}^{S\times 1} $ , the predicted value $ {y}^{\ast } $ at a new input $ {\boldsymbol{z}}^{\ast } $ are then given by its mean $ \overline{y^{\ast }} $ and variance $ \unicode{x1D54D}\left[{y}^{\ast}\right] $ , or $ {y}^{\ast}\sim \mathcal{N}\left(\overline{y^{\ast }},\unicode{x1D54D}\left[{y}^{\ast}\right]\right) $ , calculated as

(3.2) $$ \overline{y^{\ast }}={\boldsymbol{k}}^{\ast T}{\left(\boldsymbol{K}+{\sigma}_n^2\boldsymbol{I}\right)}^{-1}\boldsymbol{y} $$
(3.3) $$ \unicode{x1D54D}\left[{y}^{\ast}\right]={k}^{\ast \ast }-{\boldsymbol{k}}^{\ast T}{\left(\boldsymbol{K}+{\sigma}_n^2\boldsymbol{I}\right)}^{-1}{\boldsymbol{k}}^{\ast } $$

Here, $ \boldsymbol{K}=\left[\begin{array}{c}{\kappa}_{i{i}^{\prime }}\end{array}\right]\in {\mathrm{\mathbb{R}}}^{S\times S} $ with $ {\kappa}_{i{i}^{\prime }}=\kappa \left({\boldsymbol{z}}_i,{\boldsymbol{z}}_{i^{\prime }}\right),1\le i,{i}^{\prime}\le S $ is the covariance matrix; $ {\boldsymbol{k}}^{\ast }=\left[\begin{array}{c}{\kappa}_i^{\ast}\end{array}\right]\in {\mathrm{\mathbb{R}}}^{S\times 1} $ with $ {\kappa}_i^{\ast }=\kappa \left({\boldsymbol{z}}^{\ast },{\boldsymbol{z}}_i\right)\in \mathrm{\mathbb{R}},1\le i\le S $ ; $ {k}^{\ast \ast }=\kappa \left({\boldsymbol{z}}^{\ast },{\boldsymbol{z}}^{\ast}\right)\in \mathrm{\mathbb{R}} $ ; $ {\sigma}_n^2 $ is the variance of the noise in the observations; and $ \boldsymbol{I}\in {\mathrm{\mathbb{R}}}^{S\times S} $ is the identity matrix.

Remark 1: For simplicity, this study assumes a zero mean function a priori. This assumption is particularly suitable when no prior knowledge of the function to be approximated is available. However, if $ f $ follows Equation (3.1) with a nonzero prior mean function $ \mu $ , a change of variable to $ {f}^{\prime }=f-\mu $ ensures that $ {f}^{\prime } $ follows $ \mathcal{GP}\left(0,\kappa \right) $ .

3.1. GPR with derivatives

An important and favorable property of GPs is their closure under linear operations (De Roos et al., Reference De Roos, Gessner and Hennig2021). As differentiation is a linear operator, the derivative of a GP is also a GP (Solak et al., Reference Solak, Murray-Smith, Leithead, Leith and Rasmussen2002). Taking into account the derivatives of $ f\left(\boldsymbol{z}\right) $ , the covariance functions in Equation (3.1) should follow the equations below for each element $ {z}_i $ of vector $ \boldsymbol{z} $ and elements $ {z}_i^{\prime } $ and $ {z}_j^{\prime } $ of vector $ {\boldsymbol{z}}^{\prime } $

(3.4) $$ \operatorname{cov}\left(\frac{\partial f\left(\boldsymbol{z}\right)}{\partial {z}_i},f\left({\boldsymbol{z}}^{\prime}\right)\right)=\frac{\partial \kappa \left(\boldsymbol{z},{\boldsymbol{z}}^{\prime}\right)}{\partial {z}_i},\hskip0.48em \operatorname{cov}\left(\frac{\partial f\left(\boldsymbol{z}\right)}{\partial {z}_i},\frac{\partial f\left({\boldsymbol{z}}^{\prime}\right)}{\partial {z}_j^{\prime }}\right)=\frac{\partial^2\kappa \left(\boldsymbol{z},{\boldsymbol{z}}^{\prime}\right)}{\partial {z}_i\partial {z}_j^{\prime }} $$

Applying for the vector $ \boldsymbol{z} $ results in the gradient vector $ \nabla $ and the Hessian matrix $ {\nabla}^2 $ . Now, we can rewrite Equation (3.1) as

(3.5) $$ \left[\begin{array}{c}f\left(\boldsymbol{z}\right)\\ {}\nabla f\left(\boldsymbol{z}\right)\end{array}\right]\sim \mathcal{GP}\left(\left[\begin{array}{c}\mu \left(\boldsymbol{z}\right)\\ {}\nabla \mu \left(\boldsymbol{z}\right)\end{array}\right],\left[\begin{array}{cc}\kappa \left(\boldsymbol{z},{\boldsymbol{z}}^{\prime}\right)& {\nabla}_{\boldsymbol{z}\hbox{'}}^T\kappa \left(\boldsymbol{z},{\boldsymbol{z}}^{\prime}\right)\\ {}{\nabla}_z\kappa \left(\boldsymbol{z},{\boldsymbol{z}}^{\prime}\right)& {\nabla}^2\kappa \left(\boldsymbol{z},{\boldsymbol{z}}^{\prime}\right)\end{array}\right]\right) $$

With this in mind, now assume that we have $ S $ scalar (noisy) readings $ \left\{{y}_1,\dots, {y}_S\right\} $ at $ S $ locations $ \left\{{\boldsymbol{z}}_1,\dots, {\boldsymbol{z}}_S\right\} $ , and $ M $ (noisy) gradients $ \left\{\nabla {y}_1^{\diamond },\dots, \nabla {y}_M^{\diamond}\right\} $ at $ M $ locations $ \left\{{\boldsymbol{z}}_1^{\diamond },\dots, {\boldsymbol{z}}_M^{\diamond}\right\} $ . The extended observation vector is now $ \overset{\smile }{\boldsymbol{y}}\hskip0.3em \triangleq \hskip0.3em {\left[\begin{array}{ccccccc}{y}_1& {y}_2& \dots & {y}_S& \nabla {y}_1^{\diamond }& \dots & \nabla {y}_M^{\diamond}\end{array}\right]}^T\in {\mathrm{\mathbb{R}}}^{\left(S+ MN\right)\times 1} $ . The covariance matrix is now built as follows

(3.6) $$ \overset{\smile }{\boldsymbol{K}}=\left[\begin{array}{cc}{\boldsymbol{K}}_{11}& {\boldsymbol{K}}_{12}\\ {}{\boldsymbol{K}}_{21}& {\boldsymbol{K}}_{22}\end{array}\right]\in {\mathrm{\mathbb{R}}}^{\left(S+ MN\right)\times \left(S+ MN\right)},\mathrm{where} $$
$$ {\displaystyle \begin{array}{l}{\boldsymbol{K}}_{11}=\left[\begin{array}{c}{\kappa}_{i{i}^{\prime }}\end{array}\right]\in {\mathrm{\mathbb{R}}}^{S\times S}\hskip0.3em \mathrm{with}\hskip0.3em {\kappa}_{i{i}^{\prime }}=\kappa \left({\boldsymbol{z}}_i,{\boldsymbol{z}}_{i^{\prime }}\right)\in \mathrm{\mathbb{R}},1\le i,{i}^{\prime}\le S,\\ {}{\boldsymbol{K}}_{12}=\left[\begin{array}{c}{\boldsymbol{\kappa}}_{ij}\end{array}\right]\in {\mathrm{\mathbb{R}}}^{S\times MN}\hskip0.3em \mathrm{with}\hskip0.3em {\boldsymbol{\kappa}}_{ij}={\nabla}_{{\boldsymbol{z}}_j^{\diamond}}^T\kappa \left({\boldsymbol{z}}_i,{\boldsymbol{z}}_j^{\diamond}\right)\in {\mathrm{\mathbb{R}}}^{1\times N},1\le i\le S,1\le j\le M,\\ {}{\boldsymbol{K}}_{21}=\left[\begin{array}{c}{\boldsymbol{\kappa}}_{ji}\end{array}\right]\in {\mathrm{\mathbb{R}}}^{MN\times S}\hskip0.3em \mathrm{with}\hskip0.3em {\boldsymbol{\kappa}}_{ji}={\nabla}_{{\boldsymbol{z}}_i}\kappa \left({\boldsymbol{z}}_j^{\diamond },{\boldsymbol{z}}_i\right)\in {\mathrm{\mathbb{R}}}^{N\times 1},1\le j\le M,1\le i\le S,\\ {}{\boldsymbol{K}}_{22}=\left[\begin{array}{c}{\boldsymbol{\kappa}}_{j{j}^{\prime }}\end{array}\right]\in {\mathrm{\mathbb{R}}}^{MN\times MN}\hskip0.3em \mathrm{with}\hskip0.3em {\boldsymbol{\kappa}}_{j{j}^{\prime }}={\nabla}^2\kappa \left({\boldsymbol{z}}_j^{\diamond },{\boldsymbol{z}}_{j^{\prime}}^{\diamond}\right)\in {\mathrm{\mathbb{R}}}^{N\times N},1\le j,\hskip0.3em {j}^{\prime}\le M.\end{array}} $$

Assuming that the prior unknown function $ y=f\left(\boldsymbol{z}\right) $ follows a GP with zero mean, that is, $ \mu \left(\boldsymbol{z}\right)=0 $ and $ \nabla \mu \left(\boldsymbol{z}\right)=\mathbf{0} $ , similar to Equation (3.2), the mean of the posterior $ {y}^{\ast } $ can be calculated as

(3.7) $$ \overline{y^{\ast }}={\overset{\smile }{\boldsymbol{k}}}^{\ast T}{\left(\overset{\smile }{\boldsymbol{K}}+{\sigma}_n^2\boldsymbol{I}\right)}^{-1}\overset{\smile }{\boldsymbol{y}},\mathrm{with} $$
(3.8) $$ {\overset{\smile }{\boldsymbol{k}}}^{\ast T}=\left[\begin{array}{cc}{\boldsymbol{k}}_1^{\ast T}& {\boldsymbol{k}}_2^{\ast T}\end{array}\right]\in {\mathrm{\mathbb{R}}}^{1\times \left(S+ MN\right)} $$

where $ {\boldsymbol{k}}_1^{\ast T}=\left[\begin{array}{c}{\kappa}_i^{\ast}\end{array}\right]\in {\mathrm{\mathbb{R}}}^{1\times S} $ with $ {\kappa}_i^{\ast }=\kappa \left({\boldsymbol{z}}_i,{\boldsymbol{z}}^{\ast}\right)\in \mathrm{\mathbb{R}},1\le i\le S $ , and

$ {\boldsymbol{k}}_2^{\ast T}=\left[\begin{array}{c}{\boldsymbol{\kappa}}_j^{\ast}\end{array}\right]\in {\mathrm{\mathbb{R}}}^{1\times MN} $ with $ {\boldsymbol{\kappa}}_j^{\ast }={\nabla}_{{\boldsymbol{z}}_j^{\diamond}}^T\kappa \left({\boldsymbol{z}}_j^{\diamond },{\boldsymbol{z}}^{\ast}\right)\in {\mathrm{\mathbb{R}}}^{1\times N},1\le j\le M $ .

Given a fixed set of observations, it should follow that $ \overset{\smile }{\boldsymbol{\alpha}}\hskip0.3em \triangleq \hskip0.3em {\left(\overset{\smile }{\boldsymbol{K}}+{\sigma}_n^2\boldsymbol{I}\right)}^{-1}\overset{\smile }{\boldsymbol{y}}\in {\mathrm{\mathbb{R}}}^{\left(S+ MN\right)\times 1} $ is a constant vector. We can rewrite Equation (3.7) as

(3.9) $$ \overline{y^{\ast }}={\overset{\smile }{\boldsymbol{k}}}^{\ast T}\times \overset{\smile }{\boldsymbol{\alpha}} $$

Taking derivatives with respect to $ {\boldsymbol{z}}^{\ast } $ gives $ \overline{\nabla_{{\boldsymbol{z}}^{\ast }}{y}^{\ast }}={\nabla}_{{\boldsymbol{z}}^{\ast }}{\overset{\smile }{\boldsymbol{k}}}^{\ast T}\times \overset{\smile }{\boldsymbol{\alpha}} $ and $ \overline{\nabla_{{\boldsymbol{z}}^{\ast}}^2{y}^{\ast }}={\nabla}_{{\boldsymbol{z}}^{\ast}}^2{\overset{\smile }{\boldsymbol{k}}}^{\ast T}\otimes \hskip0.3em \overset{\smile }{\boldsymbol{\alpha}} $ . Simplifying the notations

(3.10) $$ \overline{\nabla {y}^{\ast }}=\nabla {\overset{\smile }{\boldsymbol{k}}}^{\ast T}\times \overset{\smile }{\boldsymbol{\alpha}}\in {\mathrm{\mathbb{R}}}^{N\times 1} $$
(3.11) $$ \overline{\nabla^2{y}^{\ast }}={\nabla}^2{\overset{\smile }{\boldsymbol{k}}}^{\ast T}\otimes \hskip0.3em \overset{\smile }{\boldsymbol{\alpha}}\in {\mathrm{\mathbb{R}}}^{N\times N} $$

where $ \nabla {\overset{\smile }{\boldsymbol{k}}}^{\ast T}\in {\mathrm{\mathbb{R}}}^{N\times \left(S+ MN\right)} $ , and $ {\nabla}^2{\overset{\smile }{\boldsymbol{k}}}^{\ast T}\in {\mathrm{\mathbb{R}}}^{N\times N\times \left(S+ MN\right)} $ (Dattorro, Reference Dattorro2010). Here, $ \otimes $ denotes tensor multiplication, which contracts the third dimension of $ {\nabla}^2{\overset{\smile }{\boldsymbol{k}}}^{\ast T} $ with the first dimension of $ \overset{\smile }{\boldsymbol{\alpha}} $ .

3.2. Approximation of U(z) using GPR with derivatives

In this work, the force/torque observations are obtained by maneuvring the object within the environment and recording the corresponding positions and forces/torques. This data collection involves instances where the object is either in free space, that is, no contact, or in contact with the environment. These force and torque observations represent derivatives of $ U\left(\boldsymbol{z}\right) $ , as $ -\nabla U\left(\boldsymbol{z}\right)=\boldsymbol{F} $ . The energy function $ U\left(\boldsymbol{z}\right) $ is then constructed using GPR. The incorporation of derivatives makes the GPR resemble a physics-based model (Cross et al., Reference Cross, Rogers, Pitchforth, Gibson, Zhang and Jones2024). Using Equation (3.5), $ U\left(\boldsymbol{z}\right) $ is modeled as a GP with zero mean,

(3.12) $$ \left[\begin{array}{c}U\left(\boldsymbol{z}\right)\\ {}\nabla U\left(\boldsymbol{z}\right)\end{array}\right]\sim \mathcal{GP}\left(\left[\begin{array}{c}0\\ {}\mathbf{0}\end{array}\right],\left[\begin{array}{cc}\kappa \left(\boldsymbol{z},{\boldsymbol{z}}^{\prime}\right)& {\nabla}_{\boldsymbol{z}\hbox{'}}^T\kappa \left(\boldsymbol{z},{\boldsymbol{z}}^{\prime}\right)\\ {}{\nabla}_z\kappa \left(\boldsymbol{z},{\boldsymbol{z}}^{\prime}\right)& {\nabla}^2\kappa \left(\boldsymbol{z},{\boldsymbol{z}}^{\prime}\right)\end{array}\right]\right) $$

Using Equations (3.9), (3.10), and (3.11), we can estimate the function $ U\left(\boldsymbol{z}\right) $ and its first- and second-order gradients $ \nabla U\left(\boldsymbol{z}\right) $ , $ {\nabla}^2U\left(\boldsymbol{z}\right) $ respectively.

Remark 2: The current study represents an initial exploration in which we primarily utilize the posterior predictive mean of GPR to map the environment and derive a control policy for robot manipulation tasks. Incorporating the posterior predictive covariance within this framework can enhance uncertainty-aware decision-making in these tasks and remains an open area for further investigation.

Remark 3: A key limitation of conventional GPR is its computational complexity, which scales poorly with the number of observations and may become a bottleneck in large-scale scenarios. The current framework focuses on offline training of the GPR model using pre-collected data, followed by an offline policy search based on the posterior predictive mean using Haptic-DMPs (as depicted in Figure 2). Policy execution, however, can be performed in real time. Reducing computational complexity remains an important consideration, and approaches such as scalable GPR (Liu et al., Reference Liu, Ong, Shen and Cai2020) offer potential solutions.

Figure 2. Planning framework with GPR, Haptic-DMPs, and BBO. BBO selects the optimal parameter $ \mathtt{\varTheta} $ from $ R $ random deviations from the optimal parameter $ {\mathtt{\varTheta}}_{i-1}^{\ast } $ of the previous iteration.

4. Haptic-DMPs

With the contact potential $ U\left(\boldsymbol{z}\right) $ serving as an environment map and now approximated using GPR on contact force/torque observations (left part of Figure 2), we are ready to plan the robot’s motion. In this section, we introduce Haptic-DMPs and the BBO technique used to train their parameters (right part of Figure 2). The key idea is to plan within the space of directly controllable variables while ensuring that the entire system (both $ \boldsymbol{u} $ and $ \boldsymbol{z} $ ) remains on the implicitly defined EM, following an approach similar to our previous work on continuation (Yang et al., Reference Yang, Nguyen, Lv, Campolo and Cardin2025). Before introducing Haptic-DMPs, we briefly revisit the standard DMP formulation.

4.1. Dynamic movement primitives

DMPs have been introduced (Ijspeert et al., Reference Ijspeert, Nakanishi, Hoffmann, Pastor and Schaal2013) as an attractor for a simple second-order linear system, such as a damped spring model $ \tau \ddot{y}={\alpha}_z\left({\beta}_z\left(g-y\right)-\dot{y}\right)+f $ with $ y $ is the position and $ g $ is the goal position. With $ z $ being the velocity, the dynamic equation can be rewritten in first-order notation as

(4.1) $$ \left\{\begin{array}{c}\tau \dot{z}={\alpha}_z\left({\beta}_z\left(g-y\right)-z\right)+f\\ {}\tau \dot{y}=z\end{array}\right. $$

where $ \tau >0 $ is a time scaling constant, $ {\alpha}_z $ and $ {\beta}_z $ are positive constants, and $ f $ is a forcing term. $ f $ can be chosen as a linear combination of $ P $ nonlinear radial basis functions:

(4.2) $$ f(x)=\frac{\sum_{i=1}^P{\Psi}_i(x){\theta}_i}{\sum_{i=1}^P{\Psi}_i(x)}x $$

with $ {\theta}_i $ being adjustable weights (or parameters), and the basis functions as $ {\Psi}_i(x)=\exp \left(-{h}_i{\left(x-{c}_i\right)}^2\right) $ , where $ {h}_i $ and $ {c}_i $ are constants that determine the width and centers of the basis functions, and $ x $ is a phase variable, as the solution of the canonical system $ \tau \dot{x}=-{\alpha}_xx $ .

The weights $ \boldsymbol{\Theta} ={\left[{\theta}_1,\dots, {\theta}_P\right]}^T $ of the DMP can be learnt or calculated such that the resulted trajectory meets the task requirements.

4.2. Haptic-DMPs for quasi-static systems

For a quasi-static system with the total energy described by Equation (2.2), we will perform motion planning for directly controllable variables $ \boldsymbol{u} $ . This approach is necessary because the EM is implicitly defined, meaning that the state variables $ \boldsymbol{z}\left(\boldsymbol{u}\right) $ are not explicitly known. In real-world scenarios, this reflects the fact that the exact state $ \boldsymbol{z} $ of objects cannot be determined before issuing the control command $ \boldsymbol{u} $ . Therefore, a method is required to explore and compute $ \boldsymbol{z}(t) $ .

Given any control state $ {\boldsymbol{u}}_0\in {\mathrm{\mathbb{R}}}^K $ , our goal is to compute the equilibrium state $ {\boldsymbol{z}}^{\ast } $ satisfying the quasi-static condition $ {\nabla}_{\boldsymbol{z}}W\left({\boldsymbol{z}}_0^{\ast },{\boldsymbol{u}}_0\right)=\mathbf{0} $ . Starting from an initial guess $ {\boldsymbol{z}}_0 $ , $ {\boldsymbol{z}}^{\ast } $ can be computed by an ODE using Newton’s method (or natural gradient) as below:

(4.3) $$ \left\{\begin{array}{c}\frac{d}{dt}\boldsymbol{z}(t)=-\eta {\left({\nabla}_{\boldsymbol{z}\boldsymbol{z}}^2W\right)}^{-1}{\nabla}_{\boldsymbol{z}}W\\ {}\boldsymbol{z}(0)={\boldsymbol{z}}_0\end{array}\right. $$

where $ \eta $ is a positive scalar representing step size in the ODE (Schneebeli and Wihler, Reference Schneebeli and Wihler2011), influencing the convergence of the ODE. If $ \eta $ is too small, the ODE may not converge in a given time. Conversely, if $ \eta $ is too large, the ODE could jump to a different branch of the EM. This ODE is presented by the red vector from $ \left({\boldsymbol{z}}_0,{\boldsymbol{u}}_0\right) $ to $ \left({\boldsymbol{z}}_0^{\ast },{\boldsymbol{u}}_0\right) $ in Figure 3(a).

Figure 3. Illustrations of Haptic-DMPs: the left figure illustrates the concept described in Equation (4.6); the right figure explains how Haptic-DMPs compute the control policy $ \mathbf{u}(t) $ and the object’s state $ \mathbf{z}(t) $ simultaneously given an initial position $ {\mathbf{u}}_0 $ and a target position $ {\mathbf{u}}_T $ . Once the computation is complete, each policy returns a haptic cost, as defined in Equation (4.7).

Since all valid states must remain on this manifold, finding a control policy requires exploration within it. However, we can only directly control $ \boldsymbol{u} $ , while $ \boldsymbol{z}\left(\boldsymbol{u}\right) $ remains implicit. Therefore, we need an approach that enables evolution on the EM. For nonconstant $ \boldsymbol{u}(t) $ , from the equilibrium condition $ {\nabla}_{\boldsymbol{z}}W\left(\boldsymbol{z},\boldsymbol{u}\right)=\mathbf{0} $ , differentiating with respect to time $ t $ , we have

(4.4) $$ {\nabla}_{\boldsymbol{zz}}^2 W\delta \boldsymbol{z}+{\nabla}_{\boldsymbol{uz}}^2 W\delta \boldsymbol{u}=0 $$
(4.5) $$ \Rightarrow \delta \boldsymbol{z}=-{\left({\nabla}_{\boldsymbol{zz}}^2W\right)}^{-1}{\nabla}_{\boldsymbol{uz}}^2 W\delta \boldsymbol{u} $$

which describes the tangent variation. This behavior is represented by the blue vector in Figure 3(a), which lies in the tangent space at the point $ \left({\boldsymbol{z}}_0^{\ast },{\boldsymbol{u}}_0\right) $ , depicted as the blue plane in Figure 3(a). By combining both behaviors described in Equations (4.3) and (4.5), we can determine $ \boldsymbol{z}(t) $ using the following equation, ensuring that the state remains on the EM, which is represented by the curved surface in Figure 3(a).

(4.6) $$ d\boldsymbol{z}=-{\left({\nabla}_{\boldsymbol{z}\boldsymbol{z}}^2W\right)}^{-1}{\nabla}_{\boldsymbol{uz}}^2 Wd\boldsymbol{u}-\eta {\left({\nabla}_{\boldsymbol{z}\boldsymbol{z}}^2W\right)}^{-1}{\nabla}_{\boldsymbol{z}} Wdt $$

In this equation, the first term aims to slide the state $ \boldsymbol{z} $ in the tangent space with the EM, and the second term aims to pull $ \boldsymbol{z} $ toward the EM in the normal direction.

Using Equation (4.6), the Haptic-DMPs are then introduced based on the following differential equations:

(4.7a) $$ \tau \dot{\boldsymbol{u}}=\boldsymbol{v}, $$
(4.7b) $$ \tau \dot{\boldsymbol{v}}={\alpha}_v\left({\beta}_v\left({\boldsymbol{u}}_T-\boldsymbol{u}\right)-\boldsymbol{v}\right)+\boldsymbol{f}\left(\boldsymbol{x}\right), $$
(4.7c) $$ \dot{\boldsymbol{z}}=-{\left({\nabla}_{\boldsymbol{z}\boldsymbol{z}}^2W\right)}^{-1}{\nabla}_{\boldsymbol{uz}}^2W\boldsymbol{v}-\eta {\left({\nabla}_{\boldsymbol{z}\boldsymbol{z}}^2W\right)}^{-1}{\nabla}_{\boldsymbol{z}}W, $$
(4.7d) $$ \dot{\phi}=\sqrt{{\boldsymbol{v}}^T{\boldsymbol{G}}_m^2\left(\boldsymbol{u}\right)\boldsymbol{v}} $$

with

(4.8) $$ {\boldsymbol{G}}_m\left(\boldsymbol{u}\right)\hskip0.3em \triangleq \hskip0.3em {\nabla}_{\boldsymbol{uu}}^2W-{\nabla}_{\boldsymbol{uz}}^2W{\left({\nabla}_{\boldsymbol{zz}}^2W\right)}^{-1}{\nabla}_{\boldsymbol{zu}}^2W $$

Here, the first two equations—Equations (4.7a) and (4.7b)—correspond to conventional DMPs, but for the control input $ \boldsymbol{u} $ , Equation (4.7c) resembles Equation (4.6) above. Equations (4.7c) and (4.8) involve computations of different gradients, including $ {\nabla}_{\boldsymbol{zz}}^2W\left(\boldsymbol{z},\boldsymbol{u}\right) $ , $ {\nabla}_{\boldsymbol{uz}}^2W\left(\boldsymbol{z},\boldsymbol{u}\right) $ , $ {\nabla}_{\boldsymbol{z}}W\left(\boldsymbol{z},\boldsymbol{u}\right) $ , and $ {\nabla}_{\boldsymbol{uu}}^2W\left(\boldsymbol{z},\boldsymbol{u}\right) $ . More specifically, we have

(4.9a) $$ {\nabla}_{\boldsymbol{zz}}^2W\left(\boldsymbol{z},\boldsymbol{u}\right)={\nabla}_{\boldsymbol{zz}}^2{W}_{ctrl}\left(\boldsymbol{z},\boldsymbol{u}\right)+{\nabla}^2U\left(\boldsymbol{z}\right) $$
(4.9b) $$ {\nabla}_{\boldsymbol{uz}}^2W\left(\boldsymbol{z},\boldsymbol{u}\right)={\nabla}_{\boldsymbol{uz}}^2{W}_{ctrl}\left(\boldsymbol{z},\boldsymbol{u}\right) $$
(4.9c) $$ {\nabla}_{\boldsymbol{z}}W\left(\boldsymbol{z},\boldsymbol{u}\right)={\nabla}_{\boldsymbol{z}}{W}_{ctrl}\left(\boldsymbol{z},\boldsymbol{u}\right)+\nabla U\left(\boldsymbol{z}\right) $$
(4.9d) $$ {\nabla}_{\boldsymbol{uu}}^2W\left(\boldsymbol{z},\boldsymbol{u}\right)={\nabla}_{\boldsymbol{uu}}^2{W}_{ctrl}\left(\boldsymbol{z},\boldsymbol{u}\right) $$

The gradients related to $ {W}_{ctrl}\left(\boldsymbol{z},\boldsymbol{u}\right) $ can be calculated from its explicit function, such as Equation (2.3), while the gradients of $ U\left(\boldsymbol{z}\right) $ can be computed as discussed in Section 3.2. In Equation (4.7d), $ {\boldsymbol{G}}_m\left(\boldsymbol{u}\right) $ is the control Hessian, and $ \phi $ is referred to as the haptic cost, which will be used in the optimization process later on. This haptic cost is equivalent to the accumulated control energy required along the trajectory when the robot follows a control policy $ \boldsymbol{u}(t) $ (Campolo and Cardin, Reference Campolo and Cardin2025). A higher $ \phi $ value indicates that the robot requires greater control force during manipulation. For many manipulation tasks, the goal is to minimise this total control energy, making $ \phi $ a suitable candidate for the cost function in our BBO algorithm, as presented next.

To learn the weights (parameters) $ \boldsymbol{\Theta} $ of Haptic-DMPs, a BBO algorithm is employed (Stulp and Sigaud, Reference Stulp and Sigaud2013; Yang et al., Reference Yang, Ariffin, Lou, Lv and Campolo2023; Yang et al., Reference Yang, Turlapati, Lv and Campolo2025). As shown in Figure 2, in the $ {i}^{\mathrm{th}} $ iteration, we perform $ R $ rollouts (indexed by $ r $ ), where each rollout ( $ {\boldsymbol{\Theta}}_i^r $ ) involves sampling from a Gaussian distribution $ \mathcal{N} $ as described in Equation (4.10). The distribution is centered around the optimal parameters $ {\boldsymbol{\Theta}}_{i-1}^{\ast } $ obtained from the $ {\left(i-1\right)}^{\mathrm{th}} $ iteration

(4.10) $$ {\boldsymbol{\Theta}}_i^r=\mathcal{N}\left({\boldsymbol{\Theta}}_{i-1}^{\ast },{\sigma}_{\Theta}\right) $$

Here, $ {\sigma}_{\Theta} $ denotes the variance of exploration. Once the new parameter $ {\boldsymbol{\Theta}}_i^r $ for a rollout $ r $ is generated, Haptic-DMPs, as shown in Equation (4.7), compute a corresponding control policy $ {\boldsymbol{u}}_i^r(t) $ , given the initial pose $ {\boldsymbol{u}}_0 $ and the estimated target $ {\boldsymbol{u}}_T $ . The haptic cost $ {\phi}_i^r $ for the rollout is then obtained. The parameter $ {\sigma}_{\Theta} $ can also be interpreted as the exploration rate. A larger $ {\sigma}_{\Theta} $ promotes broader exploration of $ {\boldsymbol{u}}_i^r(t) $ , while a smaller $ {\sigma}_{\Theta} $ confines the exploration, keeping it closer to the optimal policy of the previous iteration. Subsequently, the BBO algorithm selects the rollout with the lowest haptic cost, which can be expressed as:

(4.11) $$ {r}^{\ast }=\underset{r}{\arg \min}\left({\phi}^r\right) $$

Thus, the optimal parameter for the Haptic-DMP in this iteration is updated as $ {\boldsymbol{\Theta}}_i^{\ast }={\boldsymbol{\Theta}}_i^{r^{\ast }} $ , which is called “select and hold” in Figure 2. This “select and hold” step also retains the minimal cost from the previous iteration to guide new explorations in a direction that aims to reduce the cost. This iterative process continues until $ {\phi}_i^{\ast } $ converges.

5. Case studies

In this section, we conduct two numerical examples (Figure 4) to verify our framework. The first example is a relatively straightforward task: disk-in-hole insertion. This is to aid the visualization of our results, as we want to show the regressed contact potential $ U\left(\boldsymbol{z}\right) $ in a three-dimensional plot, and hence the maximum dimension of $ \boldsymbol{z} $ should be two. As such, we choose a centrally symmetric shape of an object where the contact force is independent of the orientation of the object. The second example is peg-in-hole insertion, which is a classical example in contact-rich tasks. The overall diagram of our proposed framework is shown in Figure 2, and the presentations of the two tasks in this section follow this framework.

Figure 4. A robot manipulating an object into a hole. Our framework considers both indirectly controllable variable (state $ \mathbf{z} $ ) and directly controllable variable (control input $ \mathbf{u} $ ).

5.1. Disk-in-hole manipulation

5.1.1. Model description

As shown in Figure 4(a), we define the world as $ x,y $ coordinate, one robot gripper grasps a disk $ \boldsymbol{z} $ at its center. As the robot utilizes an impedance controller, there exists a desired position for the robot control $ \boldsymbol{u} $ . Therefore, the configuration space is $ {\mathrm{\mathbb{R}}}^2\times {\mathrm{\mathbb{R}}}^2 $ , with $ \boldsymbol{z}=\left({z}_x,{z}_y\right) $ and $ \boldsymbol{u}=\left({u}_x,{u}_y\right) $ . A virtual spring connects the robot control $ \boldsymbol{u} $ and the disk $ \boldsymbol{z} $ with stiffness $ {\boldsymbol{K}}_0 $ . As such, the control energy $ {W}_{ctrl}\left(\boldsymbol{z},\boldsymbol{u}\right) $ (Equation (2.3)) between the robot and the disk can be simplified as $ {W}_{ctrl}\left(\boldsymbol{z},\boldsymbol{u}\right)=\frac{1}{2}{\left(\boldsymbol{u}-\boldsymbol{z}\right)}^T{\boldsymbol{K}}_0\left(\boldsymbol{u}-\boldsymbol{z}\right) $ .

5.1.2. Data collection in Drake

In the Drake simulator (Tedrake et al., Reference Tedrake2019), we uniformly distribute the variable $ \boldsymbol{z} $ within a specified range that encompasses the entire hole, as illustrated in Figure 5. We record all pairs ( $ \boldsymbol{z} $ , $ \boldsymbol{F}\left(\boldsymbol{z}\right) $ ). For instances where the disk remains detached from the hole, the output force is a zero vector ( $ \boldsymbol{F}=\mathbf{0} $ ), indicating that the absence of contact results in negligible contact energy. Conversely, we catalogue the instances of contact along with the corresponding contact forces, which are then transformed into the world frame.

Figure 5. A grid sampling to gather observation data. In simulation, the disk intersects with the hole to estimate contact force, where the contact is captured by a hydroelastic contact model (Tedrake et al., Reference Tedrake2019). For the disk-in-hole task, we sample at different $ \left({z}_x,{z}_y\right) $ .

5.1.3. Regression using GPR with derivatives

After inputting the grid-like object positions $ \boldsymbol{z} $ along with the corresponding output force, the dataset is collected. We then regress the entire environment using GPR (Equation (3.12)) with the kernel chosen as Gaussian (or exponentiated quadratic) defined as

(5.1) $$ \kappa \left(\boldsymbol{z},{\boldsymbol{z}}^{\prime}\right)={\sigma}_{\kappa}^2\exp \left(-\frac{\parallel \boldsymbol{z}-{\boldsymbol{z}}^{\prime }{\parallel}^2}{2{\mathrm{\ell}}^2}\right) $$

where $ {\sigma}_{\kappa } $ is the amplitude coefficient, while $ \mathrm{\ell} $ denotes the kernel width. As depicted in Figure 6(a), the geometry of the contact potential resembles a hole, where the contact potential approaches zero in noncontact regions.

Figure 6. Regressed potential $ U\left(\mathbf{z}\right) $ and gradient $ {\nabla}_zU\left(\mathbf{z}\right) $ .

In Figure 6(b), we present the vector field obtained from both observations and regression. Given that the vectors represent physical forces, it is evident that the forces point outward at the boundary of the hole. Outside the hole, where no contact occurs, the observed forces are zero, while the regressed forces remain close to zero. However, due to the properties of Gaussian kernels and the sparsity of the observed data, the regressed force becomes relatively significant near the boundary, even in the absence of actual contact. This phenomenon can be mitigated by incorporating more observations, particularly in areas near the hole surface, for regression.

5.1.4. Trajectory planning using Haptic-DMPs and BBO

As shown in Figure 2, the control policy $ \boldsymbol{u}(t) $ is parameterized by Haptic-DMPs (Equations (4.7a)–(4.7c)), and the cost function is $ \phi $ as in Equation (4.7d). We set $ R=10 $ rollouts for each iteration. We iterate the framework until the cost converges. The results are shown in Figures 7 and 8. The gray curves represent nonoptimal rollouts $ {\boldsymbol{u}}^r $ during iteration $ i $ , the red curve indicates the optimal policy in this iteration with the parameter $ {\boldsymbol{\Theta}}_i^{\ast } $ , and the blue curve shows the corresponding trajectory of the disk.

Figure 7. Iterations during BBO: the red curve denotes the optimal control policy $ \mathbf{u}(t) $ in this iteration, the blue one denotes $ \mathbf{z}(t) $ , and the gray curves represent the nonoptimal explorations in each iteration of BBO.

Figure 8. The optimal policy $ \mathbf{u}(t) $ implemented in Drake. For the disk-in-hole task, the optimal policy achieves a contactless insertion.

In the early iteration (Figure 7(a)), the control policy $ \boldsymbol{u}(t) $ closely resembles a straight line toward the target. As a result, the cost is quite high, and the disk repeatedly makes contact with the boundary of the hole. However, after several iterations (Figure 7(b)), the BBO framework successfully identifies the optimal insertion policy, allowing the disk to be inserted into the hole with minimal or no contact, thereby significantly reducing the cost.

5.1.5. Verification in the simulator

After Haptic-DMPs compute the optimal control policy $ \boldsymbol{u}(t) $ , we implement this policy in the Drake simulator. The resulting trajectory of the disk is illustrated in Figure 8. The control policy $ \boldsymbol{u} $ is shown in blue, and the simulated trajectory of the disk $ \boldsymbol{z}(t) $ is shown in black. This test verifies the performance of our algorithm.

5.2. Peg-in-hole manipulation

For the peg-in-hole task shown in Figure 4(b), the variables are $ \boldsymbol{z}=\left({z}_x,{z}_y,{z}_{\theta}\right) $ and $ \boldsymbol{u}=\left({u}_x,{u}_y,{u}_{\theta}\right) $ . The steps taken are the same as in the disk-in-hole example. We continue to use a grid map to sample the dataset, but with one more degree of freedom (DOF) for rotation.

5.2.1. Data collection and regression

Similar to the disk-in-hole task, we regress the contact potential $ U\left(\boldsymbol{z}\right) $ for the peg-in-hole task. Since the control $ \boldsymbol{u}\in SE(2) $ , we extract slices of $ {u}_{\theta } $ at different angles to visualize the regressed potential function. From Figure 9, we observe that when the peg’s rotation is fixed, the potential function resembles a hole aligned with the peg’s orientation. Specifically, when $ {z}_{\theta }={0}^{\circ } $ , $ U\left(\boldsymbol{z}\right) $ exhibits a hole-like shape analogous to the actual hole geometry. Additionally, the rotation of the peg does not affect the magnitude of the potential, as the contact potential is determined solely by the contact forces derived from the training data.

Figure 9. A sliced observation of the regressed potential function $ U\left(\mathbf{z}\right) $ on a grid map.

5.2.2. Trajectory planning using Haptic-DMPs and BBO

Similarly, we apply the BBO algorithm to the peg-in-hole example and plot the pose of the peg during each iteration to illustrate its rotation. In the early iteration (Figure 10(a)), the control policy starts as a straight-line trajectory, resulting in a high cost due to inefficient alignment and frequent contact with the hole’s boundary. However, after several iterations (Figure 10(b)), the BBO algorithm adjusts the policy, guiding the peg away from the hole initially before smoothly inserting it, utilizing the chamfer to achieve insertion, significantly reducing the cost.

Figure 10. Iterations during BBO: the red curve denotes the optimal control policy $ \mathbf{u}(t) $ in this iteration, the blue one denotes $ \mathbf{z}(t) $ , and the gray curves represent the nonoptimal explorations in each iteration of BBO.

5.2.3. Verification in the simulator

The resulting trajectory of the peg simulated by Drake is illustrated in Figure 11. It can be seen that the peg occasionally makes contact with the hole’s corners and slides along the chamfer to complete the insertion. This behavior occurs because, due to the sparsity of the observation data, the GPR provides a very smooth function approximation, which may not precisely capture sharp corners. Despite the approximation’s imperfections, the task is still successfully completed with the help of impedance control, ensuring that the peg aligns correctly and completes the insertion.

Figure 11. Implement the optimal policy $ \mathbf{u}(t) $ in the Drake. For the peg-in-hole task, the optimal policy slides along the chamfer to achieve the insertion task.

To validate the robustness of our framework, we also vary the initial position of the peg. We conducted 10 trials with randomly assigned initial positions, and all trials successfully inserted the peg into the hole as shown in Figure 12.

Figure 12. Variation of the initial position of the peg $ {\mathbf{z}}_0 $ .

5.3. Further parameter analysis

As shown in Figure 2, our framework follows a block structure, allowing us to visualize and analyze each component separately.

5.3.1. Hyperparameters of GPR

The hyperparameters of GPR, such as the amplitude coefficient of the kernel $ {\sigma}_{\kappa } $ (as defined in Equation (5.1) and the variance of the noise $ {\sigma}_n $ in the observation (as defined in Equation (3.2)), influence the regressed contact potential $ U\left(\boldsymbol{z}\right) $ . We vary the GPR hyperparameters and analyze how $ U\left(\boldsymbol{z}\right) $ changes as a result.

As observed in Figure 13, increasing $ {\sigma}_n $ has minimal impact on the results, whereas increasing $ {\sigma}_{\kappa } $ significantly affects the regressed potential. Despite these variations, the geometric shape of the hole remains consistent, although the absolute value of $ U\left(\boldsymbol{z}\right) $ changes. Additionally, the function becomes steeper, indicating increased contact stiffness or a stiffer object. Despite this, we test our framework with this parameter setting and find that the BBO successfully converges, discovering an insertion policy.

Figure 13. Effect of varying GPR hyperparameters on the regressed contact potential $ U\left(\mathbf{z}\right) $ .

5.3.2. Exploration rate of BBO

To analyze the impact of the exploration rate $ {\sigma}_{\Theta} $ (Equation (4.10)), we vary its value while keeping the initial position constant. The cost variation over iterations during the BBO process is shown in Figure 14.

  • Smaller values of $ {\sigma}_{\Theta} $ (e.g., 1): The cost decreases slowly and gets stuck in a local minimum. Physically, this corresponds to the peg getting stuck in front of the chamfer of the hole.

  • Moderate values of $ {\sigma}_{\Theta} $ (e.g., 1.5 and 3): A larger $ {\sigma}_{\Theta} $ accelerates the cost reduction. Although $ {\sigma}_{\Theta}=3 $ leads to a faster decrease than $ {\sigma}_{\Theta}=1.5 $ , both eventually converge to the same value, indicating a successful convergence of the BBO algorithm.

  • Larger values of $ {\sigma}_{\Theta} $ (e.g., 5): While the cost converges the fastest, the final converged cost is higher than that of $ {\sigma}_{\Theta}=1.5 $ and $ {\sigma}_{\Theta}=3 $ , suggesting suboptimal performance due to excessive exploration.

Figure 14. Haptic cost $ \phi $ with respect to iterations with different exploration rates.

Through these robust tests, which reveal the varying impact of different hyperparameters, the need for careful selection to optimize performance for specific tasks is again reiterated.

5.4. Further discussion

To better capture the environment in more detail, such as sharp corners, additional observation data are required. However, this introduces a trade-off with increased computational complexity due to larger datasets, as mentioned in Remark 3. This challenge can be addressed by employing scalable or incremental GPR methods.

In our simulations, Drake is used to collect force/torque data. In real-world scenarios, data collection can similarly be achieved by using a robot’s end-effector to tap around its environment and record forces and torques with a force/torque sensor. An environment map can then be constructed using GPR with derivatives, and a control policy can subsequently be derived using Haptic-DMPs and BBO.

6. Conclusion

In this article, we have shown that by using only premeasured forces and torques (such as those from a simulator) as inputs for GPR with derivatives, it is possible to create an environment map from an energy-based perspective. This environmental energy is then employed in a mechanical framework operating under quasi-static conditions. Haptic-DMPs have been proposed to plan the trajectory for navigating the manipulation task in a contact-rich environment. Through numerical simulations of two classical assembly tasks—disk-in-hole and peg-in-hole—it has been shown that the proposed method can find feasible paths to successfully perform these tasks. Our potential directions for further work involve incorporating prior knowledge about the environment into GPR, reducing computational complexity through scalable GPR techniques, and/or enabling incremental updates to GPR. These improvements would significantly improve the applicability of the framework in real-world experiments.

Data availability statement

The code is available at https://github.com/lllllyang/haptic_DMP.git and can also be found in (Nguyen et al., Reference Nguyen, Yang, Lv and Campolo2025).

Acknowledgments

This research is supported by the National Research Foundation, Singapore, under the NRF Medium-Sized Centre scheme (CARTIN). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not reflect the views of the National Research Foundation, Singapore.

Author contribution

Conceptualization: H.T.N., L.Y., and D.C. Methodology: H.T.N., L.Y., and D.C. Software: H.T.N. and L.Y. Validation: H.T.N. and L.Y. Formal analysis: H.T.N. and L.Y. Investigation: H.T.N. and L.Y. Resources: D.C. Data curation: H.T.N. and L.Y. Writing—original draft: H.T.N. and L.Y. Writing—review and editing: H.T.N., L.Y., C.L., and D.C. Visualization: H.T.N. and L.Y. Supervision: D.C. and C.L. Project administration: D.C. Funding acquisition: D.C. All authors have read and agreed to the published version of the manuscript.

Funding statement

This research was supported by grants from the National Research Foundation, Singapore, under the NRF Medium-Sized Centre scheme (CARTIN).

Competing interests

The authors declare none.

Ethical standard

The research meets all ethical guidelines, including adherence to the legal requirements of the study country.

Footnotes

H.-T.N. and L.Y. these authors contributed equally to this work.

References

Bhattacharjee, T, Shenoi, AA, Park, D, Rehg, JM and Kemp, CC (2015) Combining tactile sensing and vision for rapid haptic mapping. In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 12001207.Google Scholar
Caccavale, F, Natale, C, Siciliano, B and Villani, L (1999) Six-DOF impedance control based on angle/axis representations. IEEE Transactions on Robotics and Automation 15(2), 289300.CrossRefGoogle Scholar
Campolo, D and Cardin, F (2025) A geometric framework for quasi-static manipulation of a network of elastically connected rigid bodies. Applied Mathematical Modelling 143, 116003.CrossRefGoogle Scholar
Campolo, D and Cardin, F (2023) Quasi-static mechanical manipulation as an optimal process. In 2023 62nd IEEE Conference on Decision and Control (CDC). IEEE, pp. 47534758.CrossRefGoogle Scholar
Cross, EJ, Rogers, TJ, Pitchforth, DJ, Gibson, SJ, Zhang, S and Jones, MR (2024) A spectrum of physics-informed gaussian processes for regression in engineering. Data-Centric Engineering 5, e8.CrossRefGoogle Scholar
Dattorro, J (2010) Convex Optimization & Euclidean Distance Geometry. Lulu.comGoogle Scholar
Dawson-Elli, AR and Adamczyk, PG (2020) Design and validation of a lower-limb haptic rehabilitation robot. IEEE Transactions on Neural Systems and Rehabilitation Engineering 28(7), 15841594.CrossRefGoogle ScholarPubMed
De Roos, F, Gessner, A and Hennig, P (2021) High-dimensional Gaussian process inference with derivatives. In International Conference on Machine Learning. PMLR, pp. 25352545.Google Scholar
Diana, M and Marescaux, J (2015) Robotic surgery. Journal of British Surgery 102(2), e15e28.CrossRefGoogle ScholarPubMed
Ehsani, K, Han, W, Herrasti, A, VanderBilt, E, Weihs, L, Kolve, E, Kembhavi, A and Mottaghi, R (2021) Manipulathor: A framework for visual object manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 44974506.CrossRefGoogle Scholar
Gams, A, Nemec, B, Ijspeert, AJ and Ude, A (2014) Coupling movement primitives: Interaction with the environment and bimanual tasks. IEEE Transactions on Robotics 30(4), 816830.CrossRefGoogle Scholar
Garg, S, Sünderhauf, N, Dayoub, F, Morrison, D, Cosgun, A, Carneiro, G, Wu, Q, Chin, T-J, Reid, I, Gould, S, et al. (2020) Semantics for robotic mapping, perception and interaction: A survey. Foundations and Trends® in Robotics 8, 1, 1–2, 224.Google Scholar
Girão, PS, Ramos, PMP, Postolache, O and Pereira, JMD (2013) Tactile sensors for robotic applications. Measurement 46(3), 12571271.CrossRefGoogle Scholar
Ijspeert, AJ, Nakanishi, J, Hoffmann, H, Pastor, P and Schaal, S (2013) Dynamical movement primitives: Learning attractor models for motor behaviors. Neural Computation 25(2), 328373.CrossRefGoogle ScholarPubMed
Katayarna, S, Taniai, T and Tanaka, K (2022) Quasistatic contact-rich manipulation via linear complementarity quadratic programming. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, pp. 203210.CrossRefGoogle Scholar
Katsura, S, Matsumoto, Y and Ohnishi, K (2007) Modeling of force sensing and validation of disturbance observer for force control. IEEE Transactions on Industrial Electronics 54(1), 530538.CrossRefGoogle Scholar
Kazerouni, IA, Fitzgerald, L, Dooly, G and Toal, D (2022) A survey of state-of-the-art on visual SLAM. Expert Systems with Applications 205, 117734.CrossRefGoogle Scholar
Kok, M, Solin, A and Schön, TB (2024) Rao-blackwellized particle smoothing for simultaneous localization and mapping. Data-Centric Engineering 5, e15.CrossRefGoogle Scholar
Koptev, M, Figueroa, N and Billard, A (2021) Real-time self-collision avoidance in joint space for humanoid robots. IEEE Robotics and Automation Letters 6(2), 12401247.CrossRefGoogle Scholar
LaValle, SM (2006) Planning Algorithms. Cambridge University PressCrossRefGoogle Scholar
Lee, D-H, Choi, M-S, Park, H, Jang, G-R, Park, J-H and Bae, J-H (2022) Peg-in-hole assembly with dual-arm robot and dexterous robot hands. IEEE Robotics and Automation Letters 7(4), 85668573.CrossRefGoogle Scholar
Liu, H, Ong, Y-S, Shen, X and Cai, J (2020) When Gaussian process meets big data: A review of scalable gps. IEEE Transactions on Neural Networks and Learning Systems 31(11), 44054423.CrossRefGoogle ScholarPubMed
Nguyen, H-T, Yang, L, Lv, C and Campolo, D (2025) Replication data for: Quasi-static contact-rich manipulation using dynamic movement primitives and haptic potential map. https://doi.org/10.5281/zenodo.15920444.CrossRefGoogle Scholar
Saveriano, M, Abu-Dakka, FJ, Kramberger, A and Peternel, L (2023) Dynamic movement primitives in robotics: A tutorial survey. The International Journal of Robotics Research 42(13), 11331184.CrossRefGoogle Scholar
Schneebeli, HR and Wihler, TP (2011) The Newton–Raphson method and adaptive ODE solvers. Fractals 19(01), 8799.CrossRefGoogle Scholar
Solak, E, Murray-Smith, R, Leithead, W, Leith, D and Rasmussen, C (2002) Derivative observations in gaussian process models of dynamic systems. Advances in Neural Information Processing Systems 15.Google Scholar
Strub, C, Wörgötter, F, Ritter, H and Sandamirskaya, Y (2014) Using haptics to extract object shape from rotational manipulations. In 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, pp. 21792186.CrossRefGoogle Scholar
Stulp, F and Sigaud, O (2013) Robot skill learning: From reinforcement learning to evolution strategies. Paladyn, Journal of Behavioral Robotics 4(1), 4961.CrossRefGoogle Scholar
Suomalainen, M, Karayiannidis, Y and Kyrki, V (2022) A survey of robot manipulation in contact. Robotics and Autonomous Systems 156, 104224.CrossRefGoogle Scholar
Suresh, S, Bauza, M, Yu, K-T, Mangelson, JG, Rodriguez, A and Kaess, M (2021) Tactile SLAM: Real-time inference of shape and pose from planar pushing. In 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE, pp. 1132211328.CrossRefGoogle Scholar
Tedrake, R, et al. (2019) Drake: Model-based design and verification for robotics.Google Scholar
Turlapati, SH and Campolo, D (2022) Towards haptic-based dual-arm manipulation. Sensors 23(1), 376.CrossRefGoogle ScholarPubMed
Turlapati, SH, Gurnani, J, Ariffin, MZB, Kana, S, Wong, AHY, Han, BS, Campolo, D, et al. (2024) Identification of intrinsic friction and torque ripple for a robotic joint with integrated torque sensors with application to wheel-bearing characterization. Sensors (Basel, Switzerland) 24(23), 7465.Google ScholarPubMed
Wang, Y, Beltran-Hernandez, CC, Wan, W and Harada, K (2022) An adaptive imitation learning framework for robotic complex contact-rich insertion tasks. Frontiers in Robotics and AI 8, 777363.CrossRefGoogle ScholarPubMed
Whitney, DE, et al. (1982) Quasi-static assembly of compliantly supported rigid parts. Journal of Dynamic Systems, Measurement, and Control 104(1), 6577.CrossRefGoogle Scholar
Williams, CK and Rasmussen, CE (2006) Gaussian Processes for Machine Learning, Vol. 2. Cambridge, MA: MIT PressGoogle Scholar
Wirnshofer, F, Schmitt, PS, Meister, P, Wichert, G and Burgard, W (2019) State estimation in contact-rich manipulation. In 2019 International Conference on Robotics and Automation (ICRA). IEEE, pp. 37903796.CrossRefGoogle Scholar
Yang, L, Ariffin, MZ, Lou, B, Lv, C and Campolo, D (2023) A planning framework for robotic insertion tasks via hydroelastic contact model. Machines 11(7), 741.CrossRefGoogle Scholar
Yang, L, Nguyen, H-T, Yu, D, Lv, C and Campolo, D (2025) Haptic rapidly-exploring random trees: A sampling-based planner for quasi-static manipulation tasks. arXiv preprint arXiv:2506.00351.Google Scholar
Yang, L, Turlapati, SH, Lu, Z, Lv, C and Campolo, D (2025) A planning framework for stable robust multi-contact manipulation. arXiv preprint arXiv:2504.02516.Google Scholar
Yang, L, Nguyen, H-T, Lv, C, Campolo, D and Cardin, F (2025) An energy-based numerical continuation approach for quasi-static mechanical manipulation. Data-Centric Engineering 6, e18.CrossRefGoogle Scholar
Yang, L, Turlapati, SH, Lv, C and Campolo, D (2025) Planning for quasi-static manipulation tasks via an intrinsic haptic metric: A book insertion case study. IEEE Robotics and Automation Letters 10(6), 61116118.CrossRefGoogle Scholar
Zhao, H, Chen, Y, Li, X and Ding, H (2023) Robotic peg-in-hole assembly based on reversible dynamic movement primitives and trajectory optimization. Mechatronics 95, 103054.CrossRefGoogle Scholar
Figure 0

Figure 1. A robot manipulating an object while trying to maintain the object’s contact with the wall. Our framework considers both the indirectly controllable variable (state $ \boldsymbol{z} $) and the directly controllable variable (control input $ \mathbf{u} $). In this specific problem, $ \mathbf{u}={\left[{u}_x,{u}_y\right]}^T $ is the desired position of the robot’s end effector, $ \boldsymbol{z}={\left[{z}_x,{z}_y,{z}_{\theta}\right]}^T $ is the pose of the object (its $ xy $-coordinates and $ \theta $-rotation), and $ \boldsymbol{c}\left(\boldsymbol{z}\right) $ is the function that converts the pose of the object to the location of the contact point with the robot’s end effector.

Figure 1

Figure 2. Planning framework with GPR, Haptic-DMPs, and BBO. BBO selects the optimal parameter $ \mathtt{\varTheta} $ from $ R $ random deviations from the optimal parameter $ {\mathtt{\varTheta}}_{i-1}^{\ast } $ of the previous iteration.

Figure 2

Figure 3. Illustrations of Haptic-DMPs: the left figure illustrates the concept described in Equation (4.6); the right figure explains how Haptic-DMPs compute the control policy $ \mathbf{u}(t) $ and the object’s state $ \mathbf{z}(t) $ simultaneously given an initial position $ {\mathbf{u}}_0 $ and a target position $ {\mathbf{u}}_T $. Once the computation is complete, each policy returns a haptic cost, as defined in Equation (4.7).

Figure 3

Figure 4. A robot manipulating an object into a hole. Our framework considers both indirectly controllable variable (state $ \mathbf{z} $) and directly controllable variable (control input $ \mathbf{u} $).

Figure 4

Figure 5. A grid sampling to gather observation data. In simulation, the disk intersects with the hole to estimate contact force, where the contact is captured by a hydroelastic contact model (Tedrake et al., 2019). For the disk-in-hole task, we sample at different $ \left({z}_x,{z}_y\right) $.

Figure 5

Figure 6. Regressed potential $ U\left(\mathbf{z}\right) $ and gradient $ {\nabla}_zU\left(\mathbf{z}\right) $.

Figure 6

Figure 7. Iterations during BBO: the red curve denotes the optimal control policy $ \mathbf{u}(t) $ in this iteration, the blue one denotes $ \mathbf{z}(t) $, and the gray curves represent the nonoptimal explorations in each iteration of BBO.

Figure 7

Figure 8. The optimal policy $ \mathbf{u}(t) $ implemented in Drake. For the disk-in-hole task, the optimal policy achieves a contactless insertion.

Figure 8

Figure 9. A sliced observation of the regressed potential function $ U\left(\mathbf{z}\right) $ on a grid map.

Figure 9

Figure 10. Iterations during BBO: the red curve denotes the optimal control policy $ \mathbf{u}(t) $ in this iteration, the blue one denotes $ \mathbf{z}(t) $, and the gray curves represent the nonoptimal explorations in each iteration of BBO.

Figure 10

Figure 11. Implement the optimal policy $ \mathbf{u}(t) $ in the Drake. For the peg-in-hole task, the optimal policy slides along the chamfer to achieve the insertion task.

Figure 11

Figure 12. Variation of the initial position of the peg $ {\mathbf{z}}_0 $.

Figure 12

Figure 13. Effect of varying GPR hyperparameters on the regressed contact potential $ U\left(\mathbf{z}\right) $.

Figure 13

Figure 14. Haptic cost $ \phi $ with respect to iterations with different exploration rates.

Submit a response

Comments

No Comments have been published for this article.