Skip to main content


Project Overview

In this project, we aim to explore dexterous manipulation skills on a robotic hand. In reinforcement learning, it is a common approach to first model the policy roughly in a simulation, then fine-tune it on a real robot. However, this approach fails in dexterous tasks where the simulators are often not capable to capture the dynamics and stochasticity resulting in an unusable policy on a real robot. Recently, there have been great developments on policy search methods, such as PI2 and PoWER [1][2]. The applicability of these methods to highly dynamic manipulation tasks is an open problem.

Preliminary Work

As a preliminary work, we developed a force estimation method to enable environmental force estimation for our hand with no tactile or force sensors [3]. To , we utilized multi-joint robot dynamics and disturbance observer based friction identification methods to account for forces that arise due to gravity, stiction and viscous friction. With the effective compensation of these forces, disturbance observer units allow us to estimate environmental interaction forces. To validate the effectiveness of the force estimation with our method, experiments were conducted on the robot hand with no haptic sensing capability. The results of these experiments showed that the force estimation was in good agreement with the actual sensor measurements. To further elaborate the effectiveness of the method, compliant contact detection task was implemented on the robot. The result of this experiment indicated that environmental force estimation performance was enough to facilitate the task, and as such our method may eliminate the need for expensive force sensors at the finger tips or the joints for dexterous manipulation.

Work in Progress

As the task, we targeted a ball swapping task in which the goal of the robot is to swap the balls. Our rationale for selecting this task is based on two main factors. Firstly, the task is inherently dexterous and dynamic. Secondly, it is a cyclic task, therefore successes can be continually adjusted and evaluated.

Below, videos of an open-loop trajectory is given. As can be seen, open-loop trajectories often fail to execute the task as expected. Also, it is really difficult to generate a custom smooth trajectory. The research question here is, starting from an initial policy whether it is possible to come up with a robust policy to realize a dexterous manipulation task in a realistically few iteration.