Convergent Human Learning for Robot Skill Generation
Duration: 03/09/2012 – 02/09/2016
This project is funded by European Community’s Seventh Framework Programme FP7/2007-2013 under the grant agreement no. 321700, Converge
Robot programming is one of the bottlenecks for moving robots from factories to our daily lives. When a new skill is desired, expert knowledge is needed to implement this on the target robot. There is a growing effort for developing systems that can learn by themselves or by observing a demonstrator. When direct measurement of the motor variables of the demonstrator (e.g. joint angles) is available, the problem can be solved by designing a mapping from the observed values to the target robot actuators. However, due to kinematic and dynamic differences between the observer and the demonstrator this is not always trivial. Another approach is kinesthetic demonstration i.e. actively moving the robot through via points to reach a desired behavior, which however is not applicable to tasks with non-negligible dynamics. Recently, a robot skill generation framework was proposed that circumvents these limitations by relying on the sensorimotor learning capacity of the central nervous system (Oztop, Lin et al. 2006, Oztop, Lin et al. 2007, Babic, Hale et al. 2011, Moore and Oztop 2012) . In this framework, the operator is put in the control loop of a robotic system where (s)he controls the robot in real-time. The operator then ‘learns’ to make the robot perform a given task. After the human becomes expert in this task, the signals coming in and leaving out of the robot are used to construct an autonomous controller. The key point of this framework is that it takes away the work from the cognitive system of an expert and puts it on a layperson’s sensorimotor system.
Converge project aims to improve the efficacy of this framework by allowing the human and robot learn simultaneously, and work together as a team. For this, several human-in-the-loop learning setups has been developed, and two main research directions are pursued. Simultaneous learning for autonomy (1) and human-robot shared control (2) to surpass the performances that of the individual agents alone. For the former, simultaneous and sequential learning experiments with a cart-pole system for swing up and balance task have been conducted. The results indicate that within the simultaneous learning framework, convergent learning is possible with dynamic control sharing, and the obtained autonomous controllers perform better than the usual sequential human in the loop learning (Zamani and Oztop 2015). Although for naïve subjects, initially simultaneous learning feels harder; they still can generate policies with improved performance. With longer practice, it is possible to exploit the simultaneous learning framework to generate successful autonomous policies that can do the full ‘swing-up and pole balance’ task. The same framework is also applied to the so called ‘ball and beam’ task, where the goal is to speed up a ball placed on a rail (by raising or lowering one end) such that it never hits the either end. The results show that a smooth shift from full human guidance to shared control and finally to autonomous control can be achieved effectively. Similar transition from full teleoperation to full autonomy is also demonstrated on the ‘ball balancing’ task (which is described below) executed through an anthropomorphic robotic arm with dynamic control sharing.
In the second focus area of shared control, almost exclusively anthropomorphic robots are used. In particular, ‘ball balancing’ is studied in detail for its intuitive nature. The ball balancing task requires the movement of the robot hand so as to balance a ball placed on the tray attached to the end effector of the robot. First an autonomous robot policy is synthesized using the human in the loop learning (sequential learning) framework, and kept constant to study shared control, i.e. the human adaptation and the performance of the human-robot system as a whole. To address a richer class of problems the ball balancing task is defined as balancing a ball at a given location on the table. Therefore, the correct policy is goal dependent, and thus an effective shared control requires this goal to be shared between the human and the robot. We have studied the case where the robot is uninformed of the goal but has to infer it by observing its partner’s, i.e. human operator’s movements. In the experiments conducted, the robot is given a simple human intention estimation mechanism, and a constant control sharing. The results indicate that human adaptation creates a shared control system which emergently exploits the best parts of the human and the robot control. To be concrete, when the robot and human are coupled, a symbiotic system is formed that can balance the ball faster than the human alone and with a higher accuracy than the robot alone (Amirshirzad, Kaya et al. 2016).
When a human operator is involved in a shared control scenario such as in the ball balancing task, (s)he needs to interact with a non-stationary system that tries to predict the human goal and interfere with the control based on its prediction. So it is not clear how long it will take the human to adapt to the system and achieve a high performance. To investigate the human adaptation, extensive experiments has been performed using the ball balancing task. One group tele-operated a robot that that does not interfere with the control (human control condition); the other group controlled a robot that does intention inference and contributes to the net control command as described above (shared control condition). The results indicate that human learning proceeds faster in the shared control condition measured with the performance criteria of task completion time, length of the ball trajectory, the positional error of the ball. This is interesting, because even initially subjects had to deal with a non-stationary partner, they soon learn to exploit the robot partner to achieve a high overall performance.
One effective shared control mechanism that can be used to generate autonomous control policies is the ‘heterogonous control sharing’, where the control channels of the robot and the human are chosen to be orthogonal. In the previous human-robot control settings, the robot and the human have commanded the same control channel, so a ‘weight’ or arbitration between the human and robot has become necessary. However, such a weighting mechanisms is needed in heterogeneous control sharing making it a good choice for robot skill synthesis when such control split can be made. To realize a heterogeneous control sharing system, the so called ‘ball swapping task’ (Moore and Oztop 2012) is chosen which requires a robot hand with dexterous fingers to swap the position of a pair of balls held by the robot hand. The robot finger movements are given the basic autonomy of following an open loop sine wave with constant phase difference between the fingers. This is not at all sufficient to swap the balls. Human entered the control loop by commanding the robot arm (that has the robot hand mounted as its end-effector) by generating the desired position and orientation commands for the hand. Although initially it was not clear whether a human operator can learn to control the robot arm in response to finger and ball states to achieve ball swapping, in a few days an arm policy is discovered by the human operator that facilitates the ball swap, which then is used to synthesize a fully autonomous ball swapping performance.
Overall, in this project several aspects of human in the loop robot control systems are studied. In particular, simultaneous learning of the robot and the human for the aim of autonomous policy synthesis and shared control for obtaining a synergistic performance to surpass what can be achieved by the individual agents alone has been addressed. It is becoming more and more clear that for a robot enabled society, equipping robots with mechanisms to enable them to symbiotically operate with humans is at least as critical as robot autonomy and dexterity. This work made a step towards developing technologies that targets not only autonomous skill generation but also shared control mechanisms that will synergistically combine the skills of robots and humans for better performance. The study of human sensorimotor adaptation for robot control and robot learning is a rich research venue that may potentially lead to paradigm shifts in how robots are programmed, tested and deployed in everyday life. In particular, we may see robotic trainers as a new employment area where employees use their sensorimotor adaptation capacity to train robots. In fact, expert crane operators who must tune their sensorimotor skills to perform precisely and robustly are always on the high demand. Likewise, the need for master remote flight operators for the unmanned aerial vehicles are on the rise. It is not difficult to imagine that these expert operators of today will be transformed into the ‘robot trainers’ of the near future as the socio-economic impact of the research direction that the Converge project has contributed.
Human-robot interfacing for shared control and simultaneous learning
In the basic human in the loop control framework that is studied (see Figure 1), human controls the robot by sending commands in real-time to perform a given task. In the simultaneous learning experiments, the robot learns by observing human actions, and starts injecting its own autonomous signals into the net control. The combination of the two signals is performed in an adaptive way by monitoring the performance of the human-robot system.
In the shared control experiments, the robot estimates the human intention (if the task necessitates) and takes share in control by sending commands to assist the human to achieve the task goal. The net control command that the robot receives is specified according to the targeted task. In the experiments, conic and superimposed combination of the control signals have been used.
In the shared control systems the focus is on the elicitation of an improved performance of the coupled human-robot system. For the simultaneous learning system, the focus is on the effective skill transfer, which may be decomposed as the performance of the final autonomous control and the ease in which humans can teach robots.
Ball Balancing Task
The framework described above is implemented on the ‘ball balancing’ task. The goal in this task is to bring the ball to a specific point on the tray and keep it balanced there. As it can be seen in the figure below, tray is attached to the end-effector of a six degree of freedom robotic arm. The robot is allowed to change the tray tilt by using its two wrist joints. These joints are commanded in real-time by a human placed in the control loop. For the robot interface, the human utilizes a standard computer mouse. The horizontal and vertical displacements of the mouse are linearly mapped as the desired angular movements of the robot joints.
Goal estimation starts after a brief period after the human operator starts the ball balancing task. The ball positions in a moving window is used to estimate the goal of human operator. The ball position distribution over the window is modeled as a Gaussian distribution. Consequently, the mean indicates the estimated goal and the variance indicates the confidence of the estimate. on ball position. For this simple task, a straightforward goal estimation seems sufficient; however, in more complex tasks other intention estimation methods such as internal simulation may be needed. The movie clip on the left shows the change in the estimated probability density function as the shared control task is being executed.
Robot commands are generated by an autonomous controller which is obtained by using an expert demonstration (i.e. human robot skill synthesis via sequential learning). The controller is not required to perform at a very high level, as the aim is to see how the human and robot skills can complement each other in a shared control scenario. Consequently, a close to linear function of the current state of the system (i.e. ball position, ball velocity, joints angular positions and joints angular velocities) is synthesized to serve as the autonomous controller to drive the robot actions. The following video illustrates the task execution by the synthesized controller.
With this shared control setup there are several directions that can be followed, such as how to adapt the control sharing to maximize human-robot system performance, or focus on how the human adapts to the robotic system that changes its behavior based on its prediction of the human intent. We experimented with the former but directed our focus to the latter in an extensive experiment.
Experiments were conducted under two conditions: human and shared control. In the human control, robot is passive and is simply tele-operated by the human. The shared control combines the synthesized controller output based on the estimated goal with human control (a fixed convex combination is used in these experiments). For the experiments, eighteen naive human subjects volunteered to perform the ball balancing task under either human control or shared control conditions. Three success measures were defined to evaluate the task performance of the human-robot system: (1) task completion time; (2) length of trajectory of ball movement on the tray; (3) Positional error, i.e. the distance of the final ball position from target position. Four target positions with equal distance to the tray center were marked on the tray. At the beginning of one experimental trial, the ball is positioned at the center of the tray with zero velocity. After the subjects receive the go signal they are supposed to bring the ball to one of the four marked target positions on the tray. Each experimental session includes four sub-sessions. A sub-session is made of four experimental trials. For each subject, four experimental sessions were conducted in separate but consecutive days.
The aforementioned success measures for each subject were recorded in all the trials. To measure the learning rate we fitted a linear trend line on the three recorded success measures in ordered trials for each subject. In the following figures typical progress of two subjects is shown, one from the shared control group, and one from the human control group.
Overall, the subjects who did the task under shared control condition appear to have a higher performance at the last day, i.e. when subject learn to control an adaptive robot they eventually perform better with it. Importantly, the human learning appears to be faster with an adaptive robot. This suggests that humans can quickly model their partners’ ability, and exploit it for better performance.
Pole Swing Up and Balance With Cart-Pole through Simultaneous Learning
The following figure shows the first hardware setup that was used to test algorithms for Converge. In this setup, the human operator controls the cart to swing up the pole to upright position and keep it there. As human learns this task, the ‘machine’ also learns to mimic human and shares the control through a weighting scheme. Eventually the machine transfers the control policy from the human and becomes the sole controller for the task. At this point human guidance is no more needed, and the task skill has been synthesized on the robot. The video below shows such a skill transfer session.
Ball and Beam Speed-up Task
In the ‘ball and beam’ setup, a metal ball can freely roll on a track (i.e. beam). One side of the beam is fixed and the other side is connected to a servo motor via a lever. By controlling the position of the servo -which leads to change in the angle of the beam- the ball can be made to move to a desired position on the beam. We designed a simple task (called ‘speed up’ or ‘maximum kinetic energy’) to analyze the algorithms we develop for simultaneous learning. In this task the goal of the demonstrator is to roll the ball back and forth with maximum possible speed without hitting the two ends of the beam. The demonstrator action (mouse movements) is converted to servo motor voltages via a fixed gain. The implemented system when tested indicate that we can transfer human skill to the machine in short time scales (see the video clip).
The plot on the left shows that in less than five minutes an autonomous controller could be obtained which can perform the task quite well without human guidance. The plot shows the motor output that is generated during simultaneous learning experiment (human, robot, and net command). As can be seen, as time passes the control shifts from human to robot. This control shifting from human to robot is facilitated by the automatic mixing weight tuning that was based on a local success measure in contrast to prediction error in modeling human policy.
Teaching Dynamic Manipulation Tasks with Human Guidance
In more common robotic settings, such as factory environments, typical approach for teaching a robot to perform a task is done by consecutive point-to-point movements. Recently, with the technological advances and increased accessibility to compliant robots, kinesthetic teaching -where the demonstrator teaches the task by moving the robot links manually- has also become commonplace.
Although these methods are applicable to most tasks, it is well-known that these methods:
- do not perform well on highly dynamic tasks
- are not easily generalizable
- are not efficient, usually require redundant steps caused by the constraints of input
In this work, we present our methodology for human-to-robot skill transfer on a dynamic manipulation task. As for our required goals we chose a ball swapping task wherein a robot hand attached to a robot arm is required to swap two balls on it. In our earlier experiments(without the arm), we observed that it is not possible to generate a robust policy without changing the orientation of the hand. The natural question is, given a basic trajectory for the hand fingers would it be feasible to find an autonomous controller for the arm with the help of human guidance. The task is inherently dynamic, dexterous, stochastic. It also has many degrees of freedom which makes it difficult to learn without human guidance.
Our setup consists of a Kuka-R9000 arm, Gifu Hand-III -an anthropomorphic robot hand-, and OptiTrack optical tracker system. In our task, fingers are given simple sine wave trajectories with equal phase differences between consecutive ones. While training, a full cycle of the movement is set to 4 seconds as it is slow enough for demonstrator to teach and fast enough to capture the dynamic interactions. The amplitudes and phases are minimally tuned based on the assumption that human demonstrator would be able to exploit any given decent hand movement. Human hand movement is captured by the optical tracker and mapped to the end-effector position and orientation. For the control loop, hand is run on ~500Hz PD controller, arm is run on 250 Hz, and optical cameras run with 250 Hz. Captured movements are low-pass filtered to prevent jerks.
Although the human demonstrators become proficient in considerably few trials, the task is still difficult to repeat robustly. Owing to the complex dynamics of the system, even the playback of a recorded session occasionally fails unpredictably.
For this reason, we play-backed the recorded sessions to first choose appropriate cycles. There are two other required properties to achieve the desired behaviour. The first is to have start and end configurations in close proximity to have continuous cyclic movements without gaps in between, though small jumps are still to be handled by autonomous controller. The second is to have minimal first and second derivatives as possible. Upon repeatedly running the sliced cycles, we have picked the best one as per our subjective desired behaviour.
Though the users were free to operate on 6-DoF(position and orientation of the hand), examining the acquired data, we found that demonstrators preferred to use easier movements which are almost one dimensional. Speculating the demonstrators intended on performing a 1-D movement, we applied principal component analysis and found that first component, explaining %93 of the variance, is able to perform the task quite well.
Generalization of the movement pattern was another main goal of the study. Rhythmic command primitives are oftenly used to generate cyclic attractor patterns. Using this method, the trajectories are encoded as time-independent primitives and amplitudes can be scaled and translated while preserving the general pattern.