
However, when we want an agent to actually stack Lego blocks, the same ingenuity can pose an issue. These behaviours demonstrate the ingenuity and power of algorithms to find ways to do exactly what we tell them to do. From this perspective, specification gaming is a good sign - the agent has found a novel way to achieve the specified objective. Whether or not the agent solves the task by exploiting a loophole is unimportant in this context. For example, when we use Atari games as a benchmark for training RL algorithms, the goal is to evaluate whether our algorithms can solve difficult tasks.

Within the scope of developing reinforcement learning (RL) algorithms, the goal is to build agents that learn to achieve the given objective. We can consider specification gaming from two different perspectives. Source: Data-Efficient Deep Reinforcement Learning for Dexterous Manipulation (Popov et al, 2017) This behaviour achieved the stated objective (high bottom face of the red block) at the expense of what the designer actually cares about (stacking it on top of the blue one). Instead of performing the relatively difficult maneuver of picking up the red block and placing it on top of the blue one, the agent simply flipped over the red block to collect the reward. The agent was rewarded for the height of the bottom face of the red block when it is not touching the block. In a Lego stacking task, the desired outcome was for a red block to end up on top of a blue block. In this post, we review possible causes for specification gaming, share examples of where this happens in practice, and argue for further work on principled approaches to overcoming specification problems. These behaviours are common, and we have collected around 60 examples so far (aggregating existing lists and ongoing contributions from the AI community). For example, a reinforcement learning agent can find a shortcut to getting lots of reward without completing the task as intended by the human designer. This problem also arises in the design of artificial agents. In the real world, when rewarded for doing well on a homework assignment, a student might copy another student to get the right answers, rather than learning the material - and thus exploit a loophole in the task specification. Readers may have heard the myth of King Midas and the golden touch, in which the king asks that anything he touches be turned to gold - but soon finds that even food and drink turn to metal in his hands.

We have all had experiences with specification gaming, even if not by this name. Specification gaming is a behaviour that satisfies the literal specification of an objective without achieving the intended outcome.
