Skip to main content
  • Poster presentation
  • Open access
  • Published:

Biologically plausible reinforcement learning of continuous actions

Humans and animals have the ability to perform very precise movements to obtain rewards. For instance, it is no problem at all to pick up a mug of coffee from your desk while you are working. Unfortunately, it is unknown how exactly the non-linear mapping between sensory inputs (e.g. your mug on the retina) and the correct motor actions (e.g. a set of joint angles) are learned by the brain. Here we show how a biologically plausible learning scheme can learn to perform non-linear transformations from sensory inputs to continuous actions based on reinforcement learning.

To arrive at our novel scheme, we built on the idea of attention-gated reinforcement learning (AGREL) [1], a biologically plausible learning scheme that explains how networks of neurons can learn to perform non-linear transformations from sensory inputs to discrete actions (e.g. pressing a button) based on reinforcement learning [2]. We recently showed that the AGREL learning scheme can be generalized to perform multiple simultaneous discrete actions [3], and we now show how this scheme can be further generalized to continuous action spaces. The key idea is that motor areas have feedback connections to earlier processing layers which inform the network about the selected action. Synaptic plasticity is constrained to those synapses that were involved in the decision, and it follows a simple Hebbian rule which is gated by a globally available neuromodulatory signal that codes reward prediction errors. In our novel scheme motor units are situated in a population coding layer that encodes the outcome of the decision process as a bump of activations [4]. This contrasts to our earlier work where single motor units code for actions [1, 3]. We show that the synaptic updates perform stochastic gradient descent on the prediction error that results from the combined action-value prediction of all the motor units that encoded the decision. Unlike other reinforcement learning based approaches, e.g. [5], our reinforcement learning rule is powerful enough to learn tasks that require non-linear transformations. The distribution of population centers in the motor layer can also be automatically adapted to task demands, yielding more representational power when actions need to be precise.

We show that the novel scheme can learn to perform non-linear transformations from sensory inputs to motor outputs in a variety of direct reward tasks. The model can explain how visuomotor coordinate transforms might be learned by reinforcement learning instead of semi-supervised learning as used in [6]. It might also explain how humans learn to weigh the accuracy of their movement against the potential rewards and punishments for making inaccurate movements as in the visually guided movement task described in [7].

References

  1. Roelfsema PR, van Ooyen A: Attention-gated reinforcement learning of internal representations for classification. Neural Comp. 2005, 17: 2176-2214. 10.1162/0899766054615699.

    Article  Google Scholar 

  2. Sutton RS, Barto AG: Reinforcement Learning: an introduction. 1998, MIT Press

    Google Scholar 

  3. Rombouts JO, van Ooyen A, Roelfsema PR, Bohte SM: Biologically Plausible Multi-dimensional Reinforcement Learning in Neural Networks. ICANN. 2012, 443-450.

    Google Scholar 

  4. Zhang K: Representation of spatial orientation by the intrinsic dynamics of the head-direction cell ensemble: a theory. J Neurosci. 1996, 16: 2112-2126.

    CAS  PubMed  Google Scholar 

  5. Ognibene D, Rega A, Baldassarre G: A model of reaching that integrates reinforcement learning and population encoding of postures. From Animals to Animats 9. 2006, 381-393.

    Chapter  Google Scholar 

  6. Ghahramani Z, Wolpert DM, Jordan MI: Generalization to local remappings of the visuomotor coordinate transformation. J Neurosci. 1996, 16: 7085-7096.

    CAS  PubMed  Google Scholar 

  7. Trommershäuser J, Maloney LT, Landy MS: Statistical decision theory and the selection of rapid, goal-directed movements. J Opt Soc Am A Opt Image Sci Vis. 2003, 20: 1419-1433. 10.1364/JOSAA.20.001419.

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaldert O Rombouts.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Rombouts, J.O., Roelfsema, P.R. & Bohte, S.M. Biologically plausible reinforcement learning of continuous actions. BMC Neurosci 14 (Suppl 1), P28 (2013). https://doi.org/10.1186/1471-2202-14-S1-P28

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/1471-2202-14-S1-P28

Keywords