An anonymous reader quotes a report from MIT Technology Review: In a paper published in Nature today, DeepMind, Alphabet's AI subsidiary, has once again used lessons from reinforcement learning to propose a new theory about the reward mechanisms within our brains. The hypothesis, supported by initial experimental findings, could not only improve our understanding of mental health and motivation. It could also validate the current direction of AI research toward building more human-like general intelligence. At a high level, reinforcement learning follows the insight derived from Pavlov's dogs: it's possible to teach an agent to master complex, novel tasks through only positive and negative feedback. An algorithm begins learning an assigned task by randomly predicting which action might earn it a reward. It then takes the action, observes the real reward, and adjusts its prediction based on the margin of error. Over millions or even billions of trials, the algorithm's prediction errors converge to zero, at which point it knows precisely which actions to take to maximize its reward and so complete its task.
It turns out the brain's reward system works in much the same way -- a discovery made in the 1990s, inspired by reinforcement-learning algorithms. When a human or animal is about to perform an action, its dopamine neurons make a prediction about the expected reward. Once the actual reward is received, they then fire off an amount of dopamine that corresponds to the prediction error. A better reward than expected triggers a strong dopamine release, while a worse reward than expected suppresses the chemical's production. The dopamine, in other words, serves as a correction signal, telling the neurons to adjust their predictions until they converge to reality. The phenomenon, known as reward prediction error, works much like a reinforcement-learning algorithm. The improved algorithm changes the way it predicts rewards. "Whereas the old approach estimated rewards as a single number -- meant to equal the average expected outcome -- the new approach represents them more accurately as a distribution," the report says. This lends itself to a new hypothesis: Do dopamine neurons also predict rewards in the same distributional way?
After testing this theory, DeepMind found "compelling evidence that the brain indeed uses distributional reward predictions to strengthen its learning algorithm," reports MIT Technology Review.
Read more of this story at Slashdot.