What is it about?
We all make choices on a daily basis. Long established is the fact that the value of choice options is flexibly and continuously updated by external reinforcement, such as social kudos or money. But does learning stagnate if feedback is not available such as when practicing an instrument from home? Our work builds on recent empirical evidence indicating that the subjective feeling of confidence acts as an internally generated reinforcement learning signal when external feedback is unavailable. A tentative conclusion from these earlier studies is that internal feedback utilizes a similar neural machinery and computational logic as in instances of learning in which decision-contingent external reward or feedback is available. In the present study, we sought to test the generalizability of such confidence-based learning signals by investigating them in a key domain of reward-based learning: instrumental conditioning. We reasoned that if choice confidence acts as a reinforcement signal in the absence of feedback, it should also affect the subjective value of chosen options. Indeed, the idea that the mere act of a choice impacts the subjective values of choice options has been considered before. The most prominent example is Leo Festinger’s cognitive dissonance theory, which posits that the act of a choice influences subjective values as a form of post-hoc rationalization. To experimentally probe such value-based learning in the absence of external feedback and the role of confidence therein, we designed a value-based decision-making task in which participants had to learn the value of initially neutral stimuli in phases with and without external reward feedback, while reporting their subjective confidence after each choice. In agreement with our hypothesis, we found signatures of such confidence-based learning, including an increase in subjective confidence and choice consistency in phases without feedback, both pointing to a self-reinforcement of choice options. To better understand the mechanisms of value learning in the absence of feedback, we devised a family of computational models in which learning is based on confidence prediction errors (analogous to reward prediction errors). A statistical model comparison demonstrated that these confidence-based learning models outperformed classical reinforcement learning models (which would predict either no change in subjective values or devaluation over time). Intriguingly, an analysis of computational parameters showed that individuals with more volatile reward-based learning also showed more volatile confidence-based learning suggesting a common underlying trait.
Photo by Towfiqu barbhuiya on Unsplash
Why is it important?
Together, our findings provide evidence for a fundamental parallel between external reward-based and internal confidence-based feedback in human instrumental conditioning.
Read the Original
This page is a summary of: The value of confidence: Confidence prediction errors drive value-based learning in the absence of external feedback, PLoS Computational Biology, October 2022, PLOS,
You can read the full text:
The following have contributed to this page