Introduction

The noradrenergic system supports attention, working memory and learning functions1,2,3,4,5,6,7,8,9,10. These higher cognitive functions are critically supported by bi-directional projections of noradrenergic neurons in the locus coeruleus and neurons in the anterior cingulate and prefrontal cortex11,12,13,14,15. Optogenetic evidence has directly implicated noradrenergic activation to support adaptative behaviors. Noradrenergic neurons in the locus coeruleus projecting to the frontal cortex of mice show higher firing rates when a choice leads to an unexpected outcome that requires adjusting behavior14. Silencing the noradrenergic projections to the frontal cortex in mice impairs behavioral switching14, while enhancing noradrenergic neuron activation in the locus coeruleus can increase the speed of behavioral switching in mice16. These recent insights suggest that noradrenergic modulation of prefrontal circuits might be a key mechanism in the nonhuman primate (NHP) brain for the flexible switching of complex cognitive states such as the updating of attention sets.

There are various ways how noradrenergic modulation could support the flexible switching of cognitive states. One possibility is that tonic noradrenergic enhancement modulates arousal17, and increases exploratory behaviors18. However, while noradrenergic modulation can have positive effects by ‘releasing’ behavior from following expected values in order to seek novel and potentially more valuable information19, stimulation of noradrenergic neurons may also shift behavior to randomness15 or trigger behavioral shifts away from still valuable foraging patches, reflecting mal-adaptive or suboptimal behaviors20. Another means for noradrenaline to support the flexible switching of cognitive states is by enhancing the signal-to-noise ratio (SNR) of neuronally encoded salient events including enhanced representations of task-relevant stimuli21,22,23,24,25,26,27. For example, noradrenergic stimulation of the lateral prefrontal cortex can enhance persistent delay firing of neurons when they encode their preferred spatial location in short-term memory28. However, persistent working memory representations in lateral prefrontal cortex cannot easily explain cognitively flexible behavior, which is characterized by rapidly changing working memory representations to varying task demands and reward contingencies29,30. These are processes more closely associated with medial and orbital prefrontal circuits rather than with lateral prefrontal cortex31.

In order to shed light on how noradrenergic modulation improves cognitive flexibility and neural signaling we set out testing whether enhancing α2A noradrenergic receptors is sufficient to improve cognitive flexibility and enhance neuronal encoding of learning-relevant task variables in the anterior cingulate cortex (ACC), the dorsolateral prefrontal cortex (dlPFC, areas 9/46d and 9/46v), and the head of the caudate nucleus (striatum). We investigated these questions by administering the α2A receptor-specific drug Guanfacine in NHP performing a feature-based attentional set shifting task with multiple stimulus dimensions while we recorded single neuron activity in ACC, dlPFC, and striatum. NHP lesion studies have documented these areas are necessary for adaptive learning and each area contains neurons with activity correlating with efficient learning of abstract feature values during set shifting32,33,34,35,36,37. The task reversed color-reward associations of an attention set (Fig. 1A) and dissociated covert feature-based attention to the color of a stimulus from spatial attention to the peripheral stimulus locations and from the motor requirements to program a directional saccade (Fig. 1B). We chose Guanfacine for four main reasons. First, it is a highly selective α2A receptor agonist with 15-60x higher affinity for the α2A, over the α2B and α2C receptor subtypes38,39. Secondly, Guanfacine unlikely has off-target effects. It has a strong affinity towards post-synaptic α2A receptors, having 10× less impact on pre-synaptic α2A receptors, measured as suppression of LC firing rate, when compared to clonidine40. Thirdly, the mechanisms how Guanfacine enhances working memory in neurons of the dlPFC is well documented, providing a reference when evaluating its effects on flexible learning28,41,42,43,44,45. Fourthly, previous attempts to resolve the role of Guanfacine in modulating flexibility have remained unresolved46.

Fig. 1: Task paradigm anatomical recording locations and behavioral results.
figure 1

A Monkeys performed a feature-based reversal learning task that rewarded one of two colors in blocks, reversing the rewarded color without cue when at least 30 trials were completed and a 90% performance criterion (over preceding 12 trials) was reached. B In each trials monkeys fixated and covertly processed color and a motion onset of two peripheral stimuli. One, or both stimuli transiently dimmed at an unpredictable time and subject had to ignore the direction of motion of the distractor, while making a saccade in the direction of the target stimuli. C Reconstructed electrode recording locations in dlPFC, ACC and striatum. D Performance for learned blocks (first block excluded) in Guanfacine and vehicle sessions (shading denote SE). Learning curves were smoothed (4-trial window) from trial 4 onwards. Horizontal dashed line shows chance probability. Vertical dashed lines represent the median first trial at which criterion proportion correct responses of ≥0.7 (over successive 10 trials) was reached for Guanfacine (12) and vehicle (16). E Proportion correct in trials before and after error trials (ECn analysis) with Guanfacine (red; n = 29 sessions) and vehicle (blue; n = 76 sessions). Right panel: proportion correct after the first error in a block. Error bars are SE of sample mean differences. F ECn and CCn (proportion correct after correct trials) analysis results showing the difference in Guanfacine (n = 29 sessions) - vehicle (n = 76 sessions) for the first 9 trials in a block (left) and for trials 10 onwards (right). Error bars are SE of sample mean differences. *p = 0.022, post-hoc Tukey’s test. Source data are provided as a Source Data file.

We found that α2AR stimulation improves subjects’ use of error feedback to adjust behavior and speed up reversal learning. These behavioral improvements were paralleled by enhanced encoding of negative prediction errors in ACC and of positive prediction errors in the striatum. These findings illustrate that the α2A receptor system is part of the causal chain for enhancing flexible reversal learning and shows that the ACC - striatum circuit translates α2A stimulation into improved behavior.

Results

Two rhesus monkeys performed the attention set-shifting task for 42 (monkey H, 17 with Guanfacine injections) and 66 (monkey K, 12 with Guanfacine injections) electrophysiological recording sessions. A total of 1157 single units (619 and 538 units from monkey H and K) were collected across the ACC (area 24, n = 456 units), dlPFC (areas 8a/46d/46v, n = 410 units), and in the anterior striatum (n = 291 units) (Fig. 1C).

Guanfacine enhances reversal learning and post-error adjustment

We quantified the speed of reversal learning on days with vehicle and Guanfacine administration as the first trial in a block at which performance exceeded the learning criterion (0.70 proportion correct over the subsequent 10 trials). Reversal learning took on average 16 trials and occurred 3.2 trials earlier with Guanfacine than with vehicle (permutation testing: p = 0.003; Fig. S1d), which was evident in each monkey (Fig. S1a) and also when estimating learning speed with an ideal observer statistic (Fig. S1b). Faster learning depended on reversing a previously learned color-reward association as it started with the first reversal block of a session (Fig. S1c). Guanfacine did not alter the overall proportion of reversal blocks that were learned (n.s.). These results support the behavioral improvements of reversal learning with Guanfacine that we have established in a dose-finding study47.

Faster learning could be achieved by adjusting performance after nonrewarded (error) trials or after rewarded (correct) choices. We tested for changes in performance accuracy post-error and post-correct trials early or later in a block and found a significant main effect of drug condition (F(1,412) = 4.89; p = 0.028), time-in-a-block (F(1,412) = 62.54; p < 0.001) and accuracy post-error vs post-correct trials (F(1,412) = 266.48; p < 0.001) with a significant interaction between drug condition and time in the block (F(1,412) = 6.63; p = 0.010). These results reflect that early in the block (first 9 trials), Guanfacine significantly improved post-error performance for 2 trials after an error relative to vehicle (6.9%; Tukey’s: p = 0.022) (Fig. 1E), but did not alter post-correct performance after rewarded trials relative to vehicle (3.2%; Tukey’s: p = 0.809) (Fig. 1F). The post-error enhancement of performance with Guanfacine was already observable after the first error trial in a new reversal block (Fig. 1Eright). Later in the block (from trials 10 onwards), Guanfacine did not affect performance after error trials (2.3%; Tukey’s: p = 0.963) or correct trials (1.5%; Tukey’s: p = 0.997). Guanfacine increased the rate that subjects broke fixation during the covert attention epoch of the task (Fig. S2a), but did not affect other error types (Fig. S2a, b), Beyond accuracy, Guanfacine did not affect how fast subsequent trials were initiated (Fig. S2c) and modestly reduced reaction times in one of the animals (Fig. S2d) (see Supplemental Results).

Pupils constrict with Guanfacine

We analyzed the pupil diameter of the monkeys to infer the effect of Guanfacine on noradrenergic neuron activity in the locus coeruleus (LC). During the first three blocks of each session, when Guanfacine concentrations were highest, monkeys had a more constricted pupil diameter compared to vehicle (monkey H: p < 0.001; monkey K: p = 0.010; Fig. S1d). This finding suggests Guanfacine reduced LC activity as expected from physiological studies that showed a2A-noradrenergic receptors are expressed pre-synaptically on LC terminals where they act as inhibitory auto-receptors that decrease noradrenergic release40,48.

Guanfacine does not alter overall firing rates or firing variability

Guanfacine’s behavioral effects might relate to overall firing rate changes in ACC or dlPFC based on studies indicating Guanfacine increases excitability in prefrontal pyramidal neurons28,49,50. Since α2A adrenoceptors are likely differentially expressed in pyramidal neurons and interneurons51,52,53,54,55 we first split the neurons into narrow and broad spiking (NS and BS) neurons. Previous work has shown that fast spiking interneurons have faster (narrower) spike-width37,56, and that neurons with narrower spike-width suppress multiunit activity consistent with being inhibitory33. NS neurons in dlPFC and ACC can thus be considered putative interneurons and are well separated from BS neurons (Fig. 2A). In the striatum NS neurons are mostly putative fast spiking interneurons and are well separated from striatal BS neurons, which encompass mostly medium spiny interneurons (Fig. 2B). Overall, we found only marginal and non-systematic changes in firing rates with Guanfacine across neurons (Fig. S3a). Changes in firing rate were limited to BS neurons that showed reduced firing in dlPFC in the attention epoch (p = 0.034) and in the striatum in the feedback epoch (p = 0.008). Guanfacine did not alter firing variability (coefficient of variation, CV), or the regularity of local interspike intervals (measured as the Local Variability, LV, see methods) in the attention or feedback epoch (Fig. S3b). We also tested whether Guanfacine affected correlations of spike counts between pairs of neurons and found no changes in dlPFC or striatum, but moderately reduced correlations in ACC during the feedback (p < 0.001) and attention epoch (p < 0.001) (Fig. S3c). The reduced firing correlations are consistent with enhanced phasic noradrenergic activation in the ACC (see Supplemental Discussion).

Fig. 2: Narrow spiking neurons show stronger firing correlations with outcome variables than broad spiking neurons in dlPFC, ACC and striatum.
figure 2

A Waveform parameters (left) and normalized action potential shapes (middle) of narrow (red) and broad (blue) spiking neurons recorded in dlPFC and ACC. Right: Time to repolarization and peak-to-trough durations distinguishes narrow from broad spiking neurons. B Waveform parameters (left) and normalized action potentials for narrow (red) and broad (blue) spiking neurons recorded in the striatum (middle) distinguished by their spikes’ peak width at half peak and initial slope of valley decay (right). C Differences in correlations (Guanfacine minus vehicle) of firing rate and different outcome variables (y-axis) for narrow spiking neurons recorded in the dlPFC (left 6 column), the ACC (middle columns) and the striatum (rightmost columns). Thick outlines of cells denote p < 0.05 significance, green squares signify a trend at p < 0.075; two-sided permutation test. Numbers within cells denote the number of neurons in the Guanfacine condition. DL during learning, AL after learning. D The same as C but for broad spiking neurons. Source data are provided as a Source Data file.

Guanfacine enhances neural encoding of trial outcomes and prediction errors

Reversing attention sets relies on recognizing when previously rewarded choices lead to unexpected errors indicating a reversal. We tested this by first quantifying how Guanfacine modulated the encoding of erroneous versus correct outcomes. The overall firing rate correlations with trial outcomes increased after outcome onset and was stronger in the Guanfacine than vehicle condition in dlPFC, striatum and with a statistical trend in the ACC (Fig. 3A; permutation test, dlPFC: p < 0.05 during 0.5–0.6s; striatum: p < 0.05 during 0.05–0.3s; ACC: p < 0.075 during 0.4–0.45s). Enhanced outcome encoding with Guanfacine reflected an on average increased firing to erroneous outcomes in the dlPFC, ACC, and striatum (average firing to error for Guanfacine and vehicle: 2.8 (±0.5) Hz/2.7 (± −0.2) Hz, 3.3 (±0.6) Hz/3.9 (±0.3) Hz, and 3.4 (±1.0) Hz/3.1 (±0.4) Hz, and to correct outcomes: 2.6 (±0.5) Hz/2.4 (±0.2) Hz, 3.3 (±0.7) Hz/3.6 (±0.3) Hz, 3.1 (±0.7) Hz/2.9 (±0.4) Hz for ACC, dlPFC, and striatum, respectively). We quantified the difference in [firing rate X outcome] correlations for Guanfacine minus vehicle for outcomes of trials during learning and after learning completed and for the current and previous trials. We found that Guanfacine increased the correlations with trial outcome of the current trial in each of the recorded brain areas more prominently when learning was completed, i.e., in trials after the learning criterion was reached (dlPFC, ACC and striatum, each: p < 0.05) (Fig. 3B). This finding may indicate enhanced monitoring of outcomes after learning but cannot account for the enhanced learning speed.

Fig. 3: Guanfacine increases correlations of firing rate with reward prediction errors in anterior cingulate cortex and striatum.
figure 3

A Firing rate correlations with trial outcome (correct vs. error) increase with Guanfacine and vehicle. Horizontal bar denote statistical difference (two-sided permutation test) at p < 0.05 (solid black) and as a trend (p < 0.075, grey dashed). B Differences in correlations (Guanfacine minus vehicle) of firing rate in the feedback epoch for different outcome variables (y-axis) for neurons record in the dlPFC (left 6 column), the ACC (middle columns) and the striatum (rightmost columns). Stronger colors mean higher (more positive or more negative) correlation coefficients with Guanfacine than vehicle Black squares denote p < 0.05 significance, green squares signify a trend at p < 0.075; two-sided permutation test. Numbers within cells denote the number of neurons in the Guanfacine condition. DL during learning, AL after learning. C Same format as A for firing correlations with reward prediction error variables. Source data are provided as a Source Data file.

To test more directly for correlations of firing rate with a learning signal, we calculated the outcome prediction error (PE) for each trial, which reflects how unexpected an outcome was. PE’s serve as learning signals in reinforcement learning (RL) models to reduce the error in predicting the value of stimulus features in subsequent trials. To quantify whether PEs are possible learning signals in the current task we fit behavior with various RL models and compared the results to a fit using a rule-based Win-Stay/Lose-Switch (WS/LS) model that does not use PEs. WS/LS is a parsimonious strategy when adjusting behavior with deterministic reward contingencies57,58 by choosing a stimulus using only the outcome of the past trial to repeat choosing the stimulus with the same color when it was previously rewarded (Win-Stay), or switch to the other color otherwise (Lose-Switch). The WS/LS model was compared to RL models that represented either (1) the expected values of stimulus colors and updated them using PEs, or (2) that represented stimulus colors and additionally the stimulus locations and directions of motion and enhanced learning by decaying values of nonchosen stimulus features, or (3) that additionally weighted feature dimensions by inferring a dimension weight for color, location and motion direction (see Methods). Comparing how the models fit the behavioral data showed that each monkeys performance was best accounted for with the RL model that weighted the feature dimensions, decayed non-chosen feature values, and updated values using PEs (Supplementary Results). The model was best in predicting performance of the feature-based reversal learning task in a previous study47 and was used to estimate positive, negative, and signed PEs encoded by neurons in dlPFC, ACC, and the striatum during learning, with stronger PE signaling in these areas predicting stronger encoding of the new target feature after learning37. We used the PEs of the best-performing RL model to test whether Guanfacine enhances attentional set shifting by modulating PE encoding. We found that Guanfacine enhanced how strongly firing rates in the feedback epoch of the task correlated with negative PEs in the ACC (positive correlations: r = 0.05, p = 0.039, stat. power: 0.8881), with negative PEs during trials prior to reaching learning criterion (signed correlations: r = 0.11, p = 0.029, stat. power: 1; negative correlation r = 0.13: p = 0.016, stat. power: 0.5947), with positive PEs in the striatum (negative correlations: r = 0.14, p = 0.049, stat. power: 0.4715); with positive PEs during trials prior to reaching learning criterion: r = 0.09, p = 0.016, stat. power: 1, for all trials with signed PEs in the striatum (negative correlations: r = 0.14, p = 0.024, stat. power: 0.7274), and for signed PEs restricted to the trials prior to reaching learning criterion (signed correlations: r = 0.09, p = 0.014, stat. power: 1; negative correlations: r = 0.18, p < 0.001, stat. power: 0.7415) (Fig. 3C). Neurons that encode PEs respond to the difference of the received relative to the expected outcome, suggesting that PE encoding utilizes outcome information but also can occur irrespective of outcomes. Consistent with this notion and with results from a prior study37, we found that amongst the neurons that significantly correlated to any of the PE variables, 28% of them did not significantly correlate with outcome, while the other 72% did.

The effects of Guanfacine on PE signaling and on outcome encoding were primarily a modulation of the strength of encoding rather than a change of the type of task variables that were encoded (Figs. S4, S5a). We tested this by rank ordering the proportion of neurons encoding task variables in the Guanfacine condition and comparing the rank ordered proportions of neurons encoding task variables with the vehicle condition. During the outcome epoch there were significant correlations of the type of variables encoded in dlPFC (p = 0.006; Tau = 0.477), the ACC (p = 0.014; Tau = 0.425), and the striatum (p = 0.014; Tau = 0.425) (Fig. S5a). Significant correlations indicate that the experimental conditions (Guanfacine versus vehicle) were similar with regard to the variables encoded, suggesting that differences between conditions are primarily variations in the strength of encoding the same type of variables. However, there were relative changes of encoding ranks in the striatum where there were relative increases of the proportion of neurons encoding outcomes with Guanfacine irrespective of learning status, but comparatively less neurons that only encoded outcomes selectively prior to reaching learning-criterion (Fig. S5a).

Guanfacine enhances outcome encoding particularly for putative interneurons

Previous studies have shown that reversal learning recruits proportionally more narrow spiking (NS) neurons in ACC, dlPFC and striatum32,33,37. We successfully separated NS from BS neurons (Fig. 2A, B), which allowed testing whether Guanfacine’s effects on neural coding varied by functional cell type. We found that NS neurons (Fig. 2C), but not BS neurons (Fig. 2D), encoded outcomes stronger in the Guanfacine condition during the feedback epoch in each of the three recorded brain area. With Guanfacine, NS neurons encoded outcomes stronger in dlPFC (signed correlations: p = 0.016, stat. power: 1; positive correlations: p = 0.004, stat. power: 0.6751), in ACC (negative correlations: p = 0.027, stat. power: 0.4855), and in the striatum (signed correlations: p = 0.001, stat. power: 1; positive correlations: p < 0.001, stat. power: 0.7187; negative correlations: p = 0.006, stat. power: 0.6672) (Fig. 2C). Outcome encoding was stronger overall, and was also evident when the analysis was restricted to trials after learning criterion was reached with stronger outcome encoding after learning in dlPFC (positive correlations: p = 0.002, stat. power: 0.7265), ACC (positive correlations: p = 0.030, stat. power: 0.4147), and striatum (signed correlations: p = 0.010, stat. power: 1; for positive correlations: p = 0.014, stat. power: 0.6395). In contrast to NS neurons, coding of outcomes of BS neurons were unchanged in ACC and striatum. In the dlPFC BS neurons showed reduced encoding of trial outcomes (p = 0.016, stat. power: 0.5673) (Fig. 2D). The effect of Guanfacine on NS neurons in the feedback epoch was specific to the encoding of outcomes and was not apparent for other stimulus features or PEs (Fig. S6a–d). The differential effect of neuron type on the firing correlations with outcome was not accounted for by differences in firing rates of NS and BS neurons. In the striatum outcome encoding was more likely for NS neurons across the whole range of firing rates, while in the dlPFC and ACC, firing rate did impact how likely neurons were to be significantly correlated with outcome, but this was evident for both NS and BS neurons (Supplementary Results, Fig. S7).

Guanfacine enhances the strength and not the prevalence of coding during the feedback epoch

In addition to modulating PEs and outcome encoding, Guanfacine also increased the encoding of the chosen stimulus color in the dlPFC (p = 0.003, stat. power: 0.7619), the chosen location in the striatum (p = 0.019, stat. power: 0.8618) and the target location in the ACC (p = 0.024, stat. power: 0.9207) in the feedback epoch of the task (Fig. 4). These effects were evident in example neurons, e.g., for the encoding of the target color and the chosen color (Fig. 4A), and statistically reliable at the population level (Fig. 4B). Guanfacine also reduced encoding of the motion direction of the chosen stimulus in dlPFC (p = 0.032, stat. power: 1). These changes in Guanfacine compared to vehicle sessions affected primarily the strength of encoding without apparent changes of the relative ranking of which task or model variables were encoded in dlPFC, ACC, and striatum (see above, Fig. S5A).

Fig. 4: Guanfacine affects the encoding of the target stimulus, and the encoding of color, motion direction and location of the chosen stimulus in the feedback epoch.
figure 4

A Example neurons showing the firing rate differences for choosing color A vs. B (leftmost three columns) and the (rewarded) target versus (unrewarded) distractor color (three rightmost columns). Top panels show firing rates averaged across all trials, while bottom panel colormaps show firing rates across trials (y-axis) and time to reward (x-axis). B Differences in correlations (Guanfacine minus vehicle) of firing rate and six variables about the chosen stimulus (color, location, motion direction) and target features (color) (y-axis) for neurons recorded in the dlPFC (six leftmost columns), the ACC (middle columns) and the striatum (rightmost columns). Thick outlined squares denote significant (p < 0.05) differences; two-sided permutation test. Numbers within cells denote the number of neurons in the Guanfacine condition. Source data are provided as a Source Data file.

Guanfacine enhances neural encoding of task-relevant color in ACC during the attention epoch

A prior study suggested that Guanfacine can enhance in the dlPFC the neural representations of attended, short-term memorized stimulus locations during a delay period prior to making a choice28. In our feature-based reversal task prior to making a choice, subjects covertly shifted attention to one of two stimuli and sustained covert attention to the color and location of that stimulus until the go-cue (a transient stimulus dimming) occurred. During this sustained attention epoch we found that Guanfacine significantly enhanced neuronal encoding of the target color in the ACC specifically for NS neurons (overall: p = 0.006, stat. power: 1; for positive correlations: p < 0.001, stat. power: 0.9999; for negative correlations: p = 0.021, stat. power: 0.5671; Fig. S8a), but not for BS neurons (n.s.; Fig. S8b). In addition to increased strength of encoding the target color in ACC we also found in the attention epoch that striatum neurons were more likely to fire stronger when the previous trial had been correct (Fig. S9b) and when during learning the correct stimulus of the current trial had a higher choice probability (Fig. S9c), but these changes did not translate into higher encoding strength across the population. In dlPFC, the choice probability and value of the correct stimulus (Fig. S9c) and the target color and location (Fig. S9a) were four variables that were most likely encoded in both, Guanfacine and vehicle conditions, but Guanfacine altered how probable single neurons encoded during learning the value of the incorrect stimulus (more likely) and the value of the correct stimulus (less likely) (Fig. S9c). These effects did not translate into population-wide changes in the encoding strength, but indicate that Guanfacine altered how task variables are encoded during the attention epoch. This conclusion is supported by the lack of a correlation of the rank-ordered, best-encoded task variables between Guanfacine versus vehicle conditions in dlPFC (p = 0.970 n.s.; Tau = −0.007), the ACC (p = 0.112 n.s.; Tau = −0.281), and the striatum (p = 0.175 n.s.; Tau = 0.242) (Fig. S5b).

Discussion

Here, we reported the behavioral and neural activity signatures of noradrenergic α2A receptor-specific enhancement of flexible learning. We developed a feature-based attentional set-shifting task that required reversing the target color of an attention set through trial-and-error learning in the presence of interference from other visual stimulus features (location and direction of motion) that were randomly linked to reward. A previous behavioral study suggested that Guanfacine facilitates reversing attention sets through two mechanisms, enhancing the weight with which prediction errors are used to update expected values and by reducing the influence of non-attended features47. Here, we first confirmed in two subjects the pro-cognitive behavioral effect of Guanfacine during learning and clarified that an increased speed of reversing attention sets is linked to enhanced post-error adjustment of performance (Fig. 1D, E). Secondly, we found neuronal correlates of faster set shifting in the ACC, where Guanfacine enhanced negative PE signaling, and in the striatum, where Guanfacine enhanced positive PE signaling (Fig. 3C). Enhanced encoding of PEs was accompanied by increased firing rate differences between erroneous and correct outcomes in ACC, dlPFC, and striatum (Figs. 2, 3). These findings document a neuronal modulation of outcome and prediction error signaling that are associated with faster set shifting by facilitating the updating of expected values after a reversal. Consistent with this scenario we found that Guanfacine increased neuronal encoding of the target feature color in both, dlPFC and in ACC (Fig. 4) specifically during the attention epoch of the task when attention is covertly focused onto the target stimulus.

Taken together, these findings suggest that noradrenergic signaling in all three recorded brain areas, ACC, dlPFC, and striatum, supports cognitive flexibility by enhancing prediction error signaling and improving the representation of the target feature of the currently active attention set. This conclusion is consistent with recent causal manipulation studies in rodents14,16, and calls upon a refinement of frameworks that propose a more general involvement of noradrenergic signaling to either enhance working memory for task-relevant (spatial) stimulus representations28, or track the uncertainty of an environment (including the unexpectedness of outcomes) and facilitate exploratory behaviors by reducing uncertainty59,60.

Guanfacine enhances prediction error signaling in ACC and striatum

Our findings of enhanced PE encoding and faster reversal learning support conclusions from studies in mice documenting noradrenergic neurons in the locus coeruleus causally support flexible behavioral switching of stimulus-reward associations14,16. In these studies, activity of locus coeruleus neurons were either necessary for faster learning14 or were predictive of faster behavioral switches16. The results reported here suggest that these noradrenergic signals in the LC are translated into stronger prediction error encoding along the ACC-striatum pathway. According to this suggestion, noradrenergic inputs affect those ACC and striatum neurons that encode prediction errors and can be boosted either by activating the locus coeruleus (in the rodent studies), or by increasing α2A noradrenergic activation pharmacologically with Guanfacine to facilitate synaptic activity of neurons encoding PEs in the ACC and striatum. Our results predict that this modulation is specific to the subset of neurons in ACC and striatum that encode outcomes and PEs, because Guanfacine did neither affect the overall firing rates of neurons (Fig. S3a, b), nor did it change the prevalence of neurons encoding other task variables during the processing of outcomes (Figs. S4, S5a). According to this interpretation, Guanfacine modulates the intrinsic neuronal signaling of each of these brain areas without recruiting additional neuronal populations.

One caveat to this scenario is that our study does not distinguish whether increases of prefrontal α2A receptor activation with Guanfacine relate to a potential reduction of tonic LC activity expected with Guanfacine (Supplementary Discussion). There are lines of indirect evidence, however, that suggest that the systemic administration of Guanfacine in our study drove the observed behavioral enhancement and neuronal results primarily via cortical α2A receptors rather than via reduced LC tone. First, we observed pro-cognitive effects in the absence of overall firing rate changes (Fig. S3a), and without altering the prevalence of neurons encoding other task variables during the processing of outcomes (Figs. S4, S5). Second, we observed on average reduced pair-wise spike count correlations with Guanfacine, which is opposite to what would be expected from reduced LC activity, but consistent with increased phasic noradrenergic activation61. Third, another α2 agonist, Clonidine, which has lower affinity towards the α2A receptor but 10× the potency of Guanfacine in suppressing LC activity38,40 has weaker pro-cognitive effects when administered systemically62,63,64. Furthermore, the administration of α2A receptor antagonists has been shown to attenuate Guanfacine’s (and Clonidine’s) pro-cognitive effects50 while stimulating LC in a rodent study has been reported to enhance prediction error signaling in the visual and motor cortices of rodents65. In summary, while we cannot rule out a potential role of reduced LC activity from systemic Guanfacine administration, our results predict that this modulation is specific to the subset of neurons in ACC and striatum that encode outcomes and PEs.

Guanfacine enhances signal-to-noise ratio of neural signaling

Guanfacine enhanced the correlation of neural firing with task-relevant variables without increasing the number of neurons with firing correlations. This pattern of results reflects an enhanced neuronal SNR for coding task-relevant variables, which included better encoding of the color feature of the target stimulus. This conclusion corroborates and extends the seminal NHP study of Guanfacine in dlPFC by Arnsten, Wang and colleagues28. In their study locally injected Guanfacine enhanced persistent delay period neural firing in dlPFC for those spatial locations that were preferred by the recorded neurons. Enhanced firing for preferred over nonpreferred spatial locations indexes an enhanced SNR of short-term memory representations. Our study extends these insights from the dlPFC and performance of the delayed-match-to-sample task to the ACC and striatum and a task requiring feature-based attentional set-shifting performance that does not involve short-term memory as the stimulus color remained visible until a decision was made. The conclusions from both studies complement each other, suggesting that α2A receptor activation has a similar gain-enhancing effect in different brain systems. Gain modulation has been linked to noradrenergic activation, capable of potentiating responses of activated neurons that are already recruited by ongoing task demands1,2,66. Taken together the evidence from Wang et al., and our study predicts that α2A receptors may enhance information processing beyond the ACC, dlPFC and striatum also for other brain areas and for task variables preferentially encoded by an area. In the dlPFC the enhanced SNR firing could be traced back to the activation of post-synaptic α2A receptors on dendritic spines of basal dendrites of pyramidal cells in the dlPFC, which increased persistent firing during a delay through a reduced intracellular cAMP signaling67,68. Whether such a postsynaptic α2AR mechanism is specific to the dlPFC and working memory processes, or whether it can be extended to the ACC and flexible learning, or also apply to noradrenergic modulation in posterior parietal or visual cortices is an important question for future research.

Guanfacine’s effect on narrow spiking outcome signaling

Our electrophysiological recordings allowed distinguishing narrow spiking (NS) from broad spiking (BS) neurons (Fig. 2A). The narrow action potential of NS neurons characterizes neurons expressing K+ channels (KV3 family), which are highly expressed on PV+ basket cells that have a fast-spiking phenotype69,70,71,72,73,74,75, but also occur in non-fast spiking neuron types including in an estimated ~87% of all SST+ interneurons and in ~20% of VIP+ interneurons76,77,78, as well as in a subgroup of pyramidal cells that has been studied in primate motor cortex79. In the prefrontal and ACC, NS neurons will thus contain diverse interneuron types (and a minority of pyramidal cells) which have a net suppressive influence on the multiunit spiking activity of the local circuit, indicating their inhibitory role33. We therefore denote NS neurons as putative interneurons without implying a specific molecular interneuron type.

We found that Guanfacine enhanced outcome encoding specifically for these putative interneurons in each of the three recorded brain areas but had no effect on BS neurons (Fig. 2B, C). The clarity of this finding was unexpected and is intriguing, suggesting that the α2AR agonist may selectively modulate outcome signaling of putative interneurons. This interpretation is consistent with studies documenting that adrenoceptor expression and modulation is stronger for interneurons than pyramidal cells with α2 and β adrenoceptors enhancing their inhibitory actions while α1 adrenoceptors decrease their inhibitory actions in prefrontal cortex51,52,53,54,55, as well as in sensory and sensorimotor cortices80,81,82,83,84. Thus, Guanfacine may enhance the inhibitory tone in neural circuits. Our findings suggest that this effect was not translated into changes in overall firing, because NS neurons had unaltered firing with Guanfacine, and BS cells showed only marginal and non-systematic reductions in firing. An alternative scenario is that Guanfacine enhanced the neuronal gain of inhibitory interneurons66, which could explain stronger outcome encoding of NS neurons not only during set shifting but also after reversing the attention set was completed. However, we cannot be certain whether this mechanism could also underlie enhanced learning as we had not enough NS neurons for analyzing correlations of NS neurons only in trials during learning. Future studies will need to test specifically whether the α2A receptor is critical for the prominent role of putative NS interneurons to predict learning success and encode prediction errors in ACC, dlPFC and striatum32,33,37.

Alpha 2A receptor specific improvement of flexible set shifting

Our study documents that Guanfacine enhances flexible reversal learning of attention sets, which adds clarity to the potential of Guanfacine as a highly selective α2AR acting drug to enhance attentional functions1,2,46. Our results indicate that the attentional benefit of Guanfacine includes a better recognition of erroneously attended stimuli as indicated by enhanced post-error improvement (Fig. 1E, F). The post-error adjustment effect was limited to the exploratory learning period immediately after a reversal. During the plateau performance period, Guanfacine did neither modulate post-error adjustment nor overall accuracy levels. Thus, Guanfacine specifically improved the rate of learning to reverse the attention sets, consistent with it creating a new attentional state31. This conclusion also resonates with prior modeling studies of Guanfacine47 and with human psychopharmacological studies that have associated the action of NE/dopamine reuptake inhibitors with the modulation of learning rates based on environmental uncertainty85,86,87.

We found the neural correlates of enhanced set shifting in those neural circuits in the ACC and striatum of NHPs that also are known to support guided exploratory behavior and information seeking19,88,89. Activity in the ACC, in particular, correlates with successfully exploring alternative options and gathering information about the value of options19. In the feature-based set-shifting task used here, guided exploration during the reversal period will help to overcome previous response biases when updating attentional sets31.

Previous studies with α2AR agonists in NHP have overwhelmingly described working memory improvements (reviewed in ref. 47), while rodent studies have included tasks testing reversal learning and attention set shifting and have produced complex results (e.g., 90). While our study documents that α2A stimulation is sufficient to enhance learning flexibility, its means to modulate circuits might also involve interactions with other neuromodulatory systems. For example, the noradrenerdergic system interacts with dopamine55,68,91, and with 5-HT and acetylcholine92,93,94. It is, therefore, possible that enhanced learning flexibility in our study is not only triggered by activating α2A noradrenergic receptor but also mediated by dopamine95,96,97 (Supplemental Discussion). A similar caveat should be mentioned about the primary site of action in our study. While noradrenergic effects on prefrontal cortex are well documented (see above), the possible functional roles of noradrenergic receptors in the striatum are largely unknown8. One reason for the scarcity of functional insights about striatal noradrenaline is that in NHP’s the anterior striatum receives only sparse noradrenergic projections from the locus coeruleus98, and has severalfold lower α2A receptor densities compared to the PFCl and ACC (autoradiograph estimated receptor densities range from ~300 to 700 fmol/mg protein in the ACC/dlPFC compared to ~50–100 fmol/mg protein in the caudate head99,100). However, in rodent striatum noradrenergic projections account for increased noradrenergic release in response to locus coeruleus stimulation101,102 and following mild stressors103. Moreover, striatal noradrenergic release accompanies enhanced cortico-striatal functional connectivity101. It is therefore possible that in our study, Guanfacine might have directly modulated the responsiveness of striatal neurons to outcomes. However, our study does not rule out that the firing changes in the striatum are inherited from already modulated afferent inputs from the ACC and dlPFC, which have higher α2A receptor densities and receive richer noradrenergic projections in NHP98,99,100,104.

Possible implications beyond reversing attention sets

Previous studies suggest that noradrenergic tone modulates how optimal subjects are exploiting foraging patches. Noradrenergic tone can predict whether mice show exploratory versus exploitative behavior18 and overactivation of the locus coeruleus increased exploratory behaviors sub-optimally, causing animals to leave reward-depleting patches earlier than is optimal (ref. 20). The mechanisms for suboptimal exploratory foraging behavior are largely unknown, but our study provides some clues that noradrenergic modulation of foraging will involve the speed with which expected values are updated. With a good tonic noradrenergic activity state, prediction errors are efficiently signaled, and object values effectively updated. We expect that at non-optimal tonic concentrations of noradrenaline that these value updating processes are impaired, leading either to slower recognizing when object values change in an environment or to faster updating of values leading to faster shifting of behavior. We therefore predict that the faster leaving of depleting patches with enhancing locus coeruleus tonic activity reflects a more rapid updating of value expectations leading to an earlier disengagement with the current patch20.

In summary, our findings illustrate that Guanfacine can improve adaptive, flexible updating of attention sets by increasing the efficiency of prediction error signaling in anterior cingulate and lateral prefrontal cortex, and in the striatum. These insights suggest that α2A receptor activation across the fronto-striatal network is a key player to mediate noradrenergic regulation of behavioral flexibility.

Methods

Experimental animals

All animal care and experimental protocols were approved by the York University Council on Animal Care and were in accordance with the Canadian Council on Animal Care guidelines. The data was collected at the institute in which the ethical approval was granted. Data was recorded from two male rhesus macaques (Macaca mulatta) age 7 and 9 years old. During periods of experimental recordings, subjects the access to fluid was restricted to a minimum of 20 ml/kg per day which they could earn through performance of the reversal learning task and was supplemented otherwise. Animals were provided fresh fruits and vegetables daily and had unrestricted access to chow. Fluid intake, body weight, and mental and physical hygiene were monitored daily. Animals were pair-housed in enclosures according to Canadian Council for Animal Care guidelines. Before the electrophysiological recording experiment started, the animals were implanted a custom-made titanium headpost and a custom PEEK recording chamber. All surgical procedures were performed under general anesthesia using Isoflurane (1–4%) and Ketamine HCL (10–15 mg/kg) together with Acepromazine (0.5 mg/kg). Underneath each chamber a craniotomy of 15–20 mm diameter was used to allow access to lateral prefrontal cortex, the ACC and the anterior striatum. For each recording experimental session, the animals were seated in a custom-made primate chair and placed in a dark, sound attenuated booth such that their eyes were 65 cm away from a 21’ LCD monitor with a 85 Hz refresh rate.

Experimental control, including stimulus presentation, eye positioning monitoring and reward delivery was done through MonkeyLogic (open-source software https://www.brown.edu/Research/monkeylogic/). Eye positions were calibrated and tracked monocularly using a video-based eye tracking system (Eyelink 1000 Osgoode, Ontario, Canada; 500 Hz sampling). Eye calibration occurred daily using a 9-point fixation pattern and was monitored throughout each session. Liquid reward, controlled from an air-pressure mechanical valve system (Neuronitek, London, Ontario, Canada) was delivered via a sipper tube.

Behavioral task paradigm

The animals performed a feature-based reversal learning task as previously described37,47. Subjects learned through trial-and-error which one of two grating stimuli was deterministically rewarded in any given block lasting at least 30 trials or until a 90% performance criterion over the preceding 12 trials was reached (Fig. 1A). Each grating stimulus was defined in any given trial by a combination of three features: location (left vs right), color (monkey Ha: red vs green; monkey Ke: cyan vs yellow), and motion direction of the stimulus grating (up vs down). The two stimuli always contained opposite (i.e., mutually exclusive) values for each of these three dimensions. Only color was indicative of reward value and thus is referred to as the attention cue in the main text, while location and motion were randomly associated with reward.

Trials proceeded through a motion and color cue period before a transient dimming of the target stimulus instructed the subjects to make a saccadic choice, while a dimming of distractor stimuli had to be ignored (Fig. 1B). In particular, a trial started with the appearance of a gray central fixation point, which the subjects had to fixate. After 0.5–0.9 s, two black/white gratings appeared to the left and right of the central fixation point. Following another 0.4 s, the two stimulus gratings gained a color or had their gratings drift in opposite directions (up or down), followed after 0.5–0.9 s by the onset of the second stimulus feature such that both stimuli eventually had both color and motion. After 0.4–1 s, the two stimuli dimmed simultaneously for 0.3 s or one stimulus dimmed first followed by the other separated by 0.55 s. The dimming represented a go-cue to make a saccade from the central fixation point to one of two response targets displayed above and below the central fixation point. Breaking fixation from the central fixation point before the dimming event terminated the trial. In order to acquire reward, subjects had to make either an upward or downward saccade to one of two targets in the direction matching the upward or downward motion of the stimulus grating with the rewarded color—a congruent saccade—and fixate on the target for 50 ms, no later than 0.55 s after the dimming of that stimulus. Errors were categorized either as target errors when a congruent saccade was made in response to the go-cue (dimming) of the object with the un-rewarded color, or as timing errors when an incongruent saccade is made in response to either object’s go-cue. When animals made an incorrect (incongruent) saccade to the simultaneous dimming of both objects, a target error was noted, indicating that the wrong object’s motion direction was attended. In addition, we identified a false-start error when a saccade was made earlier than 50 ms following a dimming event, indicating an anticipatory response; and we categorized fixation errors when animals failed to maintain fixation on a target for less than 50 ms.

The feature-based reversal learning task has been designed as a multidimensional learning task similar to previous studies30,105,106 while additionally dissociating learning of feature-based attentional allocation from action and location information. Correct task performance required separable cognitive processes. Monkeys had to (1) maintain central fixation while covertly attending to the peripheral stimulus that matched the rewarded color, (2) wait for the attended stimulus to transiently change luminance, and then (3) use the motion direction of the stimulus to program a saccade to the up/down-ward stimulus that matched the motion direction of the attended stimulus. The luminance change time, location and motion direction of the stimuli varied randomly from trial-to-trial. The three processes differ from previously used object reversal learning tasks (e.g., 57,58): Subjects could not directly reach to the reward-relevant object, but had to covertly attend to interpret its motion direction; Subjects had to monitor for a Go-stimulus (luminance change) that happened at unexpected times and could occur simultaneously in the distracting and the target object, requiring interference control; Subject had to map the motion direction of the attended stimulus to an action covertly, while avoiding interference of the other motion direction in the non-attended stimulus. These task requirements made the credit assignment of a received outcome difficult because an erroneous choice could be due to multiple types of choice-relevant information including attending a non-rewarded color, covertly attending the wrong stimulus location, responding to the luminance change of the non-rewarded object, interpreting the motion direction of the attended stimulus wrongly, or programming a wrong saccade direction. We have shown previously that in monkeys performing the task, neurons in the ACC, lateral prefrontal cortex and striatum encode how unexpected a reward outcome is with regard to the color, location and motion direction of the stimulus37. While the result of that study shows that prediction error information is encoded for multiple decision variables, it also documented that the prediction error coding for the reward-relevant color information was strongest and had the highest prevalence37.

Statistical measure of learning

To identify learned blocks and an individual trial where statistically reliable learning could be said to have occurred in each block, we used an ideal observer estimation maximization (EM) algorithm107,108. Briefly, this framework utilized a state equation to represent the internal learning process as a hidden Markov (or latent) process, in which the latent variable was updated with each trial. This provided an estimate of the probability of a correct choice taking into account all trials within the block (Fig. 3A, bottom). The learning trial was then defined as the trial during which the lower 95% confidence bound exceeded chance (p = 0.5) and did not drop back down below chance for the rest of the block.

Drug dosing

Guanfacine was purchased (Guanfacine hydrochloride; Sigma-Aldrich, St. Louis, MO) and prepared with sterile water vehicle (0.1 mL volume) immediately before blinded IM injections. Subjects received Guanfacine (0.075 mg/kg) or sterile water vehicle injections close to 2 h before the start of the first trial (mean: 135 ± SE 2 min). Each week contained at most a single Guanfacine administration day which was always either on Thursday or Friday while vehicle data was collected on either Tuesdays or Wednesdays; animals still trained every day. The experimenter was blinded to the injection condition and schedule until the study completion. In total, we recorded 17 and 12 Guanfacine days for monkey Ha and monkey Ke respectively. The 0.075 mg/kg dose of Guanfacine was selected because it was previously shown to be the dose that robustly enhanced performance in this task47.

Electrophysiological recording procedures and unit isolation

Reversal learning performance was stable for both monkeys across electrophysiological recordings as seen through the session-wise median learning speed (Fig. S1e, f). There was no significant correlation (slope) between the session-wise median learning speed and session number from the first to the last Guanfacine administration session for either monkey (Fig. S1e, f, red line). While neither monkey had previously been trained on any other behavioral task, monkey Ha was a veteran of this task while monkey Ke had just completed their training prior to the start of recordings.

Single contact tungsten electrodes (FHC, Bowdoinham, ME; 1.2-2.2 MOhm impedance electrodes) were used for extracellular recordings. They were loaded into up to 4 software-controlled precision micro-drives (NAN Instruments Ltd, Israel) and lowered into the brain through a 20 × 25 mm rectangular recording chamber guided by MR images. Single units were recorded in the dlPFC (area 46), the ACC (area 24), and the head of the caudate nucleus109 (Fig. 1C). Recordings in dlPFC, ACC, and striatum were from the same locations in the Guanfacine and vehicle session with no apparent anatomical bias of sampling across conditions, which was ensured also by the double-blinded, random assignment of the weekly Guanfacine injection day (see above). Data amplification, filtering and acquisition were done with a multi-channel acquisition processor (Neuralynx). Spiking activity was obtained following a 300–8000 Hz passband filter and further amplification and digitization at 40 kHz sampling rate. After the initial acquisition of highly isolated waveforms in the regions of interest, electrodes were left to stabilize for 30–60 min before the start of the task. Sorting and isolation of single unit activity was performed manually offline with the Plexon Offline Sorter, based on principal component analysis of the spike waveforms. In order to maximize statistical power in neural analyses, an extended dataset previously recorded from monkey Ha without any injections was also considered and pooled with the vehicle data for the neuronal analysis (data were not pooled for the behavioral comparison of vehicle and Guanfacine) and referred to as ‘non-drug data’. Although behavioral performance in these sessions was superior to the vehicle sessions, virtually all relevant behavioral trends and results remained consistent (data not shown).

Putative cell type classification

Highly isolated single units were classified based on the properties of their action potential waveforms using previously published methods37,110. All waveforms from highly isolated single units were normalized, aligned to their threshold crossing and averaged. Each averaged waveform was interpolated by being fit to a cubic spline and up-sampled from 40 kHz to 400 kHz. For cortical neurons (from dlPFC and ACC), the peak-to-trough duration and time for 15% repolarization (waveform peak amplitude decayed by 15%) was calculated. For striatal neurons, the peak width (at 50% of peak amplitude) and initial-slope-of-valley decay (percentage fall-off of the action potential 0.26 ms after peak amplitude) were calculated. With these variables, the first PCA was computed and used to classify each neuron as broad or narrow spiking (Fig. 2A, B). For cortical neurons, this broadly maps onto putative pyramidal neurons and interneurons respectively, while for striatal neurons, this broadly maps onto putative medium spiny neurons and interneurons respectively.

Analysis of the temporal correlation of firing rate and outcome

To compare the time course of the firing rate correlations with outcomes in the Guanfacine and vehicle conditions (Fig. 3A) we used a permutation approach. For 200 ms wide windows, stepped over the data every 50 ms relative to the feedback onset, we calculated the firing rate correlation with the trial outcome (correct versus error) for the Guanfacine and vehicle condition, as well as for n = 5000 randomizations for which the condition label (Guanfacine and vehicle) was randomly shuffled. We calculated the p-value as the likelihood that the observed, true difference was larger than the difference of the randomly shuffled distribution.

Multi-linear regression

Spike trains were transformed into spike-density functions smoothed with a Gaussian kernel with a standard deviation of 50 ms. Only correct (rewarded) and incorrect choice trials were analyzed; incorrect trials were defined as unrewarded trials where either the unrewarded object was chosen or any choice was made during the dimming (go-cue) of the unrewarded object (either before or after the dimming of the rewarded object). The average trial-wise activity during the epoch of interest (0.05–1 s during the feedback epoch and 0.05–0.7 s during the attention cue onset epoch) of each neuron was regressed to 18 variables that were classified as either stimulus variables, outcome variables or latent model variables. The 6 binary stimulus variables were the color (color 1 vs color 2), motion (up vs down) and location (left vs right) of the chosen stimulus and the color, motion and location of the rewarded stimulus. The outcome variables were trial outcomes (binary: rewarded or unrewarded), trial outcomes during learning only (see Statistical measure of learning above), trial outcomes after learning (i.e., after the trials-to-criterion was reached), prior trial outcome for correct trials (binary: rewarded trial preceded by a rewarded trial or an error trial), prior trial outcome for error trials (binary: error trial preceded by a rewarded trial or an error trial) and error trial order during learning (non-binary: ranking errors in descending order until the statistically defined learning trial). The latent model variables were all non-binary but different depending on the epoch in question. During the feedback epoch, the latent model variables were signed PEs, positive PEs, negative PEs, and the same three variables for trials during learning only. During the attention cue onset epoch, the latent model variables were the choice probability of the chosen stimulus, value of the chosen rewarded stimulus, value of the chosen unrewarded stimulus and the same three variables but for trials during learning only.

A single neuron may have multiple significant regressions and for each significant regression (neurons had to be isolated for at least 30 trials), a correlation coefficient was computed. These coefficients were then averaged for Guanfacine and non-drug neurons and their difference plotted separately for signed, positive and negative correlation coefficients for each brain region and also split by putative cell types.

Statistical analysis

Statistical comparison between Guanfacine and non-drug neurons was done through bootstrapping with shuffled condition labels (5000 permutations). Only comparisons with at least 3 neurons in both Guanfacine and non-drug categories were statistically tested. Post-hoc statistical power was analyzed and reported in the main text for significant differences in the correlation strengths between the Guanfacine and non-drug conditions. Power analysis was based on t-test power tables using the mean of the Fisher z-transformed correlation coefficients and standard deviation of the non-drug condition using the ‘sampsizepwr’ function in MATLAB (test type used: ‘t2’).

For each neuron, the strongest regression (significant regression with the highest R2, i.e., the highest explained activity) was also identified. Then the variables that best explained activity in each brain region were ranked based on the proportion of neurons that had the highest R2 value per variable111. This ranking was done separately for Guanfacine and non-drug days which were then compared using Kendall’s tau correlation. A statistically significant correlation indicates that that the rank ordering of the encoded variables is similar between conditions.

Model variables

We estimated latent variables underlying learning performance including the outcome prediction error and the expected stimulus values using a hybrid Bayesian-reinforcement learning model that was used and validated in previous studies modeling learning the relevance of features using multidimensional stimuli30,37,47,106. This model accounted for the learning behavior of both monkeys in the current study with the lowest (i.e., best) negative log-likelihood and the lowest (i.e., best) Akaike Information Criterion when compared to alternative models that implement a color-based Win-Stay/Loose-Switch rule, a color-based reinforcement learning (RL) model, and an attention–augmented RL model using a selective forgetting of non-chosen stimulus features.

The hybrid Bayesian-reinforcement learning model is called Feature-Dimension Weighted-Decay Reinforcement Learning model. It combines (1) the Bayesian weighting of reward probabilities for the different feature dimensions of the stimuli used in the task (color, motion direction, stimulus location), (2) a decay process for values of non-chosen features, which implements selective forgetting that has been interpreted as an attentional mechanism47,106, and (3) the updating of expected values of features using prediction errors. This model was introduced before to account for behavioral adjustments of choices among stimuli with multiple feature dimensions47. The model represents the stimuli used in the feature-based reversal learning task in terms of their stimulus dimension (color, motion, location), features (color A, color B, downward motion, upward motion, left, right). The likely target feature is estimated using Bayesian inference about which stimulus feature dimension f (color, motion or location) is the likely target dimension via \(p\left(f | {{{{\mathscr{D}}}}}_{1:t}\right)\) to obtain a dimension-weighted representation for each stimulus. For tracking target feature probability, the feature dimension is denoted as d (1: location, 2: direction of motion, 3: color). For each d, the feature fd, takes two values 1 and 2. For instance, f3 = 1 indicates the first color. We then calculate the probability for the rewarded stimulus (the target) to have dimension d, \({p}_{d}=p\left(d | {{{{\mathscr{D}}}}}_{1:t}\right)={\sum}_{{f}_{d}={{\mathrm{1,2}}}}p\left({f}_{d} | {{{{\mathscr{D}}}}}_{1:t}\right)\). This defines a feature dimension weight \({\phi }_{d}=\frac{{p}_{d}^{\alpha }}{{\sum}_{d{\prime} }{p}_{d{\prime} }^{\alpha }}\), with exponent α and normalized to yield a sum across dimensions equal to one. The predicted reward value of a feature is then denoted by \({W}_{{f}_{d}}\) and scaled by the dimensional weight \({\phi }_{d}\).

The value of the specific stimulus i is given by the sum across all weighted feature values that are part of the stimulus

$${V}_{i}={\sum}_{d}{\phi }_{d}{W}_{{f}_{d}}$$
(1)

The choice of which stimulus is selected on a given trial is implemented with a softmax rule using a Boltzman function with parameter β:

$$P\left({C}_{t+1}=i\right)=\frac{\exp (\beta {V}_{i,t})}{{\sum}_{j}\exp (\beta {V}_{j,t})}$$
(2)

Following a choice the model updated the stimulus values of the chosen stimulus by a feature-specific outcome prediction error, PE = \(\left({R}_{t}-{W}_{{f}_{d},t}\right)\), scaled by learning rate \(\eta\) according to:

$${W}_{{f}_{d},t+1}={W}_{{f}_{d},t}+\eta \left({R}_{t}-{W}_{{f}_{d},t}\right)$$
(3)

Positive PEs ranged in value from 0 to 1, negative PEs ranged in value from −1 to 0. Feature values of the unchosen stimulus were scaled down (decayed) by \((1-\omega )\):

$${W}_{{f}_{d},t+1}=(1-\omega ){W}_{{f}_{d},t}$$
(4)

We compared the fit of the learning behavior of the Feature-Dimension Weighted-Decay Reinforcement Learning model with three simpler models. The first alternative model was the Color-Selective Win-Stay/Loose-Switch model that represents stimuli of the feature-based reversal learning task by their color, which was the dimension that was associated with reward. The choice of which stimulus is selected is based on the outcome of the previous trials. When the previous outcome was rewarded, the same color is chosen (Win-Stay) and when the previous outcome was non-rewarded the alternative color is chosen (Loose-Switch). The second alternative model was a Color-Selective Reinforcement Learning model that represents stimuli as the feature values of their color. We label the two colors as feature 1 and 2 and their corresponding values are denoted as Vi. After choosing a stimulus with one of the colors and receiving an outcome R (1/0 for rewarded/non rewarded) the value updating is done according to

$${V}_{i,t+1}={V}_{i,t}+\eta ({R}_{t}-{V}_{i,t}),$$
(5)

for the color i that belong to the chosen stimulus. This equation ensures that when there is a difference between the received reward and the expected (predicted) reward, the value gets updated to get closer to the received reward. This implements the delta rule of classical prediction error learning, with η representing the learning rate. The choice Ct of a stimulus is made by a softmax rule as described in Eq. (2).

As a third alternative model we considered a Feature-Selective Decay Reinforcement Learning model. This RL model represents the values of features from all three feature dimensions of the stimuli: their location (left (L) versus right (R)), direction of motion (up (U) or down (D)) and color (1 or 2). We label the six features with the indices 1 to 6, the corresponding values are thus denoted as Vi. A presented stimulus has a value for each of three feature dimensions, and thus possesses 3 feature value combinations (FVCs), the other stimulus has the remainder of the FVCs.

The model reduces the value of the FVCs of the stimuli that were not chosen. Feature values belonging to the chosen stimulus are updated according to Eq. 5. The feature values i of the non-chosen stimulus decay according to

$${V}_{i,t+1}={(1-\omega )V}_{i,t},$$
(6)

The decay parameter is denoted by ω. The choice of a stimulus is made by the softmax rule (Eq. (2)). After receiving an outcome R (1/0 for rewarded/non rewarded) value updating is done with the delta rule (Eq. (5)) for all FVCs i that belong to the chosen stimulus.

We fit the models to each monkey separately using the choice sequence across trials performed for the feature-based reversal learning task. We optimized the model fits by minimizing the negative log likelihood over all trials and computed the Akaike Information Criterion to account for the different numbers of free parameters in evaluating the differences in the model fits.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.