Introduction
Internalizing symptoms (anxiety, depression, somatic symptoms, and withdrawal) are among the most common mental health problems in childhood and adolescence (Achenbach, Reference Achenbach1991). In children and adolescents (aged 7–17 years) in Germany, the prevalence of clinically relevant depressive and anxiety symptoms at the time of data collection for this study was 15% each (Reiß et al., Reference Reiß, Kaman, Napp, Devine, Li, Strelow, Erhart, Hölling, Schlack and Ravens-Sieberer2023). After the onset of puberty, girls have an increased risk to develop internalizing symptoms compared to boys (Hayward & Sanborn, Reference Hayward and Sanborn2002). Studies show that internalizing symptoms affect almost all areas of life (Wille et al., Reference Wille, Bettge, Ravens-Sieberer and BELLA Study2008), have negative effects on later developmental outcomes (Musliner et al., Reference Musliner, Munk-Olsen, Eaton and Zandi2016), and carry a high risk of chronicity (Leadbeater et al., Reference Leadbeater, Thompson and Gruppuso2012).
Self-regulation facets
In recent years, the development of internalizing symptoms has been increasingly associated with self-regulation (SR), which can be defined as the ability to control and modulate one’s cognitions, behaviors, emotions, and physiological responses to attain future benefits (Bailey & Jones, Reference Bailey and Jones2019; Nigg, Reference Nigg2017; Robson et al., Reference Robson, Allen and Howard2020). As a multidimensional construct, SR encompasses various interrelated facets (review: Nigg, Reference Nigg2017). In this study, emotional reactivity was investigated as a basal temperament-related aspect of SR that encompasses emotional responses to events in terms of response threshold, latency, amplitude, rise time to peak intensity, and recovery time (Rothbart & Derryberry, Reference Rothbart, Derryberry, Lamb and Brown1981). Further SR facets examined in this study include core executive functions (EF) (Diamond, Reference Diamond2013; Miyake et al., Reference Miyake, Friedman, Emerson, Witzki, Howerter and Wager2000), which can be considered basal cognitive facets of SR: working memory updating (the ability to mentally retain and manipulate information), cognitive flexibility/set-shifting (the ability of switching between different tasks, perspectives, or rules), and inhibition (the ability to suppress primary behavioral impulses in favor of a less dominant response). Inhibitory control (the ability to curb impulsive behaviors when instructed), a closely associated but more temperament-related aspect of SR, was also assessed. Additionally, a more complex and later emerging EF, planning behavior (skills like goal setting, strategy development, and action organization), was examined. The aforementioned core EFs are often referred to as cognitive-driven “cool” EFs in contrast to emotion-driven “hot” EFs, a distinction supported by neural and behavioral evidence (Zelazo, Reference Zelazo2020). Our study also examined one hot EF, affective decision-making, which involves integrating emotions and cognitive processes in decision-making.
How can relations between internalizing symptoms and SR facets be explained?
Even though numerous studies show that internalizing symptoms and SR facets are associated, the direction of these relations and their explanation remain controversial. Possible explanations can be subdivided using a taxonomy by Tackett (Reference Tackett2006) that distinguishes between vulnerability (or predisposition) models, scar (or complication) models, spectrum models, and pathoplasty (or exacerbation) models. Vulnerability models propose that lower levels of SR facets increase the risk for subsequent internalizing symptoms, while scar models suggest the opposite that preexisting internalizing symptoms impair SR facets. In contrast, spectrum models view SR facets and internalizing symptoms as part of a shared continuum, attributing them to common underlying causes. Pathoplasty models posit that SR facets influence the course, characteristics, and severity of internalizing symptoms. In this study, we decided to estimate a random intercept cross-lagged panel model (RI-CLPM) (Hamaker et al., Reference Hamaker, Kuiper and Grasman2015), which allows to model the first three models simultaneously. The investigation of pathoplasty models, in contrast, requires person-centered analytical methods, for example, latent profile or trajectory analysis, as we employed in a previous study (Klinge et al., Reference Klinge, Warschburger, Busching and Klein2023). Accordingly, the following sections outline the research on the first three models.
Vulnerability models: lower SR facets as risk factors for internalizing symptoms
Vulnerability models assume that lower SR facets increase the risk of later internalizing symptoms, which is supported by multiple longitudinal studies during childhood and adolescence (e.g., Kertz et al., Reference Kertz, Belden, Tillman and Luby2015; Nelson et al., Reference Nelson, Kidwell, Nelson, Tomaso, Hankey and Espy2018). Most evidence exists regarding lower cool EF (for a meta-analysis, see Yang et al., Reference Yang, Shields, Zhang, Wu, Chen and Romer2022) and higher emotional reactivity (e.g., Carthy et al., Reference Carthy, Horesh, Apter and Gross2010) or, as a related construct, negative emotionality (for a meta-analysis, see Kostyrka-Allchorne et al., Reference Kostyrka-Allchorne, Wass and Sonuga-Barke2020) as preceding higher internalizing symptoms. Only few to no associations have been reported between lower hot EF and later internalizing symptoms (Yang et al., Reference Yang, Shields, Zhang, Wu, Chen and Romer2022), possibly due to a smaller number of studies. However, it should be critically noted that most studies did not control for previous manifestations of internalizing symptoms. Therefore, it remains open whether SR facets actually increase the vulnerability for internalizing symptoms or whether their relation can be better explained by other models.
Scar models: lower SR facets as consequences of internalizing symptoms
In opposition to vulnerability models, scar models assume that preexisting internalizing symptoms impair SR facets (Tackett, Reference Tackett2006), possibly mediated through alterations of still developing brain structures (Berl et al., Reference Berl, Vaidya and Gaillard2006). Although studied scarcely, supporting evidence exists. For example, Donati et al. (Reference Donati, Meaburn and Dumontheil2021) report that internalizing symptoms in early adolescence were associated with impaired working memory in mid-to-late adolescence, but not vice versa.
Recently, some studies also corroborated bidirectional associations between constructs similar or related to those investigated in this study, supporting both the vulnerability and scar models equally. Halse et al. (Reference Halse, Steinsbekk, Hammar and Wichstrom2022) established bidirectional cross-lagged associations between symptoms of multiple psychiatric disorders (including affective disorders) during childhood and parent-reported EF. Additionally, bidirectional cross-lagged associations between social anxiety symptoms and behavioral inhibition have been reported (Goldsmith et al., Reference Goldsmith, Hilton, Phan, Sarkisian, Carroll, Lemery-Chalfant and Planalp2021).
Spectrum models: lower SR facets and internalizing symptoms on a continuum
Contrary to the previous models, spectrum models view internalizing symptoms and lower SR facets as overlapping constructs that develop along the same continuum and may therefore be due to common causes. Such models have especially been proposed to explain the relation between temperament-related SR and psychopathology (Stifter & Dollar, Reference Stifter, Dollar and Cichetti2016). Potential common causes of internalizing symptoms and SR facets include genetics (Friedman & Miyake, Reference Friedman and Miyake2017; Musci et al., Reference Musci, Masyn, Benke, Maher, Uhl and Ialongo2015) as well as environmental factors such as child maltreatment (Cichetti & Toth, Reference Cicchetti and Toth2005; Kavanaugh et al., Reference Kavanaugh, Dupont-Frechette, Jerskey and Holler2017; Klein et al., Reference Klein, Schlesier-Michel, Otto, White, Andreas, Sierau, Bergmann, Perren and Von Klitzing2019) or harsh parenting (Halse et al., Reference Halse, Steinsbekk, Hammar, Belsky and Wichstrøm2019) to name a few.
(Potential) common causes that are hypothesized to equally influence development over time can be controlled for in cross-lagged panel models (CLPM) by additionally implementing random intercepts (RI) of each investigated construct (Hamaker et al., Reference Hamaker, Kuiper and Grasman2015; Usami et al., Reference Usami, Murayama and Hamaker2019). A correlation of the RIs of internalizing symptoms and SR facets, that is, a correlation of these variables at the between-person level, can therefore be interpreted as an indicator that these could be due to (potential) common causes, supporting spectrum models. Consistent with this, two studies report that inhibitory control (Maasalo et al., Reference Maasalo, Lindblom, Kiviruusu, Santalahti and Aronen2020) and self-control problems (Kim et al., Reference Kim, Richards and Oldehinkel2022) showed no significant cross-lagged associations with internalizing symptoms but were correlated with these at the between-person level.
Research gaps
Which explanatory model best explains the relations between internalizing symptoms and SR facets is still unclear. The described results suggest that multiple models may simultaneously apply or that models differ depending on SR facets investigated. However, there is a lack of prospective studies on children and adolescents using appropriate methods to control for prior outcome values, separate within- and between-person variance, and thus allow to model possible associations according to different theoretical models simultaneously. The few existing studies using such methods focused on only one SR facet (Kim et al., Reference Kim, Richards and Oldehinkel2022; Maasalo et al., Reference Maasalo, Lindblom, Kiviruusu, Santalahti and Aronen2020), the emergence of specific social anxiety (Goldsmith et al., Reference Goldsmith, Hilton, Phan, Sarkisian, Carroll, Lemery-Chalfant and Planalp2022) or psychopathology in general (Halse et al., Reference Halse, Steinsbekk, Hammar and Wichstrom2022) rather than investigating multiple facets of SR and their specific relations with internalizing symptoms.
Another research gap exists regarding gender differences in the relations between internalizing symptoms and SR facets. So far, these have been demonstrated only in a few individual studies. For instance, Schwartz et al. (Reference Schwartz, Snidman and Kagan1999) reported that the association between inhibited temperament at age two and symptoms of generalized anxiety disorder in adolescence was moderated by gender with girls showing a stronger association than boys. Rudolph et al. (Reference Rudolph, Davis and Monti2017) reported a significant three-way interaction of cognitive control, negative emotionality and gender in the prediction of depressive symptoms: deficits in cognitive control predicted higher depressive symptoms in girls with high, but not with average or low negative emotionality. However, this interaction was nonsignificant for boys. Similarly, Agoston and Rudolph (Reference Agoston and Rudolph2016) found that high EF deficits combined with childhood peer stress predicted depression in early adolescent girls, but not in boys. Furthermore, Han et al. (Reference Han, Klimes-Dougan, Jepsen, Ballard, Nelson, Houri, Kumra and Cullen2012) showed that boys with depression perform worse in affective decision-making tasks than girls with depression. Apart from these individual studies, however, most studies focus solely on gender differences in either internalizing symptoms or SR facets, but not on gender differences in their associations.
Current study and hypotheses
To address these research gaps, the current study examined the relations between internalizing symptoms and seven different SR facets over three measurement time points. To account for the possible associations between internalizing symptoms and SR facets according to vulnerability, scar, and spectrum models, a RI-CLPM (Hamaker et al., Reference Hamaker, Kuiper and Grasman2015) was estimated, containing all eight constructs at once. This sophisticated model distinguishes within- and between-person variance and controls for the stability and covariances of these constructs and for prior outcome values.
First, we expect higher internalizing symptoms to be concurrently associated with lower SR facets at each measurement time point. Second, we expect that lower SR facets at one measurement time point are associated with higher internalizing symptoms at a later measurement time point (according to vulnerability models). Third, we expect that higher internalizing symptoms at one measurement time point are associated with lower SR facets at a later measurement time point (according to scar models). Additionally, we will examine associations between internalizing symptoms and SR facets at the between-person level (according to spectrum models) as well as gender and age differences in association patterns exploratively. To ensure transparency and reduce publication bias, the study’s hypotheses and analysis plan were preregistered; see doi:10.17605/OSF.IO/36HMV.
Methods
Sample and procedure
Data were collected between 2011 and 2015 at three measurement time points (t1, t2, t3) by multiple informants (children, parents, teachers) and by using multiple methods (behavioral measures, questionnaires) in a large community-based longitudinal study on intrapersonal developmental risk factors in childhood and adolescence conducted at the University of Potsdam, Germany (for an overview of the whole study, see study protocol by Warschburger et al., Reference Warschburger, Gmeiner, Bondü, Klein, Busching and Elsner2023). Participants and their families were recruited from 120 classes in 33 public primary schools from rural and urban areas in the Federal State of Brandenburg, Germany, to obtain a most representative sample. To ensure economic data collection by examining several participants at the same location, the entire primary school classes were recruited wherever possible. 1657 children (52.2% girls), aged between 6 and 11 years (M age = 8.36, SD = 0.95), and 1340 (80.7%) parents participated at t1. On average, 9.14 months (SD = 1.80) after t1, 1612 children (51.9% girls; M age = 9.11 years, SD = 0.93; 97.5% retention rate) and 1197 (72.1%) parents participated again at t2. On average, 23.83 months (SD = 1.66) after t2, 1534 children (51.7% girls; M age = 11.06 years, SD = 0.92; 92.6% retention rate) and 1070 (64.5%) parents participated again at t3. For each child, one parent questionnaire was completed at each measurement time point by mothers (70.9%–79.2%), by both parents together (12.5%–20.8%), by fathers (7.1%–7.4%), or by other caregivers (0.8%–1.5%). To account for missing data, we implemented full information maximum likelihood estimation as proposed by Little and Rubin (Reference Little and Rubin1987), allowing us to include children who provided data at least once. Table 1 presents further sociodemographic characteristics of the sample and their parents at t1.
Participation in the study was voluntary, and parents gave informed consent. Assessments were approved by the Research Ethics Board at the University of Potsdam and the Ministry of Education, Youth, and Sport in Brandenburg, Germany. Children completed standardized tests in two 1-hr sessions and received small gifts afterward. Parents and teachers filled out online or printed questionnaires. Due to economic reasons, each measure was assessed by one informant and one method. Teachers received a €5 contribution to the class fund for each completed questionnaire.
Measures
Internalizing symptoms were reported by parents on the emotional problems scale of the Strengths and Difficulties Questionnaire (SDQ; Goodman, Reference Goodman2001) at t1, t2, and t3. Consisting of five items (e.g., “Often unhappy, down-hearted, or tearful”), this scale can be rated on a 3-point scale from not true (0) to certainly true (2). A sum score was calculated with higher values indicating higher internalizing symptoms. The parent-reported emotional problems scale of the SDQ demonstrates strong concurrent validity with the internalizing problems scale of the Child Behavior Check List (r = .77) and satisfactory criterion validity (AUC = .69) with respect to emotional disorders according to the ICD-10 criteria assessed by psychiatrists in a sample of 5- to 17-year-olds (Becker et al., Reference Becker, Woerner, Hasselhorn, Banaschewski and Rothenberger2004). Adequate reliability was confirmed at all measurement time points (α = .66–.72).
Emotional reactivity was reported by parents on 10 items of the emotional control scale of the Behavior Rating Inventory of Executive Function (Gioia et al., Reference Gioia, Isquith, Guy and Kenworthy2000) at t1, t2, and t3. Items could be scored on an adapted 5-point scale from never (1) to always (5). A mean score was calculated with higher values indicating higher emotional reactivity and thus lower SR. High reliability was confirmed at all measurement time points (α = .91–.92).
Working memory updating was measured behaviorally by the sum of correctly answered sequences (max. 16) in the digit span backward task (ZN-R) of the German version of the Wechsler Intelligence Scale for Children – Fourth Edition (Petermann & Petermann, Reference Petermann and Petermann2008) at t1, t2, and t3. In this task, children were asked to repeat digit spans backward, increasing the span lengths from 2 to 8 in a maximum of eight trials, each consisting of two equal-length sequences. When a child incorrectly repeated both sequences during one trial, the test was stopped. The task shows adequate retest reliability (r = .62; Piovesana et al., Reference Piovesana, Ross, Whittingham, Ware and Boyd2014).
Cognitive flexibility/set-shifting was measured behaviorally by the number of correct responses on randomly occurring switch trials (22 out of 46 trials) of the Cognitive Flexibility Task (Roebers & Kauer, Reference Roebers and Kauer2009) at t1 and t2. The task shows an acceptable split-half reliability (r = .64; Röthlisberger et al., Reference Röthlisberger, Neuenschwander, Cimeli and Roebers2013). At t3, a computerized version of the Dimensional Change Card Sort Task (Qu et al., Reference Qu, Low, Chong, Lim, Keren-Happuch, Kashima, Kashima and Beatson2013) was used, also measuring cognitive flexibility/set-shifting by the number of correct responses on randomly occurring switch trials (12 out of 48). The task shows a good internal consistency (α = .85–.90; Friedman et al., Reference Friedman, Miyake, Altamirano, Corley, Young, Rhea and Hewitt2016).
Inhibition was measured behaviorally by the interference score of the Fruit-Stroop task (Roebers et al., Reference Roebers, Röthlisberger, Cimeli, Michel and Neuenschwander2011), an adapted version of the Stroop task (Archibald & Kerns, Reference Archibald and Kerns1999) at t1, t2, and t3. In this task, four pages of 25 stimuli were presented, each under different conditions (pages 1 and 2: no interference, pages 3 and 4: interference). Children were asked to name the colors of all 25 stimuli per page. Based on the time (in seconds) they needed, an interference score was calculated (Roebers et al., Reference Roebers, Röthlisberger, Cimeli, Michel and Neuenschwander2011). To ensure that higher values indicate better inhibition, the interference score was z-standardized and inverted. The task shows a good split-half reliability (r = .78; Röthlisberger et al., Reference Röthlisberger, Neuenschwander, Cimeli and Roebers2011).
Inhibitory control was reported by parents on six items of the inhibitory control scale of the Temperament in Middle Childhood Questionnaire (Simonds et al., Reference Simonds, Kieras, Rueda and Rothbart2007) at t1, t2, and t3. Items could be scored on a 5-point scale from not at all (1) to a lot (5). A mean score was calculated with higher values indicating higher inhibitory control. Adequate reliability was confirmed at all measurement time points (α = .67–.71).
Planning behavior was reported by teachers on eight items of the planning/organizing scale of the Behavior Rating Inventory of Executive Function (Gioia et al., Reference Gioia, Isquith, Guy and Kenworthy2000) at t1, t2, and t3. Items could be scored on an adapted 5-point scale ranging from never (1) to always (5). A mean score was calculated with higher values indicating better planning behavior. High reliability was confirmed at all measurement time points (α = .93–.95).
Affective decision-making was measured behaviorally by the net-score difference between advantageous and disadvantageous choices in the Hungry Donkey Task (Crone & van der Molen, Reference Crone and van der Molen2004), a computer-based, age-appropriate version of the Iowa Gambling Task (Bechara et al., Reference Bechara, Damasio, Damasio and Anderson1994) at t1, t2, and t3. In this task, children performed 50 test trials in which they could choose between four doors with different win/lose contingencies. The task shows adequate reliability (α = .75–.78; Orm et al., Reference Orm, Pollak, Fossum, Andersen, Øie and Skogli2022).
Statistical analyses
To test our hypotheses, we estimated a RI-CLPM (Hamaker et al., Reference Hamaker, Kuiper and Grasman2015) with single indicators using Mplus (version 8.8; Muthén & Muthén, 1998–2012). The maximum likelihood estimator with robust standard errors was applied. Missing data were addressed by full information maximum likelihood estimation. The RI-CLPM is an extension of the traditional CLPM and allows the distinction between between- and within-person variance by computing a RI for each variable (Hamaker et al., Reference Hamaker, Kuiper and Grasman2015). RIs load equally across all measurement occasions of the corresponding variable and thus represent a child’s individual mean value for that variable over time. Consequently, RIs account for the stability of the construct and control all time-invariant confounders/common causes assumed to influence development consistently across time points (Usami et al., Reference Usami, Murayama and Hamaker2019). By doing so, RIs capture trait-like differences at the between-person level (Hamaker et al., Reference Hamaker, Kuiper and Grasman2015). The stable relations between constructs that persist across multiple time points – and that might be explained by shared time-invariant confounders/common causes – are reflected in the correlations among RIs. However, the variance of RIs cannot be statistically decomposed into specific variables that could represent time-invariant confounders/common causes, as they are an aggregate measure of all (unobserved) confounders/common causes at once.
Variance in the manifest variables that is not bound by the RIs feeds into latent within-person factors, which are connected by autoregressive and cross-lagged paths as well as covariances of their residuals. This part of the model appears to correspond to traditional CLPMs, but with the important difference that it represents within-person dynamics. Autoregressive paths therefore do not reflect the stability of the ranking of participants from one measurement time to the next as they would in traditional CLPMs, but rather within-person carry-over effects (Hamaker et al., Reference Hamaker, Kuiper and Grasman2015). For example, if the autoregressive parameter in a RI-CLPM is positive, this means that children who scored above their mean score at one time point are also likely to score above their mean score at the following time point. As a result, autoregressive effects are typically smaller in RI-CLPMs than in traditional CLPMs (Mulder & Hamaker, Reference Mulder and Hamaker2021). Similarly, the cross-lagged parameters in a RI-CLPM provide information on how children’s deviation from their own mean score on internalizing symptoms at a given time point is influenced by their deviation from their own mean score on a particular SR facet at an earlier time point, and vice versa.
Model fit was evaluated by the comparative fit index (CFI), Tucker–Lewis index (TLI), root mean square error of approximation (RMSEA), and root mean square residual (SRMR). A model is considered showing a good fit when CFI ≥ .95, TLI ≥ .95, RMSEA ≤ .05, and SRMR ≤ .10 (Beauducel & Wittmann, Reference Beauducel and Wittmann2005). The Satorra–Bentler chi-square difference test (Satorra & Bentler, Reference Satorra and Bentler2010) was used to determine whether additional model constraints led to a significant deterioration of the model fit.
In all models, grand means of manifest variables were constrained to be equal as they were z-standardized in advance, and gender was included as a covariate. Models for subsamples (age and gender groups) were compared using multigroup analyses.
Before estimating the RI-CLPM, weak factorial measurement invariance was confirmed separately for all questionnaire-based measures (see Supplementary Table S1). The test of measurement invariance was not feasible for behavioral measures as these measures are not test-equivalent due to the varying test procedure (e.g., termination criteria for the digit span backward test) at different measurement time points. However, since we work with single indicators to reduce model complexity – that is, each latent within-person variable is assigned exactly one observed indicator at each measurement time point – the test of measurement invariance is not a prerequisite (Hamaker et al., Reference Hamaker, Kuiper and Grasman2015).
Results
Unstandardized means and standard deviations of all examined variables are depicted in Table 2. Bivariate correlations between internalizing symptoms and SR facets are shown in Supplementary Table S2.
1 Higher values indicate lower inhibition (non-inverted interference score).
Model fitting of the RI-CLPM
First, a baseline RI-CLPM was estimated that included internalizing symptoms and seven SR facets assessed at three measurement time points. All parameters were estimated freely, resulting in a very good model fit: χ2(60) = 97.19, p = .002, CFI = .996, TLI = .981, RMSEA = .019, 90% CI [0.01, 0.03], SRMR = .013. The model fit of an alternative CLPM without RIs was significantly worse compared to the baseline model (χ2(124) = 487.64, p < .001, CFI = .963, TLI = .910, RMSEA = .042, 90% CI [0.04, 0.05], SRMR = .024; Δχ2 = 370.35, df = 64, p < .001) showing the implementation of RIs was justified.
Aiming for parsimony, we then constrained covariances of the residuals of all possible pairs of variables at t2 to be equal to their respective covariances at t3. The model showed no significant deterioration compared to the baseline model (χ2(88) = 134.20, p < .001, CFI = .995, TLI = .984, RMSEA = .018, 90% CI [0.01, 0.02], SRMR = .015; Δχ2 = 38.24, df = 28, p = .094) and thus served as the final model for further analyses.
Parameter estimates of the final RI-CLPM
The results of the final RI-CLPM are depicted in Figure 1. Parameter estimates of all included paths are provided in Supplementary Table S3. At the within-person level, internalizing symptoms were concurrently moderately positively associated with emotional reactivity at all measurement time points (rs = .26–.29, ps < .001). Cross-lagged paths were significant only between individual SR facets, predominantly concerning EFs between t1 and t2, but not between SR facets and internalizing symptoms. However, there were two marginally significant cross-lagged paths from internalizing symptoms at t2 to inhibitory control at t3 (b = −.13, p = .052) and from inhibition at t2 to internalizing symptoms at t3 (b = −.11, p = .083). At the between-person level, the RI of internalizing symptoms was moderately negatively associated with the RIs of working memory updating (r = −.29, p < .001) and inhibitory control (r = −.29, p < .001), strongly negatively associated with the RI of planning behavior (r = −.49, p < .001), and strongly positively associated with the RI of emotional reactivity (r = .59, p < .001).
Multigroup analyses
In multigroup analyses investigating age and gender differences in association patterns, both models did not converge due to insignificant variance in one or more RIs of one of the subgroups. Therefore, the multigroup analyses were conducted without RIs, as the distinction of within- and between-person variance is not necessary when investigating group differences.
Age differences
Based on a median split of age at t1 (Md = 8.40), the total sample was separated into a younger (range: 6.23–8.39) and an older age group (range: 8.40–11.33). A model in which associations could differ between these age groups showed a very good model fit: χ2(232) = 685.67, p < .001, CFI = .953, TLI = .877, RMSEA = .049, 90% CI [0.04, 0.05], SRMR = .029. However, a model in which parameters were constrained to be equal did not fit significantly worse: χ2(424) = 892.08, p < .001, CFI = .951, TLI = .931, RMSEA = .037, 90% CI [0.03, 0.04], SRMR = .040; Δχ2 = 217.72, df = 192, p = .098. According to the principle of parsimony, no age differences can be assumed.
Gender differences
A model in which associations could differ between gender groups obtained a very good model fit: χ2(200) = 585.07, p < .001, CFI = .959, TLI = .887, RMSEA = .048, 90% CI [0.04, 0.05], SRMR = .028. However, a model in which parameters were constrained to be equal between both groups did not fit significantly worse: χ2(384) = 760.41, p < .001, CFI = .960, TLI = .943, RMSEA = .034, 90% CI [0.03, 0.04], SRMR = .036; Δχ2 = 182.03, df = 184, p = .527. Therefore, no gender differences can be assumed.
Discussion
In this large community-based longitudinal study, we estimated a RI-CLPM containing internalizing symptoms and seven SR facets assessed at three measurement time points in middle childhood to examine their associations according to vulnerability, scar, and spectrum models. Our first hypothesis was partially confirmed, as we found internalizing symptoms and emotional reactivity to be moderately concurrently associated at all measurement time points. However, our second and third hypotheses were not supported, as no cross-lagged paths between internalizing symptoms and SR facets reached significance. Instead, we found significant negative associations between internalizing symptoms and working memory updating, inhibitory control, planning behavior, and most strongly and positively with emotional reactivity at the between-person level.
Our results are consistent with two studies also reporting no significant cross-lagged associations, but instead finding associations between the RIs of internalizing symptoms and lower inhibitory control (Maasalo et al., Reference Maasalo, Lindblom, Kiviruusu, Santalahti and Aronen2020) or self-control problems (Kim et al., Reference Kim, Richards and Oldehinkel2022). Contrary to our primary findings, but corresponding with our two marginally significant cross-lagged paths, Halse et al. (Reference Halse, Steinsbekk, Hammar and Wichstrom2022) reported significant cross-lagged paths in both directions between affective disorders and four different EFs, including inhibition. However, they measured core EFs using parent and teacher reports that have little overlap with performance-based tests (Toplak et al., Reference Toplak, West and Stanovich2013) as used in our study. In addition, focusing on psychopathology in general, they included other psychiatric disorders characterized by externalizing symptoms (e.g., attention-deficit/hyperactivity disorder) and constrained cross-lagged paths of the same direction to be equal. Since externalizing symptoms usually show stronger associations with SR (Robson et al., Reference Robson, Allen and Howard2020), this could also explain the differences to our results. We did not identify any age or gender differences in association patterns, which is consistent with Halse et al. (Reference Halse, Steinsbekk, Hammar and Wichstrom2022) who investigated a similarly aged sample.
Are lower SR facets risk factors or consequences of internalizing symptoms, or do both have common causes?
Associations primarily emerged at the between-person level, while there were almost none at the within-person level (cross-lagged paths and concurrent associations). This indicates that stable between-person differences may be better suited to explain the relations between SR facets and internalizing symptoms than within-person dynamics. Therefore, our results support spectrum models, according to which internalizing symptoms and SR facets may develop along the same continuum and thus potentially share common causes. This result is also compatible with studies identifying several SR facets, for example, EF deficits and negative affectivity, as transdiagnostic factors for psychopathology (Lynch et al., Reference Lynch, Sunderland, Newton and Chapman2021).
Spectrum models have been discussed in the existing literature especially in the context of temperament-related SR facets (Stifter & Dollar, Reference Stifter, Dollar and Cichetti2016), which is also confirmed in our study, as two temperament-related SR facets (emotional reactivity and inhibitory control) show significant between-person correlations with internalizing symptoms. However, it should be noted that both temperament-related SR facets and internalizing symptoms were reported by parents. Shared informant and method variance could therefore have contributed to the strength of their association. Furthermore, we were able to show that spectrum models may also explain the association of internalizing symptoms with two cognitive SR facets that were assessed behaviorally and by teacher report, respectively: working memory updating and planning behavior. This is a new finding, as most studies assume relations in terms of vulnerability models (e.g., Yang et al., Reference Yang, Shields, Zhang, Wu, Chen and Romer2022) without considering different explanatory models or controlling for prior outcome values.
Although our results primarily support spectrum models, it should be noted that we also found two marginally significant cross-lagged paths from internalizing symptoms at t2 to inhibitory control at t3 and from inhibition at t2 to internalizing symptoms at t3. There are several reasons why these and other cross-lagged paths that would have supported vulnerability and scar models may not have become significant. For example, it is possible that changes in internalizing symptoms and SR facets may influence each other over shorter intervals of time (e.g., weeks or days) than were examined in this study. The concurrent within-person associations between internalizing symptoms and emotional reactivity that emerged in addition to their strong between-person associations could reflect such short-term within-person reciprocal effects. Kim et al. (Reference Kim, Richards and Oldehinkel2022), who found a similar pattern in their study on internalizing symptoms and self-control, also discussed this possibility. Additionally, the marginal or nonsignificance of cross-lagged parameters could indicate the need for higher power to detect these effects. Considering Cohen (Reference Cohen1994), who warns against equating p with the probability of rejecting the null hypothesis, our results do not suggest that vulnerability or scar models should be discarded, especially since they are also supported by preexisting literature (e.g., Halse et al., Reference Halse, Steinsbekk, Hammar and Wichstrom2022). Further studies are needed to disentangle the effects between internalizing symptoms and SR facets according to different theoretical models. We also recommend investigating potential short-term dynamics between SR facets and internalizing symptoms by implementing experience sampling methods that involve several daily surveys over a period of 1–2 weeks (Boemo et al., Reference Boemo, Nieto, Vazquez and Sanchez-Lopez2022).
Why are some SR facets significantly associated with internalizing symptoms while others are not?
In addition to comparing explanatory models, our study raises the question of why some SR facets are significantly associated with internalizing symptoms while others are not. As shown in Table 2, SR facets with significant associations with internalizing symptoms had stable means across the three measurement time points. In this regard, they resembled internalizing symptoms, which also showed stable mean values despite high variance indicating interindividual heterogeneity. This suggests, in accordance with spectrum models, that key developmental processes related to both internalizing symptoms and significantly related SR facets may have already been completed. In contrast, nonsignificant SR facets show more change in mean scores toward improvement in SR over time, possibly requiring longer developmental trajectories to exhibit clear associations with internalizing symptoms. This is in accordance with evidence that SR facets mature at different rates (for a meta-review, see Wesarg-Menzel et al., Reference Wesarg-Menzel, Ebbes, Hensums, Wagemaker, Zaharieva, Staaks, Van den Akker, Visser, Hoeve, Brummelman, Dekkers, Schuitema, Larsen, Colonnesi, Jansen, Overbeek, Huizenga and Wiers2023; for a systematic review on cool and hot EF, see García et al., Reference García, Merchán, Phillips-Silver and González2021). For example, García et al. (Reference García, Merchán, Phillips-Silver and González2021) establish that interference inhibition, which is measured by the Stroop task (as used in our study), seems to continue to improve beyond the age of 12. The authors argue that this type of inhibition requires a higher cognitive load than response inhibition, which is often measured by Go/No-go tasks and reaches a performance level similar to adults at the age of 12. This perspective furthermore aligns with meta-analyses showing that EF deficits in individuals with depression symptoms are not detectable in adolescence and early adulthood (Goodall et al., Reference Goodall, Fisher, Hetrick, Phillips, Parrish and Allott2018; Dotson et al., Reference Dotson, McClintock, Verhaeghen, Kim, Draheim, Syzmkowicz, Gradone, Bogoian and Wit2020), but in adulthood (Rock et al., Reference Rock, Roiser, Riedel and Blackwell2014; Snyder, Reference Snyder2013). Consequently, future research should examine how the maturation processes of specific SR facets influence their relations to internalizing symptoms during different developmental periods. Regardless of this, it is also possible that the significant SR facets – especially emotional reactivity – show greater relevance and construct proximity with internalizing symptoms than the nonsignificant ones.
Further research and clinical implications
While our results indicate that higher internalizing symptoms and lower SR facets may share common causes, our study design does not allow us to identify these as RIs represent an aggregate measure of all (unobserved) confounders/common causes. Next to genetic factors, plenty of environmental factors could be influential such as child maltreatment (Cichetti & Toth, Reference Cicchetti and Toth2005; Kavanaugh et al., Reference Kavanaugh, Dupont-Frechette, Jerskey and Holler2017; Klein et al., Reference Klein, Schlesier-Michel, Otto, White, Andreas, Sierau, Bergmann, Perren and Von Klitzing2018), harsh parenting (Halse et al., Reference Halse, Steinsbekk, Hammar, Belsky and Wichstrøm2019), or parents’ insufficient containment of their infants affects (Bion, Reference Bion1962). Future studies are needed to understand which specific common causes can possibly account for the stable between-person associations of internalizing symptoms and lower SR facets to develop tailored prevention and intervention measures.
The results of our study suggest that prevention and intervention measures should, where possible, target both internalizing symptoms and the enhancement of SR, with emotional reactivity playing a significant role. Given that environmental factors can be considered common causes of higher internalizing symptoms and lower SR, it is also advisable to develop measures that support families and school professionals in managing children characterized by a higher symptom burden, more intense and prolonged negative emotions, and lower SR skills. In particular, the use of adaptive emotion regulation strategies and increased co-regulation of children by their parents could be helpful in regulating their heightened emotional reactivity.
Strengths and limitations
This study examined associations between internalizing symptoms and multiple SR facets by simultaneously modeling vulnerability, scar, and spectrum models. The large community-based sample showed very high retention rates and allowed us to employ a RI-CLPM containing eight constructs at once. This sophisticated model distinguished within- and between-person effects and controlled for the stability of constructs, the covariances of multiple SR facets, and prior outcome values. Data were collected from different informants and by using multiple methods including well-validated behavioral measures and questionnaires.
Nonetheless, some limitations must be noted. First, due to economic reasons, each measure was assessed by only one informant and one method. As internalizing symptoms and temperament-related SR facets were assessed via parent report, shared informant and method variance may have contributed to the strength of their associations. However, we also found significant associations between internalizing symptoms and two cognitive SR facets, which were measured behaviorally and by teacher report, respectively. Further studies are needed to examine how these associations may be influenced by measurement methods and whether they persist when other informants and methods are included. This could be particularly important regarding core EFs, as parent and teacher reports are known to show little overlap with behavioral measures (Toplak et al., Reference Toplak, West and Stanovich2013). It can be assumed that the former primarily measures the average, everyday performance, while the latter measures the potential maximum performance of a child.
Second, internalizing symptoms were only assessed via parent reports. Since all children at t1 and the majority at t2 were younger than 11 years, the self-rating version of the SDQ was not yet suitable. In addition, the RI-CLPM is based on covariances between constructs, so potential underestimation of internalizing symptoms, which has been observed in parent reports during adolescence (Klasen et al., Reference Klasen, Petermann, Meyrose, Barkmann, Otto, Haller, Schlack, Schulte-Markwort and Ravens-Sieberer2016), would not have significantly affected the parameters.
Third, cross-lagged paths did not reach significance, but two marginal significant cross-lagged effects in the direction of vulnerability and scar models have been found. This may have been due to the long intervals between measurement time points, or it may have required more power to detect them. As Cohen (Reference Cohen1994) cautioned, p-values should not be conflated with the probability of rejecting the null hypothesis. Therefore, it remains possible that both vulnerability or scar models, alongside spectrum models, are valid explanations for the associations between internalizing symptoms and SR facets, requiring further studies on this topic.
Fourth, our results cannot be generalized due to the homogeneity of our sample. Further studies should investigate whether our results can be replicated in samples with greater ethnic and socioeconomic diversity to avoid bias in interpretation.
Finally, time-variant confounders such as stressful life events (Grant et al., Reference Grant, Compas, Thurm, McMahon, Gipson, Campbell, Krochock and Westerholm2006) could not be controlled for, as this would have required at least one additional measurement time point for model identification (Usami et al., Reference Usami, Murayama and Hamaker2019).
Conclusion
While we identified concurrent associations between internalizing symptoms and emotional reactivity at all measurement time points, we did not observe any significant cross-lagged associations between internalizing symptoms and SR facets. However, at the between-person level, we established that internalizing symptoms were moderately associated with lower working memory updating and lower inhibitory control and strongly associated with lower planning behavior and higher emotional reactivity. These findings suggest that not only temperament-related but also cognitive SR facets may develop along the same continuum as internalizing symptoms and therefore may share common causes, as assumed by spectrum models. Nonetheless, vulnerability and scar models may be simultaneously valid alongside spectrum models as indicated by two marginal significant cross-lagged effects and previous research. Prevention and intervention measures should, where possible, target both internalizing symptoms and the enhancement of SR, with emotional reactivity playing a significant role. Further studies are needed to explore potential common causes, investigate bidirectional effects at shorter time intervals, account for different maturation rates of SR facets and examine the influence of time-variant confounders.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0954579424001937.
Acknowledgments
We thank all participating children and families.
Funding statement
This work is funded by the German Research Foundation (KL 2338/2-1) as part of a research group (FOR 5034; grant number 426314138). The first three measurement points used in this study were funded by the German Research Foundation as part of the research training group GRK 1668.
Competing interests
None.