Introduction
The increasing levels of demand, intensity, and stress that characterize work activity today, accentuated by the changes in work associated with the postpandemic COVID-19 period, have significantly increased the perceptions and negative experiences related to work activity, and with them, the need for recovery from work (De Bloom, Reference De Bloom2020; Wentz et al., Reference Wentz, Gyllensten, Sluiter and Hagberg2020). Work-related fatigue, work stress, personal resources depletion, and work exhaustion are basic processes analyzed by occupational health research for decades (Bennett et al., Reference Bennett, Gabriel, Calderwood, Dahling and Trougakos2016; Van Veldhoven, Reference van Veldhoven, Houdmont and Leka2008), directly related to the processes of recovery from work (Sonnentag et al., Reference Sonnentag, Cheng and Parker2022). The recovery concept refers to “unwinding and restoration processes during which a person’s strain level that has increased as a reaction to a stressor or any other demand returns to its prestressor level” (Sonnentag et al., Reference Sonnentag, Venz and Casper2017, p. 366). Accumulating research (Sonnentag et al., Reference Sonnentag, Stephan, Wendsche, de Bloom, Syrek and Vahle-Hinz2021; Steed et al., Reference Steed, Swider, Keem and Liu2021; Wendsche & Lohmann-Haislah, Reference Wendsche and Lohmann-Haislah2017) has shown that effective recovery from work has positive consequences for well-being and health, as well as continued performance and job and life satisfaction. In contrast, inadequate recovery leads to the reverse consequences.
Difficulty or inability to recover from work is often antecedent to adverse working conditions (de Croon et al., Reference de Croon, Sluiter and Frings-Dresen2006), such as excessive workloads, extended workdays, and work schedules (Jansen et al., Reference Jansen, Kant, van Amelsvoort, Nijhuis and van den Brandt2003); emotional work demands (Xanthopoulou et al., Reference Xanthopoulou, Bakker, Oerlemans and Koszucka2018); low job autonomy or difficulty engaging in job crafting behaviors (Shi et al., Reference Shi, She, Li, Zhang and Niu2021); high-stress levels (Sonnentag, Reference Sonnentag2018); physical occupational activity (Stevens et al., Reference Stevens, Crowley, Rasmussen, Hallman, Mortensen, Nygård and Holtermann2020); or abusive supervision (Tu & Chi, Reference Tu and Chi2023). Unsurprisingly, absence or insufficient recovery from work can have serious consequences for mental health and physical health, both in the short and medium terms (de Croon et al., Reference de Croon, Sluiter and Frings-Dresen2006) and increases the risk of long-term or work-incapacitating illnesses (Peters et al., Reference Peters, Dennerlein, Wagner and Sorensen2022).
One of the indicators for identifying perceptions of difficulties in recovery is the measure of the need for recovery from work, which aims to quantify the difficulties experienced by individuals in achieving effective recovery from work (de Croon et al., Reference de Croon, Sluiter and Frings-Dresen2006). More precisely, the construct of the need for recovery after work describes “the extent that the work task induces a collection of symptoms, characterized by temporary feelings of overload, irritability, social withdrawal, lack of energy for new effort, and reduced performance” (van Veldhoven, Reference van Veldhoven, Houdmont and Leka2008, p. 3). Specifically, in the whole process of strain–stress consequences/effects, need for recovery refers to a “very early stage of a long-term strain process” (Sonnentag & Zijlstra, Reference Sonnentag and Zijlstra2006, p. 331), or in other words, as a “precursor of prolonged fatigue or psychological distress” (Jansen et al., Reference Jansen, Kant and van den Brandt2002, p. 324). The feeling of needing to recover is experienced to a greater extent in the latter part of the workday and especially in the hours immediately after the end of the workday, which in turn can lead to difficulties in effective recovery (Chan et al., Reference Chan, Howard, Eva and Tse2022; van Veldhoven, Reference van Veldhoven, Houdmont and Leka2008; Wentz et al., Reference Wentz, Gyllensten, Sluiter and Hagberg2020). In fact, the higher the level of fatigue and stress experienced, and consequently the greater the need to recover from work, the more the recovery process tends to be deficient and ineffective, an experience that has led to the identification of the so-called the recovery paradox (Sonnentag, Reference Sonnentag2018).
In sum, if the goal of occupational health is to achieve effective recovery from work that reduces the levels of daily stress and the negative consequences of the workers’ work activity and, subsequently, decreases the risk of long-term adverse effects in both their health and future permanence in the labor market (Gommans et al., Reference Gommans, Jansen, Stynen, de Grip and Kant2015), correctly identifying and measuring the need for recovery as a subjective experience before possible recovery activities is crucial in facilitating the effectiveness of these nonwork activities.
The measurement of the need for recovery began in the 1980s in the Netherlands by constructing different scales (van Veldhoven, Reference van Veldhoven, Houdmont and Leka2008). Between 1992 and 1994, this author carried out the refinement of an instrument composed of 11 dichotomous response items (yes/no), with questions on the severity and duration of the symptoms experienced, considered as indicators that the person is not fully recovered from the effort of the work activity (Van Veldhoven & Meijman, Reference Van Veldhoven and Meijman1994). The Need for Recovery (NFR) scale was subsequently translated into English (de Croon et al., Reference de Croon, Sluiter and Frings-Dresen2006; Sluiter et al., Reference Sluiter, van der Beek and Frings-Dresen1999; Van Veldhoven & Broersen, Reference Van Veldhoven and Broersen1999), Portuguese (Moriguchi et al., Reference Moriguchi, Alem, van Veldhoven and Coury2010), Italian (Pace et al., Reference Pace, Lo Cascio, Civilleri, Guzzo, Foddai and Van Veldhoven2013), Taiwanese (Lin et al., Reference Lin, Chen, Hsieh and Chen2015), Danish (Gupta et al., Reference Gupta, Wåhlin-Jacobsen, Abildgaard, Henriksen, Nielsen and Holtermann2018), and French (Dupret et al., Reference Dupret, Bocéréan, Feltrin, Chemolle and Lebon2018). Stevens et al. (Reference Stevens, Crowley, Garde, Mortensen, Nygård and Holtermann2019) reported that the full version of the NFR exhibits adequate psychometric properties in terms of internal consistency, construct validity, and test–retest reliability. However, some aspects of this work are somewhat unclear. For example, their analyses suggested potential multidimensional aspects depending on the method employed (factor analysis versus IRT), as well as certain limitations regarding the clarity of discriminant validity and the interpretation of confirmatory results. These discrepancies highlight the importance of continuing to evaluate these properties, especially in specific contexts of cultural validation, such as this study.
Also, from the original scale, the Danish NFR scale (Stevens et al., Reference Stevens, Crowley, Garde, Mortensen, Nygård and Holtermann2019) reduced the original 11 items to 9 items, with a Likert-type measurement of five response anchors (1 = never, 5 = always). These same authors present a short version of the NFR consisting of three items, which is claimed to have good psychometric properties, primarily based on an intraclass correlation coefficient (ICC) value as evidence of criterion validity (Stevens et al., Reference Stevens, Crowley, Rasmussen, Hallman, Mortensen, Nygård and Holtermann2020). However, this approach does not comprehensively address the essential phases of the recommended psychometric validation process, such as the analysis of factorial structure, convergent and discriminant validity, and reliability. In this regard, while the studies by Stevens et al. (Reference Stevens, Crowley, Garde, Mortensen, Nygård and Holtermann2019, Reference Stevens, Crowley, Rasmussen, Hallman, Mortensen, Nygård and Holtermann2020) represent an initial step in the validation of the NFR, particularly in its shorter version, they exhibit certain methodological limitations that should be addressed in line with best practices described in the literature (Boateng et al., Reference Boateng, Neilands, Frongillo, Melgar-Quiñonez and Young2018), ensuring a more rigorous validation process aligned with international standards. Although other instruments, such as the Maslach Burnout Inventory (MBI; Maslach & Jackson, Reference Maslach and Jackson1981), and other procedures, such as interviews, exist, the NFR was selected for its specificity in assessing recovery after the workday. Unlike the MBI, which measures broader dimensions such as burnout, the NFR focuses exclusively on temporary symptoms of fatigue and disengagement. Furthermore, this instrument has been validated in multiple contexts, and along with advantages such as its ease of administration, it becomes a practical and effective tool.
In general, the Danish NFR scale in its two versions has shown evidence of convergent and discriminant validity through its associations with subjective indicators of fatigue, workability (physical and mental), and exhaustion (i.e., burnout), and high criterion validity with classic work outcomes and health variables, such as perceived health, stress, vitality, anxiety symptoms, and depressive symptoms (Dupret et al., Reference Dupret, Bocéréan, Feltrin, Chemolle and Lebon2018; Moriguchi et al., Reference Moriguchi, Alem, van Veldhoven and Coury2010; Stevens et al., Reference Stevens, Crowley, Garde, Mortensen, Nygård and Holtermann2019). Although the predictive value of the NFR scale of sickness absence has been proven in some studies (e.g., de Croon et al., Reference de Croon, Sluiter and Frings-Dresen2003), no data are yet available to assess its ability to predict challenging and long-term indicators, such as long-term absenteeism, retirement intention, or disability retirement (Stevens et al., Reference Stevens, Crowley, Garde, Mortensen, Nygård and Holtermann2019).
To have a reliable and valid measure of the need for recovery from work in Spanish, this study set out to carry out its adaptation and validation in the Spanish working population. For this purpose, the Danish NFR scale (Stevens et al., Reference Stevens, Crowley, Garde, Mortensen, Nygård and Holtermann2019, Reference Stevens, Crowley, Rasmussen, Hallman, Mortensen, Nygård and Holtermann2020), originally developed in English and composed of nine items, was used in its culturally adapted and translated version, with the five response anchors specified later in the “Method” section.
The purpose of this study is to analyze the factor structure of the Danish NFR using an exploratory factor analysis (EFA) followed by a confirmatory factor analysis (CFA) and then to analyze the internal consistency of the scale and the measurement invariance in two samples on the NFR. Finally, to provide validity evidence based on relationships with other variables, measures of perceived work stress, perceived general health, and positive and negative affect were included.
As already mentioned, there is evidence of a direct relationship between the need for recovery and work stress (Jansen et al., Reference Jansen, Kant and van den Brandt2002; Sonnentag, Reference Sonnentag2018; Sonnentag et al., Reference Sonnentag, Kuttler and Fritz2010). Therefore, it is to be expected that the more work-related stress experienced during the workday, the greater the difficulty in recovering (i.e., the greater the need for recuperation) after the workday is over.
Conversely, previous research has found that an increased need for recovery has a negative relationship with general health (Sluiter et al., Reference Sluiter, van der Beek and Frings-Dresen1999, Reference Sluiter, de Croon, Meijman and Frings-Dresen2003; Wentz et al., Reference Wentz, Gyllensten, Sluiter and Hagberg2020), since this need is a direct indicator of work-related fatigue (Jansen et al., Reference Jansen, Kant and van den Brandt2002), associated with adverse work factors acting as an antecedent. Therefore, a greater need for recuperation after work (which in turn is associated with greater work stress) will result in the individual experiencing poorer general health.
Based on these theoretical relationships, and to examine the validity of the NFR scale through its associations with other constructs, we proposed the following hypothesis:
H1: There will be a positive relationship between the need for recovery and job stress levels, and a negative relationship between the need for recovery and general health levels.
Finally, previous evidence shows that psychological distancing is an antecedent of affective states, both high levels of positive affect (Rodríguez-Muñoz et al., Reference Rodríguez-Muñoz, Sanz-Vergel, Antino, Demerouti and Bakker2018; Sonnentag et al., Reference Sonnentag, Mojza, Binnewies and Scholl2008; Wendsche & Lohmann-Haislah, Reference Wendsche and Lohmann-Haislah2017) and low levels of negative affect (Feuerhahn et al., Reference Feuerhahn, Sonnentag and Woll2014; Sonnentag et al., Reference Sonnentag, Mojza, Binnewies and Scholl2008; Wendsche & Lohmann-Haislah, Reference Wendsche and Lohmann-Haislah2017). These results have been confirmed in both correlational and experimental studies (Sonnentag & Niessen, Reference Sonnentag and Niessen2020), demonstrating the strength of this relationship. Similarly, we postulate that an increased need for recovery will be antecedent to inverse affective states (i.e., high negative affect and low positive affect), as also is postulated by the recovery paradox hypothesis (Sonnentag, Reference Sonnentag2018). Based on this evidence and rationales, and to test the predictive validity of the NFR scale, we postulate the following hypotheses:
H2: There will be a positive relationship between the need for recovery and negative affect levels, and a negative relationship between the need for recovery and positive affect levels.
Method
This validation study was conducted using data from a study composed of two workplace interventions. One was a randomized controlled trial on internal recovery from work in the private sector. The other study was a nonrandomized trial on external recovery from work in the public sector. The study was registered in ClinicalTrial.gov (NCT05557266). Ethical approval was provided by the Ethical Committee for the Universidad Nacional de Educación a Distancia (UNED, Spain), 10-PSI-2021.
Participants
We used a convenience, nonprobability sampling technique to collect the two subsamples. The sample consisted of 445 participants, of whom, 201 (45.17%) belonged to the public sector (Civil Guard) and 244 (54.83%) to the private sector (telecommunications and civil works). The participants’ mean age was 39.18 years (SD = 10.68), with a significant difference between groups in pretest measures: public sector participants had a younger mean age (M = 31.94, SD = 8.40) compared to the private sector (M = 45.15, SD = 8.42), t(443) = −16.48, p < .01. Regarding the NFR scores, the total mean was 22.32 (SD = 5.45), significantly lower in the public sector group (M = 20.37, SD = 5.08) than in the private sector (M = 23.92, SD = 5.23), t(443) = −7.22, p < .001. The effect size was moderate to large, with Cohen’s d = 0.69, 95% CI (0.50, 0.88). Additionally, significant differences were found in the gender distribution between groups: in the public sector, 82.04% of participants were men, while this proportion was 31.38% in the private sector, χ2(1) = 114.64, p < .01. The inclusion criteria for eligible workers was to work office based at least 35 hours per week. Exclusion criteria were to be free of any severe physical or mental pathology.
Instruments
Need for Recovery Scale (NFR; De Croon et al., Reference de Croon, Sluiter and Frings-Dresen2006).
The level of need for recovery was evaluated using the Spanish version of the NFR scale, which, in a preliminary version, consisted of eight items rated on a 5-point Likert scale (1 = never and 5 = always). The total score was obtained by summing up the value of each item. The total score can range from 8 to 40 points. A higher score indicates a higher degree of need for recovery after work. The internal consistency for the original scale was alpha = .87 (van Veldhoven & Broersen, Reference van Veldhoven and Broersen2003).
Due to the characteristics of both the Spanish work context and the occupational activities of the participants in the two samples, one of the nine items of the full NFR scale was eliminated (“I do not normally relax if I have only had one day without work”), since it is usual in Spain to have two consecutive days of rest in the work week (usually the weekend), so this item would not be applicable in most cases. Tables 1 and 2 show the eight items of the Danish NFR scale in its English and Spanish versions (see “Results” section).
Table 1. KMO index, homogeneity index, means, standard deviations, and correlations with confidence intervals

Note: M and SD are used to represent mean and standard deviation, respectively. Values in square brackets indicate the 95% confidence interval for each correlation. The confidence interval is a plausible range of population correlations that could have caused the sample correlation (Cumming, Reference Cumming2014). **p < .01. HI = homogeneity index.
Table 2. Item content and factor loadings from the exploratory and confirmatory factor analyses of the Danish need for recovery scale

Note: The table presents the content of the items in both English and Spanish, along with the standardized factor loadings obtained from the exploratory factor analysis (EFA) and confirmatory factor analysis (CFA).
Job Stress (General Nordic Questionnaire, QPSNordic; Dallner et al., Reference Dallner, Elo, Gamberale, Hottinen, Knardahl, Lindström, Skogstad and Ørhede2000) in its Spanish Version (Carral & Alcover, Reference Carral and Alcover2019) // Stress ([Nordic Age Discrimination Scale, NADS]; Furunes & Mykletun, Reference Furunes and Mykletun2010).
A single item was used: Stress refers to a person’s situation when they feel tense, restless, nervous, or anxious or cannot sleep at night because their mind is constantly preoccupied with work-related issues. Please indicate the extent to which you currently feel this type of stress. It has five possible responses on a Likert-type scale from 1 to 5 (1 = nothing and 5 = a lot).
General Health Questionnaire (GHQ-12; Goldberg, Reference Goldberg1978, Goldberg & Williams, Reference Goldberg and Williams1988) in its Spanish Version (Rocha et al., Reference Rocha, Pérez, Rodríguez-Sanz, Borrell and Obiols2011).
The scale consists of 12 items and is widely used as a unidimensional measure of psychological distress, despite ongoing discussions about its factorial structure. Numerous studies support its application as a unidimensional tool in practical settings, highlighting that the general factor explains most of the variance (Hystad & Johnsen, Reference Hystad and Johnsen2020). It is answered on a Likert-type scale from 1 to 4 (1 = never and 4 = always), with the total score ranging from 12 to 48. A higher score indicates a higher level of distress. The internal consistency for this sample was Omega = .81.
Positive and Negative Affect Schedule (PANAS; Watson et al., Reference Watson, Clark and Tellegen1988) in its Spanish Adaptation (Sandín et al., Reference Sandín, Chorot, Lostao, Joiner, Santed and Valiente1999).
The scale consists of 20 items, divided into two subscales designed to measure positive affect (PA) and negative affects (NA). The bidimensional structure of the PANAS has been validated in Spanish populations, demonstrating robust psychometric properties (Sandín et al., Reference Sandín, Chorot, Lostao, Joiner, Santed and Valiente1999; López-Gómez et al., Reference López-Gómez, Hervas and Vazquez2015). Each item is rated on a Likert-type scale from 1 (very slightly or not at all) to 5 (extremely), with higher scores indicating greater levels of positive or negative affect. Within the scope of this study, the reliability through the Omega coefficient was excellent for the PANAS N (ω = .96) and for the PANAS P (ω = .93).
Procedure
The study was conducted in Spain between 2021 and 2023. This period encompasses the entire research process, including the design, translation, adaptation of the instruments, as well as the data collection and analysis phases. The procedure can be divided into two steps. First, a translation/back-translation method integrated with expert committee review and pretesting was applied. Before translating each item, the conceptual and content equivalence of the underlying construct should be considered (Flaherty et al., Reference Flaherty, Gaviria, Pathak, Mitchell, Wintrob, Richman and Birz1988). Therefore, two researchers involved in the back-translation process, who were experts in the field and fluent in both English and Spanish, agreed that the meaning of the need for recovery construct underlying the NFR scale was meaningful in Spanish culture.
The translation used the backward translation strategy (Hambleton & Patsula, Reference Hambleton and Patsula1998). In the first step, two independent translators, native English speakers and proficient Spanish speakers, translated the original scale from English to Spanish. In the second step, they synthesized the two translations into one. In the third step, two independent translators, native Spanish speakers and English proficient, performed the backward translation. In a fourth step, a three-member expert committee compared the backward translations with the original scale and consented to a prefinal version of the translated questionnaire. The committee members, who were experts in Organizational Psychology, work stress, and burnout, guaranteed that the items were well designed to measure the construct meant to be measured and retain the original meaning. In the fifth and final step, the committee ensured good comprehension of each question of the scale and concluded with the final version of the Spanish NFR.
Some minor changes were made during this process to adjust the items’ ability to fit into contemporary Spanish. Thus, we used the wording «jornada laboral» when «work day» was used in English items because the exact translation for this term (día de trabajo) would have sounded too informal for the context in Spanish (items 1, 2, and 8).
Second, once the scale was fully adapted to the Spanish population, we proceeded with recruitment. Participants were recruited through the respective human resources departments of the companies that participated in the study. Participants were asked to complete the newly translated NFR. They signed the informed agreement and received instructions from a trained interviewer. The scale was administered, emphasizing that the responses were anonymous and there were no right or wrong responses. The items appeared in the same order as the English-language NFR (except for the eliminated item). There was no time limit. During the administration of the scale, any doubts arising during the process about the meaning of the items were clarified.
Data Analysis
Using an exploratory and descriptive analysis, the adequacy of the data to a multivariate normal distribution is analyzed, as well as the sociodemographic characteristics of the sample. Additionally, univariate normality was assessed for each item through skewness and kurtosis indices, which indicated that all items fall within acceptable ranges (skewness: −0.21 to 0.30; kurtosis: −1.01 to −0.56). These results suggest minimal deviations from normality at the univariate level. The study of the dimensionality of the questionnaire is analyzed using a split-sample validation procedure, dividing the sample into two halves, using a simple randomization procedure.
Given the cultural adaptation process and the potential for semantic or contextual variability, we conducted an exploratory factor analysis (EFA) with the first subsample (n = 222) using the minimum residuals method (MINRES; Barendse et al., Reference Barendse, Oort and Timmerman2015). While EFA is typically used in the development of new scales, its application in cross-cultural adaptations can serve as a conservative strategy to verify whether the hypothesized unidimensional structure is appropriate for the target population (Brown, Reference Brown2015). To determine the number of factors to retain, we performed a parallel analysis (PA; Timmerman & Lorenzo-Seva, Reference Timmerman and Lorenzo-Seva2011), which compares the eigenvalues obtained from the actual data with those derived from simulated data. The retained factors were those whose eigenvalues exceeded the threshold from the simulated data. As the parallel analysis supported a one-factor solution, no rotation was applied in the exploratory factor analysis. Pearson’s correlation matrices were used as input for both the EFA and PA. In our case, the multivariate normality criterion, as assessed by the Royston test (Royston < .01), was not met, supporting the use of these nonparametric estimation methods.
The criterion for item retention is based on the presence of a factor loading greater than .30 (Lloret-Segura et al., Reference Lloret-Segura, Ferreres-Traver, Hernández-Baeza and Tomás-Marco2014). Kaiser’s (Reference Kaiser1970) Kaiser–Meyer–Olkin (KMO) measure is used as a criterion for the goodness of fit of the data to the EFA. This procedure allows for determining whether the correlation matrix fits the EFA. In general, KMO values > .80 indicate a satisfactory degree of fit.
A confirmatory factor analysis (CFA) was carried out with the second subsample (n = 223) to evaluate the factor structure obtained in the EFA (Model 1). A second model (Model 2) was estimated based on the reduced factor structure of three items (2, 6, and 7). For Model 2, the factor structure was tested while addressing potential overestimation of parameters by constraining the loading of item 2 to 1 as an anchor point, following methodological recommendations (Brown, Reference Brown2015). Confirmatory factor analysis was conducted using multiple estimation methods to ensure the robustness of the findings. The robust maximum likelihood estimator (MLM) was first applied, providing robust standard errors and a Satorra–Bentler scaled chi-square. Additionally, the model was tested using WLSMV (weighted least squares mean and variance adjusted), an estimator specifically recommended for ordinal data. Although the items are ordinal, they were treated as approximately continuous based on the presence of five response categories, minimal skewness and kurtosis. This approach is supported by prior research suggesting that robust ML methods can be appropriately applied to ordinal variables with five or more categories under these conditions (Rhemtulla et al., Reference Rhemtulla, Brosseau-Liard and Savalei2012; Li, Reference Li2016). Model fit was assessed using the comparative fit index (CFI), the Tucker–Lewis index (TLI), the root mean square error of approximation (RMSEA), and the standardized root mean square residual (SRMR), following recommendations by Brown (Reference Brown2015). Values above 0.90 for CFI and TLI, below 0.08 for SRMR, and between 0.05 and 0.08 for RMSEA indicate an acceptable fit.
After analyzing the internal structure of the instrument, reliability was assessed using omega total, calculated from the same sample used for the exploratory factor analysis. Values above .70 are considered acceptable (Gadermann et al., Reference Gadermann, Guhn and Zumbo2012). Next, measurement invariance was assessed to determine whether the set of items measures the same construct across different groups and whether observed differences between groups are not due to biases in the items themselves (Oppong et al., Reference Oppong, Liu, Wang, Xie, Lei, Zhou, Peng, Li and Xing2023). This analysis was conducted using a multigroup CFA to evaluate the degree to which the instrument’s factor structure is invariant across the groups compared: sample origin, gender, and age. The grouping variables were operationalized as follows: sample origin was categorized into public sector (Guardia Civil) and private sector (telecommunications and civil works), while age was dichotomized based on the sample’s median (Mdn = 39), dividing participants into two groups: younger than 39 years and 39 years or older.
To assess invariance, four levels of measurement invariance were tested sequentially. Configural invariance: this level examines whether the same factor structure holds across groups without imposing additional constraints. This is achieved if the model fit is acceptable, and the factor loadings are significant across all groups. Metric invariance: this level evaluates whether factor loadings are equal across groups. This is tested by constraining the factor loadings to equality and comparing the fit with the configural model. Scalar invariance: at this level, the equality of item intercepts across groups is tested by further constraining the intercepts in addition to the factor loadings. The model fit is then compared to the metric invariance model. Strict invariance: this final level of invariance tests whether residual variances are equal across groups by imposing additional constraints on the residuals. The model fit is compared to the scalar invariance model. The criteria for evaluating invariance were based on the comparison of RMSEA and CFI indices, following Cheung and Rensvold’s (Reference Cheung and Rensvold2002) guidelines, where ΔCFI ≤0.01 and ΔRMSEA ≤0.015 indicate invariance. Additionally, chi-square difference tests were performed to compare nested models, as recommended by Brown (Reference Brown2015), to further support the assessment of measurement invariance. To complement the multigroup CFA and provide additional insight into group-related effects on the latent construct, we conducted a Multiple Indicators Multiple Causes (MIMIC) model. This approach is particularly useful when the results of measurement invariance testing are inconclusive or only partially supported, as it allows for the inclusion of covariates to assess their influence on the latent factor without imposing the strict constraints of group-based models.
Once the assumption of measurement invariance is assessed, a multiple regression analysis is conducted to explore the relationships between age, sex, and origin of the sample with the scores derived from the instrument based on the resulting factorial structure.
To examine the theoretical consistency of the construct in the context of occupational health, we included external variables associated with job strain and recovery. These included general mental health and job stress. While general health is a broader construct, it has been shown to reflect the accumulated fatigue and psychological strain resulting from work overload (e.g., Bridger & Brasher, Reference Bridger and Brasher2011; Gommans et al., Reference Gommans, Jansen, Stynen, de Grip and Kant2015). Predictive association analysis was performed on a specific subsample of public sector participants from the overall study (n = 201). These participants were involved in a controlled trial that included pretest and posttest measurements over a 4-week period.
All statistical analyses were conducted using R software (R Core Team, 2023) and the following packages: mvnTest (Royston, Reference Royston1995), psych (Revelle, Reference Revelle2021a), EFA.MRFA (Timmerman & Lorenzo-Seva, Reference Timmerman and Lorenzo-Seva2011), psychTools (Revelle, Reference Revelle2021b), car (Fox & Weisberg, Reference Fox and Weisberg2019), factoextra (Kassambara & Mundt, Reference Kassambara and Mundt2020), lavaan (Rosseel, Reference Rosseel2012), semPlot (Epskamp, Reference Epskamp2015), and semTools (Jorgensen et al., Reference Jorgensen, Pornprasertmanit, Schoemann and Rosseel2023).
Results
Exploratory Factor Analysis
The results of the PA, as a method for estimating factors to be retained, are shown in Figure 1. The abscissa axis shows the number of factors. In contrast, the ordinate axis shows the percentage of common variance explained—the similarity of the actual and simulated data points to the presence of a unifactorial structure.

Figure 1. Results of the parallel analysis following the procedure of Timmerman and Lorenzo-Seva (Reference Timmerman and Lorenzo-Seva2011). The ordinate axis shows the percentage of variance explained, while the abscissa axis shows the number of factors. The convergence or similarity between the red (real data) and green (simulated data) lines indicates the presence of a factor.
The Kaiser–Meyer–Olkin (KMO) measure of sampling adequacy showed satisfactory results, indicating the overall adequacy of the data for exploratory factor analysis (global KMO = .89). Additionally, the item-specific KMO values exceeded 0.80, confirming the adequacy of the items for the model (see Table 1). The same table shows the correlation matrix between items, their mean, and standard deviation. After confirming the dimensionality of the questionnaire through the EFA, the reliability of the total scale was evaluated. Corrected item-total correlations (HI: homogeneity indexes) for each item showed values above .72, indicating that all items contributed meaningfully to the internal consistency of the questionnaire. This is further supported by the reliability coefficients for the total scale, with Omega (ω = .89), demonstrating adequate levels of internal consistency, with values equal to or above .70 (Nunnally & Bernstein, Reference Nunnally and Bernstein1994).
Table 2 (see next) presents the factor loadings obtained for both the EFA. Notably, Item 3 displayed a negative factor loading, indicating a reversed relationship with the latent factor. This is consistent with its reverse wording, designed to reduce potential response biases (Podsakoff et al., Reference Podsakoff, MacKenzie, Lee and Podsakoff2003). While this approach may slightly lower the overall reliability estimates, it does not compromise the validity of the instrument or the adequacy of the model. As can be seen, all items maintain loading between 0.37 and 0.83, higher than the minimum retention criterion (>.30) (Lloret-Segura et al., Reference Lloret-Segura, Ferreres-Traver, Hernández-Baeza and Tomás-Marco2014). The only extracted factor explains 47% of the variance. Therefore, the interpretability of the factorial solution suggests the possibility of considering a one-factor structure provisionally.
Confirmatory Factor Analysis
Continuing with the split-sample validation procedure and the second subsample (N = 223), the CFA was carried out to contrast the hypothesis of the factorial structure of only one dimension obtained in the CFA (Model 1). Figure 2 shows the proposed model with the standardized estimated parameters (see also Table 1).

Figure 2. The standardized parameters of the model, the covariance parameter between the latent variables, and the standard errors.
The RMSEA value of 0.08 falls at the threshold of acceptable fit. The 90% confidence interval for the RMSEA ranged from 0.05 to 0.11, suggesting that while the lower bound indicates a good fit, the upper bound reflects some degree of misfit. This result should be interpreted in conjunction with other fit indices (CFI = 0.94, TLI = 0.91, SRMR = 0.041), which collectively suggest that the model provides a reasonable representation of the data.
To ensure the robustness of the findings, we also conducted the CFA using the WLSMV estimator (weighted least squares mean and variance adjusted), recommended for ordinal data. The WLSMV analysis yielded similar results, with CFI = 0.95, TLI = 0.93, RMSEA = 0.08, and SRMR = 0.044. These consistent findings across estimation methods confirm the stability of the unidimensional structure of the scale.
Complementarily, a reduced single-factor model based on items 2, 6, and 7 was tested to explore whether a shorter version of the scale could maintain acceptable psychometric properties. However, this model presented a lower fit, with only one acceptable index (CFI = 0.90), while other indicators, such as TLI (0.71) and RMSEA (0.29), fell outside the recommended thresholds (see Table 3).
Table 3. Evaluation of the goodness of fit of Models 1 and 2

* p < .05.
Analysis of Measurement Invariance
Measurement invariance was evaluated using multiple-group confirmatory factor analysis (MG-CFA) across the variables sample origin, gender, and age. Configural invariance was supported, indicating that the overall factor structure is consistent across groups. However, metric, scalar, and strict invariance showed partial support, as some significant differences were observed in chi-square tests and ΔCFI values exceeded the recommended threshold of −0.01 in specific comparisons, particularly for sample origin and gender (see Table 4).
Table 4. Evaluation of measurement invariance by estimating the configural (Model 0), metric (Model 1), scalar (Model 2), and strict invariance (Model 3) models through the variables sample origin, age, and sex

Note: Values.
* Model fit is statistically significant at p < .01.
a Dichotomous variable with median-based values (Mdn = 39). Group 0 with Mdn values < 39; Group 1 with Mdn values > 39.
The MIMIC model showed an acceptable fit: CFI = 0.883, RMSEA = 0.109, 90% CI (0.095, −0.124), and SRMR = 0.067. These results suggest that the model adequately captures the relationships between covariates and the latent variable, though the RMSEA indicates some degree of misfit.
Once the metric invariance was assessed, the predictive capacity of the variables sex, age, and the sample’s origin was analyzed. For this purpose, a multiple regression model is estimated, where these variables are introduced as predictors, while the total scores in the instrument constitute the dependent variable. The results show statistically significant values for age and group variables, showing that older age predicts lower scores. In contrast, the group of workers from private companies predicts higher scores on the instrument (see Table 5).
Table 5. Regression results using the total score of the instrument as a criterion

Note: A significant b weight indicates that the semipartial correlation is also significant. b represents unstandardized regression weights. sr2 represents the semipartial correlation squared. LL and UL indicate a confidence interval’s lower and upper limits, respectively.
**p < .01.
Validity Based on the Relationship With Other Variables
Table 6 shows the results of the analysis of validity based on the relationship with other variables. The correlation between the need for recovery and work stress was positive and statistically significant (r = .61, p < .01, 95% CI [0.55, 0.66]), which corresponds to a large effect according to Cohen’s (Reference Cohen1988) criteria. This result is theoretically consistent, as greater work stress during the day is expected to lead to more difficulty recovering at the end of the workday, and therefore to a greater need for recovery. In contrast, the association between the need for recovery and general health was negative (r = −.32, p < .01, 95% CI [−0.40, −0.23]), reflecting a moderate effect size. This result supports the idea that higher levels of recovery need are linked to poorer perceived general health.
Table 6. Means, standard deviations, and correlations with confidence intervals

Note: M and SD are used to represent mean and standard deviation, respectively. Values in square brackets indicate the 95% confidence interval for each correlation. The confidence interval is a plausible range of population correlations that could have caused the sample correlation (Cumming, Reference Cumming2014).
**p < .01.
Additionally, the two external variables, work stress and general health, were negatively correlated (r = −.35, p < .01, 95% CI [−0.43, −0.27]), which is consistent with the previous findings. The fact that the NFR shows opposite correlations with these two variables (positive with stress and negative with health) is theoretically coherent, given the inverse relationship between perceived stress and general well-being.
Regarding affective states, the NFR was positively associated with negative affect (r = .63, p < .01, 95% CI [0.54, 0.71]), and negatively associated with positive affect (r = −.74, p < .01, 95% CI [−0.74, −0.59]). According to Cohen’s (Reference Cohen1988) benchmarks, the first represents a large effect size and the second a very large effect size. These results indicate that a greater need for recovery is related to higher levels of negative affect and lower levels of positive affect.
Together, these findings support Hypotheses H1 and H2, and provide validity evidence based on relationships with other theoretically related variables.
Discussion
This study delves into the methodological understanding of adapting psychometric instruments to specific cultural contexts, emphasizing the importance of considering local particularities in instrument validation.
The results of the adaptation and validation of the Danish NFR scale (Stevens et al., Reference Stevens, Crowley, Garde, Mortensen, Nygård and Holtermann2019) in the Spanish population allow us to conclude that the scale, in its full eight-item version after eliminating an item not applicable in our context, has adequate psychometric properties according to the usual indicators in the literature, confirming its one-dimensional structure. The factor analysis results support the predominantly unidimensional structure of the Spanish version of the Danish NFR, with acceptable data fit, underscoring the scale’s potential for assessing recovery needs in the Spanish work context. While the values obtained for internal consistency confirm the reliability of the scale, the findings regarding measurement invariance suggest that the instrument is stable across some, but not all, group comparisons. These results highlight the applicability of the scale while also pointing to areas for further exploration in future studies.
The findings of this study highlight significant differences among the analyzed subgroups and point to the need to include additional variables related to work and personal characteristics that could act as moderators or mediators. For example, factors such as workload, type of contract, or family responsibilities could more comprehensively explain the observed differences in NFR scores.
The results of this study highlight the practical relevance of the NFR scale in understanding how recovery needs relate to affective states. Specifically, higher levels of need for recovery were associated with higher negative affect and lower positive affect, with both associations reaching moderate to large effect sizes. Although no specific thresholds were defined in the hypotheses, the observed effect sizes (r = .51 and r = .71, respectively) support the theoretical consistency of these associations, in line with current recommendations for interpreting the strength of psychological relationships (Cohen, Reference Cohen1988). These findings, together with the validity evidence based on relations with other variables, reinforce the conceptual value of the NFR scale as a meaningful indicator of both psychological distress and well-being.
However, unlike the results obtained by Stevens et al. (Reference Stevens, Crowley, Garde, Mortensen, Nygård and Holtermann2019, Reference Stevens, Crowley, Rasmussen, Hallman, Mortensen, Nygård and Holtermann2020), the short version of three items does not present a better fit compared to the full version in its Spanish adaptation. The comparison of single-factor models versus the reduced three-item version reveals a superiority in the fit of the complete model, reflecting a more accurate and detailed capture of the recovery need construct and underscores the value of a rigorous methodological approach to psychometric evaluation in the workplace. These results allow us to conclude that, provisionally, the underlying construct is more adequately captured by the full version in its Spanish adaptation. Although the cited authors (Stevens et al., Reference Stevens, Crowley, Garde, Mortensen, Nygård and Holtermann2019) suggest that even a single-item version would be able to represent the need for recovery construct, our results suggest caution in this direction.
The trend in psychology and related sciences to develop increasingly shorter scales is well known (e.g., Rammstedt & Beierlein, Reference Rammstedt and Beierlein2014; Smith et al., Reference Smith, McCarthy and Anderson2000), and especially in the field of work and organizations (e.g., van Veldhoven et al., Reference van Veldhoven, Prins, Laken and Dijkstra2015), based on criteria of convenience in application, on reducing response biases caused by participant fatigue or loss of attention caused by long questionnaires, and on obtaining psychometric properties equivalent and sometimes even superior to full versions. However, this tendency cannot lead to an oversimplification in the measurement of constructs that, though one-dimensional, require the formulation of a sufficient number of items to capture their real meaning.
In conclusion, the Spanish adaptation of the Danish NFR scale, in its full eight-item version, provides evidence of reliability and validity for the assessment of the need for recovery in the evaluated sample and context and represents a helpful complement for the measurement of stress and work recovery experiences, for which the Spanish adaptation (Sanz-Vergel et al., Reference Sanz-Vergel, Sebastián, Rodríguez-Muñoz, Garrosa, Moreno-Jiménez and Sonnentag2010) of the questionnaire developed by Sonnentag and Fritz (Reference Sonnentag and Fritz2007) was already available. The daily depletion of personal resources at work, manifested by the level of need for recovery in the last part of the workday and after its end (Bennett et al., Reference Bennett, Gabriel, Calderwood, Dahling and Trougakos2016; Wentz et al., Reference Wentz, Gyllensten, Sluiter and Hagberg2020), significantly reduces the resources available for investment in activities that facilitate recovery, at home or in other social or leisure contexts (Sonnentag et al., Reference Sonnentag, Cheng and Parker2022; Ten Brummelhuis & Bakker, Reference Ten Brummelhuis and Bakker2012), and lead to the aforementioned recovery paradox (Sonnentag, Reference Sonnentag2018). The accurate identification of difficulties in achieving effective post-workday recovery through the measurement of the need for recovery with the instrument validated in Spanish in this study may contribute to improving such recovery processes (Chawla et al., Reference Chawla, MacGowan, Gabriel and Podsakoff2020; de Croon et al., Reference de Croon, Sluiter and Frings-Dresen2006) and thus achieve an increase in well-being and health associated with work and quality of life in general.
Limitations, Future Research, and Practical Implications
Despite its strengths, this study is not without limitations, several of which point to valuable directions for future research. First, certain contextual and demographic variables—such as number and age of children, working hours, contract type, salary, and commuting distance—were not considered in this study. These factors could influence the perception of recovery needs and should be incorporated into future studies to better contextualize individual differences.
Second, the use of a Likert scale with extreme anchors (“never” to “always”) may have limited the variability of responses. Exploring alternative response formats (e.g., “rarely” to “very frequently”) may enhance the scale’s sensitivity and discriminatory power across occupational settings.
In addition, while the NFR scale is conceptually distinct from chronic stress constructs such as burnout, some overlap—particularly with emotional exhaustion and depersonalization—may exist. The absence of these variables in this study limits the possibility of examining how the NFR construct can be empirically distinguished from related constructs such as burnout. Future research should include such measures to better delineate the nomological network of the construct.
The study also employed a single-item indicator to assess perceived work stress. Although this approach offers simplicity and has empirical precedent, it does not permit the estimation of internal consistency and may attenuate observed associations due to measurement error. Multi-item validated scales are recommended in future applications. Finally, while the test–retest reliability of the NFR scale showed moderate stability over time, the absence of an analysis of potential changes in participants between measurements makes it difficult to fully interpret these results. Future studies should aim to clarify this issue by controlling for possible changes in external or psychological factors between administrations.
In terms of practical implications, as already mentioned, the need for recovery is considered an initial or early stage in the long-term strain process (Sonnentag & Zijlstra, Reference Sonnentag and Zijlstra2006). Therefore, preventing or detecting the need for recovery at the earliest possible stage can facilitate the recovery process, particularly within work (Chan et al., Reference Chan, Howard, Eva and Tse2022; Gommans et al., Reference Gommans, Jansen, Stynen, de Grip and Kant2015), and reduce work-related stress and its associated adverse consequences (van Veldhoven, Reference van Veldhoven, Houdmont and Leka2008; Sonnentag et al., Reference Sonnentag, Kuttler and Fritz2010). Designing organizational interventions that can detect the need for recovery early on and eliminate the factors that cause it can, in turn, reduce long-term fatigue, the effects of psychological distress, and burnout (Sinval et al., Reference Sinval, Veldhoven, Oksanen, Azevedo, Atallah, Melnik and Marôco2021). In summary, these interventions aim to reduce or eliminate factors that contribute to the need for recovery, thereby enhancing well-being and health at work (Wentz et al., Reference Wentz, Gyllensten, Sluiter and Hagberg2020). They can serve as a foundation for the design of restorative work environments (Hilbert et al., Reference Hilbert, Binnewies and Berkemeyer2024) and contribute to a healthier and more sustainable work design (Parker & Knight, Reference Parker and Knight2025).
Data availability statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Acknowledgments
We thank the Servicio de Psicología of the Guardia Civil for facilitating this research by granting access to its facilities and the sample of professionals who participated.
Author contribution
F.A.B.-J.: conceptualization, formal analysis, methodology, writing—original draft, writing—review and editing. C.D.-S.: data curation, investigation, funding acquisition. R.R.-Í.: data curation, investigation, project administration. M.-Á.S.G.: conceptualization, formal analysis, methodology, funding acquisition, project administration, supervision. C.-M.A.: conceptualization, writing—original draft, funding acquisition, supervision, writing—review and editing.
Funding statement
This research was carried out within the framework of the research project funded by the Ministerio de Ciencia e Innovación/AEI/10.13039/501100011033, Grant PID2019-110490RB-I00.
Competing interests
The authors declare none.



