1. Introduction
The effects of auditory stimuli on cognitive activities have been demonstrated by numerous studies and are widely recognized. Specifically, phenomena such as the “irrelevant speech effect” (Reference Salamé and BaddeleySalamé & Baddeley, 1982) where cognitive task performance is hindered by unrelated vocal sounds, and the “Mozart effect,” which demonstrates improved spatial cognitive abilities associated with specific music, have shown various ways auditory stimuli can impact human activities. Regarding background music (hereafter, BGM) as a form of auditory stimulation, numerous studies have focused on diverse elements such as musical tone, melody, and tempo. These studies have demonstrated that the impact on cognitive performance varies according to the BGM and task type, with reported effects ranging from improved concentration to relaxation (Reference Hallam, Price and KatsarouHallam et al., 2002). Based on these insights, mechanisms by which BGM influences work efficiency and learning have increasingly drawn attention in recent years. BGM elements can be classified into musical elements such as melody and linguistic elements (such as lyrics).
The impact on cognitive activities differs depending on these elements and BGM potentially enhancing creativity by evoking positive emotions (Reference Ritter and FergusonRitter & Ferguson, 2017). Furthermore, Reference Martin, Wogalter and ForlanoMartin et al. (1988) considered not only musical elements but also the impact of linguistic elements in BGM. Their experiment showed significantly lower text comprehension performance when listening to music with lyrics, suggesting that the lyrics had a distracting effect. Different cognitive task modes, such as calculation, memory, and reading comprehension, require varied abilities. Correspondingly, the impact of BGM on these modes has been reported to vary on creative task performance (Reference Adaman and BlaneyAdaman & Blaney, 1995; Reference Ritter and FergusonRitter & Ferguson, 2017). Blaney et al. conducted an experiment where participants performed creative tasks while listening to music that induced elevated, depressive, and neutral moods. The elevated mood group demonstrated significantly higher creativity compared to the depressive group. Additionally, Reference Arima and HashimotoArima and Hashimoto (2021) noted the possibility that BGM evoking positive emotions could enhance creativity. Multiple studies have demonstrated the effectiveness of BGM in promoting positive emotions that enhance creativity. In creativity studies, though, research focusing only on linguistic elements rather than musical elements remains insufficient. This study seeks to explore how linguistic elements in BGM affect creativity with an aim to contribute to designing sound environments that enhance creativity.
2. Research methodology
2.1. Experimental method
This study employed an experimental approach to compare the creative task performances under three conditions with BGM: voiced with meaning (VF), voiced without meaning (VL), and non-voiced (NV). VF was provided the original song with lyrics, NV retained only the melody by removing the lyrics, while the VL replaced all lyrics in the Vocaloid BGM with the syllable la to render them meaningless. A Vocaloid song was used for the BGM to exclude the effects of human vocal variations. Vocaloid is a singing voice synthesis technology and enables users to generate singing voices by inputting lyrics and melodies, without requiring human vocalists (Kenmochi & Oshita, Reference Kenmochi and Ohshita2007).
The experiment involved 30 participants with normal hearing, aged between 20 and 24 years. All participants were design students familiar with idea generation tasks and limited to native Japanese language speakers to ensure common meaning recognition of BGM lyrics. To minimize the influence of musical style, the authors used the moderate Vocaloid BGM, “Lonely Universe”. For the voiced conditions, the HARUKA voice from VOCALOID6 (YAMAHA, 2025) was used. This voice is very close to human vocals and is expected to be easily listenable even for those unaccustomed to Vocaloid songs. In both conditions, BGM voices were adjusted to maintain the original pitch and rhythm. Participants wore headphones to listen to the BGM. The volume was set at approximately 60 dBA to make it as close to optimal listening levels as possible (Hamamura et al., Reference Hamamura, Kishigami and Iwamiya2014).
Participants were asked to listen to the BGM in the three conditions at least twice every day during the two weeks prior to the experiment for familiarization. During experiment, they were asked to work individually on an alternate uses task (AUT) (Guilford, Reference Guilford1967), under the three experimental conditions, VF, VL and NV. AUT aims to generate as many potential uses as possible for a specific task theme (e.g., using a “brick” as a step). This experiment used rope, brick, and newspaper as task themes. Participants first completed pre-experiment questionnaires and then performed a one-minute practice AUT with “spoon” as the theme. After the practice session, participants put on headphones and proceeded to the experiment. They performed the AUT three times, once under each BGM condition. Each trial lasted 7 minutes and 30 seconds, corresponding to the length of the song “Lonely Universe” played twice. Questionnaires and interviews followed each task (Figure 1). To eliminate both order and combination effects, AUT task theme and BGM conditions, both orders and combinations were balanced.

Figure 1. An example of the experimental procedure
2.2. Subjective questionnaires and interview
Before the experiment, the participants answered questionnaires to reveal their personal characteristics, focusing on mind-wandering tendencies, music listening habits, and multitasking habits, which have been understood to be associated with creativity and BGM. The Mind Wandering Questionnaire for Japanese (Reference Kajimura and NomuraKajimura & Nomura, 2016) was used. Music listening habits and multitasking habits were scored on a 5-point scale (1: never, 2: 1-2 times/week, 3: 3-4 times/week, 4: 5-6 times/week, 5: daily). The MWQ was similarly scored on a 6-point scale (1: never, 2: rarely, 3: somewhat rare, 4: sometimes, 5: frequently, 6: always).
-
Music listening/Multitasking habits
-
How often do you normally listen to music? (Music listening habits)
-
How often do you work while listening to music? (Multitasking habits)
-
-
Mind Wandering Questionnaire (MWQ)
-
How often do you find yourself thinking about something else while listening to someone speak?
-
How often do you think about other things during work or class?
-
How often do you not pay sufficient attention to tasks?
-
How difficult is it to continue concentrating on simple or monotonous tasks?
-
How often do you find yourself thinking about other things while reading documents or books?
-
Immediately after each AUT task, participants were asked to complete questionnaires about task concentration and music mood regulation (Reference Hewston, Lane and KarageorghisHewston et al., 2008) scored on a 7-point scale (from 0: not applicable to 6: highly applicable).
-
Task concentration
-
I was able to concentrate on the task (concentration level)
-
I was distracted by the BGM (concentration disruption level)
-
-
Music mood regulation
-
Became enjoyable
-
Provided calmness and relaxation
-
Allowed mood switching and refreshment
-
Generated energy and motivation
-
Following the above questionnaires, participants were also interviewed to answer the following qualitative questions. Based on the participant’s answer, the interviewer continued with further detailed follow-up questions.
-
How did you come up with each response?
-
To what extent was the BGM noticeable?
-
Were there any changes in concentration over the course of the experiment?
2.3. Analysis method
Creativity was assessed in terms of fluency and originality (Reference GuilfordGuilford, 1950). Fluency was evaluated by the number of responses given during AUT (Reference GuilfordGuilford, 1967) and the Consensual Assessment Technique (CAT) was adopted for originality evaluation. Three evaluators scored each response's originality on a 1 to 5-point scale, resulting in a total score of 3 to 15 points. Evaluators were graduate students in design who were unaffiliated with the experiment. The top-scoring method (Reference Benedek, Mühlmann, Jauk and NeubauerBenedek et al., 2013) was used for originality analysis to ensure fluency and originality independence. This involves extracting top-scoring responses: extracting more than 5 responses could compromise score validity for tasks of over 5 minutes, the top 4 responses from each trial were extracted for analysis. Shapiro-Wilk tests were conducted for both fluency and originality in the results, but their normal distribution could not be confirmed (p<0.05). Therefore, non-parametric tests were adopted for subsequent analyses. For significance tests of mean values between conditions, Holm's multiple comparison test was used for paired data, and the Steel-Dwass multiple comparison test was used for unpaired data. By comparing and analyzing fluency and originality obtained from each BGM condition using multiple comparison methods, the authors examined the impact of Vocaloid BGM voices on creativity from two perspectives: voiced/non-voiced and voiced with/without meaning. Subjective questionnaires and interviews were conducted to consider personal characteristics (music listening and multi-tasking habits, mind wandering tendencies) and the influences of BGM (task concentration, mood regulation, and others).
3. Results
3.1. Overall effects of Vocaloid BGM
The overall trends in fluency and originality across BGM conditions are shown in Figure 2. In fluency, Holm’s multiple comparison tests revealed no significant differences between conditions. In contrast, Steel-Dwass multiple comparison tests may suggest a marginally significant trend at the 10% level (p=0.14) where originality was greater under the NV than the VF, by improving its sample size.

Figure 2. Overall effects of Vocaloid BGM on creativity (left: fluency, right: originality)
As for the cognitive effects of BGM on concentration, while the mean concentration values increased in order from VF, VL, to NV, the mean concentration disruption values increased from NV, VL, to VF (Figure 3). Holm's multiple comparison test revealed a marginally significant difference at the 10% level for concentration between VF and NV (p=0.06) and a significant difference at the 5% level for concentration disruption between VF and NV (p<0.001), between VL and NV (p=0.03) and between VF and VL (p=0.05).

Figure 3. Effects of BGM on concentration (left) and concentration disruption (right)
The affective effects of BGM on music mood regulation were revealed with Holm's multiple comparison tests performed between BGM conditions. Statistically significant differences were found at the 5% level only in “provides calmness and relaxation” between VF and NV and between VF and VL (p=0.01).
3.2. Influence of temporal phases
The above fluency and originality results were divided into the first and second halves of the experiment, and box plots for each were made (Figure 4), with darker colors representing the first half and lighter colors representing the second half.

Figure 4. Temporal phases and creativity (left: fluency, right: originality)
Fluency was lower in the latter half compared to the former half across all conditions. Holm’s multiple comparison tests were conducted for each BGM condition in both the first and second halves, and no significant differences were observed in either. Originality was greater in the latter half compared to the former half across all conditions. Steel-Dwass multiple comparison tests were performed for each BGM condition in both the first and second halves. In the first half, no significant differences were found between the BGM conditions. In the latter half, a significant difference was observed at the 10% level (p=0.07) between the VF and VL. Additionally, a marginally significant trend was found at the 10% level (p=0.15) between the VF and NV.
3.3. Influences of personal characteristics
3.3.1. Influence of participant’s habits
The relationships between music listening habits, fluency, and originality are shown in Figure 5. A small random number was added to visualize point overlapping. A Spearman correlation coefficient analysis between music listening habits and fluency revealed a weak negative correlation (r=-0.27, p=0.01). Subjects with music listening habits tended to have lower fluency. A statistically significant weak negative correlation was observed only in NV (r=-0.23, p=0.22 for VF; r=-0.18, p=0.35 for VL; and r=-0.37, p=0.04 for NV). The Spearman correlation coefficient analysis between music listening habits and originality showed no significant correlation (r=-0.08, p=0.14). No significant correlation between multitasking habits and creativity was found under any conditions.

Figure 5. Music listening habits and creativity (left: fluency, right: originality)
3.3.2. Influence of participant’s mind-wandering tendencies
The results of the Spearman correlation coefficient analysis of relationships between mind-wandering tendency, fluency, and originality are shown in Figure 6. In fluency, participants with greater mind-wandering tendencies trended toward greater fluency with a statistically significant weak positive correlation (r=0.29, p<0.01). A significant positive correlation was observed at the 5% level for VF, and a weak positive correlation was found at the 10% level for VL (r=0.41, p=0.02 for VF; r=0.35, p=0.06 for VL; and r=0.09, p=0.65 for NV). For originality, participants with greater mind-wandering tendencies tended to exhibit more originality with a statistically significant weak positive correlation (r=0.18, p<0.001). A weak positive correlation was statistically significant at the 5% level for NV, and at the 10% level for VF and VL (r=0.16, p=0.07 for VF; r=0.17, p=0.06 for VL; and r=0.23, p=0.01 for NV).

Figure 6. Mind-wandering tendency and creativity (left: fluency, right: originality)
4. Discussions
4.1. Effects of auditory stimuli
Background noise disrupts auditory selective attention and impairs performance on cognitive tasks, but the degree to which it is disruptive depends on the task and the individual (Reference Hao and ConwayHao & Conway, 2022). Previous studies have predominantly focused on the interference effects of voice. However, multiple studies in the creativity domain have also mentioned that a voice can have facilitating effects. For instance, Reference Mehta, Zhu and CheemaMehta et al. (2012) investigated the impact of environmental noise on creative task performance. When conducting creative tasks under three noise conditions of varying volumes (high, medium, and low noise), performance was significantly higher under the medium noise condition. They proposed that moderate noise plays a role in increasing thought abstraction, which contributes to improved performance in creative tasks. This is based on the theory that moderate noise appropriately distracts from the task, separating concrete thinking and enabling more abstract and free thinking. Additionally, while similar effects occurred in high-noise conditions, the interference effect of reduced information processing capacity may have counteracted these effects. In this study, the authors will proceed with our discussion considering the potential emergence of both interference and promotion effects.
As is shown in Figure 2, the marginal significance between VF and NV for originality suggests, as mentioned in Martin et al.'s research (1988), the potential influence of a meaningful voice on creative tasks. Martin suggests that auditory stimuli are prioritized for human processing, and as a result, consciousness being allocated to the voice may interfere with information processing for the task. Such influences can be categorized into cognitive effects that impact information processing and affective effects that evoke emotions. Therefore, the following discussion considers the possibility of both effects, or potentially only one of them.
4.1.1. Cognitive effects
Concentration disruption increases progressively, with the greatest disruption occurring in the VF, followed by the VL, with the least disruption observed in the NV (Figure 3). This indicates that the presence of a voice and its meaningfulness disrupt the participants' concentration. The test results show that the degree of concentration disruption is greater for a meaningful voice compared to a meaningless voice. This suggests that VF significantly reduced concentration compared to NV, potentially contributing to the decrease in originality observed in Figure 3. While multiple hypotheses exist regarding the mechanism by which auditory stimuli cause task interference, Reference Mehta, Zhu and CheemaMehta et al. (2012) proposed that reduced information processing capacity due to auditory stimuli leads to a decline in creativity. In their experiment, it can be considered that some processing resources were automatically allocated to the meaningful voice, thereby reducing concentration on the creative task and causing an interference effect. Another discussion that refers to their study can be that the creativity promotion effect of increased abstract thinking was caused by diverted attention with moderate noise (Reference Mehta, Zhu and CheemaMehta et al., 2012; Reference OppenheimerOppenheimer, 2008). Some participants in this study answered that their attention was more diverted by the meaningful voice, though their originality scores had decreased. Consequently, there can be two possibilities: one is that abstract thinking was promoted by the diverted attention, but the effect was canceled out by the interference effect (similar to the high-noise condition observed in their experiment); the other is that, due to differences in tasks and auditory stimuli, the results differed.
4.1.2. Affective effects
NV and VL were rated as more conducive to a sense of calmness and relaxation compared to the VF. This finding suggests the possibility that such affective states influence originality. Numerous studies have reported that positive emotions enhance creativity. For example, in an experiment conducted by Reference Tang, Xia, Li, Wang, Ying and YangTang et al. (2023), participants were assigned a total of six trials of design tasks under three kinds of music conditions (positive music, negative music, and non-music). The results showed that participants produced more ideas in positive music condition and indicated that the positive music was useful for their performance in divergent thinking measures was significantly greater. Divergent thinking, the ability to generate multiple solutions or ideas for a given problem, is considered particularly important in creative tasks. They interpreted these results as an indication that positive emotions may broaden attentional focus, enabling more extensive information processing unconstrained by conventional patterns. It is conceivable that the calmness and relaxation induced by the non-voiced BGM in this experiment may have similarly contributed to higher originality scores.
4.2. Influences on fluency and originality
Average originality scores increased in the order of NV, VL, and VF. This sequence aligns with the levels of concentration disruption associated with each condition, as well as with the degree to which each condition was rated as “calming and relaxing.” In contrast, fluency scores indicated no significant trends observed between conditions. There are two suggested reasons for these differences between fluency and originality scores: differences in temporal phases for fluency and originality, and differences in the cognitive modes underlying fluency and originality.
4.2.1. Influence of temporal phases
Figure 4 indicates that no significant differences between any of the conditions were observed in fluency or originality during the first half of the experiment, whereas significant or near-significant differences in originality emerged during the second half. This suggests that the disparity between fluency and originality results likely arose in the latter half of the experiment. What distinguishes the first and second halves of the experiment? Oral interviews with participants revealed comments such as, “At first, I didn’t notice the lyrics, but as I struggled to come up with responses in the latter half, they started to catch my attention.” This indicates that, as the experiment progressed, the participants' attention began to shift away from the task and toward the auditory stimuli. This shift suggests that the influence of auditory conditions became more pronounced in the second half, with more cognitive resources being allocated to processing the sound. The results suggest that, as participants progressed into the latter half of the experiment, they generated fewer ideas overall but provided responses with more originality, often describing unconventional approaches. When asked how they came up with their answers during the interviews, participants' responses in the first half often included phrases like, “Because I’ve used it that way before,” reflecting general, familiar approaches. In contrast, responses in the second half were more likely to involve unique approaches derived from previously generated ideas. These observations imply that eliciting highly original responses in the latter half of the experiment is crucial to enhancing originality. As the experiment progresses, participants appear to draw more on unique perspectives and ideas, which likely contribute to the assessment of originality. However, producing such creative responses requires deeper cognitive processing. In the critical latter half, cognitive resources may have been diverted toward processing auditory stimuli, particularly under the VF and VL, resulting in lower originality scores under these conditions. For fluency, Figure 4 indicates that the highest scores were observed in the early stages of the experiment. This suggests that the quantity of responses generated during the initial phase is a critical factor for fluency. The lack of significant differences between conditions during this phase may be attributable to the smaller influence of auditory stimuli in the first half of the experiment.
4.2.2. Influence of cognitive modes
When participants were asked during interviews how they came up with their responses, many mentioned reasons such as, “I’ve used it that way before,” or “I remembered someone else using it that way.” These responses were likely based on retrieving prior knowledge or memories rather than creating something new, applied convergent thinking. These types of responses are naturally included in the fluency scores. In contrast, originality scores are based on the top four most-original responses, which are more likely to reflect novel ideas created by the participants. These responses align more closely with divergent thinking tasks. In their experiment, positive emotions did not enhance performance on convergent thinking tasks but significantly improved performance on divergent thinking tasks (Reference Tang, Xia, Li, Wang, Ying and YangTang et al., 2023). Similarly, in this study, the influence of positive emotions, such as the sense of calmness and relaxation, may have been more pronounced in the evaluation of originality, which primarily consists of responses that are closer to divergent thinking.
4.2.3. Influence of personal characteristics
A negative correlation was observed between fluency, originality, and music listening habits (Figure 5). This suggests that participants less familiar with music performed better on the tasks, which may seem counterintuitive at first glance. One might expect that those less accustomed to music would be more easily distracted by BGM, thus experiencing greater concentration disruption. However, this result can be explained by focusing on the performance enhancement effect caused by the positive emotions evoked by BGM. Specifically, the less familiar participants with music listening, the more significantly they are influenced by BGM and its vocal elements, the results can be interpreted as these participants more strongly benefiting from the performance improvement effect of positive emotions. Moreover, only in the daily music listening group did NV induce significantly higher performance than VF. This might seem to contradict the previous interpretation. Possible explanations could include the following: First, participants who listen to music daily might have adaptively processed BGM's influence, potentially showing a unique response to a meaningful voice. Alternatively, the daily music-listening group had approximately 1.5 times more participants than the non-daily group, which could simply have increased statistical detection power.
Both fluency and originality showed an overall positive correlation between Mind Wandering Questionnaire (MWQ) scores and creative task performance (Figure 6). This result is consistent with Reference Yamaoka and YukawaYamaoka et al.'s (2016) experiment, where the middle MWQ score group showed significantly more originality compared to the low-score group. They pointed out the relationship between mind wandering and cognitive disinhibition, arguing that “mind wandering is a state where attention deviates from the current task, and thoughts (information) automatically arise, which is similar to a state of cognitive disinhibition.” In this experiment, the strongest positive correlation between fluency and voiced conditions was observed with the meaningful voice, which may indicate a synergistic effect between a meaningful voice and the cognitive disinhibition aspect of mind wandering. As MWQ scores increase, cognitive filters weaken, making it more difficult to filter information. When a meaningful voice—a complex information source—is introduced, the abstraction of thought may be synergistically heightened, subsequently increasing fluency. Consequently, participants with greater mind-wandering tendencies might have experienced particularly enhanced fluency, creating a wider performance gap from those having lower such tendencies, thus resulting in a positive correlation.
4.3. Integrated discussion
Previous studies have reported that voice or its meaningfulness did not significantly impact task performance (Reference Boyle and ColtheartBoyle & Coltheart, 1996). Their results indicated that none of these auditory stimuli significantly influenced task performance. The primary difference between these studies and the current research lies in the cognitive modes of the assigned task. The Alternate Uses Test (AUT) used in this study especially requires divergent thinking, whereas previous studies focused on tasks demanding convergent thinking. This difference in required cognitive modes might explain the varying results. From a cognitive impact perspective, concentration disruption seems to have a relatively consistent effect across cognitive modes. However, from an affective impact perspective, an alternative explanation might be that these emotional influences are specific to creative tasks that require divergent thinking.
Considering these factors, the significant differences and tendencies observed between VF and other conditions in the experiment's latter half are more likely reflective of affective influences rather than cognitive impacts. Another distinguishing factor is the type of auditory stimulus applied. Previous research predominantly compared conversation sounds or simple voice conditions with environmental noise or silence, examining the impact of linguistic elements. These studies often had entirely different acoustic characteristics across conditions, potentially introducing acoustic influences beyond linguistic elements. In contrast, this study used BGM with a consistent melodic foundation, providing better acoustic feature control.
These findings lead to some practical implications that the desirable acoustic environments for enhancing originality during the divergent cognitive task may differ according to MWQ characteristics: BGMs voiced without meaning are recommended in general, while the ones voiced with meaning are recommended for people with the high MWQ characteristics, for instance.
5. Conclusion
This study investigated the effects of linguistic elements in Vocaloid BGM on creativity from two perspectives: voiced/non-voiced and voiced meaningful/meaningless. The findings are summarized:
-
BGM with a voice, especially with meaning, may disrupt concentration and reduce creativity.
-
BGM without a meaningful voice may enhance creativity by inducing calmness and relaxation.
-
These above effects were particularly pronounced in originality rather than fluency, and became more evident as the tasks progressed.
-
A negative correlation exists between daily music listening habits and creative task performance under BGM.
-
A positive correlation exists between mind-wandering tendencies and creative task performance, with individuals inclined to mind-wandering potentially experiencing interactive creativity promotion effects from BGM with a meaningful voice.
These findings regarding the effects of linguistic elements in BGM on creativity could contribute to designing sound environments that enhance creativity, especially by tailoring the environments to the cognitive modes of tasks and to personal characteristics: BGM voiced without meaning is recommended in general, while the BGM voiced with meaning is recommended for people with high MWQ characteristics in order to enhance originality during divergent thinking, for instance.
Including non-Japanese participants or other types of Vocaloid songs may have different effects, and other unexamined personal characteristics could influence BGM effects. The increase in the sample size of participants will lead to the results with a greater statistical significance.
Ethical Statement
This study was conducted under strict adherence to the ethical regulations of the affiliated institution. Prior to the experiment, a full explanation and a participant consent form approved by the research ethics committee were presented, and written consent was obtained.