Hostname: page-component-54dcc4c588-2ptsp Total loading time: 0 Render date: 2025-09-14T08:54:27.092Z Has data issue: false hasContentIssue false

Lack of group-to-individual generalizability in people with lower urinary tract symptoms emphasizes the need for deep phenotyping and personalized treatments

Published online by Cambridge University Press:  01 August 2025

Victor P. Andreev
Affiliation:
Arbor Research Collaborative for Health, Ann Arbor, MI, USA
Caroline Smerdon
Affiliation:
Arbor Research Collaborative for Health, Ann Arbor, MI, USA
Brian Bieber
Affiliation:
Arbor Research Collaborative for Health, Ann Arbor, MI, USA
Abigail R. Smith
Affiliation:
Northwestern Medicine, Northwestern University, Chicago, IL, USA
Kathryn Flynn
Affiliation:
Medical College of Wisconsin, Milwaukee, WI, USA
J. Quentin Clemens
Affiliation:
Michigan Medicine, University of Michigan, Ann Arbor, MI, USA
David Cella
Affiliation:
Northwestern Medicine, Northwestern University, Chicago, IL, USA
Claire C. Yang
Affiliation:
University of Washington School of Medicine, Seattle, WA, USA
Ziya Kirkali
Affiliation:
National Institute of Diabetes and Digestive and Kidney Diseases, Bethesda, MD, USA
Kevin Weinfurt*
Affiliation:
Duke University School of Medicine, Durham, NC, USA
*
Corresponding author: K. Weinfurt; Email: kevin.weinfurt@duke.edu
Rights & Permissions [Opens in a new window]

Abstract

Introduction:

Understanding how different symptoms co-occur and are correlated may provide insights into the pathophysiology of disease. The lack of group-to-individual generalizability of co-occurrence of symptoms was recently demonstrated by comparing intra-individual and inter-individual correlations in several psychological studies. Here, we investigate this phenomenon for lower urinary tract symptoms (LUTS).

Methods:

We analyzed data collected in the Symptoms of Lower Urinary Tract Dysfunction Research Network Recall Study. Participants responded to questions about their urinary symptoms for 25 consecutive days. These questions queried urologic symptoms including storage (urinary urgency, frequency, nocturia, and urinary incontinence), voiding (slow/weak stream), and post-micturition (incomplete emptying and post-micturition dribble) symptoms. We calculated Pearson correlation coefficients and cosine similarity measures and compared distributions of intra-individual and inter-individual (cohort) metrics.

Results:

Among 234 participants, distributions of intra-individual measures were 10-fold wider than those of inter-individual correlations. There are pairs of questions with distributions of correlations and cosine similarities containing individuals with extreme positive (>0.8) and extreme negative values (<–0.8). There are groups of participants with strong positive and negative correlations of urinary frequency and nocturia, urinary incontinence and weak flow, as well as strong negative and positive correlations of urinary frequency and dribbling. Information on these extreme groups is averaged out and lost in the inter-individual correlations.

Conclusions:

Lack of group-to-individual generalizability previously shown for psychological symptoms is confirmed for LUTS. Wealth of information on the co-occurrence and co-evolution of LUTS in the intra-individual correlations and cosine similarities corroborates heterogeneity of LUTS and can be useful for deep phenotyping and for identifying personalized treatments of LUTS.

Information

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (https://creativecommons.org/licenses/by-nc-sa/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the re-used or adapted article and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright
© The Author(s), 2025. Published by Cambridge University Press on behalf of Association for Clinical and Translational Science

Introduction

Understanding how different symptoms co-occur and are correlated may provide insights into the pathophysiology of disease. One approach to exploring this is to analyze intra-individual correlations derived from the repeated measures of multiple symptoms in an individual. Unfortunately, multidimensional longitudinal data are rarely available, thus cohort data are typically used to calculate inter-individual correlations instead and infer conclusions about intra-individual correlations.

The lack of group-to-individual generalizability in human subjects’ research, which was promoted by Molenaar [Reference Molenaar and Campbell1,Reference Molenaar2] was convincingly demonstrated and emphasized for psychological studies by Fisher et al. [Reference Fisher, Medaglia and Jeronimus3], where six studies with repeated-measure design were used for comparison of inter-individual and intra-individual variances and correlations among psychological symptoms. For most of the studied variables, variance of individual symptoms was two to four times higher within individuals than within groups, while intra-individual correlations between variables had up to 4-fold wider distributions than the inter-individual (cohort-based) ones. Indeed, group-to-individual generalizability is strictly justified only for ergodic systems, which are both stationary and homogeneous. These conditions are rarely if ever met for heterogeneous, evolving systems such as cohorts of individuals followed longitudinally. Therefore, the value of correlations of variables extracted from cohort studies may be diminished when generalized to individuals and used for explanation of mechanisms of disease. However, the level of non-ergodicity of the system is likely disease-specific since the rate of progression and heterogeneity of symptoms differ by disease and from patient to patient.

Similar to psychological symptoms, lower urinary tract symptoms (LUTS) are primarily measured using patient-reported outcomes, however, group-to-individual generalizability has not been studied for these symptoms. LUTS is a general term representing a wide range of symptoms with a similarly wide range of known and unknown etiologies, high economic and social costs, and significant impact on quality of life. LUTS can include frequent urination during day and night (nocturia), urinary urgency, stress and urgency urinary incontinence, and bladder emptying symptoms such as straining, hesitancy, weak urine stream, and post-void dribbling. Many patients presenting for care report multiple symptoms [Reference Yang, Weinfurt, Merion and Kirkali4Reference Andreev, Helmuth and Liu7], and the multiplicity of symptoms in each patient may lead to a less than satisfactory treatment outcome, as remedies for LUTS are typically addressing a singular urinary symptom or condition. The prevalence of LUTS in the United States ranges between 45% and 70% and increases with age [Reference Coyne, Sexton and Thompson8,Reference Irwin, Milsom and Hunskaar9]. The goal of this paper is to investigate the group-to-individual generalizability for LUTS by comparing intra- and inter-individual similarity measures of these symptoms and to explore if intra-individual correlations of the symptoms can be used for improved subtyping of LUTS.

Materials and methods

Data

We analyzed data collected in the Recall Study of the NIDDK-funded multi-center study Symptoms of Lower Urinary Tract Dysfunction Research Network (LURN). LURN data included self-reported urinary and non-urinary symptoms, bladder diaries, and physical examination data for 545 women and 519 men with LUTS [Reference Cameron, Lewicky-Gaupp and Smith10]. A separate group of participants recruited for the Recall project [Reference Flynn, Mansfield and Smith11] by screening with LUTS Tool questionnaire [Reference Coyne, Sexton and Kopp5] and described in Table 1, then answered a subset of items from the Comprehensive Assessment of Self-Reported Urinary Symptoms [Reference Weinfurt, Griffith and Flynn12] for 25 consecutive days [Reference Flynn, Mansfield and Smith11]. These items covered storage (daytime frequency, nocturia, urinary urgency, and urinary incontinence), voiding (slow/weak stream), and post-micturition (incomplete emptying and post-micturition dribble) [Reference Cameron, Lewicky-Gaupp and Smith10] (Table 2). Possible responses ranged from 0 to 4 for 8 items and from 1 to 2 for two items, with higher values always reflecting higher severity of LUTS. In case of “How often?” questions, 0 – means “never, 1– ‘rarely, 2 – ‘sometimes,’ 3 – “often,” 4 – “almost always.” Answers to “How many times?” questions are binned into 5 groups, e.g. for Q1 (How many times you urinate during waking hours?”), 0 – means “1–3 times a day,” 1 – “4–7 times,” 2 – “8–10 times,” 3 – “11–13 times,” 4 – “14 or more times.” The structure of the data is illustrated in Figure 1, where excerpt (fragment) of the matrix on the left represents answers to Q1, while excerpt of the matrix on the right – answers to Q3. Rows represent days, while columns represent participants. Note column #7 in matrix of answers to Q3, where all the values are equal to 4, i.e., this participant gave the same answer “4 or more times a night” during all 25 consecutive days.

Table 1. Demographics and baseline urinary symptoms for the cohort of 234 participants. Mean (std)

LT stands for LUTS Tool question.

Table 2. Subset of ten Comprehensive Assessment of Self-Reported Urinary Symptoms questions answered by the participants during the 25 consecutive days

Methods

We calculated and compared intra- and inter-individual Pearson correlations for each pair of items (10 items, 45 pairs). As illustrated in Figure 1, intra-individual correlations between items are calculated within participant over time (columns), resulting in correlations for each participant and item pair. Conversely, inter-individual correlations are calculated within day across participants (rows), resulting in correlations between items for each day and item pair. Challenges in this approach arise when there is no variation in response over time or across participants in one or both items in a pair. As indicated above and shown in Figure 1, there are some participants for whom symptoms do not change over time, e.g., Q3 = 4 in participant #7 during all 25 days, which means that the standard deviation of Q3 for this participant is equal to zero. Pearson correlation coefficient of the samples of the variables x and y is defined as:

(1) $$r_{xy}={\sum_{i-1}^n(x_i-\bar{x})(y_i-\bar{y}) \over \sqrt{\sum_{i=1}^n(x_i-\bar{x})^2}\sqrt{\sum_{i=1}^n(y_i-\bar{y})^2}}$$

where n is sample size, x i and y i are the individual sample values, while $\overline{x}$ and $\overline{y}$ mean sample values. Obviously, if either x or y or both are constant, then r xy = 0/0 and therefore is undefined for constant variables. In general, this situation is rare. Constant values of variables are quite unlikely when inter-individual or cohort correlations are calculated, since cohort participants are different and therefore, variability of variables describing them are almost always guaranteed. Similarly, continuous variables, describing individual’s properties, e.g., weight, blood pressure, heart rate, etc. change in time and therefore, their standard deviations are not equal to zero and Pearson correlation coefficient is well defined. Unfortunately, in the case of ordinal variables describing answers to the questionnaires, constant values of variables for a given individual could be quite common. In part, it is the result of the way questionnaires are constructed. For example, the relatively high number of individuals for whom Q3 = 4 at all 25 time points is the result of the available answer Q3 = 4 - “4 or more times a night.” Obviously, Q3 = 4 would not be that prevalent if Q3 = 5,6,7 etc. were allowed to describe the exact number of voids per night. Similarly, constant values of Q1 would be less common if the number of voids were recorded instead of binning the number of voids as described above. Unfortunately, binning and categorization are quite typical for questionnaires. Using other types of correlations, e.g. Spearman does not solve this problem since the definition of Spearman correlation coefficient includes standard deviations of rank variables, which would be constant as well. In this paper, we examined and employed two approaches to deal with the situation described above. First is to define Pearson correlation for the special cases where one or both variables are constant. Second is to use cosine similarity measure instead of Pearson correlation.

Figure 1. Excerpts of the data matrices representing answers to questions Q1 and Q3. Columns represent patients. Rows represent consecutive days. Highlighted rows and columns illustrate how inter-individual and intra-individual correlations are calculated.

In the first approach, we defined Pearson correlation (r xy ) to be 0 if one of the items was constant and another was not. Such definition aligns with the intuitive interpretation of correlations, as it states that changes in one variable (not constant) are not associated with changes in another variable (constant). In the case where both items are constant, interpretations of change no longer apply, so instead we compared response ratings to the common midpoint of the range (e.g., C = Qmax/2 + 0.5, where Qmax is the maximum possible rating of the given item) and then considered two items to be similar (r xy = 1) if both are either above or below C, i.e. if (x-C)(y-C)>0, and dissimilar (r xy = −1), if their values lay on opposite sides of C, i.e. if (x-C)(y-C)<0. Note the non-integer value of C is selected to avoid the situation (x-C)(y-C) = 0. Such a definition of midpoint is reasonable because for most of the questions Q1–Q3, Q5–Q6, and Q8–Q10 (C = 2.5), values 0 and 1 correspond to absence of LUTS, 2 -to mild LUTS, and 3,4 to severe LUTS, while for Q4 and Q7 (C = 1.5) 0,1 -means absence and 2- means presence of LUTS.

In the second approach, we used the cosine similarity measure defined as cosine of the angle θ between two vectors x-C and y-C:

(2) $$cos\left(\theta \right)={\sum _{i=1}^{n}(x_{i}-C)(y_{i}-C) \over \sqrt{\sum _{i=1}^{n}({x_{i}}-C)^{2}}\cdot \sqrt{\sum _{i=1}^{n}({y_{i}}-C)^{2}}}$$

where C is the midpoint defined above. Note that this cosine similarity measure allows any or both variables to be constant, e.g., if y = const, eq (2) simplifies to the following

(3) $$cos\left(\theta \right)={\sum _{i=1}^{n}(x_{i}-C) \over \sqrt{n}\cdot \sqrt{\sum _{i=1}^{n}({x_{i}}-C)^{2}}}$$

while for both y = const and x = const, eq (3) further simplifies to:

(4) $$\eqalign{ & cos\left(\theta \right)={(x_{i}-C)(y_{i}-C) \over \sqrt{(x-C)^{2}}\cdot \sqrt{(y-C)^{2}}} \cr & \quad\;\;\;\,\; = \left\{\matrix{{\;\,\;1, \; if(x-C)(y-C)\gt 0} \cr {-1, \; if(x-C)(y-C)\gt 0}}\right.}$$

Note that eq (4) is equivalent to our definition of Pearson correlation, r xy in case both x and y are constant.

Figure 2 presents some examples of typical longitudinal individual profiles of answers to the above questions. Note that both measures return same or similar values of correlation/similarity for all examples except the lower right panel, which illustrates similarity of constant with partly overlapping double-step function, where r xy = 0, while cosine(xy) = 0.522. Given the qualitative similarity and some quantitative differences, we calculated and presented both similarity measures (cosine and correlation coefficient) in our analysis below. Distributions of the intra-individual similarity measures across individuals and distributions of cohort similarity measures across time points (days) were examined. Mean values and standard deviations of the distributions were calculated and compared. Extreme groups defined as those with strong positive and strong negative similarities (|rxy|>0.8, |cosine(xy)|> 0.8) in the longitudinal symptoms data were identified. All calculations using longitudinal symptom data were performed using MATLAB 2022a (MathWorks, Natick, MA). Testing of differences in demographics and baseline symptoms in pairwise comparison of the identified extreme groups (t-test) with false discovery (FDR) correction for multiple testing [Reference Benjamini and Hochberg13] was performed in SAS, version 9.4 (SAS Institute, Cary, NC).

Figure 2. Examples of Pearson correlations and cosine similarity measures for some simple typical profiles of ordinal variables. Note same or similar values of two measures for 5 out of 6 examples. Substantial difference between the measures observed only in example 6 (3rd column, 2nd row): r xy = 0, while cosine(xy) = 0.522, where r xy is Pearson correlation coefficient of x and y..

Results and discussion

The sample in our study included 234 participants. Mean age was 58.1, 48% were female, 87% were white, 67% had college degree. More details on demographics and baseline urinary symptoms are presented in Table 1.

Comparison of intra-individual and inter-individual cosine similarity measures and Pearson correlation coefficients

First, we tested if our data corroborates the results of Fisher et al [Reference Fisher, Medaglia and Jeronimus3], where intra-individual correlations between variables had much broader distributions than the inter-individual ones. Figure 3 provides the comparison of intra-individual and inter-individual (cohort) cosine similarity measures (A, B) and Pearson correlation coefficients (C, D). Upper triangles of the heatmaps (matrices) present the intra-individual values, while the lower triangles present the cohort values. Note that each element of the matrix represents a pair of questions, e.g., the element in the lower-left and upper-right corners of the heat maps characterize correlation/similarity between Q1 and Q10. The mean values of intra-individual (averaged across all participants) and cohort (averaged across time points) similarity measures are close, which is reflected by nearly the same coloring of upper and lower triangles in A and C and can be observed for each pair of questions by comparing the values of symmetrically located elements of the matrices in upper and lower triangles. The mean relative difference between the intra-individual and cohort measures across all pairs of questions is 25% for cosine similarity measure and 61% for Pearson correlation coefficient. Unlike the mean values, the standard deviations are quite different for the distributions of intra-individual and cohort measures, as seen by different values and different coloring of the lower and upper triangles of the heat maps (B, D). On average across the pairs of questions, the cosine similarity measure (B) of intra-individual distribution is 15-fold and Pearson correlation (D) distribution is 7-fold broader than that of the cohort measures. Therefore, our results corroborate for LUTS the observations of Fisher et al [Reference Fisher, Medaglia and Jeronimus3] made for psychological studies, where the typical distributions of intra-individual correlations were four-fold wider than inter-individual ones. This is an important generalization of Fisher’s results into the broader domain of medical studies indicating the importance of longitudinal data and intra-individual correlations for understanding etiology and phenotypes of diseases by examining the co-occurrence and correlation of symptoms in the individuals.

Figure 3. Comparison of intra-individual and inter-individual (cohort) similarity measures for answers to 10 questions (Q1–Q10) on urinary symptoms collected from 234 individuals during 25 consecutive days. A, C- mean values; B, D -standard deviations. A, B- cosine similarity measures; C, D -pearson correlations. Upper triangle in each matrix represents intra-individual measure, while lower triangle represents inter-individual measure. Differences in the intra- and inter- individual measures are visualized by using color in the heat maps. Note that standard deviations of intra-individual measures are much higher than inter-individual measures.

Examples of intra-individual and cohort distributions of similarity measures of longitudinal symptoms. Extreme groups based on similarity/dissimilarity of symptoms dynamics

Next, we compared intra-individual and cohort distributions of cosine similarities and Pearson correlations. Figure 4 demonstrates distributions of similarity measures for the Q1 (day-time urinary frequency) and Q3 (nocturia) pair of questions. Note that distributions of cohort measures, i.e., cosine similarity (panel B) and Pearson correlations (panel D), are quite narrow. On the contrary, the distributions of the intra-individual similarity measures (panels A and C) are broad and vary from strong negative to strong positive. Intra-individual Pearson correlation distribution demonstrates a high peak at zero, likely determined by the high number of participants with Q3 constant and Q1 nonconstant during the 25 days of the study, and by our definition of correlation for this special case. Note that intra-individual distribution of cosine similarities (panel A) is more uniform with values in the (-1,1) range indicating the different level of similarity in dynamics of Q1 and Q3 answers/symptoms in different participants. Importantly, both in cosine similarities (panel A) and Pearson correlations (panel C) histograms, there are participants with high absolute values of similarity measures < -0.8 and>0.8. For participants of these two extreme groups, the day-time frequency and nocturia are either strongly positively correlated (strongly similar) – group 1, or strongly negatively correlated (strongly dissimilar) – group 2. Note that the important information on the existence of these extreme groups is completely lost (averaged out) in the cohort distributions of the similarity measures.

Figure 4. Histograms of intra-individual and inter-individual (cohort) correlations and cosine similarities of variables Q1 and Q3. Q1- “During waking hours, how many times did you typically urinate?,” Q3-“During a typical night, how many times did you wake up and urinate?.” Cohort correlations are weak, while intra-individual correlations and cosine similarities are strong for some individuals. Distributions of intra-individual measures are much broader than those of inter-individual measures (see standard deviations of the distributions indicated in the panels).

Figure 5 presents mean daily responses to items Q1–Q10 for two extreme groups defined as those for whom cosine(Q1Q3)>0.8 (group 1 [n = 44], panel A) and cosine (Q1Q3) < -0.8 (group 2 [n = 14], panel B). Mean values of Q1–Q10 across the two groups are presented for all 25 days. Note that for group 1 (panel A) the day-time frequency score (bold blue line) is consistently higher than the nocturia score (bold yellow line), while for group 2 (panel B) the opposite is consistently true. Otherwise, the groups are similar, e.g. participants in both groups are incontinent 50% of the days and typically have less than one hour between the daytime voids.

Figure 5. Mean longitudinal profiles of variables Q1–Q10 for two extreme groups with strong positive (A) and strong negative (B) cosine similarities between Q1 and Q3. Q1 (day-time frequency) -bold blue line. Q3 (nocturia) -bold yellow line. Note that Q1 > Q3 consistently in the case of strong positive cosine similarity and that Q1 < Q3 consistently in case of strong negative cosine similarity.

Suppl. Figures 1-10 (Supplementary material 1) provide representative examples of cohort and intra-individual similarity measures as well as longitudinal profiles of Q1–Q10 symptoms for 5 more pairs of questions (symptoms): Q1Q7, Q1Q10, Q4Q5, Q4Q6, and Q7Q8. For all these pairs, cohort distributions are narrow with predominantly weak positive or weak negative correlations and cosine similarities. Intra-individual distributions are much broader and include large extreme groups with strong negative and strong positive correlations and cosine similarities. Note that extreme groups are composed of different participants with rather low overlap between groups based on the different pairs of questions (Table 3).

Table 3. Membership overlap between the extreme groups of participants. Groups based on the similarities in the dynamics of symptoms

Q1Q3n means group of participants for whom symptoms Q1 and Q3 have cosine similarity value < -0.8; Q1Q3p means group of participants for whom symptoms Q1 and Q3 have cosine similarity value>0.8; overlap between the groups is calculated as number of common participants divided by the number of participants in a smaller group in the pairwise comparison. The mean of overlap between the extreme groups is 0.31, while the maximum overlap of 0.89 is between two extreme groups with positive cosine similarities of Q4 Q6 and Q4 Q5 symptoms. Since Q5 (How often did you feel a sudden need to urinate?) and Q6 (Once you noticed the need to urinate, how difficult was it to wait more than a few minutes?) both represent the level of urinary urgency, it is not surprising that these two groups have high level of overlap.

Comparison of demographics and baseline symptoms in the identified extreme groups based on longitudinal similarity/dissimilarity

We examined if the above-identified extreme groups are driven by demographic differences or differences in baseline urinary symptoms. Suppl. Tables.xlsx file (Supplementary material 2) consists of tables comparing demographics and baseline symptoms for pairs of extreme groups based on strong positive and negative similarities of the dynamics of six pairs of symptoms Q1Q3, Q1Q7, Q1Q10, Q4Q6, Q4Q5, and Q7Q8. Significantly different variables (demographics and LUTS Tool symptoms) are highlighted in Suppl Tables (Supplementary material 2) and summarized in Table 4. Comparison of most demographic variables showed no difference across the extreme groups. An interesting exception is the percentage of males in Q7Q8 groups, which proved to be significantly different; predominantly female group has strongly dissimilar dynamics of Q7 (Did you leak urine or wet pad?) and Q8 (How often was your urine flow slow or weak?), while the predominantly male group demonstrated strong positive similarities of these symptoms’ dynamics ( significant sex difference between these groups survived FDR correction for multiple testing).

Table 4. Pairwise comparison of extreme groups based on strong negative and strong positive similarities of symptoms dynamics. Demographics and LUTS tool baseline symptoms (LT1-LT17, see Table 1). Only significant differences are indicated

Some pairs of groups (e.g., based on Q1Q3, Q1Q7) demonstrated no baseline symptom differences. Other pairs of extreme groups based on similarity measures (Q4Q5, Q4Q6, and Q7Q8) demonstrate significant differences in multiple baseline symptoms, with as many as 8 significantly different LUTS Tool symptoms for Q7Q8 groups.

The presence of extreme groups of participants with strong positive and strong negative correlations and cosine similarities in the longitudinal LUTS data indicates the existence of subtypes of LUTS with different associations and co-dependences of the symptoms. As seen in Table 4, some of these co-dependences might be associated with demographics and multiple baseline symptoms, while others might be unassociated with any of them. Therefore, these measures of co-occurrence and possible co-dependence of LUTS are complementary to the baseline data and could provide important insights for subtyping and understanding etiology of LUTS as well as for personalized evaluation and treatment decisions.

Limitations of the study

The 234 participants who answered questions about their urinary symptoms for 25 consecutive days, were not among the 1064 LUTS participants for whom the wealth of self-reported urinary and non-urinary symptoms, bladder diaries, and physical examination data were collected. Therefore, it was not possible to incorporate the above measures of co-occurrence and co-dependence of symptoms into the subtyping of over 1064 participants of LURN I participants [Reference Andreev, Helmuth and Liu7,Reference Andreev, Liu and Yang14Reference Helmuth, Smith and Glaser16]. This limitation has been overcome in the current cycle of the project (LURN II) [Reference Cameron, Yang and Bradley17], where every week symptoms data are collected for one year follow-up for over 800 men and women with urinary urgency, allowing for incorporation of co-occurrence measures into the subtyping procedure.

Conclusion

Group-to-individual generalizability for LUTS is demonstrated to be highly questionable. Distributions of intra-individual correlations and similarity measures of LUTS assessed during 25 consecutive days were 7-fold to 15-fold broader than the distributions of the inter-individual (cohort) measures. The presence of the extreme groups of individuals with strong positive and strong negative measures of LUTS correlations and co-occurrences indicates possible mechanistic differences in the subtypes of individuals with LUTS. Incorporation of intra-individual similarity measures into the subtyping of complex diseases might contribute to better understanding of disease mechanisms and adaptation patterns and to the development of personalized treatments. Note that we do not claim the absence of group-to-individual generalizability for all medical conditions. Importantly, however, we demonstrated that the lack of group-to-individual generalizability is not limited to psychological disorders but could be important for other medical conditions as well. Performing only cross-sectional analysis of study cohorts can obscure important symptom correlations for individuals that may be critically important for personalized evaluation and treatment. Group-to-individual generalizability depends on the level of heterogeneity of the disease and of the population [Reference Adolf and Fried18Reference King, Dueck and Revicki20]. The higher the heterogeneity of the disease the more important the subtyping and the study of intra-individual correlations of the symptoms in the longitudinal data. Subtyping of common complex diseases leads to identification of more homogeneous subgroups of individuals for whom group-to-individual generalizability is therefore more justifiable but still requires caution and would benefit from the complementary analysis of individual disease trajectories.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/cts.2025.10113.

Acknowledgements

This is publication number 44 of the Symptoms of LURN. The following individuals were instrumental in the planning and conduct of this study at each of the participating institutions: Duke University, Durham, North Carolina (DK097780): PIs: Cindy Amundsen, MD, J. Eric Jelovsek, MD; Co-Is: Kathryn Flynn, PhD, Jim Hokanson, PhD, Aaron Lentz, MD, David Page, PhD, Nazema Siddiqui, MD, Todd Harshbarger, PhD, Michael Odom, PhD, Chad Gridley, MD, Jordan Foreman, MD, Tara Morgan, MD; Study Coordinators: Magaly Guerrero, BSc, Stephanie Yu, Annika Sinha, MD. University of Iowa, Iowa City, Iowa (DK097772): PIs: Catherine S Bradley, MD, MSCE, Karl Kreder, MD, MBA; Co-Is: Bradley A. Erickson, MD, MS, Daniel Fick, MD, Vince Magnotta, PhD, Philip Polgreen, MD, MPH; Study Coordinator: Nancy Hollenbeck. University of Chicago, Chicago, Illinois (DK097779): PIs: C. Emi Bretschneider, MD, James W Griffith, PhD, Kimberly Kenton, MD, MS, Brian Helfand, MD, PhD; Co-Is: David Cella, PhD, Julia Geynisman-Tan, MD, Alex Glaser, MD, Margaret Mueller, MD, Francesca Farina, PhD, Richard Fantus, MD, Devin Boehm, BS; Study Coordinators: Melissa Marquez, MBA, Malgorzata Antoniak, PhD, Pooja Talaty, MS, Karen John, Jinxuan Shi, MA, Tara Samsel, BS. Dr Helfand, Dr Glaser, Dr Antoniak, and Ms. Talaty are at Endeavor Health. Dr Bretschneider, Dr Cella, Dr Geynisman-Tan, Mr. Boehm, Ms. Marquez, and Ms. John are at Northwestern Medicine. University of Michigan Health System, Ann Arbor, Michigan (DK099932): PI: J Quentin Clemens, MD, FACS, MSCI; Co-Is: John DeLancey, MD, Dee Fenner, MD, Rick Harris, MD, Steve Harte, PhD, Anne P. Cameron, MD, Aruna Sarma, PhD, Giulia Ippolito, MD, Priyanka Gupta, MD, Whitney Horner, MD; Jannah Thompson, MD; Payton Schmidt, MD; Study Coordinators: Greg Mowatt, BA, Sarah Richardson, BS, Sneha Mathai, BS, Syedah Mubeenah, MBBS. University of Washington, Seattle, Washington (DK100011): PI: Claire Yang, MD; Co-Is: Anna Kirby, MD, Swati Rane Levendovszky, PhD, Sreya Gutta; Study Coordinators: Brenda Vicars, RN, Holly Covert. Washington University in St Louis, St Louis, Missouri (DK100017): PI: H. Henry Lai, MD; Co-Is: Joshua Shimony, MD, PhD, Fuhai Li, PhD; Study Coordinators: Patricia Hayden, BSN, Aleksandra Klim, RN, MHS, CCRC. Arbor Research Collaborative for Health, Data Coordinating Center (DK099879): PI: John Graff, PhD, MS; Co-Is: Victor Andreev, PhD, DSc, Brenda Gillespie, PhD; Project Manager: Jessica Durkin, M.Ed, MBA; Clinical Monitors: Melissa Sexton, BA, CCRP, Julia Nashif, BA, CCRP; Research Analysts: Brian Bieber, MPH, Nathan Goodrich, MS, Calvin Andrews, MS, Ting Lu, MS. Project Associate: Christine Remski. National Institute of Diabetes and Digestive and Kidney Diseases, Division of Kidney, Urology, and Hematology, Bethesda, Maryland: Project Scientist: Ziya Kirkali, MD; Project Officer: Christopher Mullins, PhD.

Author contributions

Victor Andreev: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Supervision, Validation, Visualization, Writing-original draft, Writing-review & editing; Caroline Smerdon: Data curation, Formal analysis, Writing-review & editing; Brian Bieber: Data curation, Formal analysis, Writing-review & editing; Abigail Smith: Data curation, Formal analysis, Methodology, Writing-review & editing; Kathryn Flynn: Data curation, Investigation, Writing-review & editing; J. Quentin Clemens: Data curation, Investigation, Writing-review & editing; David Cella: Data curation, Methodology, Writing-review & editing; Claire Yang: Data curation, Investigation, Writing-review & editing; Ziya Kirkali: Conceptualization, Investigation, Methodology, Writing-review & editing; Kevin Weinfurt: Conceptualization, Data curation, Investigation, Writing-original draft, Writing-review & editing.

Funding statement

This study is supported by the National Institute of Diabetes & Digestive & Kidney Diseases through cooperative agreements (grants DK097780, DK097772, DK097779, DK099932, DK100011, DK100017, DK099879). Research reported in this publication was supported at Northwestern University, in part, by the National Institutes of Health’s National Center for Advancing Translational Sciences, Grant Number UL1TR001422. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

Competing interests

The authors declare none.

References

Molenaar, PCM, Campbell, CG. The new person-specific paradigm in psychology. Curr Dir Psychol Sci. 2009;18:112117.Google Scholar
Molenaar, PCM. A manifesto on psychology as idiographic science: bringing the person back into scientific psychology, this time forever. Meas- Interdiscip Res. 2004;2:201218.Google Scholar
Fisher, AJ, Medaglia, JD, Jeronimus, BF. Lack of group-to-individual generalizability is a threat to human subjects research. Proc Natl Acad Sci U S A. 2018;115:e6106e6115.Google Scholar
Yang, CC, Weinfurt, KP, Merion, RM, Kirkali, Z, LURN Study Group. Symptoms of lower urinary tract dysfunction research network. J Urol. 2016;196:146152.Google Scholar
Coyne, KS, Sexton, CC, Kopp, Z, et al. Assessing patients’ descriptions of lower urinary tract symptoms (LUTS) and perspectives on treatment outcomes: results of qualitative research. Int J Clin Pract. 2010;64:12601278.Google Scholar
Coyne, KS, Barsdorf, AI, Thompson, C, et al. Moving towards a comprehensive assessment of lower urinary tract symptoms (LUTS). Neurourol Urodyn. 2012;31:448454.Google Scholar
Andreev, VP, Helmuth, ME, Liu, G, et al. Subtyping of common complex diseases and disorders by integrating heterogeneous data. Identifying clusters among women with lower urinary tract symptoms in the LURN study. PLoS ONE. 2022;17:e0268547.Google Scholar
Coyne, KS, Sexton, CC, Thompson, CL, et al. The prevalence of lower urinary tract symptoms (LUTS) in the USA, the UK and Sweden: results from the epidemiology of LUTS (EpiLUTS) study. BJU Int. 2009;104:352360.Google Scholar
Irwin, DE, Milsom, I, Hunskaar, S, et al. Population-based survey of urinary incontinence, overactive bladder, and other lower urinary tract symptoms in five countries: results of the EPIC study. Eur Urol. 2006;50:13061314.Google Scholar
Cameron, AP, Lewicky-Gaupp, C, Smith, AR, et al. Baseline lower urinary tract symptoms in patients enrolled in LURN: a prospective, observational cohort study. JUrol. 2018;199:10231031.Google Scholar
Flynn, KE, Mansfield, SA, Smith, AR, et al. Can 7 or 30-day recall questions capture self-reported lower urinary tract symptoms accurately? J Urol. 2019;202:770778.Google Scholar
Weinfurt, KP, Griffith, JW, Flynn, KE, et al. The comprehensive assessment of self-reported urinary symptoms: a new tool for research on subtypes of patients with lower urinary tract symptoms. J Urol. 2019;201:11771183.Google Scholar
Benjamini, Y, Hochberg, Y. Controlling the false discovery rate–a practical and powerful approach to multiple testing. J Royal Statistical Society, Ser B. 1995;57:289300.Google Scholar
Andreev, VP, Liu, G, Yang, CC, et al. Symptom-based clustering of women in the symptoms of lower urinary tract dysfunction research network (LURN). Observational cohort study. J Urol. 2018;200:13231331.Google Scholar
G.Liu, G, Andreev, VP, Helmuth, ME, et al. Symptom-based clustering of men in the symptoms of lower urinary tract dysfunction research network (LURN) observational cohort study. J Urol. 2019;202:12301239.Google Scholar
Helmuth, ME, Smith, AR, Glaser, AP, et al. Phenotyping men with lower urinary tract symptoms: results from the symptoms of lower urinary tract dysfunction research network. Neurourol Urodyn. 2025;44:178193.Google Scholar
Cameron, AP, Yang, CC, Bradley, CS, et al. Symptoms of lower urinary tract dysfunction research network (LURN): an introduction to the urinary urgency phenotyping protocol LURN II. Neurourol Urodyn. 2024;43:18001808.Google Scholar
Adolf, JK, Fried, EI. Ergodicity is sufficient but not necessary for group-to-individual generalizability. Proc Natl Acad Sci U S A. 2019;116:65406541.Google Scholar
Cella, D, Bullinger, M, Scott, C, Barofsky, I. Group vs individual approaches to understanding the clinical significance of differences or changes in quality of life. Mayo Clin Proc. 2002;77:384392.Google Scholar
King, MT, Dueck, AC, Revicki, DA. Can methods developed for interpreting group-level patient-reported outcome data be applied to individual patient management? Med Care. 2019;57:S38S45.Google Scholar
Figure 0

Table 1. Demographics and baseline urinary symptoms for the cohort of 234 participants. Mean (std)

Figure 1

Table 2. Subset of ten Comprehensive Assessment of Self-Reported Urinary Symptoms questions answered by the participants during the 25 consecutive days

Figure 2

Figure 1. Excerpts of the data matrices representing answers to questions Q1 and Q3. Columns represent patients. Rows represent consecutive days. Highlighted rows and columns illustrate how inter-individual and intra-individual correlations are calculated.

Figure 3

Figure 2. Examples of Pearson correlations and cosine similarity measures for some simple typical profiles of ordinal variables. Note same or similar values of two measures for 5 out of 6 examples. Substantial difference between the measures observed only in example 6 (3rd column, 2nd row): rxy = 0, while cosine(xy) = 0.522, where rxy is Pearson correlation coefficient of x and y..

Figure 4

Figure 3. Comparison of intra-individual and inter-individual (cohort) similarity measures for answers to 10 questions (Q1–Q10) on urinary symptoms collected from 234 individuals during 25 consecutive days. A, C- mean values; B, D -standard deviations. A, B- cosine similarity measures; C, D -pearson correlations. Upper triangle in each matrix represents intra-individual measure, while lower triangle represents inter-individual measure. Differences in the intra- and inter- individual measures are visualized by using color in the heat maps. Note that standard deviations of intra-individual measures are much higher than inter-individual measures.

Figure 5

Figure 4. Histograms of intra-individual and inter-individual (cohort) correlations and cosine similarities of variables Q1 and Q3. Q1- “During waking hours, how many times did you typically urinate?,” Q3-“During a typical night, how many times did you wake up and urinate?.” Cohort correlations are weak, while intra-individual correlations and cosine similarities are strong for some individuals. Distributions of intra-individual measures are much broader than those of inter-individual measures (see standard deviations of the distributions indicated in the panels).

Figure 6

Figure 5. Mean longitudinal profiles of variables Q1–Q10 for two extreme groups with strong positive (A) and strong negative (B) cosine similarities between Q1 and Q3. Q1 (day-time frequency) -bold blue line. Q3 (nocturia) -bold yellow line. Note that Q1 > Q3 consistently in the case of strong positive cosine similarity and that Q1 < Q3 consistently in case of strong negative cosine similarity.

Figure 7

Table 3. Membership overlap between the extreme groups of participants. Groups based on the similarities in the dynamics of symptoms

Figure 8

Table 4. Pairwise comparison of extreme groups based on strong negative and strong positive similarities of symptoms dynamics. Demographics and LUTS tool baseline symptoms (LT1-LT17, see Table 1). Only significant differences are indicated

Supplementary material: File

Andreev et al. supplementary material 1

Andreev et al. supplementary material
Download Andreev et al. supplementary material 1(File)
File 240.9 KB
Supplementary material: File

Andreev et al. supplementary material 2

Andreev et al. supplementary material
Download Andreev et al. supplementary material 2(File)
File 35 KB