Hostname: page-component-745bb68f8f-f46jp Total loading time: 0 Render date: 2025-01-28T23:30:26.922Z Has data issue: false hasContentIssue false

Corrections for Criterion Reliability in Validity Generalization: A False Prophet in a Land of Suspended Judgment

Published online by Cambridge University Press:  10 April 2015

James M. LeBreton*
Affiliation:
Pennsylvania State University
Kelly T. Scherer
Affiliation:
Purdue University
Lawrence R. James
Affiliation:
Georgia Institute of Technology
*
E-mail: james.lebreton@psu.edu, Address: Department of Psychology, Pennsylvania State University, 140 Moore Building, University Park, PA 16802

Abstract

The results of meta-analytic (MA) and validity generalization (VG) studies continue to be impressive. In contrast to earlier findings that capped the variance accounted for in job performance at roughly 16%, many recent studies suggest that a single predictor variable can account for between 16 and 36% of the variance in some aspect of job performance. This article argues that this “enhancement” in variance accounted for is often attributable not to improvements in science but to a dumbing down of the standards for the values of statistics used in correction equations. With rare exceptions, applied researchers have suspended judgment about what is and is not an acceptable threshold for criterion reliability in their quest for higher validities. We demonstrate a statistical dysfunction that is a direct result of using low criterion reliabilities in corrections for attenuation. Corrections typically applied to a single predictor in a VG study are instead applied to multiple predictors. A multiple correlation analysis is then conducted on corrected validity coefficients. It is shown that the corrections often used in single predictor studies yield a squared multiple correlation that appears suspect. Basically, the multiple predictor study exposes the tenuous statistical foundation of using abjectly low criterion reliabilities in single predictor VG studies. Recommendations for restoring scientific integrity to the meta-analyses that permeate industrial–organizational (I–O) psychology are offered.

Type
Focal Article
Copyright
Copyright © Society for Industrial and Organizational Psychology 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Aguinis, H. A. (2013). Performance management (3rd ed.). Upper Saddle River, NJ: Prentice Hall.Google Scholar
Aiken, L. R. (1988). Psychological testing and assessment (6th ed.). Boston, MA: Allyn and Bacon.Google Scholar
Allen, M. J., & Yen, W. M. (1979). Introduction to measurement theory. Prospect Heights, IL: Waveland Press.Google Scholar
American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing. Washington, DC: Author.Google Scholar
Anastasi, A. (1968). Psychological testing (3rd ed.). New York, NY: The MacMillan Company.Google Scholar
Austin, J. T., & Villanova, P. (1992). The criterion problem: 1917-1992. Journal of Applied Psychology, 77, 836874.Google Scholar
Barrick, M. R., & Mount, M. K. (1991). The big five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44, 125.Google Scholar
Bhaskar-Shrinivas, P., Harrison, D. A., Shaffer, M., & Luk, D. M. (2005). Input-based and time-based models of international adjustment: Meta-analytic evidence and theoretical extensions. Academy of Management Journal, 48, 257281.CrossRefGoogle Scholar
Binning, J. F., & Barrett, G. V. (1989). Validity of personnel decisions: A conceptual analysis of the inferential and evidential bases. Journal of Applied Psychology, 74, 478494.Google Scholar
Binning, J. F., & LeBreton, J. M. (2009). Coherent conceptualization is useful for many things, and understanding validity is one of them. Industrial and Organizational Psychology: Perspectives on Science and Practice, 2, 486492.Google Scholar
Borman, W. C. (1991). Job behavior, performance, and effectiveness. In Dunnette, M. D., & Hough, L. M. (Eds.), Handbook of industrial and organizational psychology (2nd ed., Vol 2, pp. 271–326). Palo Alto, CA: Consulting Psychologists Press.Google Scholar
Bowling, N. A., Eschleman, K. J., Wang, Q., Kirkendall, C., & Alarcon, G. (2010). A meta-analysis of the predictors and consequences of organization-based self-esteem. Journal of Occupational and Organizational Psychology, 83, 601626.Google Scholar
Bradshaw, F. F. (1930). The American Council on Education rating scale. Archives of Psychology, 199, 80.Google Scholar
Campion, M. A., Campion, J. E., & Hudson, J. P. Jr. (1994). Structured interviewing: A note on incremental validity and alternative question types. Journal of Applied Psychology, 79, 9981002.Google Scholar
Cardy, R. L., & Dobbins, G. H. (1994). Performance appraisal: Alternative perspectives. Cincinnati, OH: South-Western.Google Scholar
Cascio, W. F., & Aguinis, H. (2005). Applied psychology in human resource management (6th ed.). Upper Saddle River, NJ: Prentice Hall.Google Scholar
Chiaburu, D. S., & Harrison, D. A. (2008). Do peers make the place? Conceptual synthesis and meta-analysis of coworker effects on perceptions, attitudes, OCBs, and performance. Journal of Applied Psychology, 93, 10821103.CrossRefGoogle ScholarPubMed
Colquitt, J. A., Conlon, D. E., Wesson, M. J., Porter, C. O. L. H., & Ng, K. Y. (2001). Justice at the millennium: A meta-analytic review of 25 years of organizational justice research. Journal of Applied Psychology, 86, 425445.CrossRefGoogle ScholarPubMed
Colquitt, J. A., Scott, B. A., & LePine, J. A. (2007). Trust, trustworthiness, and trust propensity: A meta-analytic test of their unique relationships with risk taking and job performance. Journal of Applied Psychology, 92, 909927.Google Scholar
Cooper, W. H. (1981). Ubiquitous halo. Psychological Bulletin, 90, 218244.CrossRefGoogle Scholar
Cooper-Hakim, A., & Viswesvaran, C. (2005). The construct of work commitment: Testing an integrative framework. Psychological Bulletin, 131, 241259.Google Scholar
Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behavioral measurements: Theory of generalizability for scores and profiles. New York, NY: John Wiley & Sons.Google Scholar
DeShon, R. P. (1998). A cautionary note on measurement error corrections in structural equation models. Psychological Measurement, 3, 412423.Google Scholar
DeShon, R. P. (2003). A generalizability theory perspective on measurement error corrections in validity generalization. In Murphy, K. R. (Ed.), Validity generalization: A critical review (pp. 365402). Mahwah, NJ: Lawrence Erlbaum.Google Scholar
Erdogan, B., Bauer, T. N., Truxillo, D. M., & Mansfield, L. R. (2012). Whistle while you work: A review of the life satisfaction literature. Journal of Management, 38, 10381083.Google Scholar
Feldman, J. M. (1981). Beyond attribution theory: Cognitive processes in performance appraisal. Journal of Applied Psychology, 66, 127148.Google Scholar
Ford, M. T., Cerasoli, C. P., Higgins, J. A., & Decesare, A. L. (2011). Relationships between psychological, physical, and behavioural health and work performance: A review and meta-analysis. Work & Stress, 25, 185204.Google Scholar
Fried, Y. (1991). Meta-analytic comparison of the job diagnostic survey and job characteristics inventory as correlates of work satisfaction and performance. Journal of Applied Psychology, 76, 690697.Google Scholar
Fried, Y., Shirom, A., Gilboa, S., & Cooper, C. L. (2008). The mediating effects of job satisfaction and propensity to leave on role stress-job performance relationships: Combining meta-analysis and structural equation modeling. International Journal of Stress Management, 15, 305328.CrossRefGoogle Scholar
Ghiselli, E. E., & Brown, C. W. (1955). Personnel and industrial psychology (2nd ed.). New York, NY: McGraw-Hill.Google Scholar
Ghiselli, E. E., Campbell, J. P., & Zedeck, S. (1981). Measurement theory for the behavioral sciences. San Francisco, CA: W. H. Freeman & Company.Google Scholar
Guilford, J. P. (1936). Psychometric methods. New York, NY: McGraw-Hill.Google Scholar
Guilford, J. P. (1954). Psychometric methods (2nd ed.). New York, NY: McGraw-Hill.Google Scholar
Guilford, J. P., & Fruchter, B. (1973). Fundamental statistics in education and psychology (5th ed.). New York, NY: McGraw-Hill.Google Scholar
Gulliksen, H. (1950). Theory of mental tests. New York, NY: John Wiley & Sons.Google Scholar
Hermelin, E., & Robertson, I. T. (2001). A critique and standardization of meta-analytic validity coefficients in personnel selection. Journal of Occupational and Organizational Psychology, 74, 253277.Google Scholar
Hoobler, J. M., Hu, J., & Wilson, M. (2010). Do workers who experience conflict between the work and family domains hit a “glass ceiling”?: A meta-analytic investigation. Journal of Vocational Behavior, 77, 481494.Google Scholar
Huffcutt, A. I., & Arthur, W. Jr. (1994). Hunter and Hunter (1984) revisited: Interview validity for entry-level jobs. Journal of Applied Psychology, 79, 184190.Google Scholar
Huffcutt, A. I., Roth, P. L., & McDaniel, M. A. (1996). A meta-analytic investigation of cognitive ability in employment interview evaluations: Moderating characteristics and implications for incremental validity. Journal of Applied Psychology, 81, 459473.Google Scholar
Hunter, J. E. (1983). Test validation for 12,000 jobs: An application of job classification and validity generalization analysis to the general aptitude test battery. USES test research report no. 45. Division of Counseling and Test Development Employment and Training Administration. Washington, DC: U.S. Department of Labor.Google Scholar
Hunter, J. E., & Hunter, R. F. (1984). Validity and utility of alternative predictors of job performance. Psychological Bulletin, 96, 7298.Google Scholar
Hunter, J. E., & Schmidt, F. L. (2004). Methods of meta-analysis: Correcting error and bias in research findings. Thousand Oaks, CA: Sage.Google Scholar
James, L. R. (1973). Criterion models and construct validity for criteria. Psychological Bulletin, 80, 7583.Google Scholar
James, L. R., & LeBreton, J. M. (2012). Assessing the implicit personality through conditional reasoning. Washington, DC: American Psychological Association.Google Scholar
James, L. R., Mulaik, S. A., & Brett, J. M. (1982). Causal analysis: Assumptions, models, and data. Beverly Hills, CA: Sage.Google Scholar
Joseph, D. L., & Newman, D. A. (2010). Emotional intelligence: An integrative meta-analysis and cascading model. Journal of Applied Psychology, 95, 5478.CrossRefGoogle ScholarPubMed
Lance, C. E., Butts, M. M., & Michels, L. C. (2006). The sources of four commonly reported cutoff criteria: What did they really say? Organizational Research Methods, 9, 202220.CrossRefGoogle Scholar
Landy, F. J. (1985). Psychology of work behavior (3rd ed.). Homewood, IL: Dorsey Press.Google Scholar
Landy, F. J. (1986). Stamp collecting versus science: Validation as hypothesis testing. American Psychologist, 41, 11831192.Google Scholar
Landy, F. J., & Farr, J. L. (1980). Performance ratings. Psychological Bulletin, 87, 72107.Google Scholar
Lanier, L. H. (1927). Prediction of the reliability of mental tests and tests of special abilities. Journal of Experimental Psychology, 10, 69113.Google Scholar
Latham, G. P., & Wexley, K. N. (1981). Increasing productivity through performance appraisal. Reading, MA: Addison-Wesley.Google Scholar
LeBreton, J. M., Burgess, J. R. D., Kaiser, R. B., Atchley, E. K. P., & James, L. R. (2003). The restriction of variance hypothesis and interrater reliability and agreement: Are ratings from multiple sources really dissimilar? Organizational Research Methods, 6, 80128.CrossRefGoogle Scholar
LeBreton, J. M., & Senter, J. L. (2008). Answers to twenty questions about interrater reliability and interrater agreement. Organizational Research Methods, 11, 815852.CrossRefGoogle Scholar
LeBreton, J. M., & Tonidandel, S. (2008). Multivariate relative importance: Extending relative weight analysis to multivariate criterion spaces. Journal of Applied Psychology, 93, 329345.Google Scholar
Lilienfeld, S. O., Wood, J. M., & Garb, H. N. (2000). The scientific status of projective techniques. Psychological Science in the Public Interest, 1, 2766.Google Scholar
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.Google Scholar
Margenau, H. (1950). The nature of physical reality. New York, NY: McGraw Hill.Google Scholar
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Lawrence Erlbaum.Google Scholar
McNemar, Q. (1962). Psychological statistics (3rd ed.). New York, NY: John Wiley & Sons.Google Scholar
Meriac, J. P., Hoffman, B. J., Woehr, D. J., & Fleisher, M. S. (2008). Further evidence for the validity of assessment center dimensions: A meta-analysis of the incremental criterion-related validity of dimension ratings. Journal of Applied Psychology, 93, 10421052.CrossRefGoogle ScholarPubMed
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741749.Google Scholar
Michel, J. S., Mitchelson, J. K., Kotrba, L. M., LeBreton, J. M., & Baltes, B. B. (2009). A comparative test of work-family conflict models and critical examination of work-family linkages. Journal of Vocational Behavior, 74, 199218.CrossRefGoogle Scholar
Morgeson, F. P., Campion, M. A., Dipboye, R. L., Hollenbeck, J. R., Murphy, K., & Schmitt, N. (2007). Reconsidering the use of personality tests in personnel selection contexts. Personnel Psychology, 60, 683729.Google Scholar
Muchinsky, P. M. (1996). The correction for attenuation. Educational and Psychological Measurement, 56, 6375.Google Scholar
Murphy, K. R., & Balzer, W. K. (1989). Rater errors and rating accuracy. Journal of Applied Psychology, 74, 619624.Google Scholar
Murphy, K. R., & Cleveland, J. N. (1995). Understanding performance appraisal: Social, organizational, and goal-based perspectives. Thousand Oaks, CA: Sage.Google Scholar
Murphy, K. R., & Davidshofer, C. O. (2005). Psychological testing: Principles and applications (6th ed.). Upper Saddle River, NJ: Prentice Hall.Google Scholar
Murphy, K. R., & DeShon, R. (2000a). Interrater correlations do not estimate the reliability of job performance ratings. Personnel Psychology, 53, 873900.Google Scholar
Murphy, K. R., & DeShon, R. (2000b). Progress in psychometrics: Can industrial and organizational psychology catch up? Personnel Psychology, 53, 913924.Google Scholar
Murphy, K. R., & Shiarella, A. H. (1997). Implications of the multidimensional nature of job performance for the validity of selection tests: Multivariate frameworks for studying test validity. Personnel Psychology, 50, 823854.Google Scholar
Nunnally, J. C. (1978). Psychometric theory (2nd ed.). New York, NY: McGraw-Hill.Google Scholar
Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York, NY: McGraw-Hill.Google Scholar
Oh, I., Wang, G., & Mount, M. K. (2011). Validity of observer ratings of the five-factor model of personality traits: A meta-analysis. Journal of Applied Psychology, 96, 762773.Google Scholar
Ones, D. S., Viswesvaran, C., & Schmidt, F. L. (1993). Comprehensive meta-analysis of integrity test validities: Findings and implications for personnel selection and theories of job performance. Journal of Applied Psychology, 78, 679703.Google Scholar
Oswald, F. L., & McCloy, R. A. (2003). Meta-analysis and the art of the average. In Murphy, K. R. (Ed.), Validity generalization: A critical review (pp. 311338). Mahwah, NJ: Lawrence Erlbaum.Google Scholar
Ployhart, R. E., Schneider, B., & Schmitt, N. (2006). Staffing organizations: Contemporary practice and theory (3rd ed.). Mahwah, NJ: Lawrence Erlbaum.Google Scholar
Putka, D. J., Le, H., McCloy, R. A., & Diaz, T. (2008). Ill-structured measurement designs in organizational research: Implications for estimating interrater reliability. Journal of Applied Psychology, 93, 959981.Google Scholar
Riggle, R. J., Edmondson, D. R., & Hansen, J. D. (2009). A meta-analysis of the relationship between perceived organizational support and job outcomes: 20 years of research. Journal of Business Research, 62, 10271030.Google Scholar
Rockstuhl, T., Dulebohn, J. H., Ang, S., & Shore, L. M. (2012). Leader-member exchange (LMX) and culture: A meta-analysis of correlates of LMX across 23 countries. Journal of Applied Psychology, 97, 10971130.CrossRefGoogle ScholarPubMed
Rosenthal, R. (1984). Meta-analysis procedures for social research. Beverly Hills, CA: Sage.Google Scholar
Rothstein, H. R., Schmidt, F. L., Erwin, F. W., Owens, W. A., & Sparks, C. P. (1990). Biographical data in employment selection: Can validities be made generalizable? Journal of Applied Psychology, 75, 175184.Google Scholar
Rotundo, M., & Sackett, P. R. (2002). The relative importance of task, citizenship, and counterproductive performance to global ratings of job performance. Journal of Applied Psychology, 87, 6680.Google Scholar
Rugg, H. O. (1921). Is the rating of human character practicable? Journal of Educational Psychology, 12, 425438.Google Scholar
Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bulletin, 88, 413428.Google Scholar
Salas, E., Rozell, D., Mullen, B., & Driskell, J. E. (1999). The effect of team building on performance: An integration. Small Group Research, 30, 309329.Google Scholar
Schmidt, F. L. (1992). What do data really mean?: Research findings, meta-analysis, and cumulative knowledge in psychology. American Psychologist, 47, 11731181.Google Scholar
Schmidt, F. L., & Hunter, J. E. (1977). Development of a general solution to the problem of validity generalization. Journal of Applied Psychology, 62, 529540.Google Scholar
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124, 262274.Google Scholar
Schmidt, F. L., Hunter, J. E., & Outerbridge, A. N. (1986). Impact of job experience and ability on job knowledge, work sample performance, and supervisory ratings of job performance. Journal of Applied Psychology, 71, 432439.CrossRefGoogle Scholar
Schmidt, F. L., Viswesvaran, C., & Ones, D. S. (2000). Reliability is not validity and validity is not reliability. Personnel Psychology, 53, 901912.Google Scholar
Schmitt, N., & Chan, D. (1998). Personnel selection: A theoretical approach. Thousand Oaks, CA: Sage.Google Scholar
Seymour, R. T. (1988). Why plaintiffs' counsel challenge tests, and how they can successfully challenge the theory of “validity generalization.” Journal of Vocational Behavior, 33, 331364.CrossRefGoogle Scholar
Smith, P. C. (1976). Behavior, results, and organizational effectiveness: The problem of the criteria. In Dunnette, M. D. (Ed.), Handbook of industrial and organizational psychology (pp. 745775). Chicago, IL: Rand-McNally College.Google Scholar
Society for Industrial and Organizational Psychology (2003). Principles for the validation and use of personnel selection procedures (4th ed.). Bowling Green, OH: Author.Google Scholar
Spearman, C. (1910). Correlation calculated with faulty data. British Journal of Psychology, 3, 271295.Google Scholar
Symonds, P. M. (1931). Diagnosing personality and conduct. New York, NY: D. Appleton-Century Company, Inc.Google Scholar
Thomas, J. P., Whitman, D. S., & Viswesvaran, C. (2010). Employee proactivity in organizations: A comparative meta-analysis of emergent proactive constructs. Journal of Occupational and Organizational Psychology, 83, 275300.Google Scholar
Thorndike, E. L. (1920). A constant error in psychological ratings. Journal of Applied Psychology, 4, 2529.Google Scholar
Thouless, R. H. (1939). The effects of errors of measurement on correlation coefficients. British Journal of Psychology: General Section, 29, 383403.Google Scholar
Tornau, K., & Frese, M. (2013). Construct clean-up in proactivity research: A meta-analysis on the nomological net of work-related proactivity concepts and their incremental validities. Applied Psychology: An International Review, 62, 4496.Google Scholar
Viswesvaran, C., Ones, D. S., & Schmidt, F. L. (1996). Comparative analysis of the reliability of job performance ratings. Journal of Applied Psychology, 81, 557574.Google Scholar
Weiner, E. A., & Stewart, B. J. (1984). Assessing individuals: Psychological and educational tests and measurements. Boston, MA: Little, Brown, & Company.Google Scholar
Winne, P. H., & Belfry, M. J. (1982). Interpretive problems when correcting for attenuation. Journal of Educational Measurement, 19, 125134.Google Scholar
Womer, F. B. (1968). Basic concepts in testing. Boston, MA: Houghton Mifflin Company.Google Scholar