Sidor

Wednesday, August 22, 2012

Successfully dealing with faking on a self-report personality test



Faking on self-report personality tests is common and a strong drawback of such tests. Many approaches have been tried to counteract this serious source of error, see e.g. a recent papers in the Journal of Applied Psychology (Bangerter, Roulin, & König, 2012; Fan, et al., 2012).

The UPP test (Sjöberg, 2010/2012) is a self-report personality test and as such it is vulnerable to faking in high-stakes testing situations. However, this test uses a simple but powerful methodology for correcting test scores for faking. It measures separately two social desirability (SD) dimensions, one overt (similar to the classical Crowne-Marlowe scale (Crowne & Marlowe, 1960)) and one covert. The covert scale uses items similar to conventional personality items but selected for their strong correlation with the overt scale. The two scales are highly correlated and give similar results when used to correct test scales for faking.

The correction procedure uses regression models where each test scale in turn is the dependent variable and the SD scales are independent variables. It is necessary to fit a new model for each test scale because the different scales are related to SD in different ways, correlations varying widely. The corrected test scales are the residuals in these regression models.

This procedure gives corrected test scales which correlate zero with SD. So far, so good, but does it also work? In other words, can it be validated on empirical data? One way to validated it is to study groups tested under different levels of involvement, from incumbents where test results have no consequences, to applicants where they do, and consequences are very important. In a recent study of applicants to the officers' training program in the Swedish Army, I had a chance to study this question, using the UPP test and its SD scales. (Previous studies had given similar results). Data were available for 5 groups:

A. Norm
B. Incumbents
C. Applicants (low consequences of test results)
D. Applicants (moderate consequences)
E. Applicants (high-stakes testing)

I expected increasing SD scale values in the order A - E. I also expected test scales to have the same rank order, if they were sensitive to SD, such as emotional stability. Finally, I expected the group differences in emotional stability to vanish if the test data were corrected for faking using the two SD scales (and a multiple regression model). For the results, see Figs. 1 and 2 below, and Table 1. 


Fig. 1. Means of SD scales


Fig. 2. Means of emotional stability before and after SD correction



Tabell 1. Mean values of emotional stability (standardized scales), uncorrected and corrected data, effect size and one-way ANOVA of group differences.
Grupp
Before correction
Corrected for SD
A. Norm
-0.25
-0.05
B. Incumbents
0.05
0.07
C. Applicants (low consequences of test results)
0.43
0.28
D. Applicants (moderate consequences)
0.56
0.06
E. Applicants (high-stakes testing)
0.73
0.11
Effect size (eta2)
0.147
0.006
One-way ANOVA
F(4,1638) = 70.693, p < 0.0005
F(4,1828) = 2.763, p = 0.026

Note that the effect size decreased to about 5 %.

In other work on leader effectiveness, using 360 degrees feedback as criterion, I found that the validities of the test scales increased after correction for SD according to the same method (Sjöberg, Bergman, Lornudd, & Sandahl, 2011), see Fig. 3. 

Fig. 3. Validities of uncorrected and corrected persnality scales


In conclusion, a simple method for correction for faking has been found to successfully remove about 95 % of the variance due to SD in test responses, and such a method increased the validity of the test scores against an external criterion. 

It is often argued that SD scales really measure "personality", such as need for approval, and not a tendency to distort responses. However, the present results strongly refute this view. It is very plausible that different levels of consequences of testing should lead to different levels of motivation for impression management, but unlikely that they should result in different levels of some personality dimension such as need for approval.

References

Bangerter, A., Roulin, N., & König, C. J. (2012). Personnel selection as a signaling game. [doi:10.1037/a0026078]. Journal of Applied Psychology, 97, 719-738.
Crowne, D. P., & Marlowe, D. (1960). A new scale of social desirability independent of psychopathology. Journal of Consulting and Clinical Psychology, 24, 349-354.
Fan, J., Gao, D., Carroll, S. A., Lopez, F. J., Tian, T. S., & Meng, H. (2012). Testing the efficacy of a new procedure for reducing faking on personality tests within selection contexts. [doi:10.1037/a0026655]. Journal of Applied Psychology, 97, 866-880.
Sjöberg, L. (2010/2012). A third generation personality test (SSE/EFI Working Paper Series in Business Administration No. 2010:3). Stockholm: Stockholm School of Economics.
Sjöberg, L., Bergman, D., Lornudd, C., & Sandahl, C. (2011). Sambandet mellan ett personlighetstest och 360-graders bedömningar av chefer i hälso- och sjukvården. (Relationship between a personality test and 360 degrees judgments of health care managers). Stockholm: Karolinska Institute, Institutionen för lärande, informatik, management och etik (LIME).

Tuesday, August 14, 2012

Validity of integrity tests


Traditionally the view has been that integrity tests (actually honesty tests) have very high validity, based on an early meta-analysis (Ones, Viswesvaran, & Schmidt, 1993). Some skeptical comments have pointed out that many of the studies in this meta-analysis came directly from reports from test vendors. Yet the high validity of integrity tests it has become an established truth, and a basis for an entire industry producing integrity tests, based on Schmidt and Hunter (1998) who wrote that the g-factor + integrity is the best basis for prediction of work performance. This is probably wrong.

A current and updated meta-analysis clearly shows that validities of integrity tests are not higher than 0.2, perhaps as low as 0.1 (Van Iddekinge, Roth, Raymark, & Odle-Dusseau, 2012a, 2012b), even if they are corrected for measurement error in criteria and range restriction in the test. The earlier estimates were at level 0.4, i.e. higher than the standard personality test. It appears now that the skeptics have been right: the high validities come from test providers' own information, independent research does not confirm therm. A rather high value of validity can be obtained with self-ratings of counterproductive behavior at work, but this is not very interesting.

This is an example of how early meta-analysis can result in errors. Van Iddekinge et al. have published a very  ambitious project. The result is clear. Integrity test seems not to have significant practical value. And then we have not even discussed that such tests can easily be faked..

References

One, DS, Viswesvaran, C., & Schmidt, FL (1993). Comprehensive meta-analysis of integrity test validities: findings and implications for personnel selection and theories of job performance. Journal of Applied Psychology Monograph, 78, 679-703.

Schmidt, F. L. & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124, 262-274.

Van Iddekinge, CH, Roth, PL, Raymark, PH, & Odle-Dusseau, HN (2012a). The criterion-related validity of integrity tests: An updated meta-analysis. [Doi: 10.1037/a0021196]. Journal of Applied Psychology, 97 (3), 499-530.

Van Iddekinge, CH, Roth, PL, Raymark, PH, & Odle-Dusseau, HN (2012b). The critical role of the research question, inclusion criteria the, and transparency in meta-Analyses of integrity test research: A reply to Harris et al. (2012) and Ones, Viswesvaran, and Schmidt (2012). [Doi: 10.1037/a0026551]. Journal of Applied Psychology, 97 (3), 543-549.

Friday, August 10, 2012

Optimal combination of personality and intelligence


Personality and intelligence are both related to job performance, but how should they be weighted for optimal results? The most straightforward approach is a linear combination, and indeed there is little evidence for other types of models. Once this is decided the final question is what weights should be given to the two types of information, in order to maximize predictive efficiency. It is well-known that they tend to be uncorrelated, hence the crucial question is how valid they are in relation to job performance criteria. Intelligence, or GMA (the g factor) correlates around 0.6 with job performance (Schmidt & Hunter, 1998). "Personality" is a less stringent term, and could mean many things. However, I shall take personality as referring to an optimal index of subscales, and such indices have been found to correlate around 0.55 with job performance (de Colli, 2011; Sjöberg, 2010; Sjöberg, Bergman, Lornudd, & Sandahl, 2011), after correction for measurement errors in criteria and range restriction in the independent variable (Schmidt, Shaffer, & Oh, 2008). Hence, intelligence and personality, in this sense, are equally efficient as predictors and an evidence-based strategy is to treat them that way, with equal weights.

It should be noted that the usual Big Five dimensions are much weaker predictors of job performance, as shown in a number of meta-analyses (Barrick, Mount, & Judge, 2001). To get an efficient personality predictor it is necessary to form an index based on focused and narrow scales (Bergner, Neubauer, & Kreuzthaler, 2010; Christiansen & Robie, 2011; Sjöberg, 2010/2012). Big Five personality tests are not sufficient for optimal prediction of job performance.


References

Barrick, M. R., Mount, M. K., & Judge, T. A. (2001). Personality and performance at the beginning of the new millennium: What do we know and where do we go next? International Journal of Selection and Assessment, 9, 9-30.
Bergner, S., Neubauer, A. C., & Kreuzthaler, A. (2010). Broad and narrow personality traits for predicting managerial success. [doi:10.1080/13594320902819728]. European Journal of Work and Organizational Psychology, 19, 177-199.
Christiansen, N. D., & Robie, C. (2011). Further consideration of the use of narrow trait scales. [doi:10.1037/a0023069]. Canadian Journal of Behavioural Science/Revue canadienne des sciences du comportement, 43, 183-194.
de Colli, D. (2011). Ett nytt svenskt arbetspsykologiskt test och arbetsprestation inom polisen – samtidig validitet: Mälardalens högskola, Akademin för hållbar samhälls- och teknikutveckling.
Schmidt, F. L., & Hunter, J. E. (1998). The validity and utility of selection methods in personnel psychology: Practical and theoretical implications of 85 years of research findings. Psychological Bulletin, 124, 262-274.
Schmidt, F. L., Shaffer, J. A., & Oh, I.-S. (2008). Increased accuracy for range restriction corrections: Implications for the role of personality and general mental ability in job and training performance. Personnel Psychology, 61, 827-868.
Sjöberg, L. (2010). Upp-testet och kundservice: Kriteriestudie. Forskningsrapport 2010:6. Stockholm: Psykologisk Metod AB.
Sjöberg, L. (2010/2012). A third generation personality test (SSE/EFI Working Paper Series in Business Administration No. 2010:3). Stockholm: Stockholm School of Economics.
Sjöberg, L., Bergman, D., Lornudd, C., & Sandahl, C. (2011). Sambandet mellan ett personlighetstest och 360-graders bedömningar av chefer i hälso- och sjukvården. Stockholm: Karolinska Institutet, Institutionen för lärande, informatik, management och etik (LIME).
Free counter and web stats