Thursday, July 19, 2012

Dealing with test complexity

People have a limited ability to make complex judgments without the support of computers and explicit decision rules. This fact has been well-known for many years. An often cited classic is a paper by Miller [12] . Expert judgments of many kinds, including the assessment of job applicants, have confirmed this general principle   [3; 8] . There are some interesting exceptions in special cases, if the experts get fast and clear feedback based on valid theory [9] .  These conditions are rarely present in the assessment of job applicants.

It is usual for judges to come to different conclusions if the information they use is complex and extensive - a common situation. Furthermore, assessments tend to vary over time. At the same time that we have these limitations in our judgment capacity, we have a tendency to fall prey to an illusion. The more information we get, the more confident we are - but beyond a modest limit, judgments become worse as in formation increases. See Fig. 1. 

Figure 1.  Decision quality as a function of amount of information. 

Most personality tests give a complicated picture of a person. This is reasonable since everyone "knows" that people are complicated. Popular tests provide results for 30-40 dimensions. It is likely that such abundance of information is popular due to the information illusion discussed above.  More information makes us more confident. Research has, however, shown that explicit rules for combining formation gives better results. Such a rule can simply be based on the decision maker's own systematic strategy, so-called boot-strapping [7] , or explicitly judged importance weights. The use of weights is an effective way of answering the question: "How do I interpret this test result?" The alterative approach is use a holistic evaluation based on the pattern of results. Holism has traditionally had a strong position in the interpretation of test results, but it cannot be justified on empirical and scientific grounds [14]

Subjective interpretation typically results in narrative texts which may be very credible, due to a number of psychological factors. Such factors have been discussed as enabling "cold reading", i.e. credible inferences about a person, which lack factual basis [13] . Historical examples show how credibility of the Rorschach test was established  by "wizards" who could seemingly produce surprisingly correct statements about a person on the basis of responses to  that test [18] , in spite of the fact that this test, as well as other projective techniques have been found to lack validity [6; 10] . I give two examples of research, which illustrate how illusory credibility may be established.
The Forer effect. Flattering texts, which are full of statements which are generally true  and which say "both A and its Opposite B" are perceived  as very accurate. Forer showed this in a classic study a long time ago [5] ; results which have been replicated many times [4; 16] .  

Forer gave a group of students a "test" which he said would reveal their personalities. After some time a returned with narrative texts said to be based on the responses to the test. Each students got his or her text, but they were all the same. They were asked to judge how well the texts described their personalities. About 90 % said that the texts fitted very well. Here is what they got (typical astronomical texts):

"You have a need for other people to like and admire you, and yet you tend to be critical of yourself. While you have some personality weaknesses you are generally able to compensate for them. You have considerable unused capacity that you have not turned to your advantage. Disciplined and self-controlled on the outside, you tend to be worrisome and insecure on the inside. At times you have serious doubts as to whether you have made the right decision or done the right thing. You prefer a certain amount of change and variety and become dissatisfied when hemmed in by restrictions and limitations. You also pride yourself as an independent thinker; and do not accept others' statements without satisfactory proof. But you have found it unwise to be too frank in revealing yourself to others. At times you are extroverted, affable, and sociable, while at other times you are introverted, wary, and reserved. Some of your aspirations tend to be rather unrealistic. "

MBTI and PPA excel in using statements of this type , and they provide popular reading for those who have taken the tests. They are perceived to be almost perfectly accurate and to give self insights, but they simply flatter [15]  and/or confirm already existing self beliefs. Once credibility is established the tester can give important advice about selection, team composition and personal development. No research exists, which shows such advice to be useful, but since the test report is so persuasive the advice is probably also believed.

The "Draw-a-man"-effect". The draw-a-man test is credible to many users although it has no demonstrated validity
[17] . This is because of common-sense thinking about what various aspect of a drawing could mean. Example: large muscles mean problem with male self-image, large eyes imply paranoid tendencies, etc. Inn addition, there is selective memory of cases which supported these speculations, the others are forgotten or explained away [1; 2] .

The UPP test deals with complexity with aggregate variables, which are linear composites of selected subscales. Extensive research, over a period of 50 years,  has shown that this approach is superior to subjective integration of information [8; 11] . For a reveiew of work on UPP, click here.


[1]. Chapman, L. J., & Chapman, J. P. (1967). Genesis of popular but erroneous psychodiagnostic observations. Journal of Abormal Psychology, 73, 193-204.

[2]. Chapman, L. J., & Chapman, J. P. (1969). Illusory correlation as an obstacle to the use of valid psychodiagnostic signs. Journal of Abnormal Psychology, 74, 271-280.

[3]. Dawes, R. M., Faust, D., & Meehl, P. E. (1989). Clinical versus actuarial judgment. Science, 243, 1668-1674.

[4]. Dickson, D. H., & Kelly, I. W. (1985). The 'Barnum Effect in Personality Assessment: A Review of the Literature. Psychological Reports 57, 367-382.

[5]. Forer, B. R. (1949). The fallacy of personal validation: a classroom demonstration of gullibility. Journal of Abnormal & Social Psychology, 44, 118-123.

[6]. Garb, H. N., Lilienfeld, S. O., & Wood, J. M. (2004). Projective techniques and behavioral assessment. In S. N. Haynes & E. M. Heiby (Eds.), Comprehensive handbook of psychological assessment, Vol. 3: Behavioral assessment (pp. 453-469). Hoboken, NJ, US: John Wiley & Sons Inc.

[7]. Goldberg, L. R. (1970). Man versus model of man: A rationale plus some evidence for a method of improving clinical inferences. Psychological Bulletin, 73, 422-432.

[8]. Grove, W. M., & Meehl, P. E. (1996). Comparative efficiency of informal (subjective, impressionistic) and formal (mechanical, algorithmic) prediction procedures: The clinical-statistical controversy. Psychology, Public Policy, and Law, 2, 293-323.

[9]. Kahneman, D., & Klein, G. (2009). Conditions for intuitive expertise: A failure to disagree. [doi:10.1037/a0016755]. American Psychologist, 64, 515-526.

[10]. Lilienfeld, S. O., Wood, J. M., & Garb, H. N. (2000). The scientific status of projective techniques. Psychological Science in the Public Interest, 1, 27-66.

[11]. Meehl, P. E. (1954). Clinical versus statistical prediction: A theoretical analysis and a review of the evidence. Minneapolis: University of Minnesota Press.

[12]. Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63, 81-97.

[13]. Rowland, I. (2005). The full facts book of cold reading, 4th edition. London: Full Facts Books.

[14]. Ruscio, J. (2002). The emptiness of holism. Skeptical Inquirer, 26, 46-50.

[15]. Thiriart, P. (1991). Acceptance of personality test results. Skeptical Inquirer, 15, 166-172.

[16]. Trankell, A. (1961). Magi och förnuft i människobedömning. Stockholm: Bonnier.

[17]. Willcock, E., Imuta, K., & Hayne, H. (2011). Children’s human figure drawings do not measure intellectual ability. [doi:10.1016/j.jecp.2011.04.013]. Journal of Experimental Child Psychology, 110, 444-452.

[18]. Wood, J. M., Nezworski, M. T., Lilienfeld, S. O., & Garb, H. N. (2003). What's wrong with the Rorschach?: Science confronts the controversial inkblot test. San Francisco, CA, US: Jossey-Bass.


  1. I think the popularity of the MBTI is not entirely due to its flattery. It does have an intuitive appeal, and it comes with an overall theory of personality as well. If you read forums on this there are plenty of people who are typed as ENTP, ESTP and similar impulsive types who seem aware of what negative outcomes impulsivity predicts. So it's more than just flattery.

  2. You may be right that there is more to it than flattery...


Free counter and web stats