Measuring the "Termites" Adult IQs: The Concept Mastery IQ Test
In 1940, when the "termites" were in their latter twenties to early thirties, Dr. Terman and his associates faced the task of measuring the Termites' adult IQs. Terman, et al, decided to develop a new IQ test for this purpose called the Concept Mastery Test (CMT). This is a 190-question, high-range test, that was sufficiently difficult that the highest raw score made by any Termite was 172. The average score of the Termites on this test was 96, with the men averaging 98 and the women averaging 94. Nine hundred fifty-four (954) of the original 1,528 Termites took the CMT in 1940. Dr. Quinn McNemar, who contributed to the statistical analysis of the data, estimated that the average raw score of the average adult on the CMT (later dubbed the CMT-A) would be 2, and that the Termites ranked about 2.1 standard deviations above the mean. Dr. Terman insinuated that a standard deviation of 2.5 was more in line with the Termite's average of 96, and this is what later researchers seem to have accepted. This conclusion that the Termites' average score of 96 is 2.5 standard deviations above the average (= 2) for the general population implies a standard deviation on this test of 94/2.5 of 37.6 above the population mean of 2
Fifteen sample synonym/antonym questions and ten sample analogies problems are given by Drs. Terman and Oden in their "Genetic Studies of Genus", Vol. 4, "The Gifted Child Grows Up", pg. 127. There's a total of 120 synonym/antonym questions and 70 verbal analogies problems, for a grand total of 190 questions. The total score on the synonym/antonym subtest is the number right minus the number wrong, and the total score for the analogies subtest is the number right minus one-half the number wrong... i. e., you're penalized for bad guesses. The total time allocated to answering all 190 questions is 30 minutes, allowing about 9 seconds per question.
....This test was validated using 81 Wilson
College freshmen-and-women (I won't say fresh women - grin), 135 Stanford
Sophomores, and 96 Stanford seniors. The mean score made by the 954 Termites who
took this test in 1940 was 96/190.
....Quinn McNemar analyzed the data and concluded that the mean for the CMT if given to the general population (in 1940) would have been about 2 correct answers, with a standard deviation of 45. That means that a raw score of 92 would have corresponded to a 2-sigma level among the general population. Consequently, he Termites' average score of 96 would represent an IQ of about 2.1 sigma above the mean, or using their standard deviation of 16, it would have corresponded to a deviation IQ of about 134. This is to be contrasted with their childhood average ratio IQ of 152. But...
(1) Equivalent Wechsler IQ Scores
The equivalent Wechsler (deviation) "IQs"* of the Terman children, running from 140 to 201 on the Stanford-Binet test), would have ranged from 133 to 170. Dr. Terman calculated the children's average childhood Stanford-Binet IQ to be 151, corresponding to a Wechsler "IQ" of 141, or 2.75 standard deviations above the mean. (*Of course, we know now that Wechsler "IQs" are not actually IQs, but are the natural logarithms of IQs).
(2) Test-Retest Regression to the Mean
Because of the average test-retest regression to the mean that occurs on tests like the Stanford-Binet, the Stanford-Binet IQs of the Termites would have ranged from about 135 to about 196 (corresponding to Wechsler "IQs" of 130 to 167). For example, on a test like the 1937 Revision of the Stanford-Binet that had two comparable versions of the test, L and M, a child could very easily get a 138 on one of the versions and a 140 on the other. This suggests to me that something like 1,000 additional children would have scored 140 or above if the tests had been administered again a few months later. And this suggests that the Terman study captured only about 10% of the eligible children in the 140-144 IQ range.
(3) Did the Termites IQs Decline When They Grew Up?
As adults, the Termites were given two IQ tests developed specifically for the purpose of measuring their adult IQs. Because of resource limitations, the norming of these tests was necessarily quite limited, with no IQ-equivalent scales published where I could find them. The first of these, the Concept Mastery Test-A (CMT-A) was developed and administered in 1940 for an 18-19 year follow-up study of the Termites. On this test, the average score of the Termites was about 96, with 190 questions on the test, and with a standard deviation of about 30. Quinn McNemar estimates that this average score was about 2.1 standard deviations above the mean IQ of 100. However, the follow-up study in 1950 raises this estimate to 2.5 standard deviations. Dr. Terman estimates that the test-retest regression to the mean would have subtracted at least 8 points from the Termites' scores on the CM due solely to measurement errors and not to any change in actual capability. That would have brought the Termites' average childhood score down to a Stanford-Binet score of 143, corresponding to about 2.4 standard deviations above the population mean of 100.
Conclusion: There was little or no decline in the IQ scores of the adult Termites from their childhood IQ scores. They were just as smart as adults as they had been as children. (However, as we'll see, there must have been a lot of changes at the individual level as the Termites matured.)
One interesting consideration is that of the impact of a ln-normal distribution of IQ scores upon the IQ scores of the Termite after they reached adulthood. Did their IQs drop?
For the "main group" of 621 children drawn from an estimated population of 168,000 San Francisco Bay Area schoolchildren, I calculate an average IQ of 152. (Dr. Terman arrives at an average IQ of 151 for his combined group.) Taking the natural logarithm of 1.52, we get 0.42 (deviation iQ = 142), and dividing this by a standard deviation of 0.15, we get 2.8 s for the average IQ of this group.
The IQs of a group with extreme IQ scores will tend to regress toward the mean if they are retested even on the same IQ test. Dr. Terman writes (Genetic Studies of Genius, Volume IV, The Gifted Chiild Grows Up, "Twenty-Five Years' Follow-Up of a Gifted Group", Terman and Oden, Stanford University Press, 1947, pg. 137),
" The reader should understand that when a group of subjects is selected on the basis of very high scores on any test with less than perfect reliabilitiy (which includes all psychological tests), and these subjects are later retested by the same test or by any other, the retest scores will show some regression toward the mean of the general population. This will occur even if the second test follows the first by only a month or a week, provided there has been no carry-over of practice effect from the first test to the second. McNemar estimates that in the case f tests as reliable as the Stanford-Binet, the regression of t retest scores would be equivalent to about 4 or 5 IQ points for subjects of IQ 140 or above on the first test. That is, this amount of regression would be due to errors of measurement and would have no bearing on the question of change of mental status. If the second test, in addition to being imperfectly reliable, should not measure exactly the same mental functions as the first test, this would result in further regression having nothing to do with mental change. It is doubtful if there is any test so reliable and so nearly equivalent to the Stanford-Binet in the functions it measures that it would not show a regression of something like 8 IQ points for subjects selected as the gifted group was selected. This means that the gifted subjects could be expected to show an apparent drop of about 0.5 S. D. on the C.M. test without having regressed at all in the mental functions originally tested."
One implication of this would seem to me to be that if we were to retest this "main group" of children a few months later on a different form of the Stanford-Binet, they would average about 5 points lower in their IQ scores, with the children who scored highest on the first test scoring more than 5 points lower on the retest, since regression to the mean is more pronounced the higher the IQ becomes. So in addition to the fact that the Terman screening presumably missed 70% of the children who would have qualified with IQs of 140 or above, we have the fact that, loosely speaking, the children in the 140-144 IQ bracket would have scored in the 135 to 139 range on the retest, and other children who had scored in the 135 to 139 range would now score in the 140 to 144 tier. Of course, this is a gross oversimplification. Children would be moving up and down above and below the entry level ratio-IQ of 140, and among brackets within the group itself.
IQ to Deviation IQ Conversion Charts in the WISC Manual
John Scoville has written in an email,
"First and more importantly, the WISC manual contains charts appropriate for converting from deviation to ratio iq and vice versa. The percentiles for various ratio iqs obtained in this manner agree (for the most part) with the Sare data."
So this is another, and official set of tables relating ratio IQs to deviation "IQs", and presumably, further corroborating the ln-normal character of the IQ distribution curve.
One fact that is puzzling is that I would think that there would be many a statistician who would have explored this idea a long time ago. And that may yet have happened. I haven't heard of it, but then I'm not a psychometrist.
On the other hand, I've generally had to dig out for myself what's really going on. None of the irregularities in the Terman Study arithmetic has shown up in any of the literature I've found. But we'll see. Letters have been mailed, and more will follow.
multi-part letters to Steve and to Patrick exploring what they've
added to the discussion of the IQ distribution, hasn't left me
much time to continue the write-up of the evidence supporting
the hypothesis that the IQ distribution is ln-normal.
The Terman Study
Suffice it to say that I think the Terman Study is seriously under-represented in its lowest three echelons (IQs 140 to 154). Whether you use the Gaussian distribution that Terman would have expected, or whether you use a ln-normal distribution, the Terman cohort should have included about 40% (if it were Gaussian) to 70% (if it is ln-normal) more Termites than it does. Clearly, the eligible children were out there. They just weren't included. Also, the bottom IQ ranges, 140-144, 145-149, and 150-154 shouldn't have had enrollment ratios like 160:150:134. At a 140 IQ, the IQ curve is falling away too fast for this. The ratios should have been more like 425:225:134.
In the lowest three brackets, the shape of his curve is a dead giveaway that something's not cricket.
The upper strata of his data fit a ln-normal curve quite well . . five orders of magnitude better at the upper end than a Gaussian distribution.
The Army General Classification Test
I won't include it here just yet but the Army General Classification Test (AGCT) data also fit well. This data was collected for 2,400,000 test-takers (presumably mostly or entirely Army recruits), so it represents the screening of a large population. For example, it assigned ratio IQ's of 170 to about 1 in 4,500 test-takers, corresponding to a z-score of about 3.535 sigma, or a Wechsler IQ of about 153. That's within a fraction of a point of the ln-normal prediction.
The Sare Data
The Sare data give fair and away the best fit to a ln-normal curve. The question is: where did Geoffrey Sare get his data? I've prepared a letter to the University of London, and will send it as soon as I've cleared it with others who are involved in this matter.
Children with 200+ IQs
The ln-normal formula predicts that about 1 in every 521,000 children should exhibit a ratio iQ of 200+. I'm unable to verify that directly and closely, but can only observe that in the three cases that are available to me -- Ruth Duskin Feldman"s "Whatever Became of the Quiz Kids", Leta Hollingworth's "Children Above 180 IQ", and Miraca Gross', "Exceptionally Gifted Children") -- the results are reasonable for the presumed size of the population. Among the anomalies are the fact that Miraca Gross should have found more than 1,000 children with IQs of 160+ in the large population that she screened, and instead (through no fault of hers, of course), found only 30, 3 of whom had IQs of 200+. Similarly, Leta Hollingsworth ransacked the New York metropolitan area for over twenty years and was only able to find and follow 12 children with IQs of 180+. These agreements for IQs of 200+ set a lower limit on the number of children with IQs of 200+ in the three representative populations, but there's no guarantee that there couldn't be more. Only exhaustive screenings with ratio IQ tests can settle this question definitvely.
Pushing the Envelope
If we do, in fact, verify the validity of a ln-normal distribution as the proper distribution for IQs, the next step will be to push the limits of this discovery. For example, there's the Flynn Effect. I think I've shown that the Flynn Effect increases with IQ up to the age of 11 through 1987 (since Christopher Otway had an IQ in excess of 200 as measured on the 1973 revision of the Stanford-Binet). But Dr. Flynn's latest interpretation of the underlying mechanism for the Flynn Effect--that children with IQs at the highest level mentally mature earlier than they did in the past, and that their adult mental ages are liittle or no higher than they were in the past--implies that the adult IQ distribution has become strongly skewed to the right, with the average adult much closer to the brightest adult on the basis of absolute intelligence than was the case 100 years ago. Determining the shape of the IQ curve (and mental age curve) is, I should think, key to testing Dr. Flynn's hypothesis. One way to approach this might be to measure the shape of the ratio IQ distribution for African-Americans. Since the African-American mean IQ continues to fall about one standard deviation below the Caucasian I. Q. distribution, their ratio IQs can be measured to higher levels on existing tests. Clearly, their IQs have been affected no more and no less than those of other ethnic groups. The mean IQ of their distribution has shifted to the right by the same 3 points per decade as has the mean IQ of the Caucasians. (They're where the Caucasians were in 1952.) Then there's the matter that the Belgians have IQs that are about 13 points higher than the U. S. mean. The Belgians are where the U. S. will be in 40 more years, if the Flynn Effect continues its steady march.