Measuring the
"Termites" Adult IQs: The Concept Mastery IQ Test
In 1940,
when the "termites" were in their latter twenties to early thirties,
Dr. Terman and his associates faced the task of measuring the Termites'
adult IQs. Terman, et al, decided to develop a new
IQ test for this purpose called the Concept
Mastery Test (CMT). This is a 190question,
highrange test, that was sufficiently difficult that the highest raw score made
by any Termite was 172. The average score of
the Termites on this test was 96, with the
men averaging 98 and the women averaging 94.
Nine hundred fiftyfour (954) of the
original 1,528 Termites took the CMT in 1940.
Dr. Quinn McNemar, who contributed to the statistical analysis of the data,
estimated that the average raw score of the average adult on the CMT (later
dubbed the CMTA) would be 2, and that
the Termites ranked about 2.1 standard
deviations above the mean. Dr. Terman insinuated that a standard deviation of 2.5
was more in line with the Termite's average of 96,
and this is what later researchers seem to have accepted. This conclusion that
the Termites' average score of 96 is 2.5
standard deviations above the average (= 2)
for the general population implies a standard deviation on this test of 94/2.5
of 37.6 above the population mean of 2
Fifteen sample
synonym/antonym questions and ten sample analogies problems are given by Drs.
Terman and Oden in their "Genetic Studies of Genus", Vol. 4, "The
Gifted Child Grows Up", pg. 127. There's a total of 120 synonym/antonym
questions and 70 verbal analogies problems, for a grand total of 190 questions.
The total score on the synonym/antonym subtest is the number right minus the
number wrong, and the total score for the analogies subtest is the number right
minus onehalf the number wrong... i. e., you're penalized for bad guesses. The
total time allocated to answering all 190 questions is 30 minutes, allowing
about 9 seconds per question.
....This test was validated using 81 Wilson
College freshmenandwomen (I won't say fresh women  grin), 135 Stanford
Sophomores, and 96 Stanford seniors. The mean score made by the 954 Termites who
took this test in 1940 was 96/190.
....Quinn McNemar analyzed the data and concluded that the mean for the CMT if
given to the general population (in 1940) would have been about 2 correct
answers, with a standard deviation of 45. That means that a raw score of 92
would have corresponded to a 2sigma level among the general population.
Consequently, he Termites' average score of 96 would represent an IQ of about
2.1 sigma above the mean, or using their standard deviation of 16, it would have
corresponded to a deviation IQ of about 134. This is to be contrasted with their
childhood average ratio IQ of 152. But...


What This
Section Concludes:
(1) Equivalent
Wechsler IQ Scores
The equivalent Wechsler (deviation) "IQs"*
of the Terman children, running from 140 to 201
on the StanfordBinet test), would have ranged from 133
to 170. Dr. Terman calculated the children's average childhood
StanfordBinet IQ to be 151, corresponding
to a Wechsler "IQ" of 141, or 2.75
standard deviations above the mean. (*Of
course, we know now that Wechsler "IQs" are not actually IQs, but are
the natural logarithms of IQs).
(2) TestRetest
Regression to the Mean
Because of the average testretest regression to the
mean that occurs on tests like the StanfordBinet, the StanfordBinet IQs of the
Termites would have ranged from about 135 to
about 196 (corresponding to Wechsler
"IQs" of 130 to 167). For example,
on a test like the 1937 Revision of the StanfordBinet that had two comparable
versions of the test, L and M, a child could very easily get a 138
on one of the versions and a 140 on the
other. This suggests to me that something like 1,000 additional children would
have scored 140 or above if the tests had
been administered again a few months later. And this suggests that the Terman
study captured only about 10% of the eligible children in the 140144 IQ range.
(3) Did the Termites
IQs Decline When They Grew Up?
As adults, the Termites were given two IQ tests developed
specifically for the purpose of measuring their adult IQs. Because of resource
limitations, the norming of these tests was necessarily quite limited, with no
IQequivalent scales published where I could find them. The first of these, the
Concept Mastery TestA (CMTA) was developed and administered in 1940 for an
1819 year followup study of the Termites. On this test, the average score of
the Termites was about 96, with 190 questions on the test, and with a standard
deviation of about 30. Quinn McNemar estimates that this average score was about
2.1 standard deviations above the mean IQ of 100. However, the followup study
in 1950 raises this estimate to 2.5 standard deviations. Dr. Terman estimates
that the testretest regression to the mean would have subtracted at least 8
points from the Termites' scores on the CM due solely to measurement errors
and not to any change in actual capability. That would have brought the
Termites' average childhood score down to a StanfordBinet score of 143,
corresponding to about 2.4 standard deviations above the population mean of
100.
Conclusion:
There was little or no
decline in the IQ scores of the adult Termites from their childhood IQ scores.
They were just as
smart as adults as they had been as children.
(However, as we'll see, there must have been
a lot of changes at the individual level as the Termites matured.)
Discussion
One interesting
consideration is that of the impact of a lnnormal distribution
of IQ scores upon the IQ scores of the Termite after they reached
adulthood. Did their IQs drop?
For the "main group" of 621 children
drawn from an estimated population of 168,000 San Francisco Bay
Area schoolchildren, I calculate an average IQ of 152. (Dr. Terman
arrives at an average IQ of 151 for his combined group.) Taking
the natural logarithm of 1.52, we get 0.42 (deviation iQ = 142),
and dividing this by a standard deviation of 0.15, we get 2.8
s
for the average IQ of this group.
The IQs of a group with extreme IQ scores will
tend to regress toward the mean if they are retested even on the
same IQ test. Dr. Terman writes (Genetic Studies of Genius,
Volume IV, The Gifted Chiild Grows Up, "TwentyFive Years'
FollowUp of a Gifted Group", Terman and Oden, Stanford
University Press, 1947, pg. 137),
" The reader should understand that when
a group of subjects is selected on the basis of very high scores
on any test with less than perfect reliabilitiy (which includes
all psychological tests), and these subjects are later retested
by the same test or by any other, the retest scores will show
some regression toward the mean of the general population.
This will occur even if the second test follows the first by only
a month or a week, provided there has been no carryover of practice
effect from the first test to the second. McNemar estimates that
in the case f tests as reliable as the StanfordBinet, the regression
of t retest scores would be equivalent to about 4 or 5 IQ points
for subjects of IQ 140 or above on the first test. That is,
this amount of regression would be due to errors of measurement
and would have no bearing on the question of change of mental
status. If the second test, in addition to being imperfectly
reliable, should not measure exactly the same mental functions
as the first test, this would result in further regression having
nothing to do with mental change. It is doubtful if there is any
test so reliable and so nearly equivalent to the StanfordBinet
in the functions it measures that it would not show a regression
of something like 8 IQ points for subjects selected as the gifted
group was selected. This means that the gifted subjects could
be expected to show an apparent drop of about 0.5 S. D. on the
C.M. test without having regressed at all in the mental functions
originally tested."
One implication of this would seem to me to
be that if we were to retest this "main group" of children
a few months later on a different form of the StanfordBinet,
they would average about 5 points lower in their IQ scores, with
the children who scored highest on the first test scoring more
than 5 points lower on the retest, since regression to the mean
is more pronounced the higher the IQ becomes. So in addition to
the fact that the Terman screening presumably missed 70% of the
children who would have qualified with IQs of 140 or above, we
have the fact that, loosely speaking, the children in the 140144
IQ bracket would have scored in the 135 to 139 range on the retest,
and other children who had scored in the 135 to 139 range would
now score in the 140 to 144 tier. Of course, this is a gross oversimplification.
Children would be moving up and down above and below the entry
level ratioIQ of 140, and among brackets within the group itself.
Ratio
IQ to Deviation IQ Conversion Charts in the WISC Manual
John Scoville has written in an email,
"First and more importantly, the WISC manual contains
charts appropriate for converting from deviation to ratio iq and vice versa.
The percentiles for various ratio iqs obtained in this manner agree
(for the most part) with the Sare data."
So this is another, and official set of tables
relating ratio IQs to deviation "IQs", and presumably,
further corroborating the lnnormal character of the IQ distribution
curve.
One fact that is puzzling is that I would think
that there would be many a statistician who would have explored
this idea a long time ago. And that may yet have happened. I haven't
heard of it, but then I'm not a psychometrist.
On the other hand, I've generally had to dig
out for myself what's really going on. None of the irregularities
in the Terman Study arithmetic has shown up in any of the literature
I've found. But we'll see. Letters have been mailed, and more
will follow.
Writing long,
multipart letters to Steve and to Patrick exploring what they've
added to the discussion of the IQ distribution, hasn't left me
much time to continue the writeup of the evidence supporting
the hypothesis that the IQ distribution is lnnormal.
The
Terman Study
Suffice it to say that I think the Terman Study
is seriously underrepresented in its lowest three echelons (IQs
140 to 154). Whether you use the Gaussian distribution that Terman
would have expected, or whether you use a lnnormal distribution,
the Terman cohort should have included about 40% (if it were Gaussian)
to 70% (if it is lnnormal) more Termites than it does. Clearly,
the eligible children were out there. They just weren't included.
Also, the bottom IQ ranges, 140144, 145149, and 150154 shouldn't
have had enrollment ratios like 160:150:134. At a 140 IQ, the
IQ curve is falling away too fast for this. The ratios should
have been more like 425:225:134.
In the lowest three brackets, the shape of
his curve is a dead giveaway that something's not cricket.
The upper strata of his data fit a lnnormal
curve quite well . . five orders of magnitude better
at the upper end than a Gaussian distribution.
The
Army General Classification Test
I won't include it here just yet but the Army
General Classification Test (AGCT) data also fit well. This data
was collected for 2,400,000 testtakers (presumably mostly or
entirely Army recruits), so it represents the screening of a large
population. For example, it assigned ratio IQ's of 170 to about
1 in 4,500 testtakers, corresponding to a zscore of about 3.535
sigma, or a Wechsler IQ of about 153. That's within a fraction
of a point of the lnnormal prediction.
The
Sare Data
The Sare data give fair and away the best fit
to a lnnormal curve. The question is: where did Geoffrey
Sare get his data? I've prepared a letter to the University of
London, and will send it as soon as I've cleared it with others
who are involved in this matter.
Children
with 200+ IQs
The lnnormal formula predicts that about 1
in every 521,000 children should exhibit a ratio iQ of 200+. I'm
unable to verify that directly and closely, but can only observe
that in the three cases that are available to me  Ruth Duskin
Feldman"s "Whatever Became of the Quiz Kids",
Leta Hollingworth's "Children Above 180 IQ", and Miraca
Gross', "Exceptionally Gifted Children")  the results
are reasonable for the presumed size of the population. Among
the anomalies are the fact that Miraca Gross should have found
more than 1,000 children with IQs of 160+ in the large population
that she screened, and instead (through no fault of hers, of course),
found only 30, 3 of whom had IQs of 200+. Similarly, Leta Hollingsworth
ransacked the New York metropolitan area for over twenty years
and was only able to find and follow 12 children with IQs of 180+.
These agreements for IQs of 200+ set a lower limit on the number
of children with IQs of 200+ in the three representative populations,
but there's no guarantee that there couldn't be more. Only exhaustive
screenings with ratio IQ tests can settle this question definitvely.
Pushing
the Envelope
If we do, in fact, verify the validity of a
lnnormal distribution as the proper distribution
for IQs, the next step will be to push the limits of this discovery.
For example, there's the Flynn Effect. I think I've shown that
the Flynn Effect increases with IQ up to the age of 11 through
1987 (since Christopher Otway had an IQ in excess of 200 as measured
on the 1973 revision of the StanfordBinet). But Dr. Flynn's latest
interpretation of the underlying mechanism for the Flynn Effectthat
children with IQs at the highest level mentally mature earlier
than they did in the past, and that their adult mental ages are
liittle or no higher than they were in the pastimplies that
the adult IQ distribution has become strongly skewed to the right,
with the average adult much closer to the brightest adult on the
basis of absolute intelligence than was the case 100 years ago.
Determining the shape of the IQ curve (and mental age curve) is,
I should think, key to testing Dr. Flynn's hypothesis. One way
to approach this might be to measure the shape of the ratio IQ
distribution for AfricanAmericans. Since the AfricanAmerican
mean IQ continues to fall about one standard deviation below the
Caucasian I. Q. distribution, their ratio IQs can be measured
to higher levels on existing tests. Clearly, their IQs have been
affected no more and no less than those of other ethnic groups.
The mean IQ of their distribution has shifted to the right by
the same 3 points per decade as has the mean IQ of the Caucasians.
(They're where the Caucasians were in 1952.) Then there's the
matter that the Belgians have IQs that are about 13 points higher
than the U. S. mean. The Belgians are where the U. S. will be
in 40 more years, if the Flynn Effect continues its steady march.