Measuring Adult IQs: the Terman Concept Mastery Tests (CMT-A and CMT-B)
What This Page Is About
the "Termites" Adult IQs in 1940: The Concept Mastery IQ Test-A (CMT-A)
In 1940, when the "termites" were in their latter twenties to early thirties, Dr. Terman and his associates faced the task of measuring the Termites' adult IQs. Terman, et al, decided to develop a new IQ test for this purpose called the Concept Mastery Test (CMT). This is a 190-question, high-range test, that was sufficiently difficult that the highest raw score made by any Termite was 172. The average score of the Termites on this test was 96, with the men averaging 98 and the women averaging 94. Nine hundred fifty-four (954) of the original 1,528 Termites took the CMT in 1940. Dr. Quinn McNemar, who contributed to the statistical analysis of the data, estimated that the average raw score of the average adult on the CMT (later dubbed the CMT-A) would be 2, and that the Termites ranked about 2.1 standard deviations above the mean. Dr. Terman insinuated that a standard deviation of 2.5 was more in line with the Termite's average of 96, and this is what later researchers seem to have accepted. This conclusion that the Termites' average score of 96 is 2.5 standard deviations above the average (= 2) for the general population implies a standard deviation on this test of 94/2.5 of 37.6 above the population mean of 2.
If z-scores on this test are directly proportional to raw scores, then this would imply the conversion table of raw scores to deviation IQs shown in Table 1 below. What I'm saying here is that there's no a priori reason why IQs should be directly proportional to raw scores on an IQ test. However, for some reason, that seems to be the case with the CMT tests. Perhaps they were designed that way.
Table 1 - Table Converting CMT-A Raw Scores to IQs, Assuming an Average Adult IQ Score for the Termites That Is 2.5 Standard Deviations Above the Mean of the General Population
This is a very reasonable set of numbers.
Another, Independent Approach:
Since we know the expected deviation-IQ distribution among the 260,000 people from whom the Termites were drawn, we can use this information to norm IQ tests that the Termites take, such as the CMT-A.
We can draw upon the fact that, out of 260,000 people, one would expect to find one person with a 1-in-260,000 IQ (171+), two with 2 with 1-in 130,000 IQs (169+), three with 1-in-90,000 IQs (168+), and so forth. (Of course, there would be a 1-in-2 chance of finding someone with a 1-in-520,000 IQ of 174+ in our sample, and a 1-in-4 chance of finding someone with a 1-in-1,040,000 IQ of 176+ in our sample.)
A completely independent way to arrive at this set of numbers is to assume that the two highest scorers had IQs at the 1 in 160,000-and-up level. That would correspond to a deviation IQ of 170, and would imply a standard deviation of 38.4 points of raw score.
But in order to apply this method to the Termites, I've had to make three perhaps-questionable assumptions:
(1) I'm assuming that the population from which the Termites were drawn is a randomly selected population.
This assumption may not be entirely justified. Because Dr. Terman sifted the urban populations of San Francisco, Alameda, Berkeley, Oakland, and Los Angeles, and because 35% of his enrollees came from professional families, the chances of finding somewhat higher than expected IQs might have been greater than it would have been had the children been enlisted from, e. g., rural Iowa.
(2) I'm assuming that virtually all the brightest children among the population of 260,000 children were included as Termites in the Terman Study.
I'm betting that the testers missed some children at the lower IQ levels, but that they identified and recruited those at the highest IQ levels.
(3) I'm assuming that virtually all of the brightest among the 260,000 adults were Termites, and took the CMT-A in 1940.
However, even if these assumptions aren't strictly valid, the results aren't very sensitive to the assumptions.
Even if some of these assumptions aren't completely valid, the results are relatively insensitive to deviations. For example, if these scores are off by a factor of 2, the effect upon the range of IQs would only be to shift the scale by about 2 points of IQ.
Table 2, below, shows the consequences of this independent approach to norming the CMT-A.
Both of these tables agree
with the value of 2.5 standard deviations above the mean for the average CMT
score of 96 for the Termites who took the CMT-A in 1940.
Table 1, above, yields a ceiling IQ for the CMT-A of 180 (5 sigma above the mean for the general population).
Table 2 implies a ceiling of 178 (4.875 sigma above the mean for the general population) on the test, although these ceilings might be expected to be subject to ceiling effects starting at a raw score of 171 (10% below the test's actual maximum raw score of 190).
In carrying out comparisons between the 1921-22 Stanford-Binet IQs and the 1940 adult CMT-A IQs, I've had to limit myself to the 643 children in Dr. Terman's "main group", drawn from 160,000 schoolchildren, versus the 954 adults who took the CMT-A in 1940. The reason I had to restrict my attention to the 643 children in the "main group" is because that's the only group for whom adequate statistics are available to me.
Male vs. Female IQs During Childhood and Adulthood
The reason I considered only adult males is because, although IQ tests are designed so that boys and girls have equally high IQs in childhood, female IQ scores, on average, drop a few points during adolescence relative to male IQ scores, apparently in response to hormonal changes. On the CMT-A, the women scored only about 2 points of IQ lower than the men, but on a test emphasizing spatial and arithmetic abilities, the disparity would probably have been greater.
One of the crucial questions this raises is: if women's IQs are falling during adolescence, what does this do to the average IQ of the population, and to the distribution of adult IQs? Does the average IQ of schoolchildren drop a few points as they grow up?
Measuring the "Termites" Adult IQs in 1952: The Concept Mastery IQ Test-T (CMT-T)
In 1951, an updated version of the CMT-A was developed to measure the Termites' IQs at mid-life (average age: 41). Ten easier synonym-antonyms and 5 easier analogies were added to replace 15 of the most difficult synonym-antonyms on the CMT-A. The 15 most difficult synonym-antonyms were eliminated because they were so seldom answered that they had little discriminatory power when the CMT-A was administered in 1940. They were replaced by easy synonym-antonyms and analogies so that the test would have more room at the bottom so that it could be given to less lofty scorers for normalization purposes. (The way I read the tea leaves, this turned out to be a mistake. After correcting for differences in test difficulty, the Termites scored, on average, about 16 points higher than they had in 1940, and that extra ceiling was needed. But Terman and company presumably didn't know that when they redesigned the test.)
Generally-speaking, it looks as though the CMT-T was, perhaps, 15-to-20 points of raw score easier than the CMT-A (for people the same age (average age of 29.5) as the Termites were in 1940). This should come as no surprise, since the 15 most difficult questions, that were rarely answered, anyway, were removed from the CMT-A, and were replaced with 15 quite-easy questions.
It turns out that the ages of the takers is important on these CMT tests.
Using the same kinds of norming that we used for the CMT-A, we would place the ceiling of the test (a raw score of 190) at a deviation IQ of, perhaps, 168, or 4.25 standard deviations above the general population mean.
the CMT-T with the CMT-A
In order to try to find out how the two versions of the CMT compared, both the CMT-A and the CMT-T were given in rapid succession to 108 undergraduate students at Stanford, to 40 graduate students at the University of California, and to 341 Air Force captains tested by the University of California. The interval between the administration of the two versions of the CMT ranged from one day to one week.
The University of California students averaged 21 points higher on the CMT-T than they did on the CMT-A (with a 95.6 average on the CMT-T vs. a 74.6 average on the CMT-A).
The Air Force captains averaged 17.4 points higher on the CMT-T than they did on the CMT-A (with a 60.2 average on the CMT-T vs. a 42.8 average on the CMT-A).
The Termites, taking the CMT-T 12 years after they took the CMT-A, averaged about 40 points higher on the CMT-T than they made on the CMT-A (with a 136.7 average on the CMT-T vs. a 96.9 average on the CMT-A). .
The Termites' spouses averaged about 33 points higher on the CMT-T than they did on the CMT-A (with a 95.3 average on the CMT-T vs. a 62.1 average on the CMT-A).
Using a "line of equivalents", Dr. Terman, et al, concluded that the Termites' raw scores (as measured by the concept Mastery Tests) had risen by about 16 points between 1940 and 1952, corresponding to an increase in IQ of about 16/38 X 16 = 6.75 points of IQ. In other words, the IQs of the Termites had increased, on average, almost 7 points over a 12-year interval.
This illuminates the distinction between fluid intelligence and crystallized intelligence. On an absolute scale, fluid intelligence decreases rapidly with increasing age, whereas crystallized intelligence can rise with advancing age.
The bar chart below shows the raw scores earned by the adult "Termites" in 1952 on the Concept Mastery Test-T (CMT-T). The average of the adult "Termite" scores on this test was 136.7 (Genetic Studies of Genius: Volume V, "The Gifted Group at Mid-Life", Thirty-Five Years' Follow-Up of the Superior Child, Lewis M. Terman and Melissa H. Oden, Stanford University Press, Stanford, California, 1959, pgs. 52-63 ).
As may be seen from the bar charts below, there were severe ceiling effects for the CMT-T test scores made by all the groups taking the test.
, taken from the Prometheus Society's Membership Committee Report,
the Termites' Childhood Stanford-Binet Scores to Deviation IQs
Table 3, below, shows, in Column 2, the numbers of individuals with a deviation IQ equal to or greater than the IQ in Column 1 whom we would expect per 260,000 members of a randomly selected population. This should give us the numbers of children at or above a given deviation IQ For example, 1% of a randomly selected population has a deviation IQ of 137 or above. Therefore, out of a population of 260,000, we would expect to find 2,600 with a deviation IQ of 137+. Similarly, since a deviation IQ of 150+ occurs among 1 in 1,100 people, we would expect to find about 236 of the 260,000 with deviation IQs of 150+. Similarly, since a deviation IQ of 160+ occurs about once in every 11,000 people, we would expect to find about 24 such indivduals among our 260,000.
Table 3 -
Side: The 1st column shows deviation
Right-Hand Side The number expected at each IQ range among 260,000 people
Table 4, to the right of Table 3, shows numbers of children arranged so that they match the predicted numbers in Table 3
. The purpose of this match-up is to try to relate the children's Stanford-Binet IQs with their expected deviation IQs by equating the frequencies of occurrence of the various Stanford-Binet IQs with the deviation IQs that correspond to those frequencies of occurrence.
For example, during the initial Terman screening of 260,000 schoolchildren in 1921-22, 26 children, or about one child in 10,000, were found with IQs of 180+. But a 1 in 10,000 frequency of occurrence corresponds to a deviation IQ of about 159.5, so we might associate a Stanford-Binet IQ of 180+ with a deviation IQ of 159.5+.
Similarly, 77 children were identified with Stanford-Binet IQs of 170+, or about 1 child in every 3,375. About 1 child in every 3,400 has a deviation IQ of 155+, so we might associate a deviation IQ of 155+ with a Stanford-Binet IQ of 170+.
This match-up strategy has been used to associate the deviation IQs shown in Column 1 of Table 3 with the childhood Stanford-Binet IQs of the children in Column 2 of Table 4
Table 4 - Left-Hand Side: The 1st column shows the number of Termites observed (as children) with a given IQ
Right-Hand Side The 2nd column shows the Stanford-Binet IQs of these children
Table 3 Table 4
Table 3 - Expected Frequencies of Occurrence Among 260,000 Randomly Selected People
(Table 3 is an abridged version of Table 1.)
Table 4, below, shows in:
Column (1) the actual CMT-A raw scores of the male Termites,
Column (2) the number of scorers in the 10-point interval, and
Column (3) the running total, starting from the highest score (of 172) that any of them got, and proceeding down to the lowest scoring range
Table 4 - Raw Scores on the CMT-A
Now all we
have to do is match up the raw scores and cumulative numbers in Table 4 with the
equivalent IQs and predicted numbers in Table 1. When we do this, we get
Table 5 - Relating Raw Scores on the CMT-A to the Expected (Gaussian) IQ Distribution among 260,000 People
The 3rd row show the number of Termites in
each IQ range.
The 4th row shows the numbers of people we would expect to find in each IQ range in such a large population
While there was an appreciable amount of shifting in the IQs of the Termites as they grew to adulthood, the maximum downward shift that occurred was 28 points (on the part of 1 male), from a childhood deviation IQ of 132-136 (ratio IQ of 136-140) to an adult deviation IQ of 104-108. There were 2 males, presumably originally in the 132-136 range in 1921-22, who scored in the 108-112 range on the adult test, while 19 additional males who placed in the 112 to 116 range as adults. It's easier to have a bad-hair day and make a lower score on an IQ test than it is to accidentally make a higher score on an IQ test, so I'm supposing that the maximum upward shift among the Termites as they grew to adulthood was no greater than 28 points, and was probably less than this. In that case, I think it unlikely that, among the 260,000 people who were initially screened by Dr. Terman in 1921-22, there were many individuals (if any) with IQs above 150 who weren't Termites.
Now by definition, in a group of about 260,000 adults we would expect find one person with a deviation IQ of 171-172+, or two persons with deviation IQs of 169-170+. Two Termites, both males, made a raw score of 172 on the CMT-A, which, on the basis of my assumption that these scorers would have a "rarity" of 1 in 130,000, would translate into a score about 4.375 standard deviations above the general population mean. If we take a score of (172 - 2) to represent 4.375 sigma above the mean, then the standard deviation becomes about 38.857 points. If we use this standard deviation to calculate a score 2.5 standard deviations above the mean (which was the average score of all the Termites), we arrive at a score of 99... 1 point above the men's average on the CMT-A, and 3 points above the overall average of 96. And we've arrived at this simply by observing that, by definition, the two brightest people out of 260,000 should fall about 4.375 sigma above the mean.
Seven of the adult Termites had raw scores on the CMT-A above 160 on the CMT-A. Using 38.857 for the standard deviation, a raw score of 160 would interpolate to about 4.066 S. D. above the mean, with an expected frequency of occurrence of 1 in 40,000, corresponding to a deviation IQ of about 165+. We would therefore expect six scores at this level or above and instead, there are seven.
Twenty-one of the adult Termites had raw scores of, or above, 150 on the CMT-A. This corresponds to about 3.8 standard deviations above the mean, with an expected frequency of occurrence of 1 in 14,000, or an expected complement of 19 individuals (compared to 21 obsaerved).
Forty-five of the adult Termites scored at or above a CMT-A raw score of 140, corresponding to a frequency of about 1 in 5,000, or an expected occurrence in this size population of about 51 (comparted to 45 observed).
We're now getting down into a scoring range in which the fact that we're dealing with only 954 of the original 1,528 Termites is showing its effects. But the principal source of this divergence in the numbers arises because the Termites were originally chosen because they all had IQs above a threshold value of something like 140. Now, though, on the CMT-A, they exhibit an average IQ of 140. While the highest IQs in the group may still reach 170 (or so I'm assuming), their lowest IQ scorers fall into the IQ 104-108 range. (The distribution of scores among the Termites is bell-shaped around 136 - 140 Stanford-Binet--whereas before, it had a sharp lower cutoff at 136.)
Ninety-seven of the Termites scored 130 or above on the CMT-A. About 130 individuals would have been expected to score at this level or above, if we had had the full 1,528 Termites available to take the CMT-A.
One hundred forty-three scored 120 or above (deviation IQ 0f 144). About 325 would have been expected to score this high out of the original group of Termites. We would have expected the 143 scoring above 120 had the rest of the original cohort been available to take this test. The difference between 325 and 225 probably represents the Termites whose IQs have slipped below 144 as they grew into adulthood. Some part of this may represent the gender-specific decline in IQs as girls grow into womanhood.
This seems to suggest that a standard deviation of about 38.8 points of raw score fits the observed scores well, although we can't say much about what happens at the lower scoring levels without having the full original group present for these measurements, and without addressing the diffusion of adult IQ scores below the childhood deviation-IQ threshold of 136.
So what would be the ceiling on this test? About 4.84 standard deviations, or about an IQ of about 177.5. However, ceiling effects would undermine its effectiveness well below that peak possible score.
Comparing the CMT-A IQ Scores of the Adult Termites with Their Childhood Stanford-Binet IQs
the Termites' Stanford-Binet IQs to deviation IQs, I used the Termites' own
data, as opposed to applying a log-normal model. Since deviation iQs are defined
in terms of percentiles, the Termites' inferred deviation IQs are the IQs that
would be associated with given percentiles if the distribution of IQs were
Gaussian. What's startling about this chart is the way so many of the Termites'
IQs have migrated below the 136 (S-B IQ of 140) cutoff IQ that originally got
them into the Terman Study. It looks as though, as adults, nearly half of them
wouldn't qualify for the 136-deviation-IQ (top 1%) entry level for the study.
Since these are deviation IQs, the distribution for the total populace can't
change. That means that for every Termite who migrated below the deviation-IQ
threshold as they grew up, there must have been another child in the same
population of 260,000 school children who moved up to take his or her place.
Otherwise, the shape of the distribution for the whole population would have
changed at the top end.
There is one other possibility, I suppose, and that is that the surplus of very high IQs thins out as children reach adulthood. This could account for regression to the mean, and for the highly non-Gaussian (e. g., ln-normal) nature of children's IQs, a condition which would than disappear when the children reached adulthood.
A previous discussion of this topic may be found here.
Two very busy
tables may complement the above graph.
Tables I and II below show the amount of regression to the mean that has occurred between the times the Termites were first tested in childhood, using the Stanford-Binet, and the time they were tested as adults using the Concept Mastery Test (CMT) - A.
Table I - Stanford-Binet Table II - Childhood Deviation
IQs and their Equivalent IQs vs. Adult CMT IQs, and
Deviation IQs the corresponding declines.
The first column
in Table I presents a set of childhood Stanford-Binet IQs. The second column
consists of the equivalent childhood deviation IQs that would go with the
frequencies observed with the 260,000-child Terman testing population. Column 3
gives the childhood deviation IQs predicted by a ln-normal distribution,
in order to provide some independent measure of the appropriateness of these
Terman-derived deviation-IQ assignments. What these numbers show is that the IQs
of the Termite children were considerably lower if measured on a deviation-IQ
scale than they would be on a Stanford-Binet scale. (Note that column 2 has been
generated without any assumptions regarding possible relationships between
deviation IQs and Stanford-Binet ratio IQs. It's based upon strictly empirical
The first column in Table II contains the number of Termites who had childhood Stanford-Binet IQs given in column 1 of Table I. For example, 22 of the Termites who took the CMT-A in 1940 had childhood Stanford-Binet IQ scores between 135 and 140, corresponding to childhood deviation IQ scores between 132 and 136, with an assumed average deviation IQ of 134.
The second column in Table II gives the average score this group made as adults in 1940 on the CMT-A. for our example, the 22 adult test-takers scored 84.5 correct answers on the CMT-A.
The third column in table II set forth the estimated average childhood deviation IQ for each group.
The fourth column gives the adult deviation IQs corresponding to the mean CMT scores in column 2, assuming that a score of 98 is 2.5 standard deviations above the population mean of 2. For our example, a score of 84.5 is about 13.5 points of raw score or about 2.1625 standard deviations above the mean, leading to an adult deviation of about 134 (standard deviation = 16).
The fifth column contains the decline in IQ from childhood to adulthood. Since all these scores are deviation scores, these declines are real.
Adult Termites' IQ Distribution,
vs. a Gaussian IQ Distribution
Figure 1 below shows a plot of the distribution of the adult Termites' deviation IQs (standard deviation = 16), as measured by the CMT-A test, versus the Gaussian distribution of deviation IQs expected in the adult population of 160,000 of the children in the "main group" who were screened by Terman, et al, in 1921-22. I have truncated the Gaussian distribution so that it show no scores below deviation-IQ = 136 in order to avoid swamping the Termites' data.
Figure 2 presents the full set of data.
Figure 1 tells a shocking story. The IQs distribution depicted by the Gaussian (pink) curve is, by definition, the distribution that we would expect to find in a randomly-selected population of 160,000 adults above a deviation IQ of 135. (I haven't included the rest of the Gaussian (for IQs below 136) in Figure 1 because it would swamp the rest of the data, as you can see in Figure 2 below.)
About 820 of the adults in a randomly-selected population would have deviation IQs ranging between 136 and 140, compared with only 51 of the Termites. In other words, as adults in 1940, only about 1/16th of the adults with IQs between 136 and 140 were enrollees in Dr. Terman's Study.
About 434 adults would have been expected with adult deviation IQs between 140 and 144, of whom 64 would have been Termites. In other words, about 6 out of every 7 adults drawn from the original population of 160,000 schoolchildren in Dr. Terman's "main group" who had Adult IQs between 140 and 144 wouldn't have been included in Dr. Terman's Study.
About 196 adults with deviation IQs between 144 and 148 should be found, of whom 60, or about 3 out of 10 would have been Termites.
About 110 adults with deviation IQs between 148 and 152 would be expected, and about 46 of them would be Termites.
About 36 adults with IQs between 152 and 156 would be expected, whereas there are 52 Termites with those scores. As both Figure 1 and Table 1 reveal, there is excellent agreement between the Gaussian curve and the Termites' curve above an IQ of 152. This is consistent with the idea that the Terman screening recruited all of the very brightest kids.
To sum it up, practically all the adults with IQs above 152 were Termites. Fewer than half of the adults with IQs below 152 were Termites dropping to one gifted adult out of every 16 with IQs below 140. In other words, the majority of all the gifted adults with IQs below 152 were overlooked by the Terman study.
To reiterate, the pink curve represents all the individuals who were screened for possible inclusion in the Terman study, while the blue curve shows the individuals who actually were selected for the study. The pink curve is the upper end (IQ = 136+) of the actual distribution of deviation IQs of the 160,000 adults whom Terman screened, as children, for his study back in 1921.
Figure 1 - The (Gaussian) distribution of deviation IQs among 160,000 adults versus the distribution of adult IQs among the Termites.
Figure 2, below, shows the full range of expected IQs for the upper half of 160,000 adults versus the Termites' adult IQs, as measured on the CMT-A in 1940. Table 1 shows the data in Figure 2.
Table 1. - The Distributions of Termite IQs versus the Distribution of IQs in the Entire Population
I derive two conclusions from this.
First, it's no wonder the Termites' didn't include any geniuses. The very smartest members of the candidate population may have been Termites, but a large fraction of the not-as-smart but still exceedingly intelligent adults were excluded from the Termites' group.
Second, there has been quite a bit of "percolation" of IQs, with nearly half the adult Termites falling below the 140 Stanford-Binet/136 deviation-IQ threshold for the Terman Study. Perhaps, a fifth of the Termites fell far enough below the threshold that mere test-retest regression doesn't explain it.
Quantitatively, there were 22 children in the Terman "main group" with childhood deviation IQs between 128 and 136. There were 160 children with IQs between 136 and 140.
Looking at the table above, there are 22 adults with deviation IQs ranging between 104 and 116, 35 adults with IQs below 120, 70 adults with IQs below 124, 100 adults with IQs below 128, and 49 adults with IQs between 128 and 132. About five points of test-retest decline would be expected on retaking the same test, and about eight points of decline would be predicted upon taking a different test.
If we were to reduce the childhood deviation IQs by eight points, we would have 22 children in the main group with IQs between 120 and 128, and 160 children with IQs between 128 and 132. Comparing that with Table 1 above, all 35 of the adult males with IQs below 120 actually lost IQ points growing into adulthood. After subtracting 8 points for test-retest and cross-test IQ losses there were 22 children with IQs below 128, but 100 adults with IQs below 128. Obviously, there was serious decline here, too. Note that the Termites' places are taken by other adults .. 6,100 of them with IQs between 120 and 128, and 38,300 with IQs between 104 and 120.
Richard M. Nixon was a Termites, so at least one of them attained a high level of eminence.