Are IQ Distributions Ln-Normal Rather Than Gaussian?

  

 

IQ Distributions aren't Gaussian.Distributions!
    I'd like to offer profound thanks to Steve Coy and Patrick Wahl for correcting what I've said about non-Gaussian distributions failing to possess standard deviations.
I'm delighted to get letters correcting or questioning what I've said on this website. Please feel free to comment. 
    . . 
    Discussions of IQ distributions usually show a bell-shaped curve and explain that IQs are distributed in accordance with a Gaussian or normal distribution. In fact, they're not! As Dr. Hans Eysenck observes in his book, "Genius", p. 38-39
    "(Ever since Quatelet fell in love with it, and used it to define the 'average man' (Stigler, 1986), the Gaussian curve has exerted a fatal influence on social scientists. As Gabriel Lipman told Poincaré, à propos the curve: 'les experimentateurs s`imaginent que c'est un théorème de mathématique, et et les mathématiciens d'être un fait experimental'. ('Experimentalists think that it is a mathematical theorem, while mathematicians believe it to be an experimental effect.') Galton's arrangement may serve well enough as a picture of the distribution of intelligence, but certainly not of eminence.
    "(Even as far as intelligence is concerned, Burt (1963) has suggested that a Pearson Type IV curve fits the case better than the normal curve!) Micceri (1993) has analysed a large sample of curves of distribution that would normally be expected to be "normal", finding practically all of them to deviate markedly from statistical normality. As Geary (1947) said long ago:" . . . normality is a myth; there never was, and never will be a normal distribution' (p. 241)."*

Steve Coy writes,
    "It is a well known fact (unfortunately I don't have a reference for you) in real-world applied statistics that you virtually never encounter any quantity with anything very closely approaching a normal distribution *in the wings*. The wings are virtually always significantly higher than what the normal distribution predicts, more and more so the further you go out. This is because (a) the normal distibution is in a sense "as compact as it can get", for a given standard deviation. At the opposite end of the spectrum, the Chebyshev inequality tells us the maximum fraction of a distribution that can lie beyond n standard deviations from the mean. I'm virtually certain that good studies have been done of this, but I'll bet that if you took a random sample of published real world measure data *of any kind*, no matter how normal-ish the distributions seem at first glance, if you look closely at the tails, they won't be."

   
The distribution of heights is non-Gaussian. There are many more extremely tall and extremely short men than would be predicted by a Gaussian curve. 

And as for IQ:
    For openers, Gaussian distributions run from (- ∞) to
    IQs, on the other hand, start at 0, peak at 100, and extend to infinity, falling off more slowly than a Gaussian distribution as one moves out along the right-hand wing of the IQ curve.
    Log-normal distributions start at 0, peak at 1, and extend to infinity, falling off more slowly than a Gaussian distribution as one moves out along the right-hand wing of the IQ curve.
    For example, If we used the Gaussian curve as our guide, we would expect to find an IQ of 160 or above among about one in 31,600 children. In fact, we find an IQ of 160 or above occurring in one in about every 1,125 kids (Please Table 1, below.). 
    If we used the Gaussian curve as our guide, we would expect to find an IQ of 200 or above among one in 78,000,000,000  children. In practice, we seem to find an IQ of 200 or above in one in about every 500,000 kids. And as for Marilyn vos Savant's childhood IQ of 228, don't even ask! (The odds are smaller than one in a quadrillion when you hit 218!)
    These are no minor discrepancies. A Gaussian curve under-predicts the likelihood of finding someone with an IQ of 200 by the order of about 150,000 to 1!
    Obviously, Gaussian distributions don't work well for IQs that are very far away from the population average.

    This raises an interesting point. If most measured quantities are non-Gaussian out on their wings, this could explain the deviations of an IQ curve from a Gaussian profile. Further, if we fit the logarithms of IQs with a Gaussian distribution,  the distribution of the lagarithms would presumably not form a perfect Gaussian, either. On the other hand, there seems to be a close fit between a ln-normal distribution and the IQ curves that I've investigated. 

    What actually happens if you try to use a standard deviation is that the standard deviation stretches as we move to the right along the curve.
An IQ of 116:
    For example, if you go out along the IQ curve to the point where 84.315% of the population lies below it (one standard deviation on a Gaussian curve), you'll find yourself at IQ = 116.2. The spread, in IQ points, between the population average of 100 and the point on the curve where 84.135% of the population lies below it is thus 16.2 points of IQ. So if we try to apply a standard deviation to it, the first "standard deviation" would be 16.2 points
    I'm wondering if this might not be why early researchers came up with a standard deviation of 16 for ratio IQs. It would have allowed them to fit their data in the neighborhood of the mean. Discrepancies farther out would have been more difficult to prove in the early days because the number of people getting higher scores falls off as the IQ rises..

An IQ of 135:
    If you go farther along the IQ curve to the point where 97.725% of the population lies below it (two standard deviations on a Gaussian curve)--which this isn't), you'll find you're at an IQ of 134.986, or about 135. The difference between 116.2 and 135.0 is 18.8 points of IQ, so we might say that our second standard deviation is 18.8 points of IQ,.
An IQ of 157
    If you continue along the curve to the IQ where 99.865% of the population lies below it (three standard deviations on a Gaussian curve), you'll be at an IQ of 156.831, or about 156.8. This is 21.8 points above 135, so we might say that our third standard deviation is 21.8 points of IQ.
An IQ of 182:
    If you wander farther down the curve to the celebrated 99.997% required for entry into the four-sigma societies, you'll find yourself at an IQ of 182.2, and the equivalent of four standard deviations on a Gaussian curve. And the difference between 182.2 and 156.8 is 25.4 IQ points, so we might assign 25.4 points to our fourth (ersatz) "standard deviation".
An IQ of 212:
    Far down the curve comes the 1-in-3,500,000 point at which 99.99997% lies below your lofty perch, at an IQ of 211.7.  (You're in rarified company here.) Since 212 is 29.8 points above 182.2, we can say that our fifth (bogus) standard deviation stretches over 29.5 points of IQ.
An IQ of 246:
    Getting about as far along the IQ curve as mere mortals can go, we arrive at the one-in-a-billion level of IQ = 246.0. This sixth fictitious "standard deviation" would be about 34.3 points, or a little more than twice the value of the standard deviation in the vicinity of the mean.

Would a Ln-Normal Curve Fit IQs Better Than a Gaussian Curve?
    Two years ago, Dr. Robert Dick suggested to a University-of-Kentucky math major, John Scoville, that he try plotting the natural logarithms of the ratios of mental ages to chronological ages to see if that would generate numbers that were closer to a Gaussian distribution. (The IQ is simply the ratio of the mental age to the chronological age multiplied by 100.) For example, for a ratio of mental age to chronological age of 1.4, corresponding to an IQ of 140, the natural logarithm is .0.3365. If you multiply that natural logarithm by 100 and add 100 to it, it looks awfully much like an IQ of 133.65. And, in fact, it's what psychometrists are calling the deviation "IQ" these days. The question was: how well would these logarithmic numbers fit a Gaussian distribution?
    The short answer is, for all intents and purposes: perfectly! In 1951, Geoffrey Thomas Sare submitted an M. A. thesis entitled, "The Complexity of Gestalt as a Factor in Mental Testing: a Contribution to the Theory of Test Construction" . In his thesis, Mr. Sare included a table matching IQs with their observed frequencies of occurrence  . .  empirical data gathered over the years. When John compared his log-normal calculations to Mr. Sare's data, he found a virtually perfect match. For example, for an IQ of 200, a ln-normal distribution predicts a rarity of 1 in 521,000 children. Mr. Sare's table shows a rarity of 1 in 532,000. Compared to 1 in 78,000,000,000, that's pretty good! Even if the ln-niormal calculations differed by a factor of 2 from the observed frequencies, that would still be enormously better than a Gaussian fit, and would be within range of experimental uncertainties.
    To read the frequency of occurrence that corresponds to a given mental age/chronological age, one needs to take its natural logarithm and then divide this by the standard deviation of the (Gaussian) distribution of the natural logarithms of IQs. This standard deviation needs to be determined experimentally. This is a very important step, because what we're seeking is a natural psychological constant, akin to the Rydberg Constant or, perhaps, the speed of sound under standard conditions. 
    Using the Sare data, Mr. Scoville arrived at a value for
s ranging between 0.1493 to 0.1501, corresponding to a standard deviation for the deviation "IQ" of about 15 points of deviation "IQ". (David Wechsler wrought well when he defined a standard deviation of 15 points for his Wechsler Intelligence Scales.) You'll notice that I said the deviation IQ because, if this ln-normal hypothesis is correct, there can be only one distribution of deviation "IQs" for a large, unselected population (Gaussian), and one standard deviation (15) for it. (This is not to say that we can't readily construct a sample population with IQs whose natural logarithms are not randomly distributed. The usual sampling precautions would have to be exercised to ensure that a sample population were representative of he larger population for which norms are to be established.)
    John Scoville published his results on the Internet in a paper, entitled, "
Statistical Distribution of Childhood IQ Scores ". I was intrigued with his results. I had wondered how anyone could explain childhood IQ's of 200, or Marilyn vos Savant's childhood IQ of 228 when such scores were so absurd if IQ's followed a Gaussian distribution. I first attempted to apply them to the IQ distributions that Terman, et al, found in their screening of more than 250,000 California schoolchildren in 1921-1922. 
    What I found was agreement at and above an IQ of 165 among the Sare data, Mr. Scoville's ln-normal distribution, and the Terman data. 
    Over the IQ range 150 to 164, Dr. Terman found about 50% of the children with IQ's between 150 and 164 that a ln-normal distribution would predict... but bear in mind that Dr. Terman only found about 30% 0f the number of children with IQs of 140 or above that a ln-normal distribution would predict, and only about 60% as many as a Gaussian normal distribution would predict. 
    Over the IQ range from 140-149, Dr. Terman only included about 22% of the children that a ln-normal distribution would predict, and only about 44% as many children as he would have been expecting to find, based upon a Gaussian distribution. 

    Table 1, below, shows how Dr. Terman's distribution compared with the Gaussian distribution that he would have been expecting. The second column shows what a Gaussian normal distribution would have led him to expect, while the third column shows what he actually found. Look at the differences! It's obvious at a glance that his numbers don't agree remotely with the Gaussian numbers he would have been anticipating. Furthermore, the number in the 140-144 range should have been at least twice that of the 150-154 range. Instead, it's only about 12% larger(You wonder what went through the heads of Lewis Terman and his staff when they realized that their results were differing so dramatically from what they must have expected... like, "What's the scientific community going to say about this strange distribution?" and "How are our sponsors going to greet these wild anomalies?")
    The fourth column shows the numbers that a Gaussian distribution would have predicted if the total number of cases were restricted to the 621 subjects found in Dr. Terman's Main Group. It's included primarily to show the shape of an equivalent Gaussian distribution compared to the Terman distribution.

    There have been critiques of the Terman results, but so far I haven't seen anything that mentions the arithmetic anomalies that you're seeing here. .

Table 1. Frequencies of IQs Found by the Terman Study Screening, vs. Frequencies Predicted by a Gaussian Distribution

IQ Range
Gaussian Prediction*
Observed by Terman
Alt.Gaussian
Prediction
**
140-144
637
160
377
145-149
264
150
156
150-154
100
134
59
155-159
34
64
20
160-164
11
43
6
165-169
3
27
2
170-174
0.77
20
0.46
175-179
0.18
8
0.11
180-184
0.04
10
0.02
185-189
0.0085

2

0.005
190-194
0.00125
2
0.00074
195-199
0.00022
0
0.00013
200
0.00003
1
0.00002
       
TOTALS:

1052

621

621

* - This column shows the distribution of IQs that Dr. Terman must have expected to find, based upon a presumed Gaussian distribution of IQs in his subject population. (I have used a standard deviation of 16 in calculating the Gaussian predictions because that's what I think Dr. Terman would have employed.) 
** - This column shows a Gaussian distribution with the same number of subjects as Dr. Terman "Main group", the intent being to compare the shape of a Gaussian curve with the shape of Dr. Terman's distribution.

    In the IQ range from 140 to 144, the Terman screening found only about 1/4th as many children as a Gaussian distrtibution would have predicted---and this is in the IQ range where the agreement with a Gaussian prediction should have been closest! 
    In the IQ range from 145 to 149, the Terman screening found only about 57% as many children as a Gaussian distribution would have predicted.
    In the IQ range from 150 to 154, the Terman screening found about 134% as many children as a Gaussian distribution would have predicted. By now, the Terman population has switched from fewer children in each IQ category than a Gaussian distribution would have predicted, to more and more children in each IQ range than a Gaussian distribution would have predicted, with a crossover in the upper 140's.
   Above this crossover IQ, the frequencies predicted by a Gaussian distribution fall off enormously more rapidly than the frequencies observed in the Terman screening.

    One of the explanations that has been given for these anomalously high frequencies of high childhood IQs is that a few children develop mentally earlier than average, with some of them going through mental growth spurts, and that these children might account for the phenomenally high scores found among a few children. The mental growth rates of these children would later slow down, while others---e. g., late bloomers---would partially catch up with them. 
    This seems plausible. Physical growth spurts occur among children, and some children reach puberty earlier than others. The only dissonance here is the 




Observation #2: 

About the Terman Study
    Because of resource limitations, Dr. Terman's identification of his "termites" was a seat-of-the-pants operation. Fishing mostly in the cities around Stanford University, Dr. Terman found about 1 child in 262 with an IQ of 140 or above in his "main group", where he would have expected (assuming the distribution of IQs to be Gaussian or "normal") to find about 1 child in every 161 (using his standard deviation of 16) with an IQ of 140 or above. However, as we know today, about 1 child in every 80 has a ratio IQ of 140. Out of an estimated 168,000 schoolchildren, Dr. Terman should have found about 2,100 children with IQs above 140. Instead, he found only 621, or about 30% of the expected number. In addition, there were the following shortcuts taken in the screening and IQ certification process.
(1)  He relied on teachers to identify potential candidates for his 643-child "main group". This led to a pre-selection of 6% to 8% of the total school population in the cities of the San Francisco Bay Area.
(2)  This first step was followed by the administration of the National Intelligence Test (a printed group test).
(3)  Those  children who scored at the 95th percentile or above on the National Intelligence Test were then given an abbreviated form of the 1916 Stanford-Binet test created for this purpose.
(4)  The IQs of the children who scored highest and/or were the oldest were corrected by Dr. Terman in an attempt to sidestep ceiling effects.
    In addition to these artifacts of the selection process, there's also the possibility that Dr. Terman's "Termites", drawn as they were from Berkeley, San Francisco and the Bay Area, may have been a somewhat enriched sample. Thirty-five percent of the children's parents were listed as "professionals".
    Taken all-in-all, there were generous opportunities for errors to creep in, and Dr. Terman's data suggests that error did creep in .
    The table below compares the distribution of IQs in Dr. Terman's "main group" with the distribution of IQs that the Sare tables would predict. 

IQ Range
Wechsler
"IQ"  Range
Gaussian Prediction*
Predicted by  Sare
Observed by Terman
% Found by Terman
140-144
133-136
637
1,000 
160
16%
145-149
137-140
264
525
150
 29%
150-154
141-143
100
281
134
 48%
155-159
144-146
34
151
64
 42%
160-164
147-149
11
82
43

54%

165-169
150-152
3
34
27
79%
170-174
153-155
0.77
21
20
95%
175-179
156-158
0.18
8
8
100%
180-184
159-161
0.04
4
10
250%
185-189
162-164
0.0085
2

2

100%
190-194
164-166
0.00125
1
2

200%

195-199
167-169
0.00022
0.664
0

0%

200+
169+
0.00003

0.336 

1

300%

* - This column shows the distribution of IQs that Dr. Terman must have expected to find, based upon a presumed Gaussian distribution of IQs in his subject population. I have used a standard deviation of 16 in calculating the Gaussian predictions because that's what I think Dr. Terman would have employed. 

    In the IQ 140-144 range, he only enrolled about 16% of the candidates that should have been picked up by the screening. 
    In the IQ 145-149 range, he enrolled about 29% of the candidates that should have been picked up by the screening.
    In the IQ 150-154 range, he enrolled about 48% of the potential candidates.
    In the IQ 155-159 range, he enrolled only about 42% of the available children.
    In the IQ 160-164 range, he got about 52% of what should have been out there...
    In the IQ 165-169 range, he enrolled about 79% of the potential candidates.
    In the IQ 170-174 range, he enrolled about 95% of the potential candidates.
    Above IQ 175, he got them all.

    So he got 
 -  only 25% of the available children in the 140-154 range, 
  
 -  about 50% of the children in the IQ 155-164 range, and 
 -  virtually 100% of the children above IQ 164.

    The IQ range from 140 to 154 had more than 70% of the Termites in it. But . . .
    Only 25% of the children with IQ's below 155 who could have been in the study were included in it, but virtually 100% of the children in the 165-and-up range were admitted to the study. 
    Unwittingly or otherwise, Dr. Terman stacked the deck.
    I fantasize that, faced with funding and staffing limitations, he cut back the number of children in the lower IQ registers. (However, it could also be that, in the teachers' selection, the brightest children stood out like red beacons, whereas the brilliant, but not most-brilliant, children weren't as obvious.)  His distribution seriously under-represents the numbers of children in the two bottom echelons of his study. (That may be how he missed the two Nobel Prize winners who were among the children he screened and rejected: Dr. William Shockley and Dr. Luis Alvarez.)
    He came up with a few more children (43 versus 37) in the IQ 170 to 200 range than the Sare data would predict, but that may have been because of statistical fluctuations, or because of his "correcting" (by as much as 14 points) high scores for ceiling effects, or because his sample included the University of California at Berkeley and the San Francisco Bay cities.

    What is crucial about these results are the close fits at the upper end of the IQ spectrum. This is the territory in which a Gaussian distribution goes completely bananas when it comes to fitting the observed IQ frequencies. 

....The fifteen sample synonym/antonym questions and the ten sample analogies problems listed below are listed by Drs. Terman and Oden in their "Genetic Studies of Genus", Vol. 4, "The Gifted Child Grows Up", pg. 127. There's a total of 120 synonym/antonym questions and 70 verbal analogies problems, for a grand total of 190 questions. The total score on the synonym/antonym subtest is the number right minus the number wrong, and the total score for the analogies subtest is the number right minus one-half the number wrong... i. e., you're penalized for bad guesses. The total time allocated to answering all 190 questions is 30 minutes, allowing about 9 seconds per question.
.
 15 Sample Synonym/Antonym Questions from 
 the Terman 1940 Concept Mastery Test (CTM)

#

1st word

S/A

2nd word

2. decadence decline
7. if although
11 abjure renounce
24. peculation embezzlement
33. insouciant nonchalant
50. choleric apathetic
58. truculent violent
65. cenobite anchorite
71. ambiguous equivocal
80. devisor assignor
86. diatribe invective
89. viscosity viscidity
98. encomium eulogy
103. sophistry casuistry
116. abstruse recondite

....Ten sample analogy questions from the CMT

#

Statement of the Analogy

Choices (circle one)

7. A : C :: X :  1. Y    2. V     3. Z
9. Darwin : Evolution :: Einstein : 1. Relativity    2. Mathematics    3. Magnetism
14. Square : Cube :: Circle : 1. Sphere    2. Line    3. Round
22. July 4 : United States :: July 14 : 1. England    2. Spain    3. France
38. Mercury : Venus :: Earth 1. Mars    2. Jupiter    3. Saturn
44. Socrates : Plato :: Samuel Johnson ::  1. Swift    2. Pope    3. Boswell
47. Analysis : Synthesis :: Differentiation : 1. Integration    2. Frustration    3. Abomination
58. Astrology : Astronomy :: Alchemy :: 1. Physics    2. Chemistry    3. Phrenology
63. Gascon : France :: Walloon : 1. Netherlands    2. Transvaal    3. Belgium
70. Eight : Two :: Thousand : 1. Twenty-five    2. Twenty    3. Ten

....This test was validated using 81 Wilson College freshmen-and-women (I won't say fresh women - grin), 135 Stanford Sophomores, and 96 Stanford seniors. The mean score made by the 954 Termites who took this test in 1940 was 96 out of 190. 
....Quinn McNemar analyzed the data and concluded that the mean for the CMT if given to the general population (in 1940) would have been about 2 correct answers, with a standard deviation of 45. That means that a raw score of 92 would have corresponded to a 2-sigma level among the general population. Consequently, he Termites' average score of 96 would represent an IQ of about 2.1 sigma above the mean, or using their standard deviation of 16, it would have corresponded to a deviation IQ of about 134. This is to be contrasted with their childhood average ratio IQ of 152. However, since that time, others have analyzed the results and have arrived at an average IQ 2.5 sigma above the mean, and this seems to be the currently accepted value. This corresponds to a deviation IQ of 140 on a sigma = 16 scale, or about 146 on a ratio IQ scale... 6 points below their childhood average. Since the Termites' average score was 196, and the average for the general population was 2, this corresponds to a standard deviation of 94/2.5 or 37.6.
.....Two of the male Termites on the CMT-A tied for the high score of 172. I'm going to assume
(1) that the Terman screening netted all of the brightest children, and
(2) that the Termites who took the CMT-A included the adult editions of the very brightest children.
    Since there were 260,000 children in the original population, I would expect to find, by definition, one child with a deviation IQ (standard deviation = 16) of 171 (one in 220,000), or two children with deviation IQs of 169 (one in 120,000). And I would expect to find those same children among the group who took the CMT-A in 1940. I would expect to find in a population of 260,000, by definition(!), roughly one adult with a deviation IQ of 171, or roughly two adults with deviation IQs of 169. (That's true because deviation IQs are defined solely in terms of their frequencies of occurrence.) So that means that I can equate a raw score of 170 on the CMT-A to a z-score of 4.3725, corresponding to a standard deviation of 170/4.3725 = 38.9. This is 1.3 points of raw score different from the 37.6-point standard deviation that we obtained above for the Termite mean score. However, the Termites mean of about 2.5 standard deviations from the mean is only approximate. If I use 38.9 for the standard deviation, I get a mean IQ for the Termites of 2.474 sigma, or a deviation IQ of about 139.6.
    This would also set the ceiling of the CMT-A at approximately 4.83 standard deviations above the population mean, or about 177.
    It's worth noting that this value for the standard deviation is derived independently of the methodology used to estimate the mean and standard deviation for the average IQ of the Termites, and that it agrees quite closely with the results obtained by Terman, et al, for the mean IQ of the Termites. Also, 
.....It might be worth noting that this test wasn't well-validated, at least in 1940.


Scores

 

Cum.

170-179 2 2
160-169 5 7
150-159 14 21
140-149 24 45
130-139 52 97
120-129 46 143
110-119 60 203
100-109 64 267
90-99 51 318
80-89 60 378
70-79 49 427
60-69 30 457
50-59 35 492
40-49 13 505
30-39 19 524
20-29 2 526
10--19 1 527
0-9 0 527

 

IQ Range

#

Cum.

S. D.

Mean

170+ 22 22 29.93 118.14
160-169 37 59 33.88 105.31
150-159 108 167 29.97 101.07
140-149 167 334 31.98 89.17
135-139 22 356 29.23 84.50

 

 


 

(To be continued)