‘Tests of Astrology’: scientifically objective or biased?
by Jan Ruis
Recently ‘Tests of Astrology’ by Dean, Mather, Nias & Smit was published as the successor to ‘Astrology under Scrutiny ‘. The authors are very ambitious: they intend to give an objective assessment and evaluation of more or less all the relevant astrological studies to date, and they have high pretentions: to express a final opinion on the significance of astrology. The previous book has received substantive fundamental criticism, as well as admiration and respect for the tremendous amount of work involved. General criticism concerned the one-sided philosophical approach to which the authors are committed to. Specific criticism concerned the incorrect representation and statistical analysis of some interesting studies, including my studies on serial killers. My understanding was that this time the review of these studies would be more extensively argued and that proves to be the case.
After studying the way that the authors of ‘Tests or Astrology’ treated the analysis and representation of several of my publications, as well as their interpretation of the Gauquelins’ results, I felt compelled to bring the following to the public attention.
2.(i) High score on Mutable signs in serial killers
In my study on serial killers (2008) I found significant effects for Mutable signs, the 12th principle (Pisces, 12th house, Neptune aspects) and Moon aspects. These results confirmed the a priori hypotheses. In personal communications in 2008-2010 Dean sent me a long list of arguments attempting to disprove my 2007 article. At some point he refused further communication and told that he would not give permission for his arguments to be raised publicly. For me, this was the reason to publish my follow-up article in 2012, discussing these arguments but without mentioning his name. To my surprise none of all his former criticisms are mentioned anywhere in the new book; now different arguments are formulated mainly focusing on my results with Mutable signs. On page 291 they discuss this figure from my 2012 article:
Figure 1. The frequency of the Mutable total (=the sum of Sun trough Saturn plus Ascendant in Mutable signs = sum of 8 factors per birth chart) in 77 serial killers. The histogram is made from 500 Mutable totals obtained from 500 samples of computer-generated birth charts, each sample N=77 charts, randomly drawn from the birth years of the serial killers. A normal distribution is fitted to the histogram. The mean score for samples with size N=77 is 195.5, while the sample of serial killers have a score of 235.
According to figure 1 the probability that a score of 235 is due to chance is extremely low (p = 0.00009). The result therefore is very significant: the sample of serial killers has a much higher score on Mutable than the theoretical expectation, thus confirming the hypothesis.
Dean et al. describe this figure as follows: “the mean for 500 controls randomly sampled from N = 6000 ..” This is a rather careless mistake. The histogram represents data from 500 computer generated samples of 77 charts each, so a total of 38.500 charts, not 500 controls sampled from N=6000. It seems that this diagram was not well understood by Dean et al.
2.(i)(a) Getting rid of the high score on Mutable by using a misleading statistical trick
Dean et al. then comment: “the above p = 0.00009 seems very convincing, except it is based on an inflated N and is therefore optimistic.” They mean by this that we have 8 factors per chart while N is only 77. This argument is invalid since I used a Monte Carlo analysis with eight factors per individual as shown in Figure 1.
Dean et al. ignore this fact. Instead, they embark on a mystifying side track. They select a post hoc observation and figure from my 2012 article in which I say that serial killers with ?3 factors in Mutable have a surplus of Mutable because there are more of them than the theoretical expectancy:


Figure 2. Percentage of persons against the score in Mutable. The minimum is zero factors and the maximum is 8 factors in Mutable per radix. The theoretical values were obtained from a control group of 6000 individuals born in approximately the same years and places as the serial killers.
The authors then use this figure to recalculate the p-level of Mutable, completely beside the hypothesis and it’s confirmation, and proceed with their mystification by asking “but what justifies counting charts with 3 or more factors instead of 2 or more, or 4 or more? No number was specified in advance, yet a choice made after the event (as here) violates statistical assumptions “. The answer is: nothing, dear authors, you yourselves created this subject. No number was specified in advance because it was not a part of the investigation on which you are commenting. You invented this question yourself. How to interpret such a misleading way of commenting research?
Summarizing: the hypothesis in the 2008 article was not about one, two, three or four factors in Mutable, but about the total in Mutable. The original hypothesis was that serial killers have a higher score in Mutable than what can be expected by chance alone. This hypothesis is, very obviously, confirmed in Figure 1.
2.(i)(b) Faulty calculation of the p-value and effect size
Now let’s take a closer look at the way Dean et al. recalculate the p-level. They use a 2×2 contingency test (chi-square) for the difference between the number of serial killers (with ≥3 factors Mutable) and the theoretical expectation. By doing so they manage to reduce the p = 0.00009 from Figure 1 to p = 0.032. This value is wrong which can be checked here: vassarstats.net/tab2x2.html.
We get: p = 0.021 (Pearson), p = 0.032 (with Yates correction), p = 0.031 (two-tailed Fisher), p = 0.016 (one-tailed Fisher). The p = 0.016 (one-tailed Fisher) is the correct value here because we apply a one-sided test (the hypothesis is that serial killers score higher). Besides, applying a Yates correction to reduce the p-level is unjustified. But these are not the most serious mistakes Dean et al make.
We are not testing two samples here! We compare a sample of N = 77 with a theoretical expectancy. So we must use the chi-square goodness of fit test. This gives p = 0.0014. This value is virtually the same as p = 0.002 obtained by bootstrap in my original 2008 article:
The result in Figure 3 confirms that the above p = 0.0014 is correct and that the objections of Dean et al are unjustified and that their p = 0.032 is wrong.
The reason that p = 0.00009 from Figure 1 is lower than p = 0.0014 is that Figure 3 was obtained with 500 resamples randomly drawn from a matched control group of N = 6000 real persons, thus seasonal and daily rhythms were reproduced, while Figure 1 is based on 500 resamples from artificial births which only reproduce the years of birth of the serial killers.
Dean et al make a similar error on pages 243 and 244 by treating a theoretical expectancy as a sample result in a 2×2 contingency test. Why is this wrong? Simply because of the law of large numbers. The mean converges to the theoretical value a with large number of resamples. The theoretically expected values are deterministic values: when the number of birth charts increases, the average values approach some deterministic values; in practice, I simply increase the number so much that no significant changes occur anymore (a demonstration of this is shown a few pages later in Figure 8). The theoretical value for the number of charts with ≥3 factors in Mutable (figure 2) was determined using 6000 charts: the probability of ≥3 factors in Mutable was 53%. I repeated this with 5 trials with 20,000 computer generated charts taken from the birth years of serial killers and each time the (rounded) probability was 52%. This means that the theoretical expectation for 77 serial killers is indeed 40, the same value that Dean et al used in their 2×2 contingency table (page 291):
To treat the theoretical value as a single sample result is thus faulty statistics. The recalculation of Dean et al does not change the fact that the probability of a score of 235 in Mutable is extremely significant. The effect size phi = 0.186 in the above table is also wrong. Phi = √(Χ2/N) = 10.2 / 77 = 0.36.
More factors in Mutable also implies that there are fewer factors in the other Qualities, Cardinal and Fixed (the total score is 8 per chart). Dean et al ignore that fact. See the figure below:


Figure 4 (left) shows the scores for the 3 Qualities in serial killers together with the theoretical scores based on 600 samples of N = 77 charts, each chart obtained by independent shuffling (with replacement) of years, months, days, times and locations of the serial killers. The percentage deviation, obtained from (obs-exp) / exp, where ‘obs’ is the value of the serial killers and ‘exp’ the theoretical expectancy, is shown at the right: the surplus of Mutable is compensated for by a deficit of Fixed. The overall absolute deviation over the 3 Qualities is 36%.
The question now is whether there really is an overall significant effect for the Qualities. So we need to know what the probability is for a deviation of 36% in a sample of N = 77. This cannot be done with a chi-square test as there is the issue of multiple correlated variables. We must therefore use a bootstrap (Monte Carlo) method.
How does that work? The computer calculates 77 charts for the independently shuffled birth data of the serial killers. This is repeated 600x. The averages of the 3 Qualities in the resulting 600×77 = 46,200 charts are then calculated. These are the theoretical expectancies. For each of the 600 samples we determine the sum of the absolute %-deviations over the 3 Qualities. Then a histogram is made of the 600 deviations and a probability distribution is fitted to it. The figure below is the result.
Serial killers thus exhibit an overall significant effect with respect to the Qualities. In Dean’s words this result should read: “the results are genuinely significant”.
2.(i)(c) Ignoring the research hypothesis
Dean et al, however, continue on page 291: “So we must repeat the test for each number”. (they refer to the number of factors in Mutable). But again that has nothing to do with the hypothesis. Testing whether serial killers have one or two factors in Mutable is irrelevant in this context. Moreover, the scores on 1 through 4 factors in Mutable are not independent of each other because a higher score at ? 3 factors automatically means a lower score at less factors. So quite apart from the fact that Dean et al use the wrong test, they use an irrelevant method. Apparently they haven’t really understood Figures 1 and 3.
Based on their otiose analysis of factors in Mutable Dean et al. conclude: “The p-levels are not significant except for p=0.03 and p=0.007 for planets in Mutable, which among this number of results are not genuinely significant..”. A vague and irrelevant remark, without further argumentation or implication and ignoring the significant result shown in Figure 1.
They finally conclude (without statistical support): “In short, nothing being tested here is genuinely significant. Hence serial killers do not confirm astrological principles.”.
And they leave it at that.
2.(ii) Ignoring other significant results in the study on serial killers
The other significant results (12th principle, Moon Aspects) are simply ignored. I am therefore reproducing these results here briefly so that readers can have a full picture.
The other significant results (12th principle, Moon Aspects) are simply ignored. I am therefore reproducing these results here briefly so that readers can have a full picture.
Dean et al do not discuss these results. They end with a general discussion of serial killers, saying: “astrologers want to identify serial killers, not tell serial killers their charts tend to be higher in X than average”. This speculation about astrologers being right or wrong, is irrelevant and distracting. It does not concern my research, which is purely scientific and not about possible usefulness to astrologers. Of course I already stated myself that serial killers cannot be identified from their chart. The fact remains that we still need a scientific explanation for the results. Could there be a selection bias causing the above significant effects (which I do not rule out, since reproducibility is low in another, although much less reliable, database), or are there genuine astrological effects (even though these effects are small and not immediately useful to astrologers)? An argument often made by Dean et al is about effect size. Because effect sizes in astrological studies are small and therefore have no practical value for the astrologer, that would be an argument for them to dismiss weak but positive effects. But even if planets have a weak but measurable effect then we still would like to know how on earth that is possible. That’s what science is about.
2.(iii) Creating confusion about the shuffling method
On page 333 (Comment 3) Dean et al. discuss the method of shuffling birth data. They say: “but astrological effects can be very fine-grained, as when a few seconds of birth time are enough to change a house position or an Ascending sign, even though births are never timed to that precision.”. They ask: “So how good is shuffling at providing controls at this fine-grained level?”.
Dean et al mix up three things here: 1) changes in astrological positions and the (in)ability to sample them, 2) controls and theoretical expectancy, 3) variation of results within a method and variation of results between methods. They also refer to me as an astrologer, which I am not.
With respect to issue 1): rapid changes in positions are no problem for the shuffle method in which the effects of these changes are accounted for. The positions are randomly sampled in many thousands of charts and the variations are thus included in the resulting histogram. Dean et al. seem to make a fallacy here. In my research I use 500 resamples each of size N and the 500 outcomes are put in a histogram. The table below demonstrates the effect of repeating the bootstrap method 5 times (resample N = 77, birth data of serial killers shuffled independently):
2.(iii)(a) Eliminating significancies by treating a theoretical expectancy as a one sample result
The above findings illustrate that the 2×2 contingency test of Dean et al (p.244, Comment, line 1-4) is wrong. Their test is for comparing two samples. They should have tested the comparison of a sample and a theoretical expectation. So a chi-squared goodness of fit test, or a bootstrap method, are the correct procedures. The deviations in figures 4 through 7 are relative to the theoretical expectancy. The theoretical value (for 500 samples) does not vary significantly in repeated bootstrap trials, as demonstrated above. The tiny variation is no problem because it makes a negligible difference to the resulting bootstrap histogram and p-values. But I agree it would be more precise to give an error estimate of the p-value.
Then the final issue 3): as shown in Figure 8, the variation of results within a shuffle method are negligible with a very large number of samples. But different shuffle methods can indeed render different results. I repeated the shuffle procedure as shown in Figure 8, this time with month and day fixed. The theoretical expectancies were also constant over 5 trials but the means were systematically slightly higher (205.7). In this case, the explanation is that the high score of Mutable in serial killers is reproduced to some extent by fixing both their month and day of birth and only shuffling the remaining parameters. The fact that (potentially astrological) effects that are present in a research sample can be reproduced by shuffle methods was already discussed in my 2008 article, but Dean et al don’t refer to this.
Shuffle methods are in fact conservative because they always reproduce some of the variations present in the sample under study and this raises the potential p-level. For instance, in our sample of serial killers there are much more births in November and January than in the other months. Shuffling will reproduce the high frequency of these months, thereby reproducing positions of the Sun and, to some extent, Mercury and Venus. Less conservative methods include the method used to produce Figure 1 (only reproducing the years without seasonal and daily rhythms). Probably the best method, but more elaborate, is using a large sample of real persons born in the same years and places as the sample under study, a method that was used in Figure 3.
In conclusion: the authors produce an incomplete analysis of different shuffle methods and seem to imply that the differences between the methods undermine the reliability of the highly significant results such as shown in Figures 1 through 7. That is not the case: the differences are too small to eliminate these significances, but they may affect the p-level slightly. Again, an error estimate of the p-level is therefore recommendable.
2. (iv) Update on serial killers
In my 2008 article I used 77 male serial killers with reliable birth data that were taken from AstroDatabank for PC back in 2007. In the mean time, AstroDatabank was taken over by Astrodienst and new serial killers and other murderers have been added over the years. To check whether the significant results are reproducible, I collected an additional 23 male serial killers using the checklist (to identify serial killers) given in the 2008 article. This brings the total to 100 male serial killers. This update will be published in a research article in Correlation in the near future. In the present context it is relevant to mention that some findings, such as the surplus of Mutable and the aspects with the Moon1, confirm the previous results. These findings enhance the likelihood that we are not dealing with an artifact or a statistical fluke.
2. (v) Synastry studies
2. (v)(a) Deceptive information
On pages 242-243 and 244 Dean et al discuss my synastry articles of 1993 and 1994. They use the totals of each synastry aspect-angle (0°,60°,90°,120°,180°) as shown in Table 1 of my article. The total for each angle is the sum of all pairings from Sun through Saturn, Asc and MC mutually. About the totals in this table, I say in the article of 1994: “…the distribution of the totals of the aspects is not significant…” Dean et al do not mention my conclusion (p244: Comment). Instead, they suggest to the reader that I mentioned that the totals are significant.
Dean et al make the same error again: applying the 2×2 contingency test instead of a goodness of fit test. They conclude from their test: “…all results are non-significant..” (p244 Comment line 5). But that conclusion is already explicitly made in my article! This is deceptive information. Worse still, Dean et al say nothing about the significant results of the grand total of aspects (the sum of all pairings over all angles). They imply that this is the 120° aspect (p243 line 14, p244 Comment line 6-7). See figure 9 (left), taken from my 1994 article:
(p. 243: Comment lines 1-3; p. 244: Comment lines 1-4).
Next they discuss the control chart for the grand total of synastry aspects (but, oddly, don’t mention that this is the grand total and imply this is the 120° aspect). Figure 9 (right) shows the control chart as suggested by Ertel (Ruis, 1994/95). The birth dates of the spouses were shifted a number of days. The further the birth dates are shifted away from the true births, the lesser the grand total deviates from expectancy.
This finding confirms the positive results, but Dean et al miss this fact. They conclude: “… no reason to suppose that the results were other than chance (they were in any case not significant).”
2.(vi) The Gauquelins’ results
Something else concerns Dean’s attempt to explain the Gauquelins’ results. The interpretation of the ‘Gauquelin effect’ is a challenge for skeptics. In his attempt to explain it, Dean suggested the idea that parents tampered with the birth time. By the way, a single piece of direct historical evidence for this idea has never been found. It cannot be qualified as an hypothesis for it is not falsifiable. Dean is very critical on the studies of others but remarkably and unexpectedly uncritical regarding his own idea.
2.(vi)(a) Wrong translation
In chapter 6.9.2 excessive and suggestive almanac clippings are presented to underpin the idea that the almanacs mention that ‘Gauquelin plus zones’ are lucky zones for the planets. In the middle of page 95 a text is presented from a very old manuscript from 1493, which says: “Degré d’une étoile fixe que Bergiers appellet alKabor c’est a dire le grat chien et dient que ceux qui sont née sous la constellacion et quelle est en l’ascendant ou au milieu du ciel elle signifié bonne fortune”.
This should be translated as: “Degree of a fixed star which the Shepherds (star gazers) call Al Kalb that is to say the Greater Dog and say that those born under the constellation and which is on the ascendant or at the midheaven signifies good fortune “2.
To go from here to explain Gauquelin’s plus zones seems an impossible task, but Dean manages to do so. How?
By dividing the above sentence into pieces, by suggestion, wrong translation and omission. He translates “.. born under the constellation..” as: “born under a planet or star”. A constellation is usually a configuration of stars, not a single planet. In fact ‘constellation’ in this text refers to Canis Major which includes the brightest star in the night sky, Sirius. The constellation is not on the ecliptic and there is no reference to planets above the horizon or west of the Midheaven3. Next, Dean translates “..en l’ascendant ou au milieu du ciel..” as: “..above the horizon or culminating in the middle of the sky”. This is wrong because “en l’ascendant” means “on the ascendant”, not above (in the 12th house). And “au milieu du ciel” means at MC (not in the 9th house).
These facts are already fatal for Dean’s idea, but he distracts attention from that by suggesting that the tampering parents, four centuries later than the book, would be in conflict because they had learned that favorite positions occur before rising and before culmination (a reference for this is omitted) while the almanac from 1493 (which Dean thinks they have read) says above the ascendant and at culmination. Therefore, Dean says, they put the planets in Gauquelin’s plus zones. Apart from the fact that the almanac of 1493 doesn’t say above the ascendant, this idea is, in my opinion, nothing but fantasy. Dean’s suggestive phrases hide his inability to explain why ‘at culmination’ is interpreted by the parents as ‘after culmination’ (the 9th house). My thanks to Kyösti Tarvainenn who showed me the errors and signaled that these errors were already present in the former book ‘Astrology under Scrutiny’. The above French phrases and translations are from Robert Currey.
2.(vi)(b) A contradiction in terms
Is Dean’s idea plausible that parents with astrological interest report an incorrect birth time to the civil registry office? It is very well conceivable that some parents report another birthday in order to avoid a date like the 13th of a month, or Friday the 13th, or December 31st or January 1st. But why would parents do that with the birth time? Well, in difficult cases as a midnight birth. But that is not what Dean means. He postulates that astrologically well-trained parents immediately calculated the house position of the planets at the time of birth of their child, and as their favorite planet was not on the ascendant or MC of the newborn, they would have calculated the required time of birth so that their favorite planet is right on the ascendant or MC. And the next day they would report the incorrect birth time of their newborn to the civil registry.
But parents who have done so would have also known that their child would not be really successful because the success bringing planet just was not in the right place in their (real) birth chart. If those astrologically educated parents believed that astrology works, how can it be that they believed that a fictional self-created planet position would also work? That is a contradiction in terms.
Some other severe points of criticism. In astrology, the Sun, Venus and Jupiter have always been considered ‘benefics’ and Mars and Saturn ‘malefics’. Gauquelin found effects for Moon, Mars, Jupiter and Saturn. Why then did time-tampering parents not prefer the Sun, Venus and Jupiter and avoid Mars and Saturn on the ascendant and MC? And what about the Moon? And how to explain that the parents did not place their favorite planets exactly on the ascendant and MC, the ‘really good place’. Their placement was most frequent in the 12th and 9th house, while in astrological tradition the 12th house was known as a very unfavorable house for worldly affairs, a place to be avoided. Why then? They all made a similar miscalculation? Unlikely. And why did they choose the 9th house? And how did they manage, as a group, to distribute the faked positions over both the 12th and 9th houses?
There are other problems. Sun and Mercury showed no effect in the statistics of Gauquelin. Why would these tampering parents avoid these, especially as a strong Sun is an asset in a birth chart? An additional problem is that almanacs not only mentioned planets in houses but also signs of the ascendant. Why didn’t the parents go all the way and give their children favorable ascendants? Why not a Jupiter in the 2nd house for financial wealth? Numerous possibilities. The fact that they didn’t is in contradiction with Dean’s wild idea.
A most intriguing question: how is it possible that children with an incorrect birth time are more likely to become eminent in a particular profession later in life? Dean’s idea would mean that parents who gave their child a faked Jupiter in the 12th house had a magical force to ensure that the child would become an eminent journalist. And if those parents so badly wanted their child to become an eminent journalist and writer, why not give the child a Mercury conjunct Ascendant or MC? Mercury is good for writers according to almanacs.
In conclusion: Dean’s idea is highly unlikely.
2. (vii) Final remarks
Geoffrey Dean has set himself the task of evaluating astrological research and has, with the aid of a few others, devoted himself to that with admirable dedication for about 40 years. Dean’s statistical approach has stimulated the development of more advanced scientific methods by researchers of astrology over the years. That is Dean’s great contribution. A critical and independent analysis is of special importance in the field of astrology since a scientific theory is lacking.
One may wonder whether someone who is apparently so strongly attached to his cause and has acquired the reputation of a ‘skeptic astrology debunker’ is not at risk of losing his objectivity; it may become difficult to admit on occasion that you are wrong or don’t know the answer.
Admittedly, many astrology practitioners produce nonsense. Much published research has little or no value. But there have also been studies that are difficult to interpret, that have produced significant results which are not easily explained away.
I do not mean that those results necessarily indicate the existence of an astrological effect, but they are challenging and invite further research. The reader finds no such considerations in the book ‘Tests of Astrology’.
The authors give the impression that they are interested in one thing: ‘debunking seemingly astrological results’. I hope I have shown convincingly that they sometimes do so by mystifying and distracting from the research subject at hand, inappropriate application of statistical methods and last but not least ignoring inconvenient results.
That astrology does not fit the scientific worldview is beyond dispute. I am therefore a doubter, inclined to reject astrology on scientific basis and yet open to convincing evidence that there seems to be something in it. My impression is that Dean et al are not open to this. By their attitude they put themselves in the position of the Chief Justice who gives the final verdict. They signal ‘errors’ in every published research, sometimes on dubious grounds, and what is worse: in “Tests of Astrology” the reader is overwhelmed with hundreds of discussions in which the authors always have the final word, without mentioning anywhere a dialogue or reply from the researchers involved.
I hope that this intensification of the dialogue leads to more independent research and a more objective review of the results.
Jan Ruis4
1: for details: see Appendix
2: translation from Robert Currey
3: modified from Robert Currey
4: Special thanks to Jean Cremers for programming the astro-research modules, Frank Vernooij & Hans van Oosterhout from the Dutch NVWOA for helping with the manuscript, Graham Douglas for textual improvements & suggestions and Kyösti Tarvainen for his useful contributions.
Appendix


Figure 9. The Mutable total in the updated sample of 100 serial killers remains very significant with an even lower p-value than was found for 77 serial killers (Figure 3).
Figure 10. The number of serial killers with ≥3 factors in Mutable is 70, while the theoretical expectancy is 51. The probability that this can occur by chance is p = 0.00003 (chi-squared goodness of fit test), lower than the p-value for 77 serial killers (Figure 2).
Figure 11. The result for the aspects with the Moon remains highly significant with an even lower p-value than was found for 77 serial killers. In particular, Moon-Saturn aspects show a stronger effect (individually significant at p = 0.001) than is shown in Figure 7. Frequent Moon-Saturn aspects was part of the second hypothesis in the 2007 article.
Figure 12. The result for the sum of planets in the 12th house remains highly significant with a p-value virtually identical to the p-value mentioned in Figure 6.
References
Astrology under Scrutiny (2013), Astrologie in Onderzoek, Volume 15, final issue, July 2013
Currey, R. (2014), Astrology under Scrutiny: Close encounters with science, Correlation 29(2): 52:68
Ruis, J. (1993/1994), Indication for a role of synastry aspects, Correlation 12(2): 20-43
Ruis, J. (1994/1995), Shift control of synastry effect, Correlation 13(2): 38-39
Ruis, J. (2007/2008), Statistical analysis of the birth charts of serial killers, Correlation 25(2): 7-44
Ruis, J. (2012), The birth charts of male serial killers, Correlation 28(2): 8-27