Reviewer Number 1 – MAJOR COMMENTS from Kyösti Tarvainen
1.(i) The Gauguelin research: misleading continues
Dr. Geoffrey Dean again claims (Section 6.9.2) that almanacs earlier presented Gauquelin plus zones as fortunate places for planets. This claim is, however, based on a translation error. He has translated the French expression “en l’ascendant ou au mileu du ciel” to the expression “above the horizon or culminating”. Note first that it doesn’t read here “past the culmination”, which could refer to the Gauquelian plus zone in the 9th house. Secondly, “en l’ascendant” means “at the Ascendant”, not above the horizon or Ascendant, which could refer to the Gauquelian plus zone in the12th house. This translation error was pointed out already by Tarvainen (2014) but it is repeated in this new book. In all, the authors don’t have presently any arguments to discredit the Gauquelin results.
1.(ii) Recent aspect studies discredited and omitted
The authors review negative aspects studies and discredit and ignore positive studies. In Section 7.6.2012.1, the authors have made an effort to discredit two positive serial killer studies by Ruis (2007, 2012). (The studies deal also with other factors than aspects.) Several irrelevant viewpoints are presented, and the authors present a mathematical trick to discredit the positive results. We describe it here in the case of factors in Mutable signs. Normally one determines the number of factors in the data’s Mutable signs and compares this count to that in the control group.
1.(ii)(a) But now the authors consider charts where there are 3 or more factors in Mutables. The number of this kind of charts in the data is compared to that in the control charts. The corresponding p-value happens to be statistically significant. But because the selected number 3 is arbitrary, the authors consider also the limits 2 and 4. Now the corresponding two p-values happen to be non-significant. Since not all considered p-values are significant, the authors’ conclusion is that Ruis’ results are not positive.
1.(ii) (b)But with this trick one can discredit perhaps most positive results: by choosing the arbitrary limit value big enough, the number of charts exceeding the limit is in both the data and controls so small that the p-value is correspondingly not significant. In all considered four cases in the bottom figure on page 291, also the limit value of 1 produces a big p-value, which probably is a rather common case (reflecting the fact the number of other cases, charts without the factor, is about the same in the data and controls).
1.(ii)(c) In summary, the authors’ trick includes an arbitrary limit parameter and one can possibly in almost all applications select such values for it which produce meaningless non-significant p-values, even if the proper standard analysis gives a positive result.
1.(ii)(d) Several recent positive aspects studies are not mentioned. The authors mention the negative results concerning Venus-Saturn aspects in Peter Niehenke’s study of 3,290 persons (Niehenke, 1987). But the authors don’t review a much larger Venus-Saturn study by Tarvainen (2014a), where 41,784 spouses were considered. Statistically significant results were obtained in regard to these aspects’ effects to the age difference of the spouses and delay of marriage.
1.(ii)(e) The authors review for example Michel Gauquelin’s and Suibert Ertel’s summary aspect studies concerning the whole Gauquelin professional data, but the authors ignore more specific, statistically significant studies for 6,285 theologians and for 1,849 famous and 910 other mathematicians (Tarvainen, 2015, 2013).
1.(ii)(f) Concerning synastry aspects between the charts of couples, the book presents over ten studies whose results are, or are claimed to be, non-significant. The number of the couples varies approximately from 200 to 3,000. But the authors don’t mention the biggest synastry study which contains 20,895 families (Tarvainen, 2011). This study gave statistically significant support for the aspects between the spouses’ charts. Astrological synastry considers also the houses where the partner’s planets fall in. In this study, the partner’s Sun fell strikingly often into the 1st, 5th and 7th houses, which according to astrology are favorable (Sakoian and Acker, 1976).
1.(iii) War on the p-value
1.(iii)(a) As there are several positive statistical studies which are difficult to discredit, the authors try to discredit two basic things that ordinarily are used in these studies: the p-value and the method of making control groups by shuffling.
1.(iii)(b)In Section 7.7.2015.2, the authors tell on five pages why p-values are unreliable. However, this text has not even convinced the authors themselves, as they use p-values all the time when wanting to prove their points. This already indicates that they lose their first war. Indeed, the p-value is a well-established concept in science.
1.(iii)(c) Since there are hundreds of astrological and non-astrological factors, the common difficulty in astrology (not addressed by the authors) is the fact that we obtain easily a p-value greater than 0.05 (the limit chosen by human agreement for statistical significance), even if there is an effective astrological factor.� Hence, after obtaining a p-value greater than 0.05, it is incorrect to conclude (as is done also by the authors in many cases) that there is no astrological effect. In fact, usually there is an excess of the considered factor if the p-values is below roughly 0.5.
1.(iii)(d) If there are working astrological factors, smaller p-values can be obtained by increasing the sample size. An important technique is also Fisher’s meta-analysis method which combines several p-values (a numerical example is presented by Tarvainen, 2012).
1.(iii)(e) The authors don’t make meta-analyses for p-values. Let us mention, as an example, that if they would have made a meta-analysis for those eight synastry studies where the p-value is given in the book, they would have obtained a statistically significant p = 0.02.
1. (iii)(f) Sometimes the authors misleadingly determine average p-values. For example, in the extrovert study of VM Gaynor (Section 7.6.1981.1), they give the average p-value of 0.32 for the twelve cases where the correlations between extrovert evaluations of four personality inventories and three astrologers are considered. But when we combine by Fisher’s method the 12 individual p-values given in Gaynor’s PhD thesis (1981, p. 194), we obtain p = 0.004, giving support to the fact that astrologers can be able to evaluate extroversion in astrological charts.
1.(iii)(g) An interesting meta-analysis related to extroversion can be made also concerning the computer’s abilities. In Section 7.6.1985.2, the study by Dean (1985) is reviewed. The task of astrologers was to determine, based on the astrological chart, whether a person is extrovert or introvert in a data of 120 very extrovert or introvert persons (determined by a psychological test).
1.(iii)(h) For a comparison, the computer made this match based on the following factors (hit rate and p-value in parentheses): preponderant element (56.7 %, p = 0.09), rising sign (55 %, p = 0.16), preponderant hemisphere (53.35 %, p = 0.26), Sun-sign (54.15 %, p = 0.21).� Here, the signs are taken as alternatively extrovert and introvert starting at Aries; preponderance is calculated from all ten planets weighted in the following way: SO and MO 3, ME.MA 2, JU.PL 1. If we approximately combine by using Fisher’s method the p-values of the almost independent three first mentioned astrological factors, we obtain p = 0.08. Thus one interesting theme would be the study of the computer’s matching capabilities in extroversion using a greater amount of data.
1.(iii)(i) In the general discussion of the p-value, it is often stressed that in addition to the p-value, it is informative to give the effect size and confidence interval (an interval where the effect size locates, typically with a 95 % probability). For example, an expensive medicine may have a statistically significant healing effect. But if the effect is that it lengthens the life by only one month, the medicine may not be of interest.
1.(iii)(j) But in astrology, the effect size is usually not as important as in many other areas. Even small effect sizes are theoretically of great interest. Furthermore, a factor whose effect size is small may be important in connection with other factors when an astrologer interprets an astrological chart.
1.(iii)(k) Since small p-values have been given clear support for astrology in many cases, sceptics avoid mentioning p-values. As an example of how this reduces information, let us consider matching tests dealing with case histories (they are the same ones as in Astrology under Scrutiny, 2013). From the point of modern astrology, they are the most important matching tests.
1.(iii)(l) It was shown by Tarvainen (2014) that the average hit rate 56.7 % for 12 case studies is statistically significant with p = 0.006 (the 95 % confidence interval is 52 %…61 %). The authors don’t give this exact statistical information. Instead, they draw a more or less vague “funnel plot” in Section 8.6.3 and give the results as follows: 0.011 � 0.1200. �(At least the first number is erroneous since, by using a formula given in Section 8.2.1, we obtain a wrong average hit rate 50.55 %.)
1.(iii)(m) Also in connection with some other positive studies, the authors don’t report the obtained small p-value.� In some cases, they make their own misleading p-value considerations. For example, in Section 7.6.2012.2 concerning Henning’s potentials, the authors don’t mention the overall p-value 0.03, which is obtained by combining twelve p-values. Misleadingly, they pay attention to only one low p-value, stating that such a single low value is not exceptional.
1.(iii)(n) The authors say the last word also in Section 7.1.1974.2 (Alexandra Mark’s divorce study) in a misleading way. They give the study’s exact p-value 0.017. But then, in an unprecedented way, the authors make their own approximate p-value calculation obtaining p = 0.32. This approximate value is the last p-value mentioned in this study, which misleads readers to believe that the study ended up with a non-significant p-value, even if the exact p-value is significant.
1.(iv) Crud factor – false argument in astrology
Since the p-value as such is a solid, informative scientific concept, we here point to only a lapse in the authors’ text concerning p-values. They claim: “If the sample size is large enough (Meehl gives a case of N= 57�000), the relation between any two variables picked at random will always give significant p values” (so called crud factor). That is, the authors claim that every astrological study gives significant results when the sample size is big enough.
1.(iv)(a) But psychologist Paul Meehl refers to psychology and sociology where all variables are connected via genes and environment. The authors have misread Meehl’s interesting claim so that it would include also all astronomical variables. Paradoxically, the notion that all astronomical, psychological and sociological factors are connected would be straightway against the sceptics’ belief that astronomical and psychological factors are not connected. The fact naturally is that, in astrological studies, we don’t get significant results just by increasing the sample size.
1.(v) War on shuffling
In Section 7.7.2014.2, the authors consider the serial killer study by Ruis (2007), where three control methods are used: two slightly different shuffling methods and the direct use of 6,000 real persons as control persons.
1.(v)(a) The authors make a statistical test that shows that the three controls differ from each other very significantly: for one considered application, p = 0.00001. (Note that the authors present p-values, not effect sizes or confidence intervals they want others to use).
1.(v)(b) According to this test, the controls differ very significantly, and the authors end thereby this discussion with the conclusion that shuffling methods are unreliable. So, did the authors win their second war? No, the authors just didn’t mention the bottom line in Ruis’ considerations.
1.(v)(c) When we look at Table 3 in Ruis’ paper, we see that the considered value in the data is 235, whereas the three different control methods give the following values: 203, 199 and 207. We see that, even if the control values differ from each other (which is quite natural), the obtained excesses are of the same magnitude: 32, 36 and 28.
1.(v)(d) The final purpose of a control method is the determination of the p-value. The authors also omit to give the p-values obtained by the three methods. The p-values are of the same magnitude: 0.005, 0.002 and 0.008 (Table 3). Thus we see that there is no essential difference in the statistical information these three control methods give – the truth is opposite to what the authors led the reader to understand.
1.(v)(e) The authors themselves don’t mistrust the shuffling method when it gives negative results, as in Thomas Shanks’ studies in Section 7.5.1987.2. Shanks was possibly the first astrology researcher to use a form of shuffling methods, in 1977.
1.(v)(f) In the basic shuffling method, the birth years, dates, hours, places are selected in random order. This method is conservative: if it discovers an astrological effect, the real effect is likely to be bigger. We can nowadays ascertain the good properties of the shuffling method by simulations (cf. Tarvainen, 2012).
1.(vi) “Divide and discredit” trick
The authors’ ambition is to show that astrology is based on artifacts. Correspondingly, one can talk about sceptics’ tricks with which sceptics try to lead others and themselves to believe that there are no astrological effects. One trick can be called “divide and discredit”.
1.(vi)(a) This trick is based on the fact that there are an immense amount of different astrological charts. Therefore, the usual situation is that even if we may have a statistically significant result for the whole data, we often don’t obtain significant results for parts of the data. But sceptics use this natural situation as an opportunity to discredit positive results.
1.(vi)(b) The authors use this trick in, for example, Section 7.6.2012.2 (page 292) by considering the data in twelve groups and stating that the results are less supporting of astrology since the effects are not visible in all groups (even if the overall p-value is 0.001, which is not mentioned in the book).
1.(vi)(c) In the same section, they require a replication inside a data sample by dividing it into two halves. However, the fact is, in astrological studies, that practically always a proper replication demands at least the same amount of data as in the original study. The sceptics know that they will always win if the data are divided: 1) they win if the results are not positive in both parts, 2) if they are positive, the sceptics can claim that this happened by chance since the number of charts is relatively small in both samples.
1.(vi)(d) In Section 7.4.2006.1, the authors review Paul Westran’s progressed synastry study, which includes 1,300 relationships and produced statistically significant results. The authors discredit this result by observing that results were not significant in a subgroup of 447 relationships (births after 1900). This is one version of the “divide and discredit” trick: one can always find a subset of the data where the results for the whole data don’t hold.
1.(vi)(e) The classical form of this trick is used in Section 8.6.11. Dean tested how fitting is Liz Green’s computer interpretation of a subject’s chart. The interpretation contained 280 sentences, and the subject checked each sentence for accuracy. Now Dean divided the 280 sentences into three sets, and noticed that, in each set, the number of accurate sentences was not significant and that thus the results gave no evidence that this chart interpretation is more accurate than simply guessing.
1.(vi)(f) But he would have obtained more evidence if he had considered the 280 statements as a whole. There were altogether 104 hits and 77 misses (other sentences were uncertain). A control person obtained 114 hits and 118 misses. By one-tailed Fisher’s exact test we obtain p = 0.057 for the accuracy of Green’s interpretation. And this was just one chart (if, for example, five charts had been interpreted with the same accuracy, the p-value would have been 0.001). But the “divide and discredit” trick apparently convinced Dean himself that there is no need to consider more charts.
1.(vii) Fishing – often an unproductive research methodology
In Section 8.4.3, the authors highlight, in addition to Niehenke’s study, one study by each of the researchers Nora Press, Mike O’Neill and Mark McDonough as a strong evidence that there is no “useful support for astrological claims”.� In these three studies, there were no research hypotheses, but the researcher just tried many astrological factors (100,000, 7,868 and 300,000 factors, respectively) for their significance. This kind of research method is called “fishing”.
1.(vii)(a) It may succeed sometimes but the problem with fishing is that even if there is an effective astrological factor, there is a big probability that some non-effective factors get a greater excess by chance, if the effective factor is not exceptionally strong (as in the Gauquelins’ studies).
1.(vii)(b) If we make a replication with a new data sample, there is a big probability some other non-working factors have a greater excess than an effective factor. Even if an effective factor is significant in the first experiment, it may not be in the second one. We can nowadays demonstrate these fishing difficulties with simulations; an example is given by Tarvainen (2012b). Usually, a better method is to consider the overall significance of several astrological factors that are selected based on earlier astrological knowledge.
1.(vii)(c) It may seem convincing when 300,000 factors are studied, but the fact is that the problems of fishing increase when the number of factors increases since the probability that non-effective factors become significant by chance also increases. In all, these three fishing studies don’t provide the claimed evidence against astrology.
1.(viii) Puzzle: the computer sees faces in the clouds
Kepler regarded – and several modern astrologers regard – aspects as the strongest astrological factors. Aspects are a good subject for statistical studies since, in many cases, one doesn’t have to know the birth hour.
1.(viii)(a) In aspect studies, we can let the computer figure out mechanically the actual working maximum orb in the considered data (Tarvainen, 2013, 2014a, 2015, 2016). In ten applications, the computer has determined orbs for major and minor natal aspects, midpoints, synastry, transits and solar arcs. The estimated orbs are close to those that, nowadays, astrologers recommend.
1.(viii)(b)Text with strong emphasis According to the authors, astrology is like seeing faces in the clouds. So, this would mean that aspects’ orbs were also “faces in the clouds”. How, then, can the authors explain that the computer can also see faces in the clouds which resemble very much those faces which astrologers see? This puzzle was presented earlier to the authors, but they ignored it and omitted to mention the successful orb estimations and the puzzle this presents to sceptics.
References
Astrology under Scrutiny (2013), Astrologie in Ondezoek, Volume 15, final issue, July 2013.
Currey, R. (2014), Astrology under Scrutiny: Close encounters with science, Correlation 29(2): 52:68.
Dean, G. (1985), Can astrology predict E and N? 2. The whole chart, Correlation 5(2): 2-24.
Gaynor, V.M. (1981), Astrology and psychology: keys to understanding human personality. Ph.D. thesis, The Union for Experimenting Colleges and Universities, Cincinnati, Ohio.
Niehenke, P. (1987), Kritische Astrologie. Aurum Verlag.
Ruis, J. (2007/2008), Statistical analysis of the birth charts of serial killers, Correlation 25(2): 7-44.
Ruis, J. (2012), The birth charts of male serial killers, Correlation 28(2): 8-27.
Sakoian, F, and L.S. Acker (1976). The Astrology of Human Relationships. Harper & Row.
Tarvainen, K. (2011), Classical synastry works on the Gauquelins’ data, composite and Davison don’t, The Astrological Journal, Volume 53, Number 1, January/February 2011.
Tarvainen, K. (2012), Henning’s synthesis method shows validity of astrology in the Gauquelins’ data, Correlation 28(1): 25-43.
Tarvainen, K. (2012b), Ordinary astrology works on the Gauquelins’ data, The Astrological Journal, Volume 54, Number 2, March/April 2012.
Tarvainen, K. (2013), Favorable astrological factors for mathematicians, Correlation 29(1): 39-51.
Tarvainen, K. (2014), Positive Results in the Book Astrology under Scrutiny. Correlation 29(2): 41-47.
Tarvainen, K. (2014a), Effects of Venus/Saturn aspects in marriages, Correlation 28(2): 7-14.
Tarvainen, K. (2015). A study of major and minor aspects in theologians’ charts. Correlation 30(1): 29-36.
Tarvainen, K. (2016), On the estimation of aspect orbs by the computer. Manuscript sent to Correlation.