CORRELATION

The Astrological Association Journal of Research in Astrology

Response to Understanding Astrology
(Dean, Mather, Nias & Smit, 2023)

By Vincent Godbout, BA, BSc.

Review of “An Automated Matching Test” (2020) Correlation 32(2) pp.13-41.

The authors’ review exhibits a concerning level of incompetence.  This prompted me to draw on my experience as a retired mathematics teacher and author of various mathematics textbooks. It appears that their approach systematically undermines the validity of the results, leaving the reader to wonder whether this is negligence on their part or deliberate sabotage.

7.7.2020.5 Comparing birth charts with biographies N=41+32

It starts badly. As we will see later, N= 41+ 32 is erroneous and demonstrates a gross attempt at p-hacking.

An extension of his 7.7.2019.2 study (p. 732).

No. It is even rather the opposite. In fact, this research was published after the characterology study, but it was developed a year before the latter.

The Expert System Debate

Then, from the very beginning, the “skeptic” writes:

Godbout calls his program an “expert system”, but it is based on tradition and is not expert in the sense of being based on a database of expert knowledge (p. 732).

An expert system is a computer system that simulates the decision-making ability of a human expert.” The term expert system is very specific where there is a knowledge base, inference system and a user interface.  Mastro Expert qualifies on all three counts. My computer system makes decisions similar to that of a human expert in astrology. Astrological knowledge is based on tradition and that is the most reliable source of expertise. This critical remark poses two problems.

  • First, it suggests that astrological knowledge cannot be subject to expertise since it is not validated; but it is precisely the object of this article which claims to validate astrology; therefore, a real skeptic and not a simple denier should at least suspend judgment until the end.
  • Second, the truth or falsity of a field of expertise is irrelevant. One can be an expert in almost any complex and structured field. For example, one may be an expert in ufology, one may be an expert in Atlantis, one may be an expert in unicorns, and one may even be an expert in theology.

The Personality Analysis and its Relevance on Astrological Research

Then comes a long commentary on the genesis of different scientific systems of personality analysis in order to disqualify the analyzes used in my Matching Test which are based on lists of keywords:

“To better understand the technical challenges of biographical keywords some background may be helpful. For most of astrology’s history there were no reliable methods of classifying human personality. In psychology this prevented all progress until the 1930s when UK and US psychologists developed factor analysis, a statistical technique for reducing a large number of variables to their underlying factors… But Godbout’s study does not use factor analysis, or proper controls, despite which some astrologers claim his approach is the first major advance since Vernon Clark’s pioneering studies of 1960-1961” (p. 732).

This interesting and long digression is irrelevant in the present context. “Interesting” because indeed the works of Allport, Odbert, Cattell and Eysenck are admirable and fascinating. It is possible that astrology could benefit from a similar approach. I have often dreamed of doing factorial analysis with reduction into principal components for the astrological corpus but, for that, I would need a very large experimental sample but, we research astrologers are excluded from universities and we don’t have the budget to do that precisely because of people like those who write Understanding Astrology. They reproach us for not doing what they prevent us from doing. Besides, once again, all this talk about personality theories is inappropriate. The purpose of my study is not to build a theory of personality and nothing obliges me to use such a theory. If my theoretical background is weak or poorly structured, it can only disadvantage me and play against the success of my experiment. My study is a simple blind matching test inspired by Vernon Clark and Carlson. In these protocols, the astrologers were not required to know personality theories and this posed no problem to Geoffrey Dean who was a consultant for the second study. The participating astrologers were not required to stick to a corpus of 20 “acceptable” words: in fact, nobody was concerned with what was going on in their heads, which was a black box; all that mattered was the results. For example, in a blind wine identification contest, a competitor could back up his choices with great oenological speeches, but this is not what is expected of him; For example, in a blind wine identification contest, a competitor could support his choices with great oenological speeches, but this is not what is expected of him; he is simply asked to make the right choices, and only this performance is evaluated. The corrector wants a chemical analysis of the wine, whereas a blind test is enough to demonstrate discernment. That’s what I did.

Astrological Keywords

Then we can read a description of the method I used to generate the lists of astrological keywords:

By computer Godbout generates chart keywords for the signs, aspects and midpoints in each chart from a pool of 6006 keywords extracted from the books of 25 leading astrologers. Houses are not included because they are generally ignored by most users of midpoints (p. 732).

Here, big problem of comprehension. The student is on the verge of failing the course! This number is wrong. I never wrote that; I wrote that I had 5148 combined factors as well as 858 midpoints. This indeed does give a total of 6006. But these are 6006 astrological factors, and not 6006 words.  Moreover, I have given the number of keywords of the expert system: in Appendix A, under table 17, I talk about the list of 2885 words of the astrological corpus of Mastro Expert. This is a sloppy mistake and while it could just be an oversight, it indicates that the authors have not grasped the experiment.

Houses are excluded not “because they are generally ignored by most users of midpoints” as assumed by the authors. The rationale for their exclusion, as explained in the article, is the absence of a consensus on house systems and the appropriate keywords among eminent astrology writers. This lack of consensus has led to a decision to exclude houses from the analysis to ensure the accuracy and reliability of the findings.

Biographical Keywords

Godbout’s subjects are famous people with biographies taken from the 100 biographies given in Subtil and Rioux’s 2014 book Le Monde… Only those with recorded birth times (source not stated) were used… (p. 733).

It’s true that I forgot to cite my source and I regret this omission. The sole source of all my birth data is www.astro.com/astro-databank

N=41+32 Revisited

[Godbout]…divided into two sets, those with times rounded to the hour or half hour (N=32), and the rest more precise (N=41) (p. 733).

In their analysis, the authors mistakenly conflate two essential facts. It is true that the data is divided into two samples: one consisting of 41 subjects and the other with 32 subjects. However, it is crucial to understand that this division serves the purpose of replication. These 73 subjects were the only ones in the source with birth times that met Rodden’s criteria for accuracy, as explicitly explained in the paper.

The confusion on the part of the authors arises from the fact that, in the section titled “Effect of birth time rounding,” I explain that the total group could be divided into two parts. One part comprised 47 subjects with birth hours of the form xx:00 or xx:30, which were probably rounded, while the other part had 26 subjects with probably unrounded hours. Surprisingly, the overall results showed more significance for the group of 26 subjects with unrounded birth times, despite its much smaller size compared to the group with rounded times. This observation adds a level of consistency and plausibility to the research findings, as it suggests that the more precise the birth time, the more significant the result.

There are two potential reasons why the authors conflate the two independent data divisions throughout their review: intentional misrepresentation of results that conflict with their beliefs or incompetence. The former seems more likely, as it masks the fact that the results were replicated and that significance improved with increased time precision.

Then he writes:

The biographical keywords for each subject are obtained by taking all words from their biography that relate to character and standardising them like chart keywords into nouns to facilitate comparison… To avoid bias the extractions were made before birth data were entered, but of course to be sure it needs to have been made by people ignorant of how astrology might relate to the subjects (p. 733).

I agree. It would have been better to ask someone who didn’t even know astrology to do the word extraction. But it was a job of about a hundred hours and I did not have the budget to afford this expense. All I can say is that I did not know the astrological charts of the subjects. However, in several cases, I knew their sun signs, but this information is completely irrelevant in this research as I will discuss later. The best guarantee I can give of the objectivity of this extraction is to go and see the source to verify the impartiality of my work.

The Matching Test Process

The matching is performed automatically by a computer program that (1) compares each of the N sets of 124-230 chart keywords with each of the N sets of 28-96 biographical keywords, (2) records how many times a word appears in both chart and biography as a proportion of the mean number of keywords. For example for Picasso there were 96 biographical keywords, 152 astrological keywords, and 41 joint appearances, so score = 41/((96+152)/2) = 0.348 (p. 733).

Whoops! The student just sank the class. He wrote: score = 41/((96+152)/2).

This formula is incorrect and what he wrote is not equal to 0.34.8. In fact, the correct formula is:  score = (41/96+41/152)/2. A high school student knows that when you add 2 fractions, you can add the numerators but not the denominators. But since I’m a good sport, I’ll give my reviewer a retake and keep reading to the end.

Then we read:

For 41 subjects with precise birth times the hits (6) were highly significant (binomial p = 0.00045), but for 32 subjects with rounded birth times the hits (3) were not significant (binomial p = 0.077) (p. 733).

No! As I explained earlier, these are not 41 subjects with specific birth times but rather the first 41 subjects from my sample who were randomly selected from the book before processing the remaining 32 subjects for a replication.

Moreover, for the 32 subjects of the replication, by giving p=0.077, he deliberately selected the only non-significant p-value in a table presenting 8 binomial p-values where the 7 others are very significant, omitting for example p = 2.89 E-04. This obvious intellectual dishonesty is appalling. I don’t know if the reviewer realizes how much he disgraces himself with such crude procedures. For a long time, I was a great admirer of Geoffrey Dean who was an inspiration to me. I miss the days when we could have good debates with bona fide skeptics. Such a level of bad faith is pathetic and childish. Looks like someone willing to do anything to save his life on a sinking boat.

Continue reading this little gem:

As judged by astrologers, they do not usefully match at all, see 8.6.1 (p. 733).

Section 8.6.1 is about matching tests with astrologers. As I mention at the beginning of my article, my research is a follow up to these apparently failed tests. Moreover, in reality, astrologers have been quite successful in matching see Clarke, Marbell and Carlson

Then:

We should also ask about the extent to which each chart factor is known to be accurately described by its keywords under controlled conditions (p. 733).

I partly answered this question by quoting the authors from which I extracted the keywords to describe the astrological factors. All of these sources are verifiable. Furthermore, it can be verified that this extraction work was done several years before the present matching test. So, there can be no bias. Moreover, from the point of view of a “believer”, if the precision of the keywords to describe the astrological factors was doubtful, it could only play against the success of the present blind test. And for a non-believer, the question is irrelevant and it would not change anything since questionable precision or not, it is pure random chaos which has no chance of producing meaningful matches. Why is he worrying about astrology when it could be accuracy for describing clouds in the sky as far as he is concerned?  Is he now representing the astrologers’ point of view?  “It’s not just all fake, but it is inaccurately fake!”

Irrelevance of a Control Group

That the study still proceeds without controls… (p. 733).

I frequently include control groups in my studies.  However, I don’t see what it has to do here. The reviewer seems to like this word that he repeats all the time without realizing that it is not relevant. In Carlson’s experiment, there were no more control groups than in mine. A control group serves to simulate chance correctly. Here, for a subject, chance is represented by the other 72 subjects that do not correspond to his astrological chart.

Debunking a Circular Argument about Signs

… and finds (among other things) that signs are valid is good reason to question the procedure (p. 733).

This is plainly a circular argument. This confirms the authors’ overriding prejudiced belief that any study that finds astrology valid is a good reason to dismiss the procedure.

In fact, I am well aware of the research that have claimed to invalidate the signs especially by Gauquelin. But it was always unifactorial research. The planets in signs taken individually have a weak effect size that is hard to detect but studies have now found ways around this problem. There is now good evidence for signs from Tarvainen (2011-2022), Currey (E&N, 2017-2018, 2021a, SCOTUS, NY Suicide 2021b) and Kollerstrom (2023) using Gauquelin’s data.

Moreover, the signs do not constitute the basic factors of my analysis where I only used them with the aspects. It should be kept in mind that the present study is multifactorial using combinations of factors which, taken in isolation, are of little effect like tiny pixels in a picture. However, with multifactorial research, it is the combination of factors that adds statistical power to the effect. Obviously, this approach is a much closer simulation of the practice of astrology.

And:

Without factor analysis Godbout’s 6006 chart keywords are impossible to evaluate even before we worry about their known low or zero validity (p. 733).

Just to be clear here, factor analysis is relevant when we aim to identify a limited number of factors that can explain most of the observed variance. It’s no more relevant to my matching test than it was to the Carlson matching test. Factor analysis has absolutely nothing to do with this kind of study. The authors’ problem is that they are unable to use one their preferred reductionist approach to discredit the study.

Keywords Evaluation: An Insight into the Matching Process

See if you can guess who the subject is: Achievement, action, adaptation, addictions, aesthetics, affection, aggressiveness, agitation, ambition, amiability, ardour, arrogance, art, authority, beauty, benevolence, boldness, breaks, brio, caution, change, changing mood, charm, collaboration, comfort, commitment, communication, companionship, compassion, conflict, conversation, convictions, conviviality, cooperation, cordiality, courage, creation, creativity, … Obviously, nobody could guess with confidence. Nor could they if told it was a US President. Nor could they if told it was either Bush, Clinton, or Kennedy (it was Clinton.) But the computer compared huge clouds of keywords and settled on Clinton (p. 733).

Astrologers are not trained to match character traits with a birth chart due to the complexity of the variables involved. No astrologer claims to be able to blindly select charts from traits. And, as discussed in my article, it is beyond human capacity to process this indigestible number of words to arrive at the correct answer among 73 possibilities. This collection of words is not written as a descriptive biographical narrative.  However, a computer is well suited to processing that form of data and the Semantic Proximity Estimator accomplishes the task with much success. The case of Clinton exemplifies this, as we can see in Table 10 where he ranks first. We can only bow to this automatic result and we can only begin to ask real questions if this situation occurs too often.

Moreover, even supposing that he would be right to say that “Obviously, nobody could guess with confidence”, precisely this observation could only play against the success of the test; the more chaos there is, the less it is possible to obtain meaningful results. Matching tests are designed precisely to avoid this kind of discussions about the sex of angels.

Next:

To the extent that leading astrologers base their interpretation of chart factor X on what is known about famous people whose charts contain X, the comparison of famous people with keywords based on such interpretations reduces to a circular process where occasional success is unremarkable (p. 733).

This specious argument is not valid for anyone who knows a little about astrology and how it developed. Almost every modern claim is echoed by an ancient one (Lilly to Ptolemy) though they were limited to the seven visible planets. Some of the authors who served as references lived before the subjects of my study. Thus, for obvious reasons of history, culture and complexity, the authors I have consulted are generally in agreement with each other and they have not created their interpretations from the biographies of Abbé Pierre, Gianni Agnelli, Arletty, Raymond Aron, Antonin Artaud, Robert Badinter, Édouard Balladur, François-Marie Banier, Raymond Barre, Maurice Béjart, Silvio Berlusconi, Berthold Brecht, Marcel Carné, Henri Cartier-Bresson, Edmonde Charles-Roux, Jacques Chirac, etc.

The Golden Rule Debate

The matching of biographical keywords to chart keywords will necessarily ignore those chart factors such as houses that are excluded, which contravenes the Golden Rule that only the whole chart will give a valid interpretation (p. 734).

This is the start of a straw man argument.  First, they suggest that I follow a “Golden Rule” that only a “whole chart” works.  False, even if a whole chart works better than individual parts.  Then they go on to define a whole chart as one that must include houses.  But why does it end at houses?  Why not the planetary Nodes, Chiron, asteroids, fixed stars, minor aspects, harmonics?

Also, his “Golden Rule” is contradicted by this own previous criticism about using Sun signs.  Surely no chart is whole without inclusion of the Sun sign but if I use Sun signs then I am deemed to be using flawed data.

If the reviewer thinks that the presence of the houses is absolutely necessary to respect the “Golden Rule” of the analysis with the whole chart, he has only to replace the word “astrology” by the word “cosmobiology” and admit that I have demonstrated the validity of the latter. This famous “Golden Rule” that he pulls out like a rabbit out of a hat is largely respected in my expert system. And above his “Golden Rule”, is Ockham’s razor: I’ve done much with as little as possible.

The Impact of Self-Fulfilling Prophecies on Astrological Matching

The popularity of astrology might generate self-fulfilling prophecies in both subjects and their biographers, thus creating artificial matches (p. 734).

In psychology, an attribution bias or attributional bias is a cognitive bias that refers to the systematic errors made when people evaluate or try to find reasons for their own and others’ behaviors. The reviewers’ argument of circularity only makes sense in simple cases like that of self-attribution in relation to the Sun signs as detected by Eysenck and Mayo in the context of their research on extraversion-introversion linked to the polarity of the Sun signs. Eysenck explained that the self-attribution argument only works for people who have a superficial knowledge of astrology and that it only applies to the Sun signs. So, indeed, self-attribution of Sun signs may be valid in certain cases or in some outcomes may not be valid at all (Currey 2023). Its Effect Size is small and is diluted to being negligible in my Matching Test since I take account of all planets, aspects, midpoints and multiple factors, the Sun sign being only a few pixels in the complexity of the factors. In other words, self-attribution has nothing to do with the complexity of the astrological analysis generated by the expert system. Therefore, in the present context, making such an argument is even more delusional than believing in astrology.

Besides, I also tested the solar sign hypothesis by trying to see if Sun signs alone, tropical as well as sidereal, yielded significant results. In both cases, the outcome was not significant.

Practical Importance vs. Statistical Significance

To focus on p values is to ignore practical importance, which is unlikely to impress working astrologers (p. 734).

There are two points here.  First, the effect size of the result (r = .63) has been shown to be strong (Currey, 2022 p.44).  Second, the practical value to working astrologers has been amply demonstrated in my second study (Godbout, 2021) where I evaluate different techniques retrospectively.

The Irrelevance of Chi-Square

despite the well-known uncertainties involved in calculating chi2 with expectancies less than 5 (p. 734).

Here the author alludes to the Cochran frequency rule: no more than 20% of the data with absolute frequency less than 5. But the catch is that the chi-square is not used in my research! I am more and more certain that the reviewer has not read my article thoroughly. If we want to be scrupulous and apply Cochran’s rule to the binomial law, we will observe in Tables 10, 11 and 12. that, for a total of 24 cases therefore, of 24 cells, only one cell among them has a frequency lower than 5. Naturally, my critics, with their ‘infinite impartiality’, picked the worst case example among my results.)!

Next a table with:

N=32 rounded… N=41 precise (p. 734).

WRONG!  I have already commented extensively on this gross and persistent error.

the expected effect size is √((61/176) = 0.59, so those above are nothing special and are not necessarily due to astrology (p. 734).

This is frankly gibberish. I would be seriously worried if it was submitted by one of my mathematics undergraduates. Drawing on my experience as a magician and the dubious methods of these authors, this is just smoke and mirrors designed to look authoritative.

Analyzing Near Misses

So near misses are identified via a thesaurus (p. 734).

Again, not correct! The term ‘near misses’ have no connection with the Thesaurus. This is part of a discussion in Appendix A that does not impact the main experiment. Again, not correct! The term ‘near misses’ have no connection with the Thesaurus. This is part of a discussion in Appendix A that does not impact the main experiment. Again, not correct!  The term ‘near misses’ have no connection with the Thesaurus.  This is part of a discussion in Appendix A that does not impact the main experiment. I was advised not to include this appendix and warned that surely bad faith skeptics would try to use this to create confusion; but I decided to talk about it anyway because I found the subject interesting. Everything in this Appendix A about the analogy is not at all necessary to validate my results. Moreover, we notice at the end of this discussion in Tables 21, 22, and 23 that the results are hardly improved by the analysis of the analogies using the Thesaurus. So, this entire following paragraph about analogies has no purpose in this review since it relates to Appendix A which describe a technique that had almost no effect on the results. So, the expression “near misses” is not at all related to synonymy or analogy; it’s simply a way of taking into account that one could consider as a success the fact of being in 2nd place or even in 3rd place or in 4th place etc. And I wrote “near misses” because success was not obtained on move 1. This is clearly explained in my text. On page 23, in the 2nd paragraph of the “Counting near misses” section, I wrote:

“But why use only the maximum scores, restricting the game to 1 chance out of 41? In matching games like this, “near misses” can also count. We can include the 2nd rank in the winning option, giving us 2 chances out of 41. We can also include the 3rd and 4th best score in the winning option, giving us respectively 3 or 4 chance out of 41 to win and so on”.

It seems to me that it could not be clearer.

Ranks after the first Godbout also counts hits when the subject’s chart is ranked 2nd, 3rd, … 8th out of N vs the subject’s biography, which gives 2, 3, … 8 opportunities for counting hits instead of just one (p. 734).

Well, it looks like he finally got it! That’s what I call “near misses” and we’ve finally come back to the heart of the matter.

A few lines later:

although the advantage of precise birth times over rounded times is often reversed (p. 734).

But not at all! This remark is false since the reviewer confuses the original group and the replication group with the 2 groups based on birth times.

P-Hacking and Misleading N Values

Sample              Hits:     Obs  Exp  Obs  Exp  Obs  Exp  Obs  Exp  Obs  Exp  Obs  Exp  Obs  Exp

N=32 rounded                 3          1        6         2         7          3       12         4        13        5        14      6        16        8

N=41 precise                    6          1        7         2         7           3       9          4        11         5       15       6        17        8

p 32, 41                           .08   <001    .01     .003    .03      .03   <001    .02    .001    .009  .001  <001  003   .001

The table (above) from Understanding Astrology serves as a prime illustration of p-hacking. This practice involves manipulating a sample to achieve a desired p-value, essentially engaging in statistical cheating. But the reviewer screwed up. Here, the group of 73 subjects is improperly divided into smaller subsets, using inaccurate criteria, all with the intention of generating insignificant p-values. As a matter of fact, the division he operated here is artificial and, moreover, it is false: it is not a question of 32 rounded and 41 precise as I explained previously.

The true final table of binomial results relating to the 73 cases of the study includes 8 cases, the least significant of which gives a p-value of 0.00875. The most significant case gives a p-value of 0.00025.

Why hide these results which constitute the most important part of my research? What kind of scientist would go to these lengths to suppress evidence and why?

Then, two lines later:

their interpretation is not possible without appropriate controls (here unrelated charts are not good enough because they do not allow each variable to be independently controlled (p. 734).

Why did these critics always miss the point?  Did they insist on equally strict and superfluous demands in the Carlson experiment when the results appeared to invalidate astrology?

The independent control of each variable is another issue that could be the subject of further research. But this is beyond the scope of this, already extensive research.

Demystifying the Erroneous Simulation

We arrive now or the pinnacle of absurdity: A simulation to demonstrate that my results are due to uncontrolled artifacts.  

Create a pool of 6000 unduplicated keywords from which simulated biographies and charts can be picked at random without replacement for variously N=28-96 and N=24-230 respectively (p. 735).

It starts badly: the corpus of the expert system only includes 2885 keywords and he uses 6000. He uses 28-96 and 24-230 which are only extreme values. How can we make a valid simulation if we don’t even know the (non-uniform) distribution between these extreme values?

In fact, it is a pseudo scholarly and incomprehensible simulation extremely badly explained. I think the results from this test are largely made up knowing that it will look impressive to impressionable people and no one will bother to check it out and just give up. It recalls the heartbreaking attempts of people who are still trying to demonstrate the squaring of the circle even if Lindemann demonstrated its impossibility in 1882.

Imagine a bag in which there are 72 black marbles and one white marble. With my eyes closed, I randomly draw a marble. Of course, I have a one in 73 chance of selecting the white marble, and this can be confirmed experimentally. The reviewer tries is determined to prove that the probability of selecting the cue ball is not 1 in 73.

Conclusion

I entered the ring with my boxing gloves on for a good match and find myself facing a WWF clown trying to knock me out with a chair.