Hello! I have been doing a lot of development of my News Aggregator Service for Astrology and want to share here an early write-up all the way from 2019 that still describes the most salient features. In the fresh reboot, during one week alone over 5000 astrology data have been cultivated. Click for more details.
(I was supposed to present this paper at a research conference in India, but sadly the whole conference got canceled. That is okay. I am glad I was able to make many great new improvements that are now available in the current version online here at AyurAstro.com.)
Contemporary astrologers are challenged like never before to make their work scientific. This is especially true for astrology researchers who are stewards of the keystone for legitimacy in astrology. Fortunately, current strides in artificial intelligence allow new avenues to data cultivation for the public and private astrology researcher. The author presents her freely available solution to the challenge, using in her explanation basic, intermediate, and advanced concepts.
Astrologers the world over from all times, whether Eastern, Western, Persian, Chinese, or Tibetan, face the same challenge. Event data of time, place, and date, including that of births, are mandatory for us to do our work.
Acquiring that data is more difficult than at first blush it may appear. For example, if one is interested in sports astrology and one wants to know the start of a basketball or cricket match from five years ago, how does one actually find that?
Perhaps there is an online news article from that time, but there are likely to be hidden problems. What I have found is that there is a wide variety among news publications as to how they treat such events. For example, BBC will include the time of the match in subsequent or preceding write-ups, while The New York Times will not.
Another part of the challenge is the language used in an article. For example, the article may say something like “the match happened last Sunday”. The New York Times often uses such an approach, begging the questions of which Sunday is that and what is its date. It may be obvious to the contemporaneous online reader but not to one years thereafter.
Then, there is a question of whether those articles are even archived and hence publicly accessible for some time later. Often, only the articles subsequent to an event are kept posted online and then only for a few years.
Finally, there is the strong possibility of an astrologer not knowing that the event exists in the first place. For example, let’s switch our focus to cyclone event data. There was a cyclone in the Spring of 2019 named Cyclone Ida.
There are hundreds of cyclones per year. (National Hurricane Center, 2019) Twenty years from now, will an astrologer who is interested in cyclones know to do an online search of contemporaneous literature for that particular cyclone? Will there still be material online for that cyclone?
Doing astrological research is difficult. For the cyclone researcher, he or she may need to pore over tens of thousands of webpages just to get that kernel of truth of when, where, and what time a batch of cyclone events occurred.
Reading that many documents takes time. Even if a list of ten thousand website URLs were presented instantly to the researcher, spending three minutes to scan each article would necessitate a solid 21 hours straight of such focused activity. That is not counting the search itself for the URLs or the recording of the results.
Given these daunting facts, there is no wonder as to why astrological research is relatively rare and often scanty or incompletely done. Science, by contrast, demands first and foremost the relative accuracy and completeness of data. This core difference between astrological research and conventional science represents the deepest chasm between the two world perspectives.
Bridging this chasm of data completeness would do the most to unify the work of all astrologers everywhere within the paradigm of science that could be said to define our current societies.
A data challenge needs a data solution.
The Information Age in which we live deluges us with data. Fortunately, in these times, we are being presented with newly hatched tools to automate the search for and the processing and retaining of data in bulk. The newest and best methods use artificial intelligence (AI).
At this time, these AI tools are being used throughout every industry except astrology. These tools presently are also not for the mathematically faint of heart, so there is still a pretty steep learning curve for implementing them. Commercial industries find that the effort directed to the computational and mathematical gymnastics is more than worth it. The results are phenomenal, world-changing, and transforming to the industries even as the computer science in use is still relatively in its infancy.
What is needed is a means of using these tools to help the future astrology researcher who is within that exponentiating global data deluge to find and record the relevant data that was previously preserved as it happened.
Just as astrologers use books for transmission of knowledge, email for communicating with clients and each other, and software for chart generation, it is time for astrology to join the rest of professional society by investigating and employing the latest machine learning methods to generate artificial intelligence within astrology.
Event data generation is the first order of business for the personal or public research astrologer. Using artificial intelligence on global public information to cultivate event data should be the first order of business for a current computer scientist who wishes to help astrology.
There are a few steps to a computer-automated processing of current event data to help a future astrologer.
A computer program would be needed to 1. monitor daily current events as they are published in online articles. The computer program would then need to 2. linguistically digest the complex language of online news to find A. the nature of the event (for example, a new cyclone) and B. the date, time, and place of the event. After digestion, the computer program, at a minimum, would need to 3. preserve A. and B. for future use.
The basic triplicate of requirements of monitoring, digestion, and preservation of astrologically relevant data in a timely manner is in truth a cascade of separate algorithms, or sub-programs, each of which has only been developed recently, sometimes as recently as just a few months ago.
Monitoring can be done through a process called web scraping, wherein the texts of new articles are programmatically gleaned. However, of the millions of new articles that get published per day, which have data relevant to events of interest to the future astrologer? There is a computational cost to aggregating such monitored data. Even if the program takes but one second per article to seek out, acquire, and analyze for relevancy the text of the article, only 86400 such articles could be processed per day for this step alone. Thus, for current network, hardware, and software constraints, a necessary restriction in the numbers of internet sources and research topics is required.
Digestion is the most technically challenging part of the three-step process. The computer program would need to do nothing less than understand language itself. Take this sentence as an example: “Last Sunday night, three hours after kickoff, the Chicago Bears won the Super Bowl championship against the Denver Broncos.” The program would need to know that A. the article is about a match in American football and B. what was last Sunday’s date. The location and time of kickoff would, one hopes, similarly be determinable through other sentences of the article.
Digestion makes heavy use of something called natural language processing, a field of artificial intelligence that is still rapidly improving. The following two sections explore this step in some detail.
Finally, preservation of the data in a central, publicly accessible repository is the last, necessary step to the program. This is the easiest step of the three and typically just requires publishing the data file online.
Natural language processing (NLP) across different languages and across different subjects is a wide computer science field with many compartments. The main ones of relevance here are automatic summarization, natural language understanding, and question answering.
Automatic summarization is needed to determine whether an astrology research topic is indeed the topic of the article. Typically, this is done by condensing the article into a small summary and seeing if the topic is present therein. How does one determine the summary of an article? The non-trivial prospect of text extraction and synthesis based on statistical significance metrics and pattern-matching is a relatively straightforward type of summarization. These days, abstraction of language content is also often required as well. Machine learning based on linguistic feature extraction is at the core of this memory- and computation-heavy abstraction process. As a sub-field of NLP, automatic summarization benefits as being one of the longest studied topics of it. (Hahn & Mani, 2000)
Natural language understanding is the study of how to improve comprehension of text by a computer. It is a necessary middle step to answering questions from the text. The reader may be familiar with interfaces for language understanding from Siri, Bixby, or Alexa apps on their cell phones. Such commonness belies the extraordinary technical achievement that they represent. For example, one may say “The man started on a new novel,” and “The man started on a sandwich”. The formal structure is identical even though the meanings are wildly different.
Finally, question answering is critical to the astrologer’s needs. Even with the well-understood text of an article whose topic was deemed relevant, asking questions of the article is also non-trivial. Fortunately for astrologers, there are only three main questions. On what day did the event occur? At what time of day did the event occur? In what town did the event occur? However, the news article may be ambiguous on these matters or it may not include these details at all. For example, for a cyclone, what counts as the birth event? The initial early formation far off at sea? Landfall? Landfall only reaching a major population center? Such delicate domain-specific answering to a question is only lately, in the past few years or so, reaching maturity.
The implementations of automatic summarization, natural language understanding, and question answering that are useful to know about for my project involve neural nets.
Neural nets, a.k.a. artificial neural networks (ANN), are a way for the computer to solve problems of NLP that is lightly based on how biological animal brains work. Their loose depiction is as follows.
Visual examples of the special aspects as end points to triangles pictorially show the language that describes the planets Jupiter, Mars, and Saturn.
My derivation of the angle values:
[Edit: this study was subsequently repeated with NFL player data, and the same negative results held and in the same way.]
It was a topic of conversation: does body mass index (BMI) or even height or weight vary among the sun signs?
Ayurveda would say that they may vary by Ascendant, not particularly the sun sign, but getting good data on the Ascendant is a notable difficulty, because the Ascendant depends on the birth time of day.
Getting tens of thousands of charts with that level of precision is nearly unheard of.
A psychology prof from France named Michel Gauquelin compiled thousands of charts in the 1950's and 1960's that included birth time and hence Ascendant information. I am a little skeptical of the quality of these charts, as many are from the 1800's and more worringly, many use Local Mean Time, which if you back-engineer, describe the birth time as being at 10 am or 11 am on the dot.
So, I feel I can not use Gauquelin's data. I am always on the lookout, then, for credible birth or event data, scouring data science competition sites like Kaggle for the elusive ideal data set.
I came across SoFIFA.com in that way. It gives very good data on the various thousands of professional male soccer players associated with FIFA, including their birth date and height and weight. Birth time and place are not given.
However, if I could just "scrape" that data, that would give me a way to test the conjecture that BMI or even height or weight may vary by sun sign. So, that is what I did.
[Edit: The latest and greatest on this project, including source files, can be seen at the publication link here.]
I have posted a few times here about a rich dataset that I have.
Data were obtained from Stanford's SNAP data repository of Amazon.com reviews that gave daily misspelling rates; astronomical data were from Wolfram's Mathematica software and its astronomy resources.
Here is what the dataset looks like. Click on each picture to enlarge.
Each of the 5296 rows represents a sequential day in a 14.5 year span of Amazon review misspelling rates during Jan 1, 2000 to Jul 1, 2014.
Across the top are the labels. In each column is a simple, stable, linear function of the right ascension (i.e., the astrological Tropical degree) of the planet, moon, or star at midnight at the start of that day in London, UK. Retrogressions of the planets are also included.
The final column is the log of difference of the misspelling rate of the day from the 27-day SNIP baseline. (The Moon's right ascension completes its cycle every 27 and change days. That is the shortest cycle for any of the right ascensions.) Thus, it is the data over time minus its background noise. The following is a graph of this column's data over time.
SNIP stands for Sensitive Nonlinear Iterative Peak-clipping algorithm. This method preserves any cyclic patterns -- such as the planetary placements and retrogressions -- while discarding "background noise" in the data, which would tend to obfuscate the patterns. The SNIP method is not subjective. It comes out of processing signals within spectra and is unprejudiced. Note that the SNIP method comes from signal processing and tends to preserve cyclic behavior in spectra.
The apparent cyclicity hidden within this data is revealed via a correlelogram:
The thin bands represent the start and end of Mercury retrograde across 14.5 years with Mercury retrograde analysis being the original motivator for acquiring this data.
For today's study, the data for the first 80% of days were developed into a training group, and that of the subsequent 20% of days were isolated as a test group for prediction.
What was doing the training and testing? They were done entirely by an automated machine learning (AI) algorithm from BigML.com called DeepNet. DeepNet* was applied to the training set of the first 80% of days. This DeepNet was then tested or evaluated on the last 20% of days. The DeepNet is a hands-off technique offered to anyone for free.
The chart below displays ridiculously good results as given in the usual AI industry way: the error rates for predictions for the 20% test group by DeepNet (in green) is dramatically smaller than other standard methods of prediction, in gray, which are based on the mean (average) rate of the training data or an approach assuming random chance. Moreover, the strong R-squared suggests good correlation of predicted misspelling rates to actual values only for the astronomical data of the DeepNet.
Astro-databank is the resource for researchers in astrology. It is a repository of birth information of many thousands of people and events and includes biographic data as well as the birth time, place, and date. Of high utility is the included Rodden Rating which tells us the accuracy of each chart.
An AA rating is "Data as recorded by the family or state". The expectation of many researchers is that AA data is of the highest accuracy possible and can be used freely. In the following, I show statistically that it is extremely unlikely that the AA rating charts altogether are accurate. I will be considering charts of people only and only those born at or after 1930.
From here on, I will try to state things as explicitly as possible.
The common assumption (the null hypothesis) is that AA rating charts are all of high accuracy and hence, taken altogether, exhibit behavior of high accuracy. My assertion (the alternate hypothesis) is that AA rating charts do not exhibit behavior of accurate birth data.
One behavior of accurate birth data is that the minute of birth is evenly distributed. That is to say, a birth time of 8 minutes after the hour is not expected to happen much more or less than a birth time of 9 minutes after the hour, for example.
The following is a simulation of a uniform distribution so that you you know what its plot looks like.
When it comes to modeling real-life phenomena for astrological research, earthquakes are one of the most widely studied.
And why not? After all, the exact time, place, and day are known as well as the strength of the effect (in magnitude).
However, a mapping of earthquake strength to solar system events has proven to be elusive, not to be dramatic, but until now.
Inspired by this Kaggle post, I decided to try my hand at this perhaps age-old problem, and I found that yes, earthquake magnitude correlates with the moon phase at the time of the event. (Moon phase has been looked at quite often but not with the model I will present today.)
First off, I went to https://earthquake.usgs.gov for the earthquake data. (Thanks to Joe Ritrovato for the link.)
I wanted to look particularly at all earthquakes of any depth between Jan 1, 1975 and Jan 1, 2005. Those years were chosen, because a uniform seismograph was finally used through out the world by the mid-1970's, and hydraulic fracturing with its associated quakes was not yet in widespread practice. The search was further restricted to earthquakes of magnitude greater than 5.5, following this system of what counts as a serious earthquake. (Some lower limit to the magnitudes was necessitated by the search limit on the USGS site.)
Here is what my search looked like (be sure to also choose earthquakes only below the fold):
And here is what you will see if you press enter:
By reading the book Cymatics, many of us thrilled to the idea of vibration made visible in that gem of a book from the late 1960's.
With a few lines of code I have decided to plot the equations which go to heart of, and could be said to generate, these beautiful forms.
More motivated me than just the chance to look directly at and witness the imagery. I have seen some claims that the Shri Chakra could be seen from these "tonoscopes".
I decided to test a recent theoretical development of how football game winners can be seen in the day, place, and starting time of the event.
Accordingly, my assistant drew up a table of all Super Bowls so far and their event information.
I then drew up each chart and made a prediction. These were then checked against the real winners.
I should preface by saying that I am a football agnostic. I do not know much about football, only watching socially for a few minutes here and there and receiving the good-natured teasing of friends for knowing so little.
I want to admit that I have seen some games at some times, enough to develop the model, yet I feel I can judge these past charts fairly, truly without any a priori knowledge of the winner.
Thirty five out of forty five Super Bowls were evaluated correctly. (Five early Super Bowls did not have recorded start times available.)
Final 2-sided p-value is 0.0002.
To do this project even more fairly, I would recommend the following:
Astrological research is presently a tough row to hoe.
There is no money in it, and my clients actually don’t like it.
They ask me why I do it. These are intelligent well-educated people, but they tell me they come to astrology because they are sick of science. ("One year coffee is good for you, one year it is bad for you…")
The work of astrological science is lonely, scary, and frustrating and really tough on me physically.
I do it instead out of love, out of passion, out of honestly wanting to know the answer.
That is why I say it is a hobby for me. I think that is a good thing.
I went to college full-time at fifteen, actually being able to emancipate based on my scholarship stipend. Some of my friends were getting PhDs at that age from schools like Harvard and Princeton.
One thing united us in order to work so hard and give up so much, so young. We all shared an immense personal love for science, actually being in love with Mother Nature Herself.
Then, 5 or 10 years later, we all made it into the profession, the industry, of science, and we almost all dropped out.
Research into fundamental astrology returns my gaze back to Mother Nature, and it is that worship which is really why I do it, and yet I certainly do keep all the numbers real. You must, to get really close to Her.
So, even though I do a particular methodological approach which is very big data and AI oriented, using pretty advanced mathematics, I do it for intensely personal reasons, and actually as an artistic expression, I feel.
The methodologies that I would like to see at large in the future of astrology research would be ones that would speak to our fact-based culture as a whole, and maybe along the way, some money could be sent toward astrological researchers, because a need of society at large is met, some pressing practical societal need is answered.
Richard Feynman once said: “People who wish to analyse nature without using mathematics must settle for a reduced understanding.”
So, I think we have to increase our mathematics chops, even if we are doing hermeneutics, perhaps learning from all the good work happening in the digital humanities in the past decade.
We also have an opportunity to go beyond what even regular science provides society, and that is to do our work with love, for love.
It is not just an opportunity, it is a necessity, for astrology is and we are psychology and medicine and football games and politics and money and families, everything altogether, and that is love. I know it is.
Remarks prepared for The Kepler Conference, 2017.
Click on image to see the presentation from The Kepler Conference for Astrological Research, Jan 2017.
[Edit: a crazy further reduction in RMSE was achieved by finally using a neural net. The quick write-up of that can be seen here.
A full journal article was just approved for publication (March 2018). It will be referenced in the bibliography.]
To hear the audio of the misspellings, download the original file below and mouse over on the second red graph.