I have posted a few times here about a rich dataset that I have. Here is what it looks like. Click on each picture to enlarge.
Each of the 5296 rows represents a sequential day in a 14.5 year span of Amazon review misspelling rates during Jan 1, 2000 to Jul 1, 2014.
Across the top are the labels. In each column is a simple, stable, linear function of the right ascension (the Tropical degree) of the planet, moon, or star at midnight at the start of that day in London, UK. Retrogressions of the planets are also included.
The final column is the log of difference of the misspelling rate of the day from the 27-day SNIP baseline. (The Moon's right ascension completes its cycle every 27 and change days. That is the shortest cycle for any of the right ascensions.) The following is a graph of this column's data over time.
For today's study, the first 80% of days were developed into a training group, and the subsequent 20% of days were isolated as a test group.
An automated machine learning algorithm from BigML.com called a DeepNet was applied to the training set of the first 80% of days. This DeepNet was then tested or evaluated on the last 20% of days.
After less than a minute of computation at default ("1-click") settings, these ridiculously good results ensued.
Let me summarize my take: future misspelling rates in Amazon reviews were successfully predicted using only basic astronomy data when compared to random values or when the average (mean) value was repeatedly applied.
Here are the DeepNet fields in order of importance.
I am not even sure what to do next, but in case you do, here is the spreadsheet.
Please let me know what is up, and please reference this post if you use the data set.
Renay Oshop - teacher, searcher, researcher, immerser, rejoicer, enjoying the interstices between Twitter, Facebook, and journals.