In a previous post, I showed that Reddit entries were slightly, but significantly, more likely to contain spelling errors and other unusual word choices. I had not discounted the most common slang and swear words, which one could say contain proper meaning and are not misspelled even while not in the dictionary. In this post, I will account for them. Dropping common slang and swear words ("ok", "lol", "bs", "gg", "awol", "4Chan", "nsfl", "nsfw", "ama", "s%#&t*", "f&%k*", etc. and ":)", ":(", ";)", ";(", ":/", ":\", etc., and their main capitalization variants) as well as links, the difference between Mercury Retrograde (MR) and Non Mercury Retrograde (NMR) in incidence of word spelling errors divided by entry word length increased to 2.14%. Moreover, this difference of 2.14% is quite statistically significant with an ANOVA p-value for equality at 6.4x10^-14. The 95% mean confidence interval for NMR is ~{0.0528, 0.0531} and for MR it is ~{0.540 , 0.542}. Here is the histogram showing the separation of MR (yellow) from NMR (blue) as well as the quartile-quartile plot comparing MR to NMR upon removing slang, emojis, links, and non-English entries. Again, the difference could still be accounted for by other strange words, like unusual last names, that suddenly became fashionable in the MR period, but keep in mind that these results are from scanning evenly across 53 million entries. That is something like two thousand front page posts with full comments per day in both NMR and MR. This huge number is likely to dilute away any such fashionable blip with a similarly fashionable blip in NMR, although more data is always better. (Looking at all or at least more MRs and NMRs would be nice, but this is all the data I have right now.) [Edited: If you would like to see such a study across many Mercury retrograde seasons, see here.] To give you a sense of perspective, the average number of words in an entry only increased by 0.4% during MR despite wide variation, and that increase is not statistically significant. So, again I say, a >2% increase is huge, a statistically and culturally significant result. Consider this: if you are just 2 percent more likely to make a mistake in some small thing during Mercury Retrograde, but you build up on hundreds (if not thousands) of those small things per day, that effectively implies that your rate of making some bigger mistake skyrockets. For example, if you do 10 related small things in a MR day and each step is two percent more likely to cause an error than in NMR, then you are at least 21 percent more likely to commit a compounded error that day, and within the three weeks of MR, you are almost sure to make at least four such serious errors. The Mathematica notebook is available for download.
Postscript: Re-running the file but also dropping "OMG" and "(͡ ͜ºʖ ͡º)" increased the difference between NMR and MR misspelling rates further to 2.83 percent. Mean 95% confidence intervals are {0.061656, 0.0619098} and {0.0632795, 0.0635948}.
0 Comments
Your comment will be posted after it is approved.
Leave a Reply. |
ARTICLESAuthorRenay Oshop - teacher, searcher, researcher, immerser, rejoicer, enjoying the interstices between Twitter, Facebook, and journals. Categories
All
Archives
September 2023
|