Update To This: In The Google Drive There Are Currently Two Csv Files In The Top Folder. One Is The Raw Dataset. The Other Is A Dataset That Has Been Deduplicated. Right Now, I Am Running A Script That Tries To Repair The OCR Noise And Mistakes. That Will Also Be Uploaded As A Unique Dataset.