Update To This: In The Google Drive There Are Currently Two Csv Files In The Top Folder. One Is The Raw Dataset. The Other Is A Dataset That Has Been Deduplicated. Right Now, I Am Running A Script That Tries To Repair The OCR Noise And Mistakes. That Will Also Be Uploaded As A Unique Dataset.

submitted by /u/Ok-District-1330
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *