{"id":36630,"date":"2025-11-18T06:27:06","date_gmt":"2025-11-18T05:27:06","guid":{"rendered":"https:\/\/www.graviton.at\/letterswaplibrary\/20000-epstein-files-in-a-single-text-file-available-to-download-100-mb\/"},"modified":"2025-11-18T06:27:06","modified_gmt":"2025-11-18T05:27:06","slug":"20000-epstein-files-in-a-single-text-file-available-to-download-100-mb","status":"publish","type":"post","link":"https:\/\/www.graviton.at\/letterswaplibrary\/20000-epstein-files-in-a-single-text-file-available-to-download-100-mb\/","title":{"rendered":"20,000 Epstein Files In A Single Text File Available To Download (~100 MB)"},"content":{"rendered":"<p><!-- SC_OFF --><\/p>\n<div class=\"md\">\n<p>I&#8217;ve processed all the text and image files (~25,000 document pages\/emails) within individual folders released last friday into a two column text file. I used Googles tesseract OCR library to convert jpg to text.<\/p>\n<p>You can download it here: <a href=\"https:\/\/huggingface.co\/datasets\/tensonaut\/EPSTEIN_FILES_20K\">https:\/\/huggingface.co\/datasets\/tensonaut\/EPSTEIN_FILES_20K<\/a><\/p>\n<p>For each document, I&#8217;ve included the full path to the original google drive folder from House oversight committee so you can link and verify contents. In using this dataset, please be sensitive to the privacy of the people involved (and remember that many of these people were certainly not involved in any of the actions which precipitated the investigation)<\/p>\n<\/div>\n<p><!-- SC_ON -->   submitted by   <a href=\"https:\/\/www.reddit.com\/user\/tensonaut\"> \/u\/tensonaut <\/a> <br \/> <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1p031v6\/20000_epstein_files_in_a_single_text_file\/\">[link]<\/a><\/span>   <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1p031v6\/20000_epstein_files_in_a_single_text_file\/\">[comments]<\/a><\/span><\/p><div class='watch-action'><div class='watch-position align-right'><div class='action-like'><a class='lbg-style1 like-36630 jlk' href='javascript:void(0)' data-task='like' data-post_id='36630' data-nonce='65e0e39b87' rel='nofollow'><img class='wti-pixel' src='https:\/\/www.graviton.at\/letterswaplibrary\/wp-content\/plugins\/wti-like-post\/images\/pixel.gif' title='Like' \/><span class='lc-36630 lc'>0<\/span><\/a><\/div><\/div> <div class='status-36630 status align-right'><\/div><\/div><div class='wti-clear'><\/div>","protected":false},"excerpt":{"rendered":"<p>I&#8217;ve processed all the text and image files (~25,000 document pages\/emails) within individual folders released last friday&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[85],"tags":[],"class_list":["post-36630","post","type-post","status-publish","format-standard","hentry","category-datatards","wpcat-85-id"],"_links":{"self":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/36630","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/comments?post=36630"}],"version-history":[{"count":0,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/36630\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/media?parent=36630"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/categories?post=36630"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/tags?post=36630"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}