{"id":35073,"date":"2025-08-18T17:27:59","date_gmt":"2025-08-18T15:27:59","guid":{"rendered":"https:\/\/www.graviton.at\/letterswaplibrary\/public-dataset-scraper-for-project-gutenberg-texts\/"},"modified":"2025-08-18T17:27:59","modified_gmt":"2025-08-18T15:27:59","slug":"public-dataset-scraper-for-project-gutenberg-texts","status":"publish","type":"post","link":"https:\/\/www.graviton.at\/letterswaplibrary\/public-dataset-scraper-for-project-gutenberg-texts\/","title":{"rendered":"Public Dataset Scraper For Project Gutenberg Texts"},"content":{"rendered":"<p><!-- SC_OFF --><\/p>\n<div class=\"md\">\n<p>I created a tool that extracts books and metadata from Project Gutenberg, the online repository for public domain books, with options for filtering by keyword, category, and language. It outputs structured JSON or CSV for analysis. <\/p>\n<p>Repo link: <a href=\"https:\/\/console.apify.com\/actors\/kcQs4Qdtmt3IU9qT6\/source\">Project Gutenberg Scraper<\/a>. <\/p>\n<p>Useful for NLP projects, training data, or text mining experiments.<\/p>\n<\/div>\n<p><!-- SC_ON -->   submitted by   <a href=\"https:\/\/www.reddit.com\/user\/1maplebarplease\"> \/u\/1maplebarplease <\/a> <br \/> <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1mtp203\/public_dataset_scraper_for_project_gutenberg_texts\/\">[link]<\/a><\/span>   <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1mtp203\/public_dataset_scraper_for_project_gutenberg_texts\/\">[comments]<\/a><\/span><\/p><div class='watch-action'><div class='watch-position align-right'><div class='action-like'><a class='lbg-style1 like-35073 jlk' href='javascript:void(0)' data-task='like' data-post_id='35073' data-nonce='bc39e8310e' rel='nofollow'><img class='wti-pixel' src='https:\/\/www.graviton.at\/letterswaplibrary\/wp-content\/plugins\/wti-like-post\/images\/pixel.gif' title='Like' \/><span class='lc-35073 lc'>0<\/span><\/a><\/div><\/div> <div class='status-35073 status align-right'><\/div><\/div><div class='wti-clear'><\/div>","protected":false},"excerpt":{"rendered":"<p>I created a tool that extracts books and metadata from Project Gutenberg, the online repository for public&#8230;<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[85],"tags":[],"class_list":["post-35073","post","type-post","status-publish","format-standard","hentry","category-datatards","wpcat-85-id"],"_links":{"self":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/35073","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/comments?post=35073"}],"version-history":[{"count":0,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/35073\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/media?parent=35073"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/categories?post=35073"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/tags?post=35073"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}