{"id":32146,"date":"2025-01-05T22:27:25","date_gmt":"2025-01-05T21:27:25","guid":{"rendered":"https:\/\/www.graviton.at\/letterswaplibrary\/%f0%9f%9a%80-content-extractor-with-vision-llm-open-source-project\/"},"modified":"2025-01-05T22:27:25","modified_gmt":"2025-01-05T21:27:25","slug":"%f0%9f%9a%80-content-extractor-with-vision-llm-open-source-project","status":"publish","type":"post","link":"https:\/\/www.graviton.at\/letterswaplibrary\/%f0%9f%9a%80-content-extractor-with-vision-llm-open-source-project\/","title":{"rendered":"\ud83d\ude80 Content Extractor With Vision LLM \u2013 Open Source Project"},"content":{"rendered":"<p><!-- SC_OFF --><\/p>\n<div class=\"md\">\n<p>I\u2019m excited to share <strong>Content Extractor with Vision LLM<\/strong>, an open-source Python tool that extracts content from documents (PDF, DOCX, PPTX), describes embedded images using <strong>Vision Language Models<\/strong>I\u2019m excited to share <strong>Content Extractor with Vision LLM<\/strong>, an open-source Python tool that extracts content from documents (PDF, DOCX, PPTX), describes embedded images using <strong>Vision Language Models<\/strong>, and saves the results in clean Markdown files.<\/p>\n<p>This is an evolving project, and I\u2019d love your feedback, suggestions, and contributions to make it even better!<\/p>\n<h1>\u2728 Key Features<\/h1>\n<p>  <strong>Multi-format support<\/strong>: Extract text and images from <strong>PDF, DOCX, PPTX<\/strong>. <strong>Advanced image description<\/strong>: Choose from <strong>local models<\/strong> (Ollama&#8217;s llama3.2-vision) or <strong>cloud models<\/strong> (OpenAI GPT-4 Vision). <strong>Two PDF processing modes<\/strong>:  <strong>Text + Images<\/strong>: Extract text and embedded images. <strong>Page as Image<\/strong>: Preserve complex layouts with high-resolution page images.  <strong>Markdown outputs<\/strong>: Text and image descriptions are neatly formatted. <strong>CLI interface<\/strong>: Simple command-line interface for specifying input\/output folders and file types. <strong>Modular &amp; extensible<\/strong>: Built with SOLID principles for easy customization. <strong>Detailed logging<\/strong>: Logs all operations with timestamps.  <\/p>\n<h1>\ud83d\udee0\ufe0f Tech Stack<\/h1>\n<p>  <strong>Programming<\/strong>: Python 3.12 <strong>Document processing<\/strong>: PyMuPDF, python-docx, python-pptx <strong>Vision Language Models<\/strong>: Ollama llama3.2-vision, OpenAI GPT-4 Vision  <\/p>\n<h1>\ud83d\udce6 Installation<\/h1>\n<p>Clone the repo and install dependencies using <strong>Poetry<\/strong>. System dependencies like <strong>LibreOffice<\/strong> and <strong>poppler<\/strong> are required for processing specific file types.<\/p>\n<p>Detailed setup instructions: <a href=\"https:\/\/github.com\/MDGrey33\/content-extractor-with-vision\">GitHub Repo<\/a><\/p>\n<h1>\ud83d\ude80 How to Use<\/h1>\n<p>  Clone the repo and install dependencies. Start the Ollama server: ollama serve. Pull the llama3.2-vision model: ollama pull llama3.2-vision. Run the tool:bashCopy codepoetry run python <a href=\"http:\/\/main.py\/\">main.py<\/a> &#8211;source \/path\/to\/source &#8211;output \/path\/to\/output &#8211;type pdf Review results in clean Markdown format, including extracted text and image descriptions.  <\/p>\n<h1>\ud83d\udca1 Why Share?<\/h1>\n<p>This is a work in progress, and I\u2019d love your input to:<\/p>\n<p>  Improve features and functionality Test with different use cases Compare image descriptions from models Suggest new ideas or report bugs  <\/p>\n<h1>\ud83d\udcc2 Repo &amp; Contribution<\/h1>\n<p>GitHub: <a href=\"https:\/\/github.com\/MDGrey33\/content-extractor-with-vision\">Content Extractor with Vision LLM<\/a><\/p>\n<p>Feel free to open issues, create pull requests, or fork the repo for your own projects.<\/p>\n<h1>\ud83e\udd1d Let\u2019s Collaborate!<\/h1>\n<p>This tool has a lot of potential, and with your help, it can become a robust library for document content extraction and image analysis. Let me know your thoughts, ideas, or any issues you encounter!<\/p>\n<p>Looking forward to your feedback, contributions, and testing results.<\/p>\n<p>, and saves the results in clean Markdown files.<\/p>\n<p>This is an evolving project, and I\u2019d love your feedback, suggestions, and contributions to make it even better!<\/p>\n<h1>\u2728 Key Features<\/h1>\n<p>  <strong>Multi-format support<\/strong>: Extract text and images from <strong>PDF, DOCX, PPTX<\/strong>. <strong>Advanced image description<\/strong>: Choose from <strong>local models<\/strong> (Ollama&#8217;s llama3.2-vision) or <strong>cloud models<\/strong> (OpenAI GPT-4 Vision). <strong>Two PDF processing modes<\/strong>:  <strong>Text + Images<\/strong>: Extract text and embedded images. <strong>Page as Image<\/strong>: Preserve complex layouts with high-resolution page images.  <strong>Markdown outputs<\/strong>: Text and image descriptions are neatly formatted. <strong>CLI interface<\/strong>: Simple command-line interface for specifying input\/output folders and file types. <strong>Modular &amp; extensible<\/strong>: Built with SOLID principles for easy customization. <strong>Detailed logging<\/strong>: Logs all operations with timestamps.  <\/p>\n<h1>\ud83d\udee0\ufe0f Tech Stack<\/h1>\n<p>  <strong>Programming<\/strong>: Python 3.12 <strong>Document processing<\/strong>: PyMuPDF, python-docx, python-pptx <strong>Vision Language Models<\/strong>: Ollama llama3.2-vision, OpenAI GPT-4 Vision  <\/p>\n<h1>\ud83d\udce6 Installation<\/h1>\n<p>Clone the repo and install dependencies using <strong>Poetry<\/strong>. System dependencies like <strong>LibreOffice<\/strong> and <strong>poppler<\/strong> are required for processing specific file types.<\/p>\n<p>Detailed setup instructions: <a href=\"https:\/\/github.com\/MDGrey33\/content-extractor-with-vision\">GitHub Repo<\/a><\/p>\n<h1>\ud83d\ude80 How to Use<\/h1>\n<p>  Clone the repo and install dependencies. Start the Ollama server: ollama serve. Pull the llama3.2-vision model: ollama pull llama3.2-vision. Run the tool:bashCopy codepoetry run python <a href=\"http:\/\/main.py\/\">main.py<\/a> &#8211;source \/path\/to\/source &#8211;output \/path\/to\/output &#8211;type pdf Review results in clean Markdown format, including extracted text and image descriptions.  <\/p>\n<h1>\ud83d\udca1 Why Share?<\/h1>\n<p>This is a work in progress, and I\u2019d love your input to:<\/p>\n<p>  Improve features and functionality Test with different use cases Compare image descriptions from models Suggest new ideas or report bugs  <\/p>\n<h1>\ud83d\udcc2 Repo &amp; Contribution<\/h1>\n<p>GitHub: <a href=\"https:\/\/github.com\/MDGrey33\/content-extractor-with-vision\">Content Extractor with Vision LLM<\/a><\/p>\n<p>Feel free to open issues, create pull requests, or fork the repo for your own projects.<\/p>\n<h1>\ud83e\udd1d Let\u2019s Collaborate!<\/h1>\n<p>This tool has a lot of potential, and with your help, it can become a robust library for document content extraction and image analysis. Let me know your thoughts, ideas, or any issues you encounter!<\/p>\n<p>Looking forward to your feedback, contributions, and testing results.<\/p>\n<\/div>\n<p><!-- SC_ON -->   submitted by   <a href=\"https:\/\/www.reddit.com\/user\/Electrical-Two9833\"> \/u\/Electrical-Two9833 <\/a> <br \/> <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1hugvz9\/content_extractor_with_vision_llm_open_source\/\">[link]<\/a><\/span>   <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1hugvz9\/content_extractor_with_vision_llm_open_source\/\">[comments]<\/a><\/span><\/p><div class='watch-action'><div class='watch-position align-right'><div class='action-like'><a class='lbg-style1 like-32146 jlk' href='javascript:void(0)' data-task='like' data-post_id='32146' data-nonce='614a020375' rel='nofollow'><img class='wti-pixel' src='https:\/\/www.graviton.at\/letterswaplibrary\/wp-content\/plugins\/wti-like-post\/images\/pixel.gif' title='Like' \/><span class='lc-32146 lc'>0<\/span><\/a><\/div><\/div> <div class='status-32146 status align-right'><\/div><\/div><div class='wti-clear'><\/div>","protected":false},"excerpt":{"rendered":"<p>I\u2019m excited to share Content Extractor with Vision LLM, an open-source Python tool that extracts content from&#8230;<\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[85],"tags":[],"class_list":["post-32146","post","type-post","status-publish","format-standard","hentry","category-datatards","wpcat-85-id"],"_links":{"self":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/32146","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/comments?post=32146"}],"version-history":[{"count":0,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/32146\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/media?parent=32146"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/categories?post=32146"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/tags?post=32146"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}