{"id":32807,"date":"2025-02-26T20:27:27","date_gmt":"2025-02-26T19:27:27","guid":{"rendered":"https:\/\/www.graviton.at\/letterswaplibrary\/you-can-now-train-your-own-reasoning-model-with-just-5gb-vram\/"},"modified":"2025-02-26T20:27:27","modified_gmt":"2025-02-26T19:27:27","slug":"you-can-now-train-your-own-reasoning-model-with-just-5gb-vram","status":"publish","type":"post","link":"https:\/\/www.graviton.at\/letterswaplibrary\/you-can-now-train-your-own-reasoning-model-with-just-5gb-vram\/","title":{"rendered":"You Can Now Train Your Own Reasoning Model With Just 5GB VRAM"},"content":{"rendered":"<p><!-- SC_OFF --><\/p>\n<div class=\"md\">\n<p>Hey amazing people! First post here! Today, I&#8217;m excited to announce that you can now train your own reasoning model with just 5GB VRAM for Qwen2.5 (1.5B) using our open-source project Unsloth: <a href=\"https:\/\/github.com\/unslothai\/unsloth\">https:\/\/github.com\/unslothai\/unsloth<\/a> <\/p>\n<p>GRPO is the algorithm behind DeepSeek-R1 and how it was trained. You need a dataset with about 500 rows in question, answer pairs and a reward function and you can then start the whole process!<\/p>\n<p>This allows any open LLM like Llama, Mistral, Phi etc. to be converted into a reasoning model with chain-of-thought process. The best part about GRPO is it doesn&#8217;t matter if you train a small model compared to a larger model as you can fit in more faster training time compared to a larger model so the end result will be very similar! You can also leave GRPO training running in the background of your PC while you do other things!<\/p>\n<p>  Due to our newly added Efficient GRPO algorithm, this enables <strong>10x longer context<\/strong> lengths while using <strong>90% less VRAM<\/strong> vs. every other GRPO LoRA\/QLoRA (fine-tuning) implementations with 0 loss in accuracy. With a standard GRPO setup, Llama 3.1 (8B) training at 20K context length demands 510.8GB of VRAM. However, Unsloth\u2019s 90% VRAM reduction brings the requirement down to just 54.3GB in the same setup. We leverage our <a href=\"https:\/\/unsloth.ai\/blog\/long-context\">gradient checkpointing<\/a> algorithm which we released a while ago. It smartly offloads intermediate activations to system RAM asynchronously whilst being only 1% slower. This <strong>shaves a whopping 372GB VRAM<\/strong> since we need num_generations = 8. We can reduce this memory usage even further through intermediate gradient accumulation. Use our GRPO notebook with 10x longer context using Google&#8217;s free GPUs: <a href=\"https:\/\/colab.research.google.com\/github\/unslothai\/notebooks\/blob\/main\/nb\/Llama3.1_(8B\">Llama 3.1 (8B) on Colab<\/a>-GRPO.ipynb)  <\/p>\n<p>Blog for more details on the algorithm, the Maths behind GRPO, issues we found and more: <a href=\"https:\/\/unsloth.ai\/blog\/grpo%5D(https:\/\/unsloth.ai\/blog\/grpo\">https:\/\/unsloth.ai\/blog\/grpo<\/a>)<\/p>\n<p>GRPO VRAM Breakdown:<\/p>\n<p>   Metric  Unsloth TRL + FA2    Training Memory Cost (GB) 42GB 414GB   GRPO Memory Cost (GB) 9.8GB 78.3GB   Inference Cost (GB) 0GB 16GB   Inference KV Cache for 20K context (GB) 2.5GB 2.5GB   Total Memory Usage <strong>54.3GB (90% less)<\/strong> <strong>510.8GB<\/strong>   <\/p>\n<p>Also we spent a lot of time on our Guide (with pics) for everything on GRPO + reward functions\/verifiers so would highly recommend you guys to read it: <a href=\"https:\/\/docs.unsloth.ai\/basics\/reasoning-grpo-and-rl\/tutorial-train-your-own-reasoning-model-with-grpo\">docs.unsloth.ai\/basics\/reasoning<\/a><\/p>\n<p>Thank you so so much for reading! \ud83d\ude00<\/p>\n<\/div>\n<p><!-- SC_ON -->   submitted by   <a href=\"https:\/\/www.reddit.com\/user\/yoracale\"> \/u\/yoracale <\/a> <br \/> <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1iyv8oh\/you_can_now_train_your_own_reasoning_model_with\/\">[link]<\/a><\/span>   <span><a href=\"https:\/\/www.reddit.com\/r\/datasets\/comments\/1iyv8oh\/you_can_now_train_your_own_reasoning_model_with\/\">[comments]<\/a><\/span><\/p><div class='watch-action'><div class='watch-position align-right'><div class='action-like'><a class='lbg-style1 like-32807 jlk' href='javascript:void(0)' data-task='like' data-post_id='32807' data-nonce='65e0e39b87' rel='nofollow'><img class='wti-pixel' src='https:\/\/www.graviton.at\/letterswaplibrary\/wp-content\/plugins\/wti-like-post\/images\/pixel.gif' title='Like' \/><span class='lc-32807 lc'>0<\/span><\/a><\/div><\/div> <div class='status-32807 status align-right'><\/div><\/div><div class='wti-clear'><\/div>","protected":false},"excerpt":{"rendered":"<p>Hey amazing people! First post here! Today, I&#8217;m excited to announce that you can now train your&#8230;<\/p>\n","protected":false},"author":0,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[85],"tags":[],"class_list":["post-32807","post","type-post","status-publish","format-standard","hentry","category-datatards","wpcat-85-id"],"_links":{"self":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/32807","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/comments?post=32807"}],"version-history":[{"count":0,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/posts\/32807\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/media?parent=32807"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/categories?post=32807"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.graviton.at\/letterswaplibrary\/wp-json\/wp\/v2\/tags?post=32807"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}