Vietnamese Legal Documents — 518K Laws, Decrees & Circulars (1924–2026), Full Text In Markdown

Hi all, I’m releasing a dataset of 518,255 Vietnamese legal documents I collected and processed as a personal research project.

Why it matters: Vietnamese is a low-resource language in the legal NLP space. There’s no comparable open dataset of this scale for Vietnamese law.

What’s inside: – Document types: Decisions, Official Letters, Resolutions, Circulars, Laws, … – 2,393 unique issuing authorities – Full text converted from HTML → Markdown – Metadata: title, date, legal type, sector tags, issuing body, signers

Two configs (join on id):metadata — 9 columns, ~82 MB – content — full text, ~3.6 GB

🔗 https://huggingface.co/datasets/th1nhng0/vietnamese-legal-documents

Happy to answer questions about the collection pipeline!

submitted by /u/Th1nhng0
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *