Multi Language SMS Dataset For Application But ı Cant Find It

I’m looking for a multilingual SMS dataset for an application, but I can’t find one

Hello, as mentioned in the title, I’m looking for an SMS dataset. I found a few, but these

Critical Issues:

Class Imbalance – Raw: 4,825 (86.59%) | Spam: 747 (13.41%) → 6.46:1

~440 duplicates in each language (7.5-8%)

🟡 Medium-Level Issues:

Weak Hindi translation – Mixed characters, poor transcription

Wide length distribution – Especially in Hindi (max: 1406!)

Very short messages – Especially in Hindi (95 instances)

How can I find datasets without these issues?

submitted by /u/Extension-Onion2310
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *