Boredom Central - [Update] Emotionally-Aware VN Dialogue Dataset – Deep Context Tagging, ShareGPT-Style Structure

Hey again everyone, Following up on my earlier posts about converting a visual novel script into a fine-tuning dataset, I’ve gone back and improved the format significantly thanks to feedback here.

The goal is the same: create expressive, roleplay-friendly dialogue data that captures emotion, tone, character personality, and nuance, especially for dere-type characters and NSFW/SFW variation.

VOl 0 is only SFW

• What’s New:

Improved JSON structure, closer to ShareGPT format

More consistent tone/emotion tagging

Added deeper context awareness (4 lines before/after)

Preserved expressive elements (onomatopoeia, stutters, laughs)

Categorized dere-type and added voice/personality cues

• Why?

Because tagging a line as just “laughing” misses everything. Was it sarcasm? Pain? Joy? I want models to understand motivation and emotional flow — not just parrot words.

Example (same as before to show improvement):

Flat version:

{ “instruction”: “What does Maple say?”,

“output”: “Oopsie! I accidentally splashed some hot water on you! Sorry about that~ Ahahah– Owwww!!”,

“metadata”: { “character”: “Maple”, “emotion”: “laughing”

“tone”: “apologetic” }

}

• Updated version with context:

 { "from": "char_metadata", "value": { "character_name": "Azuki", "persona": "Azuki is a fiery, tomboyish...", "dere_type": "tsundere", "current_emotion": "mocking, amused, pain", "tone": "taunting, surprised" } }, { "from": "char", "value": "You're a NEET catgirl who can only eat, sleep, and play! Huehuehueh, whooaaa!! Aagh, that's hotttt!!!" }, { "from": "char_metadata", "value": { "character_name": "Maple", "persona": "Maple is a prideful, sophisticated catgirl...", "dere_type": "himidere", "current_emotion": "malicious glee, feigned innocence, pain", "tone": "sarcastic, surprised" } }, { "from": "char", "value": "Oopsie! I accidentally splashed some hot water on you! Sorry about that~ Ahahah-- Owwww!!" }, { "from": "char_metadata", "value": { "character_name": "Azuki", "persona": "Azuki is a fiery, tomboyish...", "dere_type": "tsundere", "current_emotion": "retaliatory, gleeful", "tone": "sarcastic" } }, { "from": "char", "value": "Heh, my bad! My paw just flew right at'cha! Hahaha!" }

• Outcome

This dataset now lets a model:

Match dere-type voices with appropriate phrasing

Preserve emotional realism in both SFW and NSFW contexts

Move beyond basic emotion labels to expressive patterns (tsundere teasing, onomatopoeia, flustered laughter, etc.)

It’s still a work in progress (currently ~3MB, will grow, dialogs only without JSON yet), and more feedback is welcome. Just wanted to share the next step now that the format is finally usable and consistent.

submitted by /u/Akowmako
[link] [comments]

[Update] Emotionally-Aware VN Dialogue Dataset – Deep Context Tagging, ShareGPT-Style Structure

Leave a Reply Cancel reply

Recent Posts

Recent Comments

18+ Content

Leave a Reply Cancel reply

Recent Posts

Recent Comments