A workflow for generating labeled object-detection datasets without manual annotation (experiment / feedback wanted)

I’m experimenting with using prompt-based object detection (open-vocabulary / vision-language models) as a way to auto-generate training datasets for downstream models like YOLO.

Instead of fixed classes, the detector takes any text prompt (e.g. “white Toyota Corolla”, “people wearing safety helmets”, “parked cars near sidewalks”) and outputs bounding boxes. Those detections are then exported as YOLO-format annotations to train a specialized model.

Observations so far:

Detection quality is surprisingly high for many niche or fine-grained prompts
Works well as a bootstrapping or data expansion step
Inference is expensive and not suitable for real-time use. this is strictly a dataset creation / offline pipeline idea

I’m trying to evaluate:

How usable these auto-generated labels are in practice
Where they fail compared to human-labeled data
Whether people would trust this for pretraining or rapid prototyping

Demo / tool I’m using for the experiment (Don’t abuse, it will crash if bombarded with requests:

Detect Anything

I’m mainly looking for feedback, edge cases, and similar projects. similar approaches before, I’d be very interested to hear what worked (or didn’t).

submitted by /u/eyasu6464
[link] [comments]

A Workflow For Generating Labeled Object-detection Datasets Without Manual Annotation (experiment / Feedback Wanted)

Leave a Reply Cancel reply

Recent Posts

Recent Comments

18+ Content

Leave a Reply Cancel reply

Recent Posts

Recent Comments