A Workflow For Generating Labeled Object-detection Datasets Without Manual Annotation (experiment / Feedback Wanted)

I’m experimenting with using prompt-based object detection (open-vocabulary / vision-language models) as a way to auto-generate training datasets for downstream models like YOLO.

Instead of fixed classes, the detector takes any text prompt (e.g. “white Toyota Corolla”, “people wearing safety helmets”, “parked cars near sidewalks”) and outputs bounding boxes. Those detections are then exported as YOLO-format annotations to train a specialized model.

Observations so far:

  • Detection quality is surprisingly high for many niche or fine-grained prompts
  • Works well as a bootstrapping or data expansion step
  • Inference is expensive and not suitable for real-time use. this is strictly a dataset creation / offline pipeline idea

I’m trying to evaluate:

  • How usable these auto-generated labels are in practice
  • Where they fail compared to human-labeled data
  • Whether people would trust this for pretraining or rapid prototyping

Demo / tool I’m using for the experiment (Don’t abuse, it will crash if bombarded with requests:

Detect Anything

I’m mainly looking for feedback, edge cases, and similar projects. similar approaches before, I’d be very interested to hear what worked (or didn’t).

submitted by /u/eyasu6464
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *