Bot detection is relatively straightforward these days (honeypots, timestamps, etc.). But I’m struggling with a different data quality issue: The “Bored Human.”
These are real people who technically pass the bot checks but select “C” for every answer or type “good” in every text box just to finish.
When cleaning a new dataset, what are your heuristics for flagging these? Do you look for standard deviation in their answers (straight-lining), or do you rely on minimum character counts for open text?
submitted by /u/EnergyBrilliant540
[link] [comments]