Hi, I’m doing a master thesis on telling apart bots from humans based on their HTTP requests with machine learning. Right now I have a working proptotype that is based on the traffic logs from my university and honeypots. However, we’re a little limited on the human data and fear it wouldn’t be representative of the broader web. Is there any datasets with guaranteed human requests? Preferably containing header fields such as the User Agent, status, protocol version, response size and uri.
Thank you.
submitted by /u/Bottled_Up_DarkPeace
[link] [comments]