I’m looking for a large free-text data sets to train a model that will identify and redact sensitive data. Would be awesome if it was already annotated/labeled. Some entity types I’m interested in:
Location, email, name, CC, CVV, Exp, date, product, username, password, passport #, time.
Anything helps.
submitted by /u/tombenom
[link] [comments]