Datasets Suggestions For These Requirments

Hey guys. I am currently starting to work on my universtiy project for Fundamentals of Artificial Intelligence class. I would really appreciate if you could suggest me the datasets according to these requirments :

“Select a dataset that is suitable for a classification task. The student must avoid selecting the Iris dataset or the

Palmer Archipelago (Antarctica) penguin dataset. In addition, the meaningfulness of the classification has to be

considered, e.g. it is meaningless to classify continents by the number of Covid-19 cases because, first, there are

only six continents and new ones will not appear soon, second, the number of Covid-19 cases is not a

defining characteristic of continents;

• it is preferable to select a dataset that is already given in the format of a .csv datafile;

• the dataset should be well-documented (there should be information about who created the set, when and what

the data source is);

• the dataset should be of reasonable size (at least 200 data objects);

• the dataset should be deeply annotated (there should be information about which features are stored and what

they mean);

• the number of features should be between 5-15;

• the dataset should be labelled;

• the student must avoid datasets with many Boolean (true/false, 1/0, etc.) or categorical type feature (attribute)

values. It is preferable to use datasets in which most of the attributes are represented by continuous attribute

values;

• you should avoid datasets of unlabelled data (e.g. text corpora and raw images)”

submitted by /u/kktsrvii
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *