Hey guys. I am currently starting to work on my universtiy project for Fundamentals of Artificial Intelligence class. I would really appreciate if you could suggest me the datasets according to these requirments :
“Select a dataset that is suitable for a classification task. The student must avoid selecting the Iris dataset or the
Palmer Archipelago (Antarctica) penguin dataset. In addition, the meaningfulness of the classification has to be
considered, e.g. it is meaningless to classify continents by the number of Covid-19 cases because, first, there are
only six continents and new ones will not appear soon, second, the number of Covid-19 cases is not a
defining characteristic of continents;
• it is preferable to select a dataset that is already given in the format of a .csv datafile;
• the dataset should be well-documented (there should be information about who created the set, when and what
the data source is);
• the dataset should be of reasonable size (at least 200 data objects);
• the dataset should be deeply annotated (there should be information about which features are stored and what
they mean);
• the number of features should be between 5-15;
• the dataset should be labelled;
• the student must avoid datasets with many Boolean (true/false, 1/0, etc.) or categorical type feature (attribute)
values. It is preferable to use datasets in which most of the attributes are represented by continuous attribute
values;
• you should avoid datasets of unlabelled data (e.g. text corpora and raw images)”
submitted by /u/kktsrvii
[link] [comments]