Hi all! I’m building an application that automatically quizzes you on textual datasets! So far things are working brilliantly, but I’m running into an issue. I wish to remove words that are “uninteresting” for quizzing. Exactly my problem is that I don’t know how to describe them, so don’t know what to lookup. I’ll show an example instead.
“The mitochondria is the powerhouse of the cell”
If I had a simple fill-in-the-blanks question, I want to avoid blanking “the” “is” and “of” as that would make for a very boring quiz question. I’m not a linguist, but from my rudimentary knowledge, I don’t know of any linguistic term that applies to these words as they aren’t just, in the general case, prepositons, for example.
Best case, someone already knows a dataset of words that I can use, but I would really appreciate any help for even what to look up on this topic.
I hope this is appropriate to ask here, else, forgive me and I’ll happily take recommendations for where else to ask!
Many thanks
submitted by /u/langers8
[link] [comments]