Sentence Semantic Similarity Dataset With Their Similarity Scores

Im new in DL projects. Ive been trying to search a dataset that should have atleast three columns sentence1, sentence2, their semantic similarity. So far i found SICK dataset and snli but something else would be more suitable for my task so do you know any datasets like this.

basically im trying to build a system that searches for most similar sentence to the query in a video transcript. suppose u have a podcast video you take its subtitles and do a query and it will give u timestamps of the most similar sentence so for that ill grab a bert model and fine tune on some semantic similarity dataset. it will be good if the dataset is based upon a certain style, topic or domain. like for example, sentences on technology or animal documentary or some human conversation or anything basically

submitted by /u/Deferfire
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *