sentence semantic similarity dataset with their similarity scores

Im new in DL projects. Ive been trying to search a dataset that should have atleast three columns sentence1, sentence2, their semantic similarity. So far i found SICK dataset and snli but something else would be more suitable for my task so do you know any datasets like this.

basically im trying to build a system that searches for most similar sentence to the query in a video transcript. suppose u have a podcast video you take its subtitles and do a query and it will give u timestamps of the most similar sentence so for that ill grab a bert model and fine tune on some semantic similarity dataset. it will be good if the dataset is based upon a certain style, topic or domain. like for example, sentences on technology or animal documentary or some human conversation or anything basically

submitted by /u/Deferfire
[link] [comments]

Sentence Semantic Similarity Dataset With Their Similarity Scores

Leave a Reply Cancel reply

Recent Posts

Recent Comments

18+ Content

Leave a Reply Cancel reply

Recent Posts

Recent Comments