Reliable Data Set For The Reddit Dataset

now I am trying to do a project which is associated with the representation learning for large scale dynamic network, and I want to look for a reliable reddit data set( the data should include post_id, user_id, time, comment). So that I can build the graph by using the user as node and if two user comment the same post i can build one edge.

The macro task of the current article is to create a representation learning. For the purpose of the reddit dataset (build a good representation learning to complete a community search based on a graph of social network data. I want to use reddit data to complete my project, and I have some requirements for the data I need. I want the reddit dataset to contain users as nodes, and then I want to use different users to comment on the same post to build edges. I tried a few datasets, but I feel that none of them meet my needs. I would like to ask if you have a link to a reddit dataset that meets my needs. The following are what I have tried:

https://github.com/dingidng/reddit-dataset (I only can create several edge based on these data which is not making sense) https://snap.stanford.edu/graphsage/#datasets (the node is not user)

And I also have problem about how to using the Pushshift to access any Reddit data. Since whenever I submitted the request of the access to the data, my request will be rejected by the bot automatically. If anyone knows how to use the pushshift to access the data set and get the access permission for that.
https://pushshift.io/signup

This is my first time posting for help, thank you for any help you can provide!

submitted by /u/Terrible_Band6290
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *