Persistent Temporal Knowledge Graph Datasets

I’m working on a temporal knowledge graph (TKG) model for link prediction and graph generation. Basically, I have snapshots of a persistent knowledge graph over time (subject, relation, object) triplets, and I want to train the model to autoregressively predict the next graphs over a sequence of timesteps. For training, it takes in a graph at timestep t and predicts the graph at timestep t+1.

Unfortunately, I’m running into a pretty severe issue: the model overfits almost immediately, and Hits@K stays basically random.

Current dataset:

I’m currently using wikidata12k, which is a pretty small dataset, which I think may be causing some of the issues. It gives me about 200 knowledge graphs, one for each year from 1800 to 2020, each about 500 nodes.

I would actually love a bigger dataset, but it has to be in a persistent knowledge graph format, which means the graph changes slowly over time, and the graph at timestep t is similar to the graph at timestep t+1. This unfortunately rules out a lot of popular TKG datasets like ICEWS.

I’ve also looked at YAGO11k, but it suffers from the same lack of scale as wikidata12k.

I’ve made another post in r/learnmachinelearning with details about the architecture and other issues I’m facing, which you can check out if you want more details.

https://www.reddit.com/r/learnmachinelearning/comments/1sjl7ck/temporal_gnn_gat_pernode_lstm_overfitting/

Thank you so much for the help, and I’m happy to answer any additional questions

submitted by /u/Divine_Invictus
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *