I’m working on a temporal knowledge graph (TKG) model for link prediction and graph generation. Basically, I have snapshots of a persistent knowledge graph over time (subject, relation, object) triplets, and I want to train the model to autoregressively predict the next graphs over a sequence of timesteps. For training, it takes in a graph at timestep t and predicts the graph at timestep t+1.
Unfortunately, I’m running into a pretty severe issue: the model overfits almost immediately, and Hits@K stays basically random.
Current dataset:
I’m currently using wikidata12k, which is a pretty small dataset, which I think may be causing some of the issues. It gives me about 200 knowledge graphs, one for each year from 1800 to 2020, each about 500 nodes.
I would actually love a bigger dataset, but it has to be in a persistent knowledge graph format, which means the graph changes slowly over time, and the graph at timestep t is similar to the graph at timestep t+1. This unfortunately rules out a lot of popular TKG datasets like ICEWS.
I’ve also looked at YAGO11k, but it suffers from the same lack of scale as wikidata12k.
I’ve made another post in r/learnmachinelearning with details about the architecture and other issues I’m facing, which you can check out if you want more details.
Thank you so much for the help, and I’m happy to answer any additional questions
submitted by /u/Divine_Invictus
[link] [comments]