Needed Full Reddit Comment Trees For An NLP Dataset, Here’s What I Used

Was building a training corpus and kept hitting the official API’s 500 comment truncation limit. Found a gateway that recursively resolves full thread depth and has historical archive access which the official API just doesn’t have.

Endpoint I relied on most:

GET /submission/{id}/full

Returns the entire thread, no truncation. Only charges on 200 OK so failed requests don’t eat your credits. Sharing in case anyone else is doing similar dataset work — happy to share what I’m using if anyone’s interested.

submitted by /u/Ok-Direction-9618
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *