Boredom Central - Building a multi-source feminism corpus (France–Québec)

Hi,

I’m prototyping a PhD project on feminist discourse in France & Québec. Goal: build a multi-source corpus (academic APIs, activist blogs, publishers, media feeds, Reddit testimonies).

Already tested:

Sources: OpenAlex, Crossref, HAL, OpenEdition, WordPress JSON, RSS feeds, GDELT, Reddit JSON, Gallica/BANQ.
Scripts: Google Apps Script + Python (Colab).

Main problems:

APIs stop ~5 years back (need 10–20 yrs).
Formats are all over (DOI, JSON, RSS, PDFs).
Free automation without servers (Sheets + GitHub Actions?).

Looking for:

Examples of pipelines combining APIs/RSS/archives.
Tips on Pushshift/Wayback for historical Reddit/web.
Open-source workflows for deduplication + archiving.

Any input (scripts, repos, past experience) 🙏.

submitted by /u/Commercial-Soil5974
[link] [comments]

Building A Multi-source Feminism Corpus (France–Québec) – Need Advice On APIs & Automation

Leave a Reply Cancel reply

Recent Posts

Recent Comments

18+ Content

Leave a Reply Cancel reply

Recent Posts

Recent Comments