The issue with the famous 429 when mass scraping google trends seems to have me stuck. I have a list of around 30k keywords I want data on, but don’t want to wait for the timeouts.
I’m using pytrends and have tried using rotating proxies but the high traffic seems to bring my costs up way too high when renting those. I tried multiprocessing using unique Tor circuits for each keyword, but I seem to get authentication errors from google, which seem to get sorted out by including some identity headers, which quickly become invalid due to rate limiting.
Does anyone have a workaround/working code for this? Multiple Google accounts with programmatic login and getting the headers from there, followed by injecting them into pytrends requests? I’d be grateful if you could share your experiences. Thanks!
submitted by /u/thefoque
[link] [comments]