I’m looking to scrap the full text for all the proposed bills from the 117th Congress. I want to run the data through NVIVO for content analysis. I tried just downloading all the texts individually from Congress.gov, but I am looking to have all 15,224 documents available for analysis so the one-by-one approach is really unrealistic. I haven’t been able to find this data in a pre-existing dataset, but any assistance would be greatly appreciated!
Of note, I have tried utilizing the Congress.gov API but I can’t figure out how to get all proposed texts. I then tried to run a python script in Google Collab, but I kept getting a “gaierror” error that I couldn’t resolve. I’ve also tried ProPublica and govtrack.us but I couldn’t find a bulk data download option — only a bulk data query for view. I would still have to individually download each bill.
Reference Python Script:
#I removed my API key for privacy purposes, but I assure you it was in the script when I ran it
import requests
import json
def get_bill_data(congress_number):
base_url = “https://api.govinfo.gov“
endpoint = “https://api.congress.gov/v3/bill/117/hr/1/text?api_key=DEMO_KEY”.format(congress_number)
api_key = “[SQUATTINGFOX_API_KEY]”
url = base_url + endpoint
headers = {
“X-API-KEY”: api_key,
“Content-Type”: “application/json”
}
response = requests.get(url, headers=headers)
if response.status_code == 200:
data = response.json()
return data
else:
print(“Error retrieving bill data. Status Code:”, response.status_code)
return None
def save_bill_data(data, output_file):
with open(output_file, ‘w’) as file:
json.dump(data, file)
congress_number = “117”
output_file = “bills_data.json”
bill_data = get_bill_data(congress_number)
if bill_data:
save_bill_data(bill_data, output_file)
print(“Bill data saved to”, output_file)
submitted by /u/squattingfox
[link] [comments]