117th U.S. Congress Bill Full Text Data Scrap?

I’m looking to scrap the full text for all the proposed bills from the 117th Congress. I want to run the data through NVIVO for content analysis. I tried just downloading all the texts individually from Congress.gov, but I am looking to have all 15,224 documents available for analysis so the one-by-one approach is really unrealistic. I haven’t been able to find this data in a pre-existing dataset, but any assistance would be greatly appreciated!

Of note, I have tried utilizing the Congress.gov API but I can’t figure out how to get all proposed texts. I then tried to run a python script in Google Collab, but I kept getting a “gaierror” error that I couldn’t resolve. I’ve also tried ProPublica and govtrack.us but I couldn’t find a bulk data download option — only a bulk data query for view. I would still have to individually download each bill.

Reference Python Script:

#I removed my API key for privacy purposes, but I assure you it was in the script when I ran it

import requests

import json

def get_bill_data(congress_number):

base_url = “https://api.govinfo.gov

endpoint = “https://api.congress.gov/v3/bill/117/hr/1/text?api_key=DEMO_KEY”.format(congress_number)

api_key = “[SQUATTINGFOX_API_KEY]”

url = base_url + endpoint

headers = {

“X-API-KEY”: api_key,

“Content-Type”: “application/json”

}

response = requests.get(url, headers=headers)

if response.status_code == 200:

data = response.json()

return data

else:

print(“Error retrieving bill data. Status Code:”, response.status_code)

return None

def save_bill_data(data, output_file):

with open(output_file, ‘w’) as file:

json.dump(data, file)

congress_number = “117”

output_file = “bills_data.json”

bill_data = get_bill_data(congress_number)

if bill_data:

save_bill_data(bill_data, output_file)

print(“Bill data saved to”, output_file)

submitted by /u/squattingfox
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *