Open-source dataset of US professions. Two levels:
130 profession profiles in data/professions/us/profiles/. Each is a JSON with 7 sections – daily routine, regulations, tools, jargon, career levels + fears, community channels, labor market. All sourced from .gov, law.cornell.edu, BLS, and professional associations with source URLs attached to every fact. Built by running 7 targeted WebSearch queries per profession.
25 of those profiles also have generated pain bundles in data/professions/us/pains/. 8-15 inferred recurring pains per profession, each paired with a typed spec for the AI tool that would solve it (calculator with inputs/outputs/formula, checklist with steps and statutory refs, document template with variables, reference lookup keys, LLM advisor decision criteria). Generated by feeding the profile to Opus with a deductive system prompt – no web search at the generation step.
Sample of what comes out, from data/professions/us/pains/us-lawyers.json:
- Billable Hours & Fee Calculation (calculator)
- Statute of Limitations Lookup (reference)
- IOLTA Trust Account Reconciliation (calculator)
- Engagement Letter Drafting (template)
- Court Filing Deadline Calculator (calculator)
- … 8 more
And from data/professions/us/pains/us-auto-detailers.json:
- Cost-plus detail job pricing calculator (calculator, includes 2026 IRS mileage rate)
- EPA stormwater compliance checklist (checklist, $64,618/day Clean Water Act exposure)
- California Car Wash Act registration + surety bond (checklist, Labor Code §§ 2050-2067)
- Vehicle intake / pre-inspection form generator (template)
- Quarterly self-employment tax estimator (calculator, 15.3% SE tax)
- … 8 more
Each pain entry has: title, problem (2-3 sentences), affected segment, frequency, time_waste_h, money_risk_usd, source SCOPE section, skill_type, and a typed skill_spec matching the type. Schema docs in data/professions/us/_FORMAT.md.
Backstory: extending an MIT pain-mining repo I’d been running (court records based, B2B angle). Court records don’t have profession-level pain because professionals don’t litigate their own workflow tedium. Switched to web search for regulatory facts + offline LLM deduction for what’s painful given those facts.
Honest positioning: discovery dataset, not validated pain register. Pains are inferred from regulation + daily routine, not from real users complaining. Plausible starting points for customer-development interviews, not conclusions.
Both pipeline stages are in prompts/profession-scan/ so the dataset is fully regenerable. Country-aware – works for any country with adequate online regulatory data.
Repo: https://github.com/AyanbekDos/unfairgaps-os Cleanest single file to open: https://github.com/AyanbekDos/unfairgaps-os/blob/main/data/professions/us/pains/us-auto-detailers.json
MIT. PRs welcome for the remaining 105 profiles or non-US countries.
submitted by /u/Ogretape
[link] [comments]