I’m planning on making an application scorecard for home loans as my bachelor thesis for University.
One of my(along with my academic supervisor’s) main concern is having a reliable dataset or rather the dataset being from a reliable source. One of the big questions that I’m going to be potentially challenged on in such a thesis is the dataset’s reliability so it can’t be from somewhere like Kaggle, but for a example somewhere like Experian/Equifax would be okay. I work at a bank and deal with such models but unfortunately I can’t use any company data (even if it gets anonymized). So far I’ve seen some promising stuff in FFIEC’s website but would like some additional sources so I can make a more educated decision
Roughly I would need the data to contain these fields:
Age
Job
Income
Education
Marriage Status
Information about previous defaults ( something like a Y/N if the applicant has defaulted on a loan in the last 5 years for example)
Type of property that would be purchased with the loan
Some other fields that I could potentially exclude in further analysis
submitted by /u/JesusBreakdancing
[link] [comments]