I’m dealing with a multitude of CSV files where the formats and structures vary widely, with mixed styles, inconsistent headers, and sometimes even headers smack in the middle of the data. It’s a nightmare for any machine learning endeavor.
Manually cleaning and preprocessing these files would be imposible as there are too many small tables, and I’m wondering if there’s an out-of-the-box AI or deep learning solution that can help. Ideally, I’m looking for something that can among other preprocessing steps:
Identify and standardize headers Split tables if there’s an unexpected header in the middle Fill in missing values Turn these chaotic CSVs into clean, ML-friendly tables
Has anyone encountered a tool or model that can handle such tasks? Any recommendations or advice would be a lifesaver!
Thanks in advance for your help!
submitted by /u/Apprehensive_View366
[link] [comments]