Here’s the general path that I take:
API > Parquet File(s) > Uploaded to S3 > Copy Into (From External Stage) > Raw Table
It’s all orchestrated by Dagster with asset checks along the way. Raw data is never transformed till after it’s in the db. I prefer using SQL instead of Python for cleaning data when possible.
submitted by /u/fruitstanddev
[link] [comments]