Hi everyone,
I’m a data science researcher focusing on process engineering and optimization, and I’m looking to further strengthen my knowledge through different use cases. I’m reaching out for recommendations on extensively large datasets that can be processed using cloud platforms.
My goal is to create an end-to-end Data Science/Data Engineering project that involves ingesting these large datasets and applying domain knowledge to derive insights. I’m particularly interested in **time series** modeling, which is crucial for capturing temporal trends.
Some areas I’m considering include:
Oil and gas unit operations datasets Carbon Capture, Utilization, and Storage (CCUS) datasets FMCG manufacturing datasets, such as edible oil or biomass production Water treatment units, especially where time-sensitive data is key
To give you an idea of my background, I’ve worked on modeling and optimization in amine treating, sulfur recovery, and carbon capture datasets. I’ve also successfully developed an anomaly detection model for the Tennessee Eastman process. However, I’m eager to dive deeper into time series modeling for my next project.
Major requirements:
Focus on time series data Can involve classification or regression tasks Comparatively large datasets with many columns (variables) and datapoints
I would greatly appreciate any suggestions or pointers to datasets that align with what I mentioned.
Thanks in Advance!
submitted by /u/ryanroy0698
[link] [comments]