I have 11 .csv files containing data which has information about multiple participants in a study. All of the tables have a ‘timestamp’ column, some have ‘start-time’ and ‘end-time’ columns too. I then have 5 .csv files with data that is *not* timestamped – it contains some background/onboarding information collected at the beginning of the study.
I want to use this data to train a machine learning model.
I need to pull all of this information into one .csv file. I’m not sure how exactly to go about doing this. I’ve thought about matching timestamps for each table, and adding the relevant columns onto the row with the same timestamp, and just having the non-timestamped information in each row for that participant ID.
i.e., it would look something like this:
[ID] [timestamp] [feature1] [added feature 1] [added feature 2]
Then, all of the timestamps associated with each person’s id would have its own row, but some of the features would be empty/null values.
Would it make sense to do this? What are some methods I could use to achieve this?
submitted by /u/an-diabhal
[link] [comments]