What Tools And Tech Should I Use To Build An Open Source Dataset?

Hey everyone,

I want to build an open source dataset in the clinical trial space. I’m looking for some tech/tools recommendations that make building an open source dataset easy.

I guess the easiest would just be to set up a Google Sheet and Google Form to get new data submissions. I also came across: https://github.com/dolthub/dolt, but this seems to be quite expensive.

Some requirements that need to be fulfilled:
– The core dataset should be public, but we want to restrict access to contact information such as email or phone numbers to avoid that people get spammed
– People should be able to submit new data or submit updates to existing data points, but this data should be verified before it’s written to the public dataset – The final dataset could become quite large (10-20GB). Google Sheet won’t work with this – Users and contributors are non-technical. So it needs to be easy for them to user

Would be curious to learn more about how other people have built their datasets.

Thanks a lot!

submitted by /u/Affenbob123
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *