Got a few ones from kaggle but they aren’t amazing
submitted by /u/Pablo-94
[link] [comments]
Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?
I’m building a web page to provide company-specific contact information and the steps to take to close or transfer accounts after someone dies. Trying to figure out the best way to identify companies to request info from. Thanks! https://www.buriedinwork.com/company-contacts
submitted by /u/apzuckerman
[link] [comments]
I’m seeking real estate agent email data for when offers come into a realtors email.
submitted by /u/No-Exam5695
[link] [comments]
Hi all!
For the past few months, after uploading this post in r/PushShift, I had a chance to have quite a lot of discussions with academic researchers with this. I soon noticed that sharing historical database often goes against universities’ IRB (and definitely the new Reddit’s t&c), so that project had to be shutdown. But based on the discussions, I worked on a new tool that adheres strictly to Reddit’s terms and conditions, and also maintaining alignment with the majority of Institutional Review Board (IRB) standards.
The tool is called RedditHarbor and it is designed specifically for researchers with limited coding backgrounds. While PRAW offers flexibility for advanced users, most researchers simply want to gather Reddit data without headaches. RedditHarbor handles all the underlying work needed to streamline this process. After the initial setup, RedditHarbor collects data through intuitive commands rather than dealing with complex clients.
Here’s what RedditHarbor does: – Connects directly to Reddit API and downloads submissions, comments, user profiles etc. – Stores everything in a Supabase database that you control – Handles pagination for large datasets with millions of rows – Customizable and configurable collection from subreddits – Exports the database to CSV/JSON formats for analysis
Why I think it could be helpful to other researchers: – No coding needed for the data collection after initial setup. (I tried maximizing simplicity for researchers without coding expertise.) – While it does not give you an access for entire historical data (like PushShift or Academic Torrents), it complies with most IRBs. By using approved Reddit API credentials tied to a user account, the data collection meets guidelines for most institutional research boards. This ensures legitimacy and transparency. – Fully open source Python library built using best practices – Deduplication checks before saving data – Custom database tables adjusted for reddit metadata
Please check it out and let me know your thoughts! I would love to hear any feedbacks and feature requests 🙂
Actively maintained and adding new features (i.e collect submissions by keywords)
submitted by /u/nickshoh
[link] [comments]
I’m looking for GeoJSON of the world’s NAVAREAs and subregions but I can’t seem to find them anywhere. I can find pictures of them (like the one below) but that’s not really what I need.
I would have thought IHO would have had something like this but it’s not on their website, they can’t be messaged on Twitter, and they seem to be unable to make any sort of commitment that there might even be set of internationally recognized areas of responsibility without having their lawyers present to advise them.
NAVAREA boundaries are created only for information purpose and it does not constitute an endorsement or approval of them and the IHO does not vouch for the validity or accuracy of these boundaries.
submitted by /u/hrokrin
[link] [comments]
I’ve been trying to find some API that can allow me to get information on upcoming flights such as origin, destination, number of stops and prices. But so far I’ve come across none that are usable. There were two major ones that I thought might work: Skyscanner and Google Flights, but Skyscanner only allows for commercial use and google flights api doesn’t exist somehow… Not sure where to go from here.. I’m thinking of building my own api by scrapping but that is extremely in-efficient and sounds like a dumb idea…
submitted by /u/Competitive-Adagio18
[link] [comments]
Hi! I need data set for UN’s 16th sustainable goal which is “peace, justice, and strong institutions”. I know there are open source data sets available but all of these datasets only have 1 to 2 variables at max such as homicides number by age/sex. I need a data set where I can run multiple linear regression, make 3D scatterplot, scatter plot matrix, heat map.
All of these require multiple numeric data.
submitted by /u/rantings-of-troubled
[link] [comments]
Hey! As the title states, I’m looking for a comprehensive list of US motorcycle owners emails and license data. For the license data we’d like a dataset of every single person in the US that possesses a motorcycle license, preferably with these attributes: First Name, Last Name, License Number, Email, Phone Number, Address.
Budget of ~$60,000 but willing to go higher if the data is high quality. I know this type of data is pretty hard to get but I appreciate any hints on where to acquire it!
submitted by /u/nobilis_rex_
[link] [comments]
Hello, do you have a link where I can find a csv of professionals biats registered with identification info such as MMSI, IMO, length, type and class. Really appreciated, for a school project (data integration)
submitted by /u/isthatnicknameused
[link] [comments]
Hi guys, I’m researching customer behavior in Vietnam and would like to have access to historical
anonymous mobile location data to find insights into customers’ favorite locations. Is there any free dataset that I could use to achieve this? Or I can buy it if it is less than $100 (sorry not much, because I’m still in college). Thank you.
submitted by /u/Thanh-Do
[link] [comments]
I’m currently knee-deep in my thesis research exploring the factors influencing e-commerce growth, and I’m hitting a roadblock that I hope some of you might be able to help with.
I’ve got data for my independent variables—things like mobile phone penetration rate, urbanization, and education levels across populations in China, India, the United States, and Europe. However, when it comes to the crucial dependent variable of e-commerce growth, it’s proving to be quite the challenge.
I’m specifically looking for monthly or quarterly data, and my school insists on a substantial timeframe (2010 to 2020). The trouble is, finding this kind of data for all four regions is like finding a needle in a haystack, especially when comparing provincial data for China against the other regions.
If anyone has suggestions for alternative dependent variables or knows of sources for monthly/quarterly e-commerce growth data (even if it’s just for China), I’d be eternally grateful. My thesis is almost wrapped up, focusing on why China’s e-commerce growth stands out, but this data hiccup is causing a bit of a headache.
Thanks in advance for any insights or leads you can provide!
submitted by /u/trippie30
[link] [comments]
I’m currently knee-deep in my thesis research exploring the factors influencing e-commerce growth, and I’m hitting a roadblock that I hope some of you might be able to help with.
I’ve got data for my independent variables—things like mobile phone penetration rate, urbanization, and education levels across populations in China, India, the United States, and Europe. However, when it comes to the crucial dependent variable of e-commerce growth, it’s proving to be quite the challenge.
I’m specifically looking for monthly or quarterly data, and my school insists on a substantial timeframe (2010 to 2020). The trouble is, finding this kind of data for all four regions is like finding a needle in a haystack, especially when comparing provincial data for China against the other regions.
If anyone has suggestions for alternative dependent variables or knows of sources for monthly/quarterly e-commerce growth data (even if it’s just for China), I’d be eternally grateful. My thesis is almost wrapped up, focusing on why China’s e-commerce growth stands out, but this data hiccup is causing a bit of a headache.
Thanks in advance for any insights or leads you can provide!
submitted by /u/trippie30
[link] [comments]
The fact is you could easily generate a lot of synthetic data just by asking an already trained bot to rewrite this as a given author that they have a lot of text they trained on. Or just have something like a thesaurus bot (maybe trains with Grammarly) that learns how to swap enough info out without changing the meaning (very strictly cause without this meaning being the same this training is useless although this may limit the scope of the changes allowed but is still generally better than no synthetic data (extremely easy to do with math cause it can just have math rules to define one step changes it generates) ) which is much easier to make than AGI. Thus whatever bot you are using the synthetic data to train on, it has to try to check if these two things the original and the synthetic data match in meaning. Thus it would have to understand the meaning or/and math to follow if the changes that were made match so it could replicate the process on its own.
So this could basically have a bot that can use Symbolab to train AGI in math.
And a bot that uses a more strict Grammarly or some form of thesaurus bot to train the AGI in language comprehension.
submitted by /u/Deamichaelis
[link] [comments]
Newbie to research. Looking for good datasets accessible to medical residents for research on body comp / muscle mass, and health outcomes. Thank you!
submitted by /u/Easy-Sheepherder-248
[link] [comments]
I need to make a stacked bar chart on a recurring basis. Included a few pictures here. The bar chart needs to show 15 grocery stores. Each grocery store has multiple applications. I need to show the number of users for each application by grocery store. Each grocery store application varies in maximum user size (between 100 -50,000).
I have a few problems: My data doesn’t have the exact data I need. The data has emails (with grocery stores embedded). The data also doesn’t have direct numbers, just “FALSE”. How do I turn all of this into a graph automatically and easily change the colors? Any advice is SO appreciated, thank you! I will literally PayPal for help.
submitted by /u/dbdhshhsh
[link] [comments]
I have never seen so many vehicles being repossessed in my lifetime than I am right now.
I live next to a rental building that houses like 50-100 people. Last year the parking lot was full.
Now its been almost emptied by tow trucks repossessing. Originally it had I’d estimate around 35 cars.
Its got maybe 11 vehicles left now…. The tow trucks come 3-4 times a day(Not sure how many times at night)
Its a long shot request, but figured it would not hurt to ask…
Just depressing to see this happening to…literally everyone who has been effected by joblosses(I’ll just leave it at that to not stirr up controversy) This nonstop appearances of tow trucks in the neighborhood started in October..(Atleast from what I’ve noticed..)
This data probably wouldnt be public now that I think about it, due to people looking to see if they are on the list and avoiding repossessions..
Would just love to analyze the difference between 2022-2023.
Thanks!
submitted by /u/Mido907
[link] [comments]
This article is about , “Understanding Azure Data Lake Storage Gen2” This article will cover: 💡
1- Why Azure Data Lake Storage Gen2
2- How to enable Azure Data Lake Storage Gen2
3- Azure Data Lake Gen2 vs Azure Blob Storage Gen2
If you are interested to understand Azure Data Lake Storage Gen2 you can access the full article here: https://devblogit.com/understand-azure-data-lake-storage-gen2/
Don’t miss out on this opportunity to transform your data practices and stay ahead of the competition. Read the article today and unlock the power of Azure Data Lake Storage Gen2! 💪#Azure #DataManagement #Analytics #DataLake
submitted by /u/Bubbly_Bed_4478
[link] [comments]
I have been looking for a data marketplace which covers different types of data for our business . We like Techsalerator but want to benchmark them to see if there other good alternatives out there
submitted by /u/EnvironmentOk772
[link] [comments]
Learn how to use Microsoft’s Azure OpenAI Service, the most private and secure way to use GPT-4.
To know more>>>https://team-gpt.com/learn/chatgpt-for-work-course
submitted by /u/LongjmpingShower
[link] [comments]
I have 11 .csv files containing data which has information about multiple participants in a study. All of the tables have a ‘timestamp’ column, some have ‘start-time’ and ‘end-time’ columns too. I then have 5 .csv files with data that is *not* timestamped – it contains some background/onboarding information collected at the beginning of the study.
I want to use this data to train a machine learning model.
I need to pull all of this information into one .csv file. I’m not sure how exactly to go about doing this. I’ve thought about matching timestamps for each table, and adding the relevant columns onto the row with the same timestamp, and just having the non-timestamped information in each row for that participant ID.
i.e., it would look something like this:
[ID] [timestamp] [feature1] [added feature 1] [added feature 2]
Then, all of the timestamps associated with each person’s id would have its own row, but some of the features would be empty/null values.
Would it make sense to do this? What are some methods I could use to achieve this?
submitted by /u/an-diabhal
[link] [comments]
Hello, I am building a dataset for research purposes, and the content I am using is audiovisual and copyrighted. It consists of video clips from scenes in movies. I have observed that there are datasets with their accompanying papers available, and they don’t seem to have legal issues despite using copyrighted movie scenes.
I wanted to know if fair use covers this type of usage or what recommendations you could give me for publishing a dataset with these characteristics.
Thank you.
That datasets
HOLLYWOOD2: Actions in Context (CVPR 2009) HLVU: A New Challenge to Test Deep Understanding of Movies the Way Humans do MPII-MD: A Dataset for Movie Description MovieNet: A Holistic Dataset for Movie Understanding (ECCV 2020) MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions MovieQA: Story Understanding Benchmark (CVPR 2016) Video Person-Clustering Dataset: Face, Body, Voice: Video Person-Clustering with Multiple Modalities MovieGraphs: Towards Understanding Human-Centric Situations from Videos (CVPR 2018) Condensed Movies: Story Based Retrieval with Contextual Embeddings (ACCV 2020)
https://github.com/xiaobai1217/Awesome-Video-Datasets
Best regards.
submitted by /u/Tlaloc-Es
[link] [comments]
Where to source historical crime datesets? Preferably European country and up-to-date
submitted by /u/charlieclarkeuk
[link] [comments]
We have developed backlinks api to analyze your link profile and get more ideas about how good your link profile is and how you should improve it.
Also, included wiki backlinks. e.g for given domain, check if your domains appear in more than 30+ different Wikipedia pages and in different languages.
https://rapidapi.com/getbishopi/api/backlinks-api1/
submitted by /u/xseson23
[link] [comments]
I would like to have a panel dataset with for each year the ETS system existed:
– all firms that handed in too little ETS rights for their emissions
– the number of EST rights they were short
– (nice to have: sector and country)
This data is available at https://ec.europa.eu/clima/ets/allocationComplianceMgt.do?languageCode=en, but the format is really poor. To create a panel, I would need to select each country individually, select years one by one, export the compliance data and add all resulting csv files together. Using a webscrapper this should be doable maybe, but I haven’t done that before.
Companies not handing in enough ETS (and hence being fined) are identified by compliance status “B”. The number of rights they are short can be calculated based on the tables too.
My question is if anybody maybe knows if there is a more accessible version of this data available online. Or maybe someone already scrapped the database? Any leads are appreciated.
submitted by /u/AtkinsonStiglitz
[link] [comments]