Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Seeking Help For Thesis Research On E-commerce Growth Data China

I’m currently knee-deep in my thesis research exploring the factors influencing e-commerce growth, and I’m hitting a roadblock that I hope some of you might be able to help with.

I’ve got data for my independent variables—things like mobile phone penetration rate, urbanization, and education levels across populations in China, India, the United States, and Europe. However, when it comes to the crucial dependent variable of e-commerce growth, it’s proving to be quite the challenge.

I’m specifically looking for monthly or quarterly data, and my school insists on a substantial timeframe (2010 to 2020). The trouble is, finding this kind of data for all four regions is like finding a needle in a haystack, especially when comparing provincial data for China against the other regions.

If anyone has suggestions for alternative dependent variables or knows of sources for monthly/quarterly e-commerce growth data (even if it’s just for China), I’d be eternally grateful. My thesis is almost wrapped up, focusing on why China’s e-commerce growth stands out, but this data hiccup is causing a bit of a headache.

Thanks in advance for any insights or leads you can provide!

submitted by /u/trippie30
[link] [comments]

Seeking Help For Thesis Research On E-commerce Growth Data China

I’m currently knee-deep in my thesis research exploring the factors influencing e-commerce growth, and I’m hitting a roadblock that I hope some of you might be able to help with.

I’ve got data for my independent variables—things like mobile phone penetration rate, urbanization, and education levels across populations in China, India, the United States, and Europe. However, when it comes to the crucial dependent variable of e-commerce growth, it’s proving to be quite the challenge.

I’m specifically looking for monthly or quarterly data, and my school insists on a substantial timeframe (2010 to 2020). The trouble is, finding this kind of data for all four regions is like finding a needle in a haystack, especially when comparing provincial data for China against the other regions.

If anyone has suggestions for alternative dependent variables or knows of sources for monthly/quarterly e-commerce growth data (even if it’s just for China), I’d be eternally grateful. My thesis is almost wrapped up, focusing on why China’s e-commerce growth stands out, but this data hiccup is causing a bit of a headache.

Thanks in advance for any insights or leads you can provide!

submitted by /u/trippie30
[link] [comments]

Is It Possible To Obtain Car Repossessed Datasets 2022-2023?

I have never seen so many vehicles being repossessed in my lifetime than I am right now.

I live next to a rental building that houses like 50-100 people. Last year the parking lot was full.

Now its been almost emptied by tow trucks repossessing. Originally it had I’d estimate around 35 cars.

Its got maybe 11 vehicles left now…. The tow trucks come 3-4 times a day(Not sure how many times at night)

Its a long shot request, but figured it would not hurt to ask…

Just depressing to see this happening to…literally everyone who has been effected by joblosses(I’ll just leave it at that to not stirr up controversy) This nonstop appearances of tow trucks in the neighborhood started in October..(Atleast from what I’ve noticed..)

This data probably wouldnt be public now that I think about it, due to people looking to see if they are on the list and avoiding repossessions..

Would just love to analyze the difference between 2022-2023.

Thanks!

submitted by /u/Mido907
[link] [comments]

Understanding Azure Data Lake Storage Gen2

This article is about , “Understanding Azure Data Lake Storage Gen2” This article will cover: 💡
1- Why Azure Data Lake Storage Gen2
2- How to enable Azure Data Lake Storage Gen2
3- Azure Data Lake Gen2 vs Azure Blob Storage Gen2
If you are interested to understand Azure Data Lake Storage Gen2 you can access the full article here: https://devblogit.com/understand-azure-data-lake-storage-gen2/
Don’t miss out on this opportunity to transform your data practices and stay ahead of the competition. Read the article today and unlock the power of Azure Data Lake Storage Gen2! 💪#Azure #DataManagement #Analytics #DataLake

submitted by /u/Bubbly_Bed_4478
[link] [comments]

Synthetic Data For AGI Is Not THAT Hard (math Especially)

The fact is you could easily generate a lot of synthetic data just by asking an already trained bot to rewrite this as a given author that they have a lot of text they trained on. Or just have something like a thesaurus bot (maybe trains with Grammarly) that learns how to swap enough info out without changing the meaning (very strictly cause without this meaning being the same this training is useless although this may limit the scope of the changes allowed but is still generally better than no synthetic data (extremely easy to do with math cause it can just have math rules to define one step changes it generates) ) which is much easier to make than AGI. Thus whatever bot you are using the synthetic data to train on, it has to try to check if these two things the original and the synthetic data match in meaning. Thus it would have to understand the meaning or/and math to follow if the changes that were made match so it could replicate the process on its own.
So this could basically have a bot that can use Symbolab to train AGI in math.
And a bot that uses a more strict Grammarly or some form of thesaurus bot to train the AGI in language comprehension.

submitted by /u/Deamichaelis
[link] [comments]

SOS How To Make Stacked Bar Chart In Excel?

I need to make a stacked bar chart on a recurring basis. Included a few pictures here. The bar chart needs to show 15 grocery stores. Each grocery store has multiple applications. I need to show the number of users for each application by grocery store. Each grocery store application varies in maximum user size (between 100 -50,000).

I have a few problems: My data doesn’t have the exact data I need. The data has emails (with grocery stores embedded). The data also doesn’t have direct numbers, just “FALSE”. How do I turn all of this into a graph automatically and easily change the colors? Any advice is SO appreciated, thank you! I will literally PayPal for help.

submitted by /u/dbdhshhsh
[link] [comments]

Help Pulling Multiple .csv Files With Timestamped Data And Multiple Participants Into One File.

I have 11 .csv files containing data which has information about multiple participants in a study. All of the tables have a ‘timestamp’ column, some have ‘start-time’ and ‘end-time’ columns too. I then have 5 .csv files with data that is *not* timestamped – it contains some background/onboarding information collected at the beginning of the study.

I want to use this data to train a machine learning model.

I need to pull all of this information into one .csv file. I’m not sure how exactly to go about doing this. I’ve thought about matching timestamps for each table, and adding the relevant columns onto the row with the same timestamp, and just having the non-timestamped information in each row for that participant ID.

i.e., it would look something like this:

[ID] [timestamp] [feature1] [added feature 1] [added feature 2]

Then, all of the timestamps associated with each person’s id would have its own row, but some of the features would be empty/null values.

Would it make sense to do this? What are some methods I could use to achieve this?

submitted by /u/an-diabhal
[link] [comments]

Can Fair Use Principles Safeguard The Creation Of A Movie Scene Dataset For Research Purposes?

Hello, I am building a dataset for research purposes, and the content I am using is audiovisual and copyrighted. It consists of video clips from scenes in movies. I have observed that there are datasets with their accompanying papers available, and they don’t seem to have legal issues despite using copyrighted movie scenes.
I wanted to know if fair use covers this type of usage or what recommendations you could give me for publishing a dataset with these characteristics.
Thank you.

That datasets

HOLLYWOOD2: Actions in Context (CVPR 2009) HLVU: A New Challenge to Test Deep Understanding of Movies the Way Humans do MPII-MD: A Dataset for Movie Description MovieNet: A Holistic Dataset for Movie Understanding (ECCV 2020) MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions MovieQA: Story Understanding Benchmark (CVPR 2016) Video Person-Clustering Dataset: Face, Body, Voice: Video Person-Clustering with Multiple Modalities MovieGraphs: Towards Understanding Human-Centric Situations from Videos (CVPR 2018) Condensed Movies: Story Based Retrieval with Contextual Embeddings (ACCV 2020)

https://github.com/xiaobai1217/Awesome-Video-Datasets

Best regards.

submitted by /u/Tlaloc-Es
[link] [comments]

ETS Compliance Data In Useable Format?

I would like to have a panel dataset with for each year the ETS system existed:
– all firms that handed in too little ETS rights for their emissions
– the number of EST rights they were short
– (nice to have: sector and country)

This data is available at https://ec.europa.eu/clima/ets/allocationComplianceMgt.do?languageCode=en, but the format is really poor. To create a panel, I would need to select each country individually, select years one by one, export the compliance data and add all resulting csv files together. Using a webscrapper this should be doable maybe, but I haven’t done that before.

Companies not handing in enough ETS (and hence being fined) are identified by compliance status “B”. The number of rights they are short can be calculated based on the tables too.

My question is if anybody maybe knows if there is a more accessible version of this data available online. Or maybe someone already scrapped the database? Any leads are appreciated.

submitted by /u/AtkinsonStiglitz
[link] [comments]

Stock Market News Dataset – 2008 Or Later

Hello,

I’m working on a machine learning project, and need a large dataset of financial news. Specifically, I’m looking for news on companies that have a medium market cap or lower, and from a period of 2008 until now… or any interval of time over this period.

Is anyone aware of such a dataset? Or any websites where I can query historical financial news – ideally free?

Thank you.

submitted by /u/JustinPooDough
[link] [comments]

Enlighten Me About These Project’s Dataset.

I have a school project which involves creating an Ingredient-Based Recipe Generator Chatbot for Bicol Cuisine Main Dishes. The chatbot should generate recipes based on user commands, but these commands must contain a minimum of three ingredients. I plan to use fine-tuning with OpenAI’s language model. Since this is my first AI project, I’m a bit confused about how to begin creating the dataset. Can someone help me by explaining how I should go about creating the dataset?

submitted by /u/akameaoi
[link] [comments]

Want A Huge Dataset Of All English Songs

i want to train my AI on songs and poems, so i want a huge dataset of all english songs and poems, any suggestions on websites , i can scrape to get a large set of english songs only i heard of azlyrics but it contains other languages romanized versions too that makes it hard to get english songs only

submitted by /u/innocentboy0000
[link] [comments]

Providing Datasets, Leads As Needed. US Healthcare Available.

Hey all! 👋
👩‍⚕️ Healthcare Datasets Expertise:
Been diving into USA healthcare datasets for a year now 🏥✨
🔧 Services:
Web scraping, data management, and cleaning – I’ve got your data needs covered. Let’s tidy up those datasets and make them shine! 🌟
🌐 Tech Stack:
Python, Node.js, Puppeteer, Scrapy, Selenium, BS4 – name it, I’ve conquered it! 🚀
💬 Let’s Connect:
Ready to boost your projects with quality data? DM me, let’s chat and cook up something awesome together! 📬🤝

submitted by /u/purplepyramid7
[link] [comments]

Looking For Datasets: ClickStream, HealthCare, IOT, Agri, Edtech,Sales

I’m looking for raw datasets either session based or user based, (NOT THE AGGREGATED)

Here’s what I’m looking for, I’ll pay for any or all of the following, I’m fine either with one or many of these ….

1) IOT: timeseries dataset from individual IOT device, I’m fine with any data in it.

2) HealthCare: timeseries for individual patient or procedule, if you have anything else please let me know, it should not be aggregated

3)Agri: Individual sensors or any other device data along with location(perferable)

4)ClickStream: timeseries and session based

5) Sales: timeseries, user or session based along with product and sales cost

6) Edtech: let me know whatever you have.

Please DM me if you can help or point me to some source. I’m fine to pay or free or whatever works.

submitted by /u/Winter-Breadfruit943
[link] [comments]