Access 150k+ Datasets From Hugging Face With DuckDB

I am not sure this is kosher but it seems really interesting

submitted by /u/cavedave
[link] [comments]

Untidy Dataset Required For The Project

I needed untidy dataset.

One of the selected data sets must not follow at least of the tidy data principles. In tidy data where each variable must have its own column or Each observation must have its own row.

submitted by /u/Front-Benefit8232
[link] [comments]

0

In Need Of Datasets Of Indian And Carabao Mango Leaves

Hello everyone,

I am a college student currently working on a thesis about machine learning, specifically focused on identifying Indian and Carabao mango leaves with and without anthracnose disease using a CNN model.

At this stage, I need a large number of datasets, likely 1000 and more images, from the mentioned varieties of mango. I am looking for datasets of leaves affected by anthracnose disease as well as healthy leaves from both Carabao mango and Indian mango varieties.

I am reaching out in the hope that you can help us find these datasets, as they will serve as the primary data for our thesis.

Thank you very much for considering my request.

submitted by /u/chadmomentgiga
[link] [comments]

0

Looking For A Dataset Of Currently Reported As Phishing/scam Crypto Wallets

Hi guys,

I’m currently working on a project to enhance the detection and prevention of cryptocurrency scams and phishing attempts. A crucial part of this project is identifying and analyzing scam crypto wallets that have been reported by users and security experts.

I am looking for a reliable and up-to-date dataset that contains information about cryptocurrency wallets reported as being involved in phishing or scam activities. Ideally, this dataset should include details such as:

Wallet addresses Type of scam or phishing attempt

If anyone knows where I can find such a dataset or has resources that could help, I would greatly appreciate your assistance. Open-source datasets or any repositories maintained by security communities or organizations would be extremely helpful.

Thank you in advance for your help!

submitted by /u/Funny-Accident-5612
[link] [comments]

0

Datasets Request About Carabao And Indian Mango Leaves

Hello everyone,

I am currently working on a machine learning, specifically focused on identifying Philippine Indian and Carabao mango leaves with and without anthracnose disease using a CNN model.

At this stage, I need a large number of datasets, likely 1000 and more images, from the mentioned varieties of mango. I am looking for datasets of leaves affected by anthracnose disease as well as healthy leaves from both Carabao mango and Indian mango varieties.

Thank you very much for considering my request.

submitted by /u/chadmomentgiga
[link] [comments]

0

Microsoft Access Question: Copying Data From Excel

Hi, I am learning my companies data management system from scratch, and am trying to figure out if I copy things FROM excel INTO access in the Query section or the Table section? I am pretty sure table but want to be sure. Thanks!

submitted by /u/suzimakesthings
[link] [comments]

0

What Is Your Favorite Dataset For Training Yourself?

What is your favorite dataset to learn new methods?

submitted by /u/MinerOfIdeas
[link] [comments]

0

Looking For A Dataset On Suicides In The US

Hi everyone,

Maybe someone knows some open access datasets on suicides committed in the U.S. (or number of death if there is variable for the cause of death) per year (from about 2015 to at least 2020) and per state. The more addition variables there are (such as gender, age, employment status, etc.), the better.

Hope that maybe some of you have seen something of this sort🙏

submitted by /u/dollala
[link] [comments]

0

UK Private Companies Datasets For 25m+ Filings

We are a UK FinTech company and have launched a new product that automatically extracts data (including handwritten) from 25 million filings for millions of UK companies. In addition, there are insights and easy-to-consume charts and tables. The automatically extracted data includes/ provides the following data for 2m+ private companies:

An industry-first price-per-share and last-round-valuation (market capitalisation) chart Capital structure, shareholding, and the change in shareholding Equity fundraising trends in the UK Top fundraisers and investors in the UK

I would like to hear your feedback on our UK company insights data 🙂

submitted by /u/olive_er
[link] [comments]

0

Automated Dataset Generation And Augmentation

Hi guys, I’ve been working on a fine tuned llama3 for quite some time now and want to expand the dataset. Are there any good automated solutions to generate these datasets from pdf or html and can these be augmented automatically?

Thanks so much in advance

submitted by /u/OkVegetable2512
[link] [comments]

0

[Paid] Anonymized Dataset For Market Analysis

I’m selling a high quality dataset that includes(Email address, Full Name, Phone number, Age, Location(country), Gaming Platforms Owned (e.g., PC, PlayStation, Xbox, Android, etc.), etc.)

Price: $1.20 per individual ($120 total)

Format: CSV, Excel and PDF

Delivery: Secure download link or Direct file

DM If you are interested

submitted by /u/Money_Ad3408
[link] [comments]

0

United Kingdom Gender Pay Gap Dataset

New Dataset here: UK Gender Pay Gap

I would like to invite all of you kindly visit, open and upvote this dataset.
If you found it valuable then download it and leave a comment.

Your support and appreciation means a lot.

Link: https://www.kaggle.com/…/uk-gender-pay-gap-data-2018-2023

submitted by /u/Umer_Haddii
[link] [comments]

0

Financial Data Of Football (soccer) Clubs

Hey,

Does anyone know how I can obtain financial data of football (soccer) clubs?

I need it for the smaller clubs in Europe as well, not only the top clubs, and for as many years as possible.

Any thoughts?

Thanks!

submitted by /u/Porcoddio45
[link] [comments]

0

Datasets For Predicting Consumer Credit Trends

Hi, I wanted to ask if anyone has open data sets with features that can be used to predict consumer credit trends, including demographic information, financial behavior, and transaction history. I’ve been looking for a few hours but can’t find a good data set.

submitted by /u/ReplyConscious1561
[link] [comments]

0

Lyric Dataset With Song Structure For Commercial Use

Hey, I’m trying to find a dataset that contains lyrics and the song structure, exactly like https://genius.com

For example:

[Intro]
Psst, I see dead people
(Mustard on the beat, ho)

[Verse 1]
Ayy, Mustard on the beat, ho
Deebo any rap nigga, he a free thro

Genius doesn’t allow scraping or the usage of his data for commercial use

Except as expressly authorized by Genius in writing, you agree not to modify, copy, frame, scrape, rent, lease, loan, sell, distribute or create derivative works based on the Service or the Genius Content, in whole or in part, except that the foregoing does not apply to your own User Content (as defined above) that you legally upload to the Service. In connection with your use of the Service you shall not engage in or use any data mining, robots, scraping or similar data gathering or extraction methods. Any use of the Service or the Genius Content other than as specifically authorized herein is strictly prohibited. As between you and Genius, the technology and software underlying the Service or distributed in connection therewith is the exclusive property of Genius, our affiliates and our partners (the “Software”). You agree not to copy, modify, create a derivative work of, reverse engineer, reverse assemble or otherwise attempt to discover any source code, sell, assign, sublicense, or otherwise transfer any right in the Software. Any rights not expressly granted herein are reserved by Genius.

Do you know any other source of data that contains the lyrics and the song structure (chorus, verse, etc)? I want to fine-tune whisper to transcribe lyrics with these tags for a commercial product (music generation model).

I think that suno.com has used genius.com for their music model because they use the same tag for song structure xD.

submitted by /u/Which-Breadfruit-926
[link] [comments]

0

Online User Activity Data With Ananomyzed Emailid

Hi,

I’m looking for online user activity data with ananomyzed emailid… can somebody point me to right contact, please

submitted by /u/Winter-Breadfruit943
[link] [comments]

0

Building A Collection Of The Best Datasets And Resources

Hey scientists!

I’m working on cooldata, I’d like to build a more useful way to access open data online.

What are the best resources you use everyday (data.gov, etc…)? And more importantly why do use them and how?

I’m starting this by myself as a 20% personal project, the goal is to be fully open and maybe also open source as the thing moves on. (If anyone wants to apply to contribute I’m happy to listen! just send a dm)

Have a nice day!

submitted by /u/antonscap
[link] [comments]

0

Tableau Help Or Better Yet, Can You Analyze My Data?

It has been a while (10yrs) and I can’t figure out how to do a join of several tables using date/time in Tableau Public. Backstory; I have a annoying health condition (SIBO) that is starving my body of nutrients and I am trying figure things out by tracking methane, hydrogen, food intake, meds, symptoms, etc.

https://public.tableau.com/app/profile/mfinaly/viz/SmallIntestinesBacteriaOvergrowth/TrackingmySIBO

submitted by /u/Immediate_Ad3066
[link] [comments]

0

Generate Differentially Private Synthetic Text For Fine-tuning AI Models

submitted by /u/Repeat-or
[link] [comments]

0

How To Scrape Subtitles?

There is very little Irish language text, audio and english translation. One of the best sources is this soap opera

https://www.tg4.ie/en/player/play/?pid=6352950048112&title=Ros%20na%20R%C3%BAn&series=Ros%20na%20R%C3%BAn&pcode=669535&genre=Drama

It is fairly easy to find the url of the subtitles when on that webpage manually

getting the vtt file

But the vtt URL uses UUIDs that seem pretty random

https://redirector.playback.eu-west-1.prod.deploys.brightcove.com/v1/1555966122001/7b5d6364-47e2-4016-ae63-93301a7f4e38/ff7182e5-8f90-4af9-8d35-41a3bae7fa1e/441366d1-6c40-4106-9c0f-ecfdc21476b0.vtt

https://redirector.playback.eu-west-1.prod.deploys.brightcove.com/v1/1555966122001/83680fe1-8055-4494-96ff-bc2786f937cc/652c30ad-ff11-45d4-9e0c-46db42f5a34c/0ab149e4-25b0-4c73-8c9a-8130d647de91.vtt

There are subtitle archive sites but this soap opera is not there. So how would you extract a few hundred sets of VTT files (I want to build NLP datasets , ngrams etc, not make money or anything).

I can imagine answers of

With this site you can hire someone and if you show them the steps they can extract them for you cheap

With this mouse emulator you can do it by XYZ

There is away around the UUIDs being random by XYZ

But I do not know how any of these would actually work.

submitted by /u/cavedave
[link] [comments]

0

Looking For Bacterial Growth Per Time Dataset

hello everyone, thank you for reading this post. Like the title says I’m looking for a dataset experimental one about bacterial growth per time (if you have the protocole it would be better but a real one would be awesome and the source). I try to simulate a bacterial growth model and trying to compare to a real one Ty for your attention. All the best for everyone <3

submitted by /u/Fickle_Buy7668
[link] [comments]

0

Dataset Browsing Behavior / Search History

Hi everyone,

I am looking to analyze browsing data holistically, so I would like to understand what pages users visit. Best would be search history data from browsers. It would be great if it was recent too (2021-2024). Does anyone know of anything like that? I am a PhD student so I only have limited budget.

Thank you in advance!

submitted by /u/KeyScale1232
[link] [comments]

0

Bodybuilder/fitness Model Image Dataset

Hi, I am wondering if anyone has any idea if there is a dataset for images of bodybuilders/fitness models, I have been looking all day online and haven’t found a single dataset dedicated to it. Thank you!

submitted by /u/ZavierTheSavior
[link] [comments]

0

I Need Ideas For My Data Science Project

(what’s this link thing?) Hello folks, I need ideas of datasets that I can use for a data analisys for my college. I thought about the relation between more developed countries x unemployment or a dataset that I found that contained a study about what may be the most commom way to study a subject and if it’s effective or not, however I couldn’t find the source of the data so if you guys could help me find these or maybe give me some better ideas I would thank a lot

submitted by /u/vitstola
[link] [comments]

0

Does Anyone Have Data Or A Source Showing How Much Greater Federal Investment In Highways Was Compared To Public Transit Between 1960 And 1980, On Average?

Does anyone have data or a source showing how much greater federal investment in highways was compared to public transit between 1960 and 1980, on average?

submitted by /u/Cpwkid
[link] [comments]

0

Moon Dataset For Summer Research Project

Hi all, I am working on a research project and require pictures of the moon with the dates those pictures were taken on.

Any kind of pictures of the moon with dates would work. Even better if instead of dates it would say what day of the lunar month the picture was taken on.

Thank you in advance!

submitted by /u/Apprehensive-Web5650
[link] [comments]

0

Movie Title Screen Dataset With More And Newer Data

I am in search of a dataset of movies along with images of the title screen. There exists this dataset https://www.shillpages.com/movies/index2.shtml
However this is getting outdated and doesn’t have a lot of data to work with. Does anyone know of a movies dataset that also contains images of the title screen?

submitted by /u/JadyBray27
[link] [comments]

0

Cannabis Industry Data Organized By Geographical Region, Individual Sectors, And Hemp/CBD

submitted by /u/OregonTripleBeam
[link] [comments]

0

Open E-commerce 1.0: Five Years Of Crowdsourced U.S. Amazon Purchase Histories With User Demographics – Harvard Dataverse

submitted by /u/yaph
[link] [comments]

0

Open Sourcing Touristic POI Database – Questions Around Format, Interest

We’re planning to open source our touristic POI Database (currently 1.4 Million points worldwide). There is some effort involved in generalizing it from our internal format so I wanted to confirm that a) there is interest in it as well get some feedback on the format. I’ve also outlined the process of creating/ updating the dataset, as it gives some insight what to expect from the dataset and if it interests anyone, probably the people in this sub.

POI data points

Location (mandatory) Category (mandatory, more on that later) Name Images ( designated thumbnail with blur hash, all with (permissive licensing information) Localizations (consisting of a name, teaser and description in one of the supported languages, availability depends) Rating (mandatory, more on that later) Source (mandatory, such as Wikidata, OSM, tourism council etc.) Type (most POIs are individual sights but „special“ POIs such as places ie cities/towns exist ) Parent (if it exists, a „special“ poi such as a city or town ) Links/References (links to Wikidata entity, Wikipedia/Wikivoyage articles in different languages but also links to social media (fb, ig, twitter etc.), booking sites (agoda, booking, hotels.com etc. ) or relevant 3rd party sites such as Trip Advisor, Atlas Obscura etc.. Misc. Properties: Webaddress Telephone Zip Code Opening Hours Heritage Designation (UNESCO, UK Grade I building ) etc. More depending on the source

We derive our content from many different sources, some of them we simple map to the above format (especially those derived from regional or country level Tourism councils ). The bulk is however combined from Wikidata, Wikipedia, Wikivoyage and OpenStreetMap in the following manner.

Process

Process the complete Wikidata Dump, filtering out all entities that possess a geocoordinate and an instance of-claim. The instance of claim is then checked against a list of touristically relevant classes. Note: This claim can be very specific such as olive sand beach or agricultural theme park so that we expand our list of touristically relevant classes (ie beach and amusement park) to include the descendant subclasses. We get a lot of structured information from this source (especially links to other sites) but little in description, images etc. Process all linked articles in the different language versions of wikipedia/wikivoyage (at the moment we look at the English, German, French, Spanish, Italian, Portuguese and Polish sites). Extract teaser and shorter excerpts for descriptions (Localizations) as well as images with their respective licenses. Clean-Up low quality & unspecific images Assign Parents depending on the “located in adminstrative Region” – claim to “special” POIs (cities, towns), the assigned pois then form an area that are used to assign further Pois in that area to the same parent.

Two things would require some work: category and rating. We map information from sources to an internal category representation. It is binary, fast to filter with bit masks but not very flexible and probably not that easy to use. For the open source version I was thinking of creating a taxonomy somewhat similar to the one Foursquare uses but other suggestions are appreciated.

The rating combines a somewhat objective data quality rating (amount of images, links to wikipedia articles, length of descriptions etc., types of properties present) with a biased weighting of categories (among other information) that fits our use case. We also use user reviews/rating but that wouldn’t be part of the dataset. We could use a slightly more generalized aggregate rating and/ or different rating components but more likely than not you would want to use your own weighting if your use case is sufficiently different, so I guess I am wondering what expectations or requests there are here.

Export Formats

TSV and GeoJSON Feature Collections but open to suggestions.

submitted by /u/berlumptsss
[link] [comments]

0

Category: Datatards

Access 150k+ Datasets From Hugging Face With DuckDB

Untidy Dataset Required For The Project

In Need Of Datasets Of Indian And Carabao Mango Leaves

Looking For A Dataset Of Currently Reported As Phishing/scam Crypto Wallets

Datasets Request About Carabao And Indian Mango Leaves

Microsoft Access Question: Copying Data From Excel

What Is Your Favorite Dataset For Training Yourself?

Looking For A Dataset On Suicides In The US

UK Private Companies Datasets For 25m+ Filings

Automated Dataset Generation And Augmentation

[Paid] Anonymized Dataset For Market Analysis

United Kingdom Gender Pay Gap Dataset

New Dataset here: UK Gender Pay Gap

Financial Data Of Football (soccer) Clubs

Datasets For Predicting Consumer Credit Trends

Lyric Dataset With Song Structure For Commercial Use

Online User Activity Data With Ananomyzed Emailid

Building A Collection Of The Best Datasets And Resources

Tableau Help Or Better Yet, Can You Analyze My Data?

Generate Differentially Private Synthetic Text For Fine-tuning AI Models

How To Scrape Subtitles?

Looking For Bacterial Growth Per Time Dataset

Dataset Browsing Behavior / Search History

Bodybuilder/fitness Model Image Dataset

I Need Ideas For My Data Science Project

Does Anyone Have Data Or A Source Showing How Much Greater Federal Investment In Highways Was Compared To Public Transit Between 1960 And 1980, On Average?

Moon Dataset For Summer Research Project

Movie Title Screen Dataset With More And Newer Data

Cannabis Industry Data Organized By Geographical Region, Individual Sectors, And Hemp/CBD

Open E-commerce 1.0: Five Years Of Crowdsourced U.S. Amazon Purchase Histories With User Demographics – Harvard Dataverse

Open Sourcing Touristic POI Database – Questions Around Format, Interest

POI data points

Process

Export Formats

Recent Posts

Recent Comments

18+ Content

New Dataset here: UK Gender Pay Gap

POI data points

Process

Export Formats

Recent Posts

Recent Comments