I am not sure this is kosher but it seems really interesting
submitted by /u/cavedave
[link] [comments]
Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?
I am not sure this is kosher but it seems really interesting
submitted by /u/cavedave
[link] [comments]
I needed untidy dataset.
One of the selected data sets must not follow at least of the tidy data principles. In tidy data where each variable must have its own column or Each observation must have its own row.
submitted by /u/Front-Benefit8232
[link] [comments]
Hello everyone,
I am a college student currently working on a thesis about machine learning, specifically focused on identifying Indian and Carabao mango leaves with and without anthracnose disease using a CNN model.
At this stage, I need a large number of datasets, likely 1000 and more images, from the mentioned varieties of mango. I am looking for datasets of leaves affected by anthracnose disease as well as healthy leaves from both Carabao mango and Indian mango varieties.
I am reaching out in the hope that you can help us find these datasets, as they will serve as the primary data for our thesis.
Thank you very much for considering my request.
submitted by /u/chadmomentgiga
[link] [comments]
Hi guys,
I’m currently working on a project to enhance the detection and prevention of cryptocurrency scams and phishing attempts. A crucial part of this project is identifying and analyzing scam crypto wallets that have been reported by users and security experts.
I am looking for a reliable and up-to-date dataset that contains information about cryptocurrency wallets reported as being involved in phishing or scam activities. Ideally, this dataset should include details such as:
Wallet addresses Type of scam or phishing attempt
If anyone knows where I can find such a dataset or has resources that could help, I would greatly appreciate your assistance. Open-source datasets or any repositories maintained by security communities or organizations would be extremely helpful.
Thank you in advance for your help!
submitted by /u/Funny-Accident-5612
[link] [comments]
Hello everyone,
I am currently working on a machine learning, specifically focused on identifying Philippine Indian and Carabao mango leaves with and without anthracnose disease using a CNN model.
At this stage, I need a large number of datasets, likely 1000 and more images, from the mentioned varieties of mango. I am looking for datasets of leaves affected by anthracnose disease as well as healthy leaves from both Carabao mango and Indian mango varieties.
Thank you very much for considering my request.
submitted by /u/chadmomentgiga
[link] [comments]
Hi, I am learning my companies data management system from scratch, and am trying to figure out if I copy things FROM excel INTO access in the Query section or the Table section? I am pretty sure table but want to be sure. Thanks!
submitted by /u/suzimakesthings
[link] [comments]
Hi everyone,
Maybe someone knows some open access datasets on suicides committed in the U.S. (or number of death if there is variable for the cause of death) per year (from about 2015 to at least 2020) and per state. The more addition variables there are (such as gender, age, employment status, etc.), the better.
Hope that maybe some of you have seen something of this sort🙏
submitted by /u/dollala
[link] [comments]
We are a UK FinTech company and have launched a new product that automatically extracts data (including handwritten) from 25 million filings for millions of UK companies. In addition, there are insights and easy-to-consume charts and tables. The automatically extracted data includes/ provides the following data for 2m+ private companies:
An industry-first price-per-share and last-round-valuation (market capitalisation) chart Capital structure, shareholding, and the change in shareholding Equity fundraising trends in the UK Top fundraisers and investors in the UK
I would like to hear your feedback on our UK company insights data 🙂
submitted by /u/olive_er
[link] [comments]
Hi guys, I’ve been working on a fine tuned llama3 for quite some time now and want to expand the dataset. Are there any good automated solutions to generate these datasets from pdf or html and can these be augmented automatically?
Thanks so much in advance
submitted by /u/OkVegetable2512
[link] [comments]
I’m selling a high quality dataset that includes(Email address, Full Name, Phone number, Age, Location(country), Gaming Platforms Owned (e.g., PC, PlayStation, Xbox, Android, etc.), etc.)
Price: $1.20 per individual ($120 total)
Format: CSV, Excel and PDF
Delivery: Secure download link or Direct file
DM If you are interested
submitted by /u/Money_Ad3408
[link] [comments]
I would like to invite all of you kindly visit, open and upvote this dataset.
If you found it valuable then download it and leave a comment.
Your support and appreciation means a lot.
Link: https://www.kaggle.com/…/uk-gender-pay-gap-data-2018-2023
submitted by /u/Umer_Haddii
[link] [comments]
Hey,
Does anyone know how I can obtain financial data of football (soccer) clubs?
I need it for the smaller clubs in Europe as well, not only the top clubs, and for as many years as possible.
Any thoughts?
Thanks!
submitted by /u/Porcoddio45
[link] [comments]
Hi, I wanted to ask if anyone has open data sets with features that can be used to predict consumer credit trends, including demographic information, financial behavior, and transaction history. I’ve been looking for a few hours but can’t find a good data set.
submitted by /u/ReplyConscious1561
[link] [comments]
Hey, I’m trying to find a dataset that contains lyrics and the song structure, exactly like https://genius.com
For example:
[Intro]
Psst, I see dead people
(Mustard on the beat, ho)
[Verse 1]
Ayy, Mustard on the beat, ho
Deebo any rap nigga, he a free thro
Genius doesn’t allow scraping or the usage of his data for commercial use
Except as expressly authorized by Genius in writing, you agree not to modify, copy, frame, scrape, rent, lease, loan, sell, distribute or create derivative works based on the Service or the Genius Content, in whole or in part, except that the foregoing does not apply to your own User Content (as defined above) that you legally upload to the Service. In connection with your use of the Service you shall not engage in or use any data mining, robots, scraping or similar data gathering or extraction methods. Any use of the Service or the Genius Content other than as specifically authorized herein is strictly prohibited. As between you and Genius, the technology and software underlying the Service or distributed in connection therewith is the exclusive property of Genius, our affiliates and our partners (the “Software”). You agree not to copy, modify, create a derivative work of, reverse engineer, reverse assemble or otherwise attempt to discover any source code, sell, assign, sublicense, or otherwise transfer any right in the Software. Any rights not expressly granted herein are reserved by Genius.
Do you know any other source of data that contains the lyrics and the song structure (chorus, verse, etc)? I want to fine-tune whisper to transcribe lyrics with these tags for a commercial product (music generation model).
I think that suno.com has used genius.com for their music model because they use the same tag for song structure xD.
submitted by /u/Which-Breadfruit-926
[link] [comments]
Hi,
I’m looking for online user activity data with ananomyzed emailid… can somebody point me to right contact, please
submitted by /u/Winter-Breadfruit943
[link] [comments]
Hey scientists!
I’m working on cooldata, I’d like to build a more useful way to access open data online.
What are the best resources you use everyday (data.gov, etc…)? And more importantly why do use them and how?
I’m starting this by myself as a 20% personal project, the goal is to be fully open and maybe also open source as the thing moves on. (If anyone wants to apply to contribute I’m happy to listen! just send a dm)
Have a nice day!
submitted by /u/antonscap
[link] [comments]
It has been a while (10yrs) and I can’t figure out how to do a join of several tables using date/time in Tableau Public. Backstory; I have a annoying health condition (SIBO) that is starving my body of nutrients and I am trying figure things out by tracking methane, hydrogen, food intake, meds, symptoms, etc.
https://public.tableau.com/app/profile/mfinaly/viz/SmallIntestinesBacteriaOvergrowth/TrackingmySIBO
submitted by /u/Immediate_Ad3066
[link] [comments]
There is very little Irish language text, audio and english translation. One of the best sources is this soap opera
It is fairly easy to find the url of the subtitles when on that webpage manually
But the vtt URL uses UUIDs that seem pretty random
There are subtitle archive sites but this soap opera is not there. So how would you extract a few hundred sets of VTT files (I want to build NLP datasets , ngrams etc, not make money or anything).
I can imagine answers of
With this site you can hire someone and if you show them the steps they can extract them for you cheap
With this mouse emulator you can do it by XYZ
There is away around the UUIDs being random by XYZ
But I do not know how any of these would actually work.
submitted by /u/cavedave
[link] [comments]
hello everyone, thank you for reading this post. Like the title says I’m looking for a dataset experimental one about bacterial growth per time (if you have the protocole it would be better but a real one would be awesome and the source). I try to simulate a bacterial growth model and trying to compare to a real one Ty for your attention. All the best for everyone <3
submitted by /u/Fickle_Buy7668
[link] [comments]
Hi everyone,
I am looking to analyze browsing data holistically, so I would like to understand what pages users visit. Best would be search history data from browsers. It would be great if it was recent too (2021-2024). Does anyone know of anything like that? I am a PhD student so I only have limited budget.
Thank you in advance!
submitted by /u/KeyScale1232
[link] [comments]
Hi, I am wondering if anyone has any idea if there is a dataset for images of bodybuilders/fitness models, I have been looking all day online and haven’t found a single dataset dedicated to it. Thank you!
submitted by /u/ZavierTheSavior
[link] [comments]
(what’s this link thing?) Hello folks, I need ideas of datasets that I can use for a data analisys for my college. I thought about the relation between more developed countries x unemployment or a dataset that I found that contained a study about what may be the most commom way to study a subject and if it’s effective or not, however I couldn’t find the source of the data so if you guys could help me find these or maybe give me some better ideas I would thank a lot
submitted by /u/vitstola
[link] [comments]
Does anyone have data or a source showing how much greater federal investment in highways was compared to public transit between 1960 and 1980, on average?
submitted by /u/Cpwkid
[link] [comments]
Hi all, I am working on a research project and require pictures of the moon with the dates those pictures were taken on.
Any kind of pictures of the moon with dates would work. Even better if instead of dates it would say what day of the lunar month the picture was taken on.
Thank you in advance!
submitted by /u/Apprehensive-Web5650
[link] [comments]
I am in search of a dataset of movies along with images of the title screen. There exists this dataset https://www.shillpages.com/movies/index2.shtml
However this is getting outdated and doesn’t have a lot of data to work with. Does anyone know of a movies dataset that also contains images of the title screen?
submitted by /u/JadyBray27
[link] [comments]
We’re planning to open source our touristic POI Database (currently 1.4 Million points worldwide). There is some effort involved in generalizing it from our internal format so I wanted to confirm that a) there is interest in it as well get some feedback on the format. I’ve also outlined the process of creating/ updating the dataset, as it gives some insight what to expect from the dataset and if it interests anyone, probably the people in this sub.
Location (mandatory) Category (mandatory, more on that later) Name Images ( designated thumbnail with blur hash, all with (permissive licensing information) Localizations (consisting of a name, teaser and description in one of the supported languages, availability depends) Rating (mandatory, more on that later) Source (mandatory, such as Wikidata, OSM, tourism council etc.) Type (most POIs are individual sights but „special“ POIs such as places ie cities/towns exist ) Parent (if it exists, a „special“ poi such as a city or town ) Links/References (links to Wikidata entity, Wikipedia/Wikivoyage articles in different languages but also links to social media (fb, ig, twitter etc.), booking sites (agoda, booking, hotels.com etc. ) or relevant 3rd party sites such as Trip Advisor, Atlas Obscura etc.. Misc. Properties: Webaddress Telephone Zip Code Opening Hours Heritage Designation (UNESCO, UK Grade I building ) etc. More depending on the source
We derive our content from many different sources, some of them we simple map to the above format (especially those derived from regional or country level Tourism councils ). The bulk is however combined from Wikidata, Wikipedia, Wikivoyage and OpenStreetMap in the following manner.
Process the complete Wikidata Dump, filtering out all entities that possess a geocoordinate and an instance of-claim. The instance of claim is then checked against a list of touristically relevant classes. Note: This claim can be very specific such as olive sand beach or agricultural theme park so that we expand our list of touristically relevant classes (ie beach and amusement park) to include the descendant subclasses. We get a lot of structured information from this source (especially links to other sites) but little in description, images etc. Process all linked articles in the different language versions of wikipedia/wikivoyage (at the moment we look at the English, German, French, Spanish, Italian, Portuguese and Polish sites). Extract teaser and shorter excerpts for descriptions (Localizations) as well as images with their respective licenses. Clean-Up low quality & unspecific images Assign Parents depending on the “located in adminstrative Region” – claim to “special” POIs (cities, towns), the assigned pois then form an area that are used to assign further Pois in that area to the same parent.
Two things would require some work: category and rating. We map information from sources to an internal category representation. It is binary, fast to filter with bit masks but not very flexible and probably not that easy to use. For the open source version I was thinking of creating a taxonomy somewhat similar to the one Foursquare uses but other suggestions are appreciated.
The rating combines a somewhat objective data quality rating (amount of images, links to wikipedia articles, length of descriptions etc., types of properties present) with a biased weighting of categories (among other information) that fits our use case. We also use user reviews/rating but that wouldn’t be part of the dataset. We could use a slightly more generalized aggregate rating and/ or different rating components but more likely than not you would want to use your own weighting if your use case is sufficiently different, so I guess I am wondering what expectations or requests there are here.
TSV and GeoJSON Feature Collections but open to suggestions.
submitted by /u/berlumptsss
[link] [comments]