I’m looking for a dataset/corpus containing linux shell command inputs and system outputs. I’d really appreciate any kind of help
submitted by /u/JamesAntoni
[link] [comments]
Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?
I’m looking for a dataset/corpus containing linux shell command inputs and system outputs. I’d really appreciate any kind of help
submitted by /u/JamesAntoni
[link] [comments]
Hello,
I am looking for a dataset where I can see the gold prices since there are historical gold prices (~1980) at least daily exactly in Euro and USD. I have so far found only those that give me an average for the whole year.
submitted by /u/God_Enki
[link] [comments]
Is there a place that has compiled standardized test score information for all public schools in the U.S.? I know that each state has this information, but I am wondering if there’s a central place where all of this might be available.
submitted by /u/jyddyj20
[link] [comments]
I’m participating in a study on ways in which different people write their thoughts, lecture notes, reminders, and other short-form texts that are usually not meant to be shared.
Does anyone know of datasets that could be helpful here? One of our goals is to do some clustering analysis and determine the main “forms” of notes people use. We also want to find out how often people write multiple notes related to the same topic and obtain other interesting results.
Any suggestions are appreciated!
submitted by /u/smthamazing
[link] [comments]
hi all, could i please get some help in finding the LCOE for diff generating technologies( both fossil and RE) in Europe for the past decade.
thankyou!
submitted by /u/one100eyes
[link] [comments]
Does anyone know the underlying data source for website Allbiz.com?
submitted by /u/HereToLearnArt
[link] [comments]
As a novice in the field of Artificial Intelligence and Machine Learning, I would appreciate some guidance on the various platforms that professionals use to acquire datasets for training/fine tuning their models with images and videos.
submitted by /u/akanshtyagi
[link] [comments]
I’m looking for any dataset that contains social network texts labeled as political or non-political.
Any help?
submitted by /u/bigbrainjune
[link] [comments]
Hi, I just wanted to ask if you know any datasets that would be suitable for the models I wrote about in the title? I would want R2 to be high and also that, for example, during moderation, the moderator and the variable should affect Y.
submitted by /u/9Black_Rabbit8
[link] [comments]
With book bans rising in popularity The Marshall Project compiled a list of 50,000 titles that are banned in 19 states. They’re currently cleaning some additional lists from other states to add to the data.
(Un)surprisingly, Florida bans the most titles at over 20,000 Georgia bans the least at 28 If a reason is given, it’s hard to wrap your head around how something like Coding for Parents could pose a threat to security (Wisconsin)
Source: https://www.themarshallproject.org/2022/12/21/prison-banned-books-list-find-your-state
View the Data: https://app.gigasheet.com/spreadsheet/Banned-Books-in-U-S–Prisons/7b6b282b_a6d1_48bc_9df2_71b27f9ab107
submitted by /u/Adorable-Kitchen-919
[link] [comments]
I am trying to replicate the model in this paper and to make sure it works I would like to apply it to the same data as in the original paper
There are two datasets used, one is the weekly prices of 5 NYMEX crude oil futures contracts from 1/2/1990 to 2/17/1995. The paper says that these were made public by Knight-Ridder Financial, a company that has ceased to exist since.
The other one is a set of crude oil prices by Enron Capitial, a company that has also ceased to exist since.
I doubt I could obtain the second dataset but I was wondering if anyone had any suggestions on where I could find the first dataset by Knight-Ridder Financial. I have tried accessing their website through internet archive but I wasn’t able to find anything on there, nor was I able to locate the original publication.
Bloomberg is not an option for me right now either.
Full reference: Schwartz, E. and Smith, J.E., 2000. Short-term variations and long-term dynamics in commodity prices. Management Science, 46(7), pp.893-911.
submitted by /u/horux123
[link] [comments]
My math class has an end of year coding project which uses the basic plotting tools in pandas to analyze and review a dataset of my own choosing. Im pretty okay at coding and i wont struggle to set up everything once i have it planed out. Problem is, all my classmates have picked the cool stuff like weatherpatterns, tempareture changes with correlation to co2 increase and other easy targets.
I would like to stand out a bit. Do you have an interesting dataset that i can use in pytjon without doing any sorting for, outside of the obvious x and y values? I am not an expert at dataset analysis so i can exclusively use pandas and i can only use datasets stored as .csv files.
Im getting slightly stressed over this project as the deadline is creeping closer and closer. So if you have an old coding poject from your class where you learned about comparing graphs and looking for correlations. It would be a huge help to give me some help here.
submitted by /u/RevolutionaryAd4161
[link] [comments]
I need a data set for loading into QGIS to plot the coverage area of different cellular network standards in a particular area. (Chennai, India)
Any idea where to look?
submitted by /u/Awkward_Smile7
[link] [comments]
Where can I find traffic data that gives data like the amount of traffic on a route , I would require data spanning atleast across 1-2 years and preferable 2021-22 or 23
I require the dataset for a traffic prediction model
submitted by /u/akkaaaaaashhhh
[link] [comments]
Hey guys, stumbled upon this sweet dataset the other day. You can export it to KML for some serious parsing and analysis. It’s the crowd-sourced geolocation of every damn corn on the cob vendor in Mexico City! How cool is that? I challenge y’all to train a neural network on it and see what kind of insights you can get. Let’s get cracking, folks!
submitted by /u/JulieJas
[link] [comments]
I’m interested in datasets containing number or percent of Black voters, registered voters, voting rates, etc.
submitted by /u/captainschnarf
[link] [comments]
Hi everyone.
I was wondering if anyone on this sub had experince working with/downloading solo and duoque League of legends data. Is it possible to export from na.op.gg or maybe riot has an API I get it from.
Ideally I would like to wrangle the data in a way where I could separate my soloq games from my duoq to get some stats and expose my duoq partner.
Anyone have experince with this or think its possible?
EDIT: I can use python with things like pandas, numpy etc for some simple data wrangling and analysis.
submitted by /u/ebscodingjourney
[link] [comments]
The issue with the famous 429 when mass scraping google trends seems to have me stuck. I have a list of around 30k keywords I want data on, but don’t want to wait for the timeouts.
I’m using pytrends and have tried using rotating proxies but the high traffic seems to bring my costs up way too high when renting those. I tried multiprocessing using unique Tor circuits for each keyword, but I seem to get authentication errors from google, which seem to get sorted out by including some identity headers, which quickly become invalid due to rate limiting.
Does anyone have a workaround/working code for this? Multiple Google accounts with programmatic login and getting the headers from there, followed by injecting them into pytrends requests? I’d be grateful if you could share your experiences. Thanks!
submitted by /u/thefoque
[link] [comments]
For those unfamiliar with it, BOINC is the Berkeley Open Infrastructure for Network Computing. It is a free software and volunteer computing infrastructure focused on science with over 15 active projects. There are teraflops of computing power available to you for absolutely free. If you are working on problems that can be done in a distributed or parallel matter, YSK about it.
The BOINC server software works with any app you have (such as a protein simulator), and can handle all the workunit creation/delivery/validation. You can run the server as a docker container and distribute your app as as pre-compiled binary or inside a virtualbox image to instantly work across platforms. BOINC not only supports 32 and 64-bit Windows/OS X/Linux hosts, but ARM and Android as well. And it supports GPU acceleration as well on both Nvidia and AMD cards. It’s also open-source so you can modify it to suit your use case. For small projects, you can run the BOINC server on a $10/month VPS or a spare laptop in a closet for larger projects obviously the memory and storage needs will scale with complexity.
Once you have your server up (or beforehand, if you need to secure a guarantee of computation before investing development resources), you can approach Science United and Gridcoin for your guaranteed computation (“crunching”). Neither of these mechanisms require you to be affiliated with a university or other institution, they just require that you are doing interesting scientific research.
Science United is a platform run by the BOINC developers which connects volunteer computing participants to BOINC projects. Once they add you to their list, thousands of volunteers around the globe will immediately start crunching data for your project giving you many teraflops of power. Science United is particularly good for smaller projects which don’t have large, ongoing workloads or have sporadic work.
Gridcoin is a cryptocurrency (founded 2013, not affiliated with the BOINC developers) which incentivizes people to crunch workunits for you. They currently incentivize most active BOINC projects (with their permission) and hand out approx $500 USD equivalent in incentivization money to your “crunchers” monthly. The actual value of the computation you receive is much higher than this. All of this happens without you ever needing to do anything aside from have a BOINC server. There are some requirements you must meet such as having a large amount of work to be done (be an ongoing project), but they can direct petaflops of power your way and have a procedure to “pre-approve” your project before it’s done being developed.
BOINC can also be used to harvest under-utilized compute resources on your campus or in your company. It can be installed on platforms and set to compute only while the machine is idle, so it doesn’t slow it down while in use.
Famous research institutes and major universities across the world use BOINC. World Community Grid, the Large Hadron Collider, Rosetta, University of Texas, and the University of California are a handful of the big names that use BOINC for work distribution.
Relevant links:
submitted by /u/makeasnek
[link] [comments]
Hi. Long time lurker here.
Was wondering if there are open source data sets as stated in the title for South America, Africa and Asia.
submitted by /u/saintisstat
[link] [comments]
Looking for a dataset with MRIs or any other type of imaging, but also some notes (so not just classification) – either description of what’s happening on the image or a bit more elaborate diagnosis.
submitted by /u/Altruistic_Carrot_34
[link] [comments]
If you’re looking for reliable and up-to-date information on civil aviation accidents and incidents, the Aviation Safety Network (ASN) dataset may be just what you’re looking for. This global database has information on more than 100,000 accidents and incidents that happened since 1919. You can download the dataset stored in a csv file format for further analysis. The csv file has the following columns:
Date – Date of the accident Type – Type of aircraft registration – Registration of the aircraft operator – Operator of the aircraft fatalities – Number of fatalities location – Location of the accident country – Country of the accident cat – Category of the accident described by ASN year – Year of the accident
It is available for download at the below Github link:
https://github.com/alsonpr/Aviation-Safety-Network-Dataset
submitted by /u/woolly-mamoth
[link] [comments]
I’m looking for datasets on venture capital, startups and private equity for a few reasons but primarily just personal interest. As a result I’m not looking to spend a whole lot. Any suggestions?
submitted by /u/AndreeSmothers
[link] [comments]
Does anybody know where I can find a table of the trains running in Germany, and or where the delays and cancellation are?
If not available, a dataset on other national train lines would be nice too.
submitted by /u/imaris_help
[link] [comments]
There used to be a site thispersondoesnotexist.com which generates AI generated (GAN created ) artificial human face images . ( Originally a project done at NVDIA ) . That site has been replaced by another one – https://this-person-does-not-exist.com/en ) which has watermarks etc .
Does anyone have the dataset of those AI generated images ? (1024×1024 px ) . I found a few on kaggle datasets , but they are not of the same resolution of the original images that were generated by the site. If so, can you please share the links to the dataset ?
submitted by /u/pythoslabs
[link] [comments]
I’ve 10 numerical and large datasets where each has 3 generic categories. Each row contains unique data. The end row of each dataset contains the labels for each category. The category is not distinct thus other row may refer to any of the 3 categories.
e.g.
Date Value Category 1/1/2010 1.11111 Alpha 2/1/2010 2.11111 Beta 3/1/2010 2.00009 Alpha 4/1/2010 0.00000 Charlie
But the 10 datasets have different volume of data. E.g. dataset A may have 10K rows, dataset B around 100K, Dataset C 1 million, etc.
I couldn’t process all the data as its too large.
What would be the best way to sample each dataset? I’d like the sample containing a fair representative of the 3 categories.
submitted by /u/runnersgo
[link] [comments]
Hi!
I am trying to find routes taken by public transportation in major cities of Europe along with the stops in the routes.
Can anyone help me find some data sources for these? Thanks.
submitted by /u/DiabolicDiablo
[link] [comments]
Does anyone know where I would find federal grant success rate data? I need some insight into how competitive it is. If anyone has a dataset or relevant data let me know please!
submitted by /u/DetachedOptimist
[link] [comments]
Looking for public dataset with every school district in the country with associated superintendent, contact information, size, etc.
submitted by /u/c391112
[link] [comments]