Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Interested In Social Media Data? We Released A LLM-powered SM Digital Twin [self Promotion]

Hi everyone! Together with my research group, I’ve been working on a project for the past few months, and I think it might interest you all.

YSocial is a digital twin of a social network platform that improves the simulation of dynamic social interactions by integrating large language model (LLM) agents.

You can design your own scenario with LLM agents and describe them using multiple features, such as their political leaning, age, personality traits, interests, and more. Agents will interact on a topic of discussion (e.g., politics, sports) and according to a desired follower/content recommender system. Additionally, you can even make them discuss news extracted in real-time by RSS feeds!

For now, we have observed that many real-world phenomena emerge (without us forcing them), e.g., influencers, viral content, long-tailed distributions for users’ connections, etc.

This is just a sneak-peak of all YSocial features, you can read more on the website!

YSocial is on Github, open and free for everyone! Feel free to give us some feedback and contribute to the project. There is also a preprint available on ArXiv and a website with some pre-made scenarios you can test.

submitted by /u/Internal-Newspaper91
[link] [comments]

European Cars Data Set For School Work

I’m looking for EU (& US) cars data and specs but don’t find any dumps. Can someone have something to share on can give me some link. I feel strange that car, rim or tyre website has information but I cannot find anything. Are they all using APIs to service providers?

Even have question about VIN decoders websites, are there any files not APIs.

Basically interested in basic car data- measures, 0-100, max speed, fuel consumption etc.

I have school project that I would like to accomplish.

submitted by /u/volber
[link] [comments]

Need To Find A Twitter Dataset That Has Random Tweets Over Several Years

Hello,

I’m conducting an independent research project surveying the prevalance of hate speech in Twitter throughout the years, seeing if the ratio of hate/non-hate has increased over time and if the rise is correlated to any other long-term trends (such as the popularity of Twitter or political climate). For that, I need data over several years of Twitter so I can link longitudinal trends to hate speech data throughout time. In addition, I would want to get a randomized sample so the study has less chance for bias or error.

So, are there any publicly accessible Twitter datasets that has data over several years without any content filters? And if not, what should I do to get longitudinal data for this study?

submitted by /u/GeoZ17
[link] [comments]

I Made An Olympic Games API (json) With Real Time Data!

Hey everyone, I built an Olympics API with all the games, medals, countries, and sports that updates in real-time. In addition to the data, it also provides images of the sports (pictograms) and the flags of the countries.

If you want/can give me some feedback later:

Documentation
https://docs.apis.codante.io/olympic-games-english

Endpoints
Medals and Countries
Games with Results
Sports (with pictograms)

Repo
https://github.com/codante-io/api-service

Thanks!

submitted by /u/robertotc12345
[link] [comments]

Python Code Prompts Requesting Building Neural Networks

Hi guys!
I’m writing an academic paper on Filter Functions in LLMs.
For evaluation purposes I need to check for the ability to filter out certain code libraries, and I think the best way to do this would be to get a dataset with code requests (“hey can you write a program that does X?”), specifically requests for neural nets with pytorch/tensorflow.

Just to make clear – I do not need to train any model on these, just to run them through the LLM with/out the filter.

Example – “Hey can you build a neural network that classifies semantics of tweets?”
I don’t need anything too complicated

I’ve searched standard datasets on huggingface/google but haven’t found any with enough samples.
Any ideas?
Any help would be much appreciated and I’d love to answer any questions about the research itself.

Thanks!

submitted by /u/AltivoTheHorseX
[link] [comments]

UI For Data Enrichment With LLMs + Search

I build a system to enrich datasets I found myself doing this a lot with LLMs connected to search. ChatGPT can’t do it yet as it doesn’t ‘loop’. The functionality is basic, but it works well. You can upload a CSV, provide instructions in natural language, preview results for top X rows, process task for full dataset, download results as CSV.

Example tasks I have done:

Check if information seems to be valid based on top few search results and return a boolean Write a description of a company using LLM (+ optionally search results) Re-assign categories based on LLM

Is this of interest to anyone? Comment if so and I’ll put it online and send you a private link. Currently it uses my OpenAI API key so I would need to modify to BYOK or add billing, which I won’t bother with unless there’s interest.

submitted by /u/oacoleshill
[link] [comments]

[Request] Looking For Datasets That Compile Public-facing Statements And/or Posts Made By Politicians

I’m looking to do sentiment analysis for a project and am hoping to find a large compilation of public statements by politicians, preferably containing American and English politicians or parties. Ideal conditions would be Bay Area (CA) local, Manhattan (NY) local and London local politicians, but a by-party or full uncategorized set might do fine as well.

submitted by /u/hexahedron17
[link] [comments]