Hi guys,
My typical approach when creating portfolio projects is finding a public dataset online (which most are already cleaned, etc. and ready to go). I then come up with specific problems I would like to investigate. I write SQL queries to solve these problems. I then visualize the solutions on a Tableau dashboard to tell a story.
Every job is different but I assume that most will require you to Join multiple tables together prior to analysis. The issue i’ve come across during portfolio creation are that most datasets that are publicly available online are already put together.
I’ve come up with the idea of finding two completely unrelated datasets and trying to join them together with a common column but completely struggle with execution due to the complexity of the datasets and a common column not always being available. Ex: Amazon package delivery speeds vs weather and joining on DATES.
I know what joins are and can solve easy to maybe medium SQL leet code Join questions with not that much difficulty but completely struggle with the hard problem as well as my scenario in the prev paragraph. So few questions:
How important are demonstrating that you know joins in a data analyst portfolio for entry level roles? Aka showing the sql code of joining 2+ tables and doing your analysis on that?
if it is needed, how can i demonstrate this? I struggle with joining two completely unrelated datasets together. Is there a better way to do this while still showing that i know joins or should i just keep on doing analysis on fully completed datasets that are already available online?
Thanks so much, greatly appreciate any advice I can get in regards to this!! Located in big city in midwest, USA btw.
submitted by /u/believeinriven
[link] [comments]