Is SQL Really A Game Changer For Data Science Careers?
By Alan Hylands — 5 minute read
I’ll nail my colours to the mast right away on this one. permalink
In my opinion, SQL is the most fundamental skill you MUST have to get started in a career in data analysis. We can argue it out until we are blue in the face over whether R is better than Python. Or whether SAS and Java are dead. Or Spark is more useful than Azure. None of that matters unless you have the lowest hanging fruit plucked and safely placed in your skills basket.
Data Scientist Jeff Hale carried out an excellent piece of analysis recently on the most in-demand skills for data scientists. He wrote it up for KD Nuggets and it’s well worth a read for both aspiring and current data analysts.
Jeff scraped job postings on LinkedIn, Indeed, Monster, SimplyHired and AngelList to see which terms were showing up most regularly for various data science related job vacancies. He searched only within the United States and did exact match searches on the terms: "data scientist" "[keyword]".
The most interesting part for me was the rankings he put together for various technology skills. It will come as little surprise to anyone who has scanned data science job vacancies in the past 6-12 months that Python, R and SQL came in 1st, 2nd and 3rd respectively.
But didn’t you say SQL was most important? permalink
You might think I’ve shot my own argument in the foot here by using a piece of research that shows SQL as only coming in third in the list. If over 70% of listings wanted Python skills and over 60% specified knowledge of R, shouldn’t we concentrate on those first? I don’t think you should. And here’s why.
SQL is easier to pick up the basics from a standing start. Even if you only learn some basic uses of the SELECT statement you will already be ahead of 99% of Excel jockeys who never bothered to move beyond copy/paste and VLOOKUPs.
SQL crosses all areas of the wider data science world. Want to be an analyst? Learn SQL to SELECT the data you need from the data warehouse. Want to be a data scientist and run machine learning? Use SQL to clean up your data before getting into algorithms. Fancy building data pipelines as a data engineer? SQL. SQL. SQL. SQL.
RDBMS databases are here to stay. There will always be a shiny new object in the database world to distract us. (I’m looking at you MongoDB.) But the vast majority of companies and online applications still run on the big three of MySQL, SQL Server and PostgreSQL. You don’t have to have worked with them to notice something jumping out of their names at you. Hint: it’s three letters long and rhymes with prequel.
There’s another element to SQL coming in third in Jeff’s analysis though and it was brought up by Kristin Kehrer in her analysis of Jeff’s analysis. (Yes I know, it’s like turtles all the way down with analysts analysing each other’s work.)
Kristin asked how important is SQL and was surprised to find it only came in 3rd. She believes that may be explained because hiring companies see SQL as being so fundamental as to be a prerequisite when hiring data scientists.
I would wholeheartedly agree. When hiring data analysts I immediately discard the CV in front of me if they haven’t demonstrated SQL knowledge. It’s not even a question of it being desirable. It’s a necessity right out of the gate.
I was lucky enough when I fell into the data world as I came from a web and software development background. I’d been building desktop and online applications in healthcare and e-commerce and a fundamental part of that was being my own development DBA.
Not only designing and developing table structures, I was writing and optimizing queries and stored procedures in both development and production stages. Small teams need full stack skills and knowing SQL was one of the top ones I relied on: day in, day out.
When I first started as a data analyst in my current workplace, I just picked up where I’d left off writing SQL as a developer. The other analysts on my team looked at me as if I was some kind of witch when I was writing sub-queries and hand-coding SQL. They were still using the drag n’ drop query designer in Microsoft Access. I didn’t know any different though and it helped me hit the ground running and make a big impact early on.
The case for the defence is almost ready to rest.
I see questions like this one asked regularly on internet forums and my answer always starts and ends the same. If you want a career in data analysis, you will be doing yourself a massive dis-service by not learning SQL. Keep building on that foundation throughout your career but please learn the basics.