Posted in Data
Choosing which database system to use for your early forays into the world of data analysis can be a foreboding task at the best of times.
While there really isn’t a wrong answer, you can set yourself up for an incredibly steep learning curve that can put the prospective analyst off before they’ve even had a chance to get the training wheels off.
Where To Begin?
My contention in my “What Should An Entry Level Data Analyst Know?” post that relational databases are still the best way to get up to speed quickly and have the most potential value in your career is borne out by the highly regarded DB-Engines.com rankings for 2018.
Six of the top seven Database Management Systems (DBMS) are Relational Databases with only MongoDB sneaking in at #6 to spoil the Relational DB party.
With that in mind, let’s take a quick look at the top three most readily available Relational DBMS that you should be installing and getting to know right now.
Second only to parent company Oracle’s main commercial RDBMS on the DB-Engines Rankings, MySQL is the world’s most popular open source database. Used extensively by tech luminaries such as Facebook, Twitter and Youtube, MySQL provides the database backend for many of the web applications you will come across online.
Reliable, quick and easy to get into, you can download the MySQL Community Edition and install it on your local dev machine or find it pre-installed as standard on many popular shared and VPS hosting providers.
I’ve used MySQL for years as the standard backend for WordPress and administered it with a minimum of fuss through the web-based front end phpMyAdmin and their own MySQL Workbench environment.
Download MySQL here.
Another open source RDBMS that has came on in leaps and bounds over the past few years is PostgreSQL. While not quite as widespread and popular as MySQL, it’s development team do pride themselves on their standards compliance and strong conformity to the ANSI-SQL:2008 standard.
Mightn’t mean a lot to the newbie but that, along with it’s high performance and reliability in high volume environments, have seen it make real inroads into the web application space.
Best of all for those looking to get a start, you can get PostgreSQL for most operating systems, from BSD, Linux, MacOS, Solaris and even Windows.
Again, it’s becoming more widespread on all levels of hosting package as well which saves even having to install it yourself.
Download PostgreSQL here.
3) Microsoft SQL Server
Microsoft have been pushing their SQL Azure cloud database service but it’s still extremely hard to look past their dogged old RDBMS workhorse, SQL Server. I’ve been using it since I started out nearly 20 years ago on SQL Server 7.0 and the whole ecosystem has continued to come on strongly over recent years.
Available on Linux, Docker and MacOS as well as it’s native Windows, you can get a free trial version, a full free Developer edition or the smaller Express edition to get yourself up and running.
Commercial licenses are still a costly alternative to open source options like MySQL and PostgreSQL but there is little argument about the power and quality of the connected tools like Management Studio, SSIS, SSAS and SSRS.
Download SQL Server here.
Which Should I Choose To Best Help My Career?
If you are using any of these three to build up some skills in setting up tables and schemas, building some ETL routines to load data then doing some basic reporting you will find those skills transfer easily across all three.
For websites and working online I’ve predominantly used MySQL over the years.
In industry, earlier in my career especially, I used SQL Server a lot and again a few years ago when I built a sports betting analytics system.
The exact tool or DBMS isn’t really the point at this stage though, it’s the skills and basic understanding of how relational databases work that you need to get more experience with and I’ll cover more of that in future posts.