Github is a collection of code repositories where developers, analysts and data scientists store and share their projects.
Think of it like this. When you are working on a software development project, like an app, or a data science exploratory analysis, you will make constant changes to your code as you go along. Version control lets you track the changes at each step and, if necessary, rollback to previous versions if you make a balls of what you are doing further down the line.
It’s particularly helpful when working in teams as other team members can download the latest version, make their changes and upload the new version back up to the main repository.
From Jupyter.org: “The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and much more”.
It has quickly became the #1 way for data scientists using Python to distribute and share their code and projects to a wide user base.
Github is the #1 community and repository for code-based projects in the world. There is a massive community of developers and data scientists using it to share their work and learn from other like minded, highly talented people.
Having your code base hosted and publicly shared on Github is also a good way of demonstrating to potential employers or collaborators that you have the analytical chops they need in their lives.
Have you already got a Github account? If not, why not?
Jump sideways for a second to the Github homepage and sign up for the free Personal plan.
When that’s done, come back and start up the tutorial with the rest of us existing Githubbers. We’ll be right here waiting (plays Richard Marx on Spotify in tribute).
1) OK, let’s begin. Go to the Github homepage and log-in to your account.
2) Create a new repository by clicking the big “Start A Project” button in the middle of the screen.
3) Enter a name for your new repository (I’ve went for the cunningly titled “start-scraping”), enter a short description, click Public (as we want to be able to share the notebook), tick the Initialize this repository with a README and click the big green “Create Repository” button at the bottom.
4) Click on the Github icon at the top of the screen to be taken back to the Github homepage. You should now see your newly created repository under the Repositories header. **Step 1 completed!**
5) Now we’re going to upload our sample notebook to our new repository. If you don’t have a sample notebook (why not? What did you think this tutorial was about?), open up your Jupyter Notebook window, click on the New button and select Python 3 under “Notebook:”. In the first cell type:
print(“This is a call to all my past resignations”)
Run the cell to check you haven’t made any syntax boo-boos then Save the notebook somewhere you can find it.
6) Go back to your Github homepage browser tab and click on the link to your newly created repository under “Repositories”. Click on the “Upload Files” button.
7) Either: a) drag n’ drop your sample notebook file into the “Drop to upload your files” box or b) click the “Choose your files” link and browse to the notebook location.
8) Any uploaded files will show under the “Drag N’ Drop” box. You can add a description of the changes you are making in the text box below the Commit changes header. I just typed “Added start-scraping notebook” and clicked the big green Commit changes button at the bottom.
9) Now we’re back on the repository main page we should see that there are now TWO files in our repository: a README and our newly uploaded sample notebook.
10) And we’re done. You’ve now added a Jupyter Notebook to your Github public repository and made it available to be shared with other analysts.