Python & other tools for data analysis

If you have been coding for some time, you’ll know that many headaches are from IDE (Integrated Development Environment) or package installation, compatibility issues, etc. I have especially horrid memories of trying to get C to work via Eclipse in my first year engineering programming course.

Luckily for us coding in Python, several distributions and IDEs get us coding more efficiently, without the hassle. However, most of us get comfortable very quickly and suddenly it is five years down the line and we are still struggling with the same old, outdated programming setup. This often means even coding with old Python releases because you know, it works.

I have been guilty of this until I found Anaconda. Without getting into it here, Anaconda is essentially an open-source distribution of Python (and R) for data analysis. Anaconda includes all IDEs, notebooks and libraries such as matplotlib, scikit-learn, pandas, Jupyter, Tensor Flow, and many others.

I usually prefer an IDE with a simple text editor, such as Spyder. However, there are some cases where I need more from my programming toolkit. When working with masses of data I have found the Pandas library an invaluable tool. In conjunction with this library, Jupyter is often used for quick visualisation and prototyping.

Spyder

Spyder is very intuitive and perfect for your data science needs. Here is a comparison of some of the most popular Python IDEs. Here you can find an introduction to working in Spyder and the editor shortcuts you will need.

Using Anaconda Prompt, you can install Spyder via command line:

conda install -c anaconda spyder

Jupyter

Unlike Spyder, Jupyter is an IPython (Interactive Python) notebook. Jupyter allows you to run individual blocks of code, access Markdown and LaTeX, and is perfect for prototyping smaller projects. Look here for a great introduction to using Jupyter.

Installing Jupyter via Anaconda Prompt:

conda install -c anaconda jupyter

Pandas

I’ll be talking a lot about Pandas. It is a powerful data science tool, but if you use it improperly, you’ll most likely end up working less productive.

You can install Pandas via Anaconda Prompt:

conda install -c anaconda pandas

Matplotlib

Matplotlib is probably the most popular data visualisation (plotting) Python package. There are many examples of what and how you can plot with Matplotlib. If aesthetically pleasing interactive plots are required, you can also look at Plotly.

Installing Matplotlib via Anaconda Prompt:

conda install -c anaconda matplotlib

Numpy & SciPy

You will require the Numpy and SciPy Python packages for any mathematical analysis. Numpy you will need for any array type calculations and SciPy for statistics, integration, polynomial fits, etc.

Install Numpy and SciPy via Anaconda Prompt:

conda install -c anaconda numpy
conda install -c anaconda scipy

Here is a brilliant introductory resource to use Python for science (and engineering).

A summary of these links is available in the menu under CODING. Let us know which distributions, IDEs, toolkits and packages you are using in Python.

3 Comments

  1. Amazing issues here. I am very happy to peer your article.

    Thank you a lot and I’m looking forward to contact you.

    Will you please drop me a e-mail?

  2. Hurrah, that’s what I was exploring for, what a data!
    existing here at this weblog, thanks admin of this site.

  3. Wow that was unusual. I just wrote an very long comment but after I clicked submit my comment didn’t appear.

    Grrrr… well I’m not writing all that over again. Anyways, just wanted to say
    superb blog!

Leave a Reply to Marylin Cancel reply

Your email address will not be published. Required fields are marked *