Pandas: Need to knows

If you are using Python in the postgraduate research space, then chances are you will need pandas. There are many resources and tutorials out there that will give you an introduction to pandas, or even time series analysis with pandas. There are, however, things you should know how to do in pandas that no one tells you about.

Display all rows/columns of the data frame

In most cases, your data frame (df) will contain thousands of rows and in most consoles only a contracted part of your df will display.

You can display all rows:

import pandas as pd
pd.set_option('display.max_rows', None)

Similarly, you can display all columns:

pd.set_option('display.max_columns', None)

Remember, your console may also limit the number of lines buffered. Spyder IDE (via Anaconda distribution) has a default configuration of 10,000 lines. You can change this setting in Tools > Preferences > IPython console > Source code > Buffer as shown below. Restart Spyder to apply changes.

Setting the buffer size in Spyder.

You can also display a specific number of rows or columns, the justification of headers and several other display settings for pandas. See the full list here.

Fill missing index datetimes

In general, working with datetimes is a beast on its own. We’ll later get into time zones, curve fits of datetimes and plotting options. However, the first thing is to ensure that missing time steps are filled.

Here is the code using minute averaged solar radiation data for 2019 from SAURAN for Stellenbosch University, and we’ll unpack after:

import pandas as pd
import datetime as dt
dateparse = lambda x: dt.datetime.strptime(x, '%d/%m/%Y %H:%M:%S')
df = pd.read_csv('Data/SUN_Minute.csv', skiprows=1, parse_dates=['TmStamp'], index_col='TmStamp', date_parser=dateparse)
print(df.shape)
r = pd.date_range(start='2019-01-01 00:00', end='2019-12-31 23:59', freq='min')
df = df.reindex(r).reset_index().set_index('index')
print(df.shape)
  • Line #1: we simply import our pandas library.
  • Line #2: we import the Python standard datetime class since strptime() from the pandas class, as used in line #3, was recently deprecated.
  • Line #3: we indicate the format of the date and time in the solar radiation data file that we are reading in line #4.
  • Line #4: The file imported is called SUN_Minute.csv and is located in a folder Data. Set ‘TmStamp’ as the index column.
  • Line #5: output: (508503, 15) The df has 508503 rows and 15 columns.
  • Line #6: Since in this example we have a single CSV file containing an entire year we can use date_range() to indicate what we expect the index should contain and at what frequency the discrete measurement points are.
  • Line #7: Without .set_index('index') the df will have no index.
  • Line #8: output: (525600, 15) The df now has 525600 rows and still 15 columns.

So the original output before the filling the missing time steps is:

2019-06-29 13:03:00  1464016     571.038100  ...   9.722885  2.161459
2019-06-29 13:04:00 1464017 466.012900 ... 9.345224 2.058094
2019-06-29 13:05:00 1464018 784.429500 ... 10.609600 2.433831
2019-07-05 01:05:00 1471922 0.000000 ... 0.000000 0.012182
2019-07-05 01:06:00 1471923 0.000000 ... 0.000000 0.012182

The resulting df now have a datetime index for every minute within the year range but all other missing values were set to NaN (Not a Number):

2019-06-29 13:03:00 1464016.0 571.038100 … 9.722885 2.161459
2019-06-29 13:04:00 1464017.0 466.012900 … 9.345224 2.058094
2019-06-29 13:05:00 1464018.0 784.429500 … 10.609600 2.433831
2019-06-29 13:06:00 NaN NaN … NaN NaN
2019-06-29 13:07:00 NaN NaN … NaN NaN
2019-06-29 13:08:00 NaN NaN … NaN NaN

The missing data could also have been replaced with zeros (or other values) by adding fill_value in Line #7: .reindex(r, fill_value=0).

Avoid Pythonic loops

Pandas is an extremely powerful tool however, misuse will cause frustration and excessive code execution times. Applying the usual Python loops is possibly contributing to your misery.

There are many optimised solutions to looping through a df and here is a brilliant guide and here are some more options. In summary:

  • Loop with iterrows(): less efficient and therefore slower than the other options but feels quite Pythonic, especially if you are newer to Python.
  • Loop with itertuples(): very similar to the previous option but faster.
  • Loop with apply(): usually more efficient than the previous options and a useful solution when grouping is required.
  • Vectorisation with pandas or numpy: both are faster than the previous options, with vectorisation with numpy often the fastest solution.

The speed of these solutions are dependant on the type of function executed “within” the loop, however, vectorisation seem to be the fastest solution in general. I have often found that my choice is more based on the function, i.e. do I need to group or resample before looping. Most important though, avoid Pythonic looping.

Let us know if you have any other need-to-knows for pandas so that we all may conquer our unoptimised pandas code.

33 Comments

  1. Anonymous says:

    Please share more like this? It’s proven very useful for me.

    1. I am glad you found it useful. Please let me know if you have specific questions. I will definitely post examples of plots with pandas.

  2. Anonymous says:

    So this post definitely made me think! Thank You-I wouldn’t have thought of things from your p.o.v otherwise.

  3. Anonymous says:

    I think this is one of the most vital information for me.

  4. Anonymous says:

    This is very fascinating, You’re an excessively skilled blogger.

  5. Hello this is kind of of off topic but I was wanting to know
    if blogs use WYSIWYG editors or if you have to manually code with HTML.

    I’m starting a blog soon but have no coding experience so I wanted to get advice from someone with experience.
    Any help would be enormously appreciated!

    1. Hi. This is a WordPress-based site. I use HTML to make basic changes, but there are WYSIWYG editor plugins that you could use in WordPress. Whether you use HTML or a WYSIWYG editor, WordPress, Wix or something else would depend on the type of blog and what it is required to do. Hope that helps.

  6. Very good blog! Do you have any recommendations for aspiring writers?
    I’m hoping to start my own website soon but I’m a little
    lost on everything. Would you advise starting with a free platform like WordPress or go for a paid option? There are so
    many choices out there that I’m completely overwhelmed ..
    Any ideas? Kudos!

  7. Great goods from you, man. I’ve understand your stuff previous to and you’re just extremely great.
    I really like what you’ve acquired here, really like what you’re stating and the way in which you say it.

    You make it entertaining and you still take care of
    to keep it smart. I can’t wait to read much more from you.
    This is actually a wonderful site.

  8. Wonderful, what a webpage it is! This website gives useful information to us,
    keep it up.

  9. This is really interesting, You’re a very skilled blogger.
    I have joined your feed and look forward to seeking
    more of your great post. Also, I’ve shared your website in my social networks!

  10. When someone writes an piece of writing he/she retains the image of
    a user in his/her brain that how a user can understand it.
    Thus that’s why this piece of writing is outstdanding.
    Thanks!

  11. What’s up, yup this paragraph is genuinely pleasant and I
    have learned lot of things from it on the topic
    of blogging. thanks.

  12. It’s difficult to find experienced people on this subject, however, you seem like you know
    what you’re talking about! Thanks

  13. Hi there I am so thrilled I found your blog page, I really found you by error, while I was searching on Bing for
    something else, Nonetheless I am here now and would just
    like to say kudos for a fantastic post and a all round entertaining blog (I also love the theme/design),
    I don’t have time to read it all at the minute but I
    have bookmarked it and also added your RSS feeds, so when I have time
    I will be back to read a great deal more, Please do keep up the awesome
    job.

  14. Can you tell us more about this? I’d love to find out some additional information.

  15. Hi to all, how is all, I think every one is getting more from this site, and your views
    are fastidious in support of new visitors.

  16. If some one needs to be updated with latest technologies
    after that he must be pay a quick visit this site and be up
    to date all the time.

  17. I’m very happy to find this great site. I wanted to thank
    you for your time for this particularly wonderful read!!
    I definitely appreciated every bit of it and I have you book marked to check out new stuff in your
    web site.

  18. Howdy I am so excited I found your web site, I really found you by error, while I was looking on Bing
    for something else, Anyhow I am here now and would just like to say many
    thanks for a marvelous post and a all round entertaining blog (I also love the theme/design), I don’t have time to
    read through it all at the minute but I have book-marked it and also included your RSS feeds, so when I have time I will be back to read much more, Please do keep up the fantastic work.

  19. Thank you for another excellent article. Where else may just anybody
    get that kind of info in such a perfect means of writing?
    I have a presentation next week, and I’m on the search for
    such info.

  20. Hello, I enjoy reading through your post. I like to write a little comment to
    support you.

  21. Hi my friend! I wish to say that this post is awesome, great written and include almost all
    important infos. I would like to peer more posts like this .

  22. Excellent website you have here but I was wanting to know if you knew of any
    discussion boards that cover the same topics discussed here?
    I’d really love to be a part of community where
    I can get feedback from other knowledgeable individuals that share the same interest.
    If you have any suggestions, please let me
    know. Kudos!

    1. It really depends on the type of engagement you seek. For more technical and theoretical discussion, I would suggest ResearchGate.

  23. Greetings from Colorado! I’m bored at work so I decided to check out your blog on my iphone during lunch break.

    I enjoy the info you present here and can’t wait to
    take a look when I get home. I’m amazed at how fast your blog loaded on my phone ..
    I’m not even using WIFI, just 3G .. Anyways, very good blog!

  24. I really like what you guys are usually up too. Such clever work
    and reporting! Keep up the excellent works guys I’ve added
    you guys to blogroll.

  25. wonderful publish, very informative. I’m wondering why the opposite specialists of this sector do not notice this.
    You must proceed your writing. I’m confident, you’ve a great readers’ base already!

  26. Great article, totally what I needed.

  27. Great post. I was checking constantly this weblog and I am
    impressed! Very useful info particularly the ultimate part 🙂 I deal with such info much.
    I was seeking this particular info for a long time.
    Thanks and best of luck.

  28. Excellent web site you’ve got here.. It’s hard to find quality writing like yours these days.
    I seriously appreciate people like you! Take care!! asmr 0mniartist

  29. Have you ever considered creating an ebook or
    guest authoring on other blogs? I have a blog centered on the same subjects you discuss and would
    love to have you share some stories/information. I know my visitors would appreciate your work.
    If you are even remotely interested, feel free to shoot me an e mail.

  30. What’s Taking place i am new to this, I stumbled
    upon this I have discovered It absolutely useful and it has aided me out loads.

    I am hoping to contribute & assist different users like its
    helped me. Great job.

Leave a Reply to Regan Cancel reply

Your email address will not be published. Required fields are marked *