Some Work With pandas In Python

Hi there. Here is some experimental data wrangling work with Python and the pandas package.


Sections

 


A Real Estate Dataset Example

This first example is on a real estate dataset. I import pandas and load in the data with the read_csv() command from pandas.

Now that the data is loaded into Python, I want to take a preview of the data to see what kind of variables I am dealing with.

Selecting Columns

I can select columns in pandas by specifying the column variable. (Outputs not shown to save space.)

 

Selecting Rows & Columns

In pandas, rows and columns can be selected with the .iloc() function.

 


A Plotting Example With pandas and matplotlib

This second example deals with create a pandas dataframe of favourite colours and their counts. These results are then plotted as a horizontal bar graph in matplotlib.

I can sort the table by the counts in ascending order (lowest to highest).

With some stackoverflow references, I have created a sorted horizontal bar graph which shows the results from the survey. It is important to use barh instead of bar to achieve the horizontal bars.

 

 


Data Wrangling With pandas

There are times when you need to reformat the data/dataframes into a format that is ready for plotting and/or data analysis. The pandas package contains a wide variety of functions which allow you reshape the data the way you want it to be.

The main references is the pandas Cheatsheet.

 

The melt Function

In pandas, the melt function converts the dataframe into a long format. (More rows than columns).

Subsetting Rows

 

Subsetting Variables/Columns

Summarize Data

You can gather some basic statistics about your dataframe.

Grouping Data & A Plot

In this section, I create a new list of answers which are either Yes or No to a survey question. This list will be converted into a table.

The results from the table can be plotted.


References

  • http://pandas.pydata.org
  • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.pivot.html
  • https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.plot.bar.html
  • https://stackoverflow.com/questions/32244019/how-to-rotate-x-axis-tick-labels-in-pandas-barplot
  • https://stackoverflow.com/questions/18973404/setting-different-bar-color-in-matplotlib-python
  • https://stackoverflow.com/questions/5735208/remove-the-legend-on-a-matplotlib-figure

Leave a Reply