How to reshape and subset data with pandas functions?

  Pandas, Python, Worknotes

It is easy to change the layout, sort, reindex, rename, and subset table data using pandas commands. The simple code below shows you how to do this easily. The functions include melt(),pivot,sort_values,rename,sort_index,drop,filter,query,iloc,loc,iat,at and drop_duplicates.

Prepare data


#dowonload https://github.com/ziwangdeng/Data/blob/main/Vancouver_weather2010to2019_v00.csv
import pandas as pd
df=pd.read_csv('Vancouver_weather2010to2019_v00.csv')
cols=df.columns
df1=df[cols[:10]]
df2=df[cols[10:]]
ll=len(df)
df3=df.head(5000)
df4=df.tail(ll-5000)
df.columns
Index(['Unnamed: 0', 'Longitude (x)', 'Latitude (y)', 'Station Name',
       'Climate ID', 'Date/Time (LST)', 'Year', 'Month', 'Day', 'Time (LST)',
       'Temp (°C)', 'Temp Flag', 'Dew Point Temp (°C)', 'Dew Point Temp Flag',
       'Rel Hum (%)', 'Rel Hum Flag', 'Precip. Amount (mm)',
       'Precip. Amount Flag', 'Wind Dir (10s deg)', 'Wind Dir Flag',
       'Wind Spd (km/h)', 'Wind Spd Flag', 'Visibility (km)',
       'Visibility Flag', 'Stn Press (kPa)', 'Stn Press Flag', 'Hmdx',
       'Hmdx Flag', 'Wind Chill', 'Wind Chill Flag', 'Weather'],
      dtype='object')

Collect columns into rows


pd.melt(df).sample(n=10)

Spread rows into columns


df.pivot(columns='Month', values='Temp (°C)').sample(n=10)

Append rows of DataFrames


pd.concat([df3,df4])
df3.append(df4,ignore_index=True)

df3.shape
(5000, 31)
df4.shape
(83392, 31)
pd.concat([df3,df4]).shape
(88392, 31)

Append columns of DataFrames


pd.concat([df1,df2], axis=1)

df1.shape
(88392, 10)

df2.shape
(88392, 21)

pd.concat([df1,df2], axis=1).shape
(88392, 31)

Drop columns from DataFrame


df.drop(columns=['Unnamed: 0', 'Longitude (x)', 'Latitude (y)', 'Station Name'])

Subset columns of DataFrames

Select columns whose name matches regular expression regex


df[['Longitude (x)', 'Latitude (y)', 'Station Name']]
#Select columns whose name matches regular expression regex
df.filter(regex='^T')

Select columns using iloc



df.iloc[:, [0, 10, 20, 25]]

Select columns between 2 columns


df.loc[:, 'Time (LST)':'Dew Point Temp Flag']

Order rows by values of a column


df.sort_values('Temp (°C)')
df.sort_values('Temp (°C)', ascending=True)
df.sort_values('Temp (°C)', ascending=False)

Subset rows

Extract rows that meet logical criteria.


df[df.Year > 2019]

Remove duplicate rows


df.drop_duplicates()

Select rows using query


df.query('Year >2019')

df.query('Year > 2019 and Day < 10')

Select rows based on index


df.iloc[10:20]
df.iloc[:10]
df.iloc[10:]
df.iloc[[1,5,100]]

Subset rows and columns


df.loc[df['Year'] > 2019, ['Time (LST)','Dew Point Temp (°C)']]

Access single value by index


df.iat[10, 5] 
df.at[4, 'Year']

Set index with a column


df.set_index('Date/Time (LST)')

Set MultiIndex for Pandas DataFrame


df.set_index(['Date/Time (LST)','Weather'])

Sort the index of a DataFrame


df.sort_index()

Reset index of DataFrame to row numbers, moving index to columns.


df.reset_index()

Rename the columns of a DataFrame


df.rename(columns = {'Temp (°C)':'Tas','Wind Spd (km/h)':'Wsp'})