How to reshape and subset data with pandas functions?

November 17, 2022 Pandas, Python, Worknotes

It is easy to change the layout, sort, reindex, rename, and subset table data using pandas commands. The simple code below shows you how to do this easily. The functions include melt(),pivot,sort_values,rename,sort_index,drop,filter,query,iloc,loc,iat,at and drop_duplicates.

Prepare data

#dowonload https://github.com/ziwangdeng/Data/blob/main/Vancouver_weather2010to2019_v00.csv
import pandas as pd
df=pd.read_csv('Vancouver_weather2010to2019_v00.csv')
cols=df.columns
df1=df[cols[:10]]
df2=df[cols[10:]]
ll=len(df)
df3=df.head(5000)
df4=df.tail(ll-5000)
df.columns
Index(['Unnamed: 0', 'Longitude (x)', 'Latitude (y)', 'Station Name',
       'Climate ID', 'Date/Time (LST)', 'Year', 'Month', 'Day', 'Time (LST)',
       'Temp (°C)', 'Temp Flag', 'Dew Point Temp (°C)', 'Dew Point Temp Flag',
       'Rel Hum (%)', 'Rel Hum Flag', 'Precip. Amount (mm)',
       'Precip. Amount Flag', 'Wind Dir (10s deg)', 'Wind Dir Flag',
       'Wind Spd (km/h)', 'Wind Spd Flag', 'Visibility (km)',
       'Visibility Flag', 'Stn Press (kPa)', 'Stn Press Flag', 'Hmdx',
       'Hmdx Flag', 'Wind Chill', 'Wind Chill Flag', 'Weather'],
      dtype='object')Copy

Collect columns into rows

pd.melt(df).sample(n=10)Copy

Spread rows into columns

df.pivot(columns='Month', values='Temp (°C)').sample(n=10)Copy

Append rows of DataFrames

pd.concat([df3,df4])
df3.append(df4,ignore_index=True)

df3.shape
(5000, 31)
df4.shape
(83392, 31)
pd.concat([df3,df4]).shape
(88392, 31)Copy

Append columns of DataFrames

pd.concat([df1,df2], axis=1)

df1.shape
(88392, 10)

df2.shape
(88392, 21)

pd.concat([df1,df2], axis=1).shape
(88392, 31)Copy

Drop columns from DataFrame

df.drop(columns=['Unnamed: 0', 'Longitude (x)', 'Latitude (y)', 'Station Name'])Copy

Subset columns of DataFrames

Select columns whose name matches regular expression regex

df[['Longitude (x)', 'Latitude (y)', 'Station Name']]
#Select columns whose name matches regular expression regex
df.filter(regex='^T')Copy

Select columns using iloc

df.iloc[:, [0, 10, 20, 25]]Copy

Select columns between 2 columns

df.loc[:, 'Time (LST)':'Dew Point Temp Flag']Copy

Order rows by values of a column

df.sort_values('Temp (°C)')
df.sort_values('Temp (°C)', ascending=True)
df.sort_values('Temp (°C)', ascending=False)Copy

Subset rows

Extract rows that meet logical criteria.

df[df.Year > 2019]Copy

Remove duplicate rows

df.drop_duplicates()Copy

Select rows using query

df.query('Year >2019')Copy

df.query('Year > 2019 and Day < 10')Copy

Select rows based on index

df.iloc[10:20]
df.iloc[:10]
df.iloc[10:]
df.iloc[[1,5,100]]Copy

Subset rows and columns

df.loc[df['Year'] > 2019, ['Time (LST)','Dew Point Temp (°C)']]Copy

Access single value by index

df.iat[10, 5]
df.at[4, 'Year']Copy

Set index with a column

df.set_index('Date/Time (LST)')Copy

Set MultiIndex for Pandas DataFrame

df.set_index(['Date/Time (LST)','Weather'])Copy

Sort the index of a DataFrame

df.sort_index()Copy

Reset index of DataFrame to row numbers, moving index to columns.

df.reset_index()Copy

Rename the columns of a DataFrame

df.rename(columns = {'Temp (°C)':'Tas','Wind Spd (km/h)':'Wsp'})Copy