How to extract date and time features from datetime variables?

Example code about how to extract several date and time features from datetime variables with feature-engine. It can answer following questions:

  • How to directly read weather data from Canadian government website?
  • How to randomly pick n rows from pandas data frame?
  • How to extract all feature-engine supported date time features automatically?
  • How to extract the most commonly used date time features automatically?
  • How to extract your interest date time features?

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

from feature_engine import transformation as vt

#  create range of monthly dates
download_dates = pd.date_range(start='2019-01-01', end='2020-01-01', freq='MS')

#  URL from Chrome DevTools Console
base_url = ("https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&"
              "stationID=51442&Year={}&Month={}&Day=7&timeframe=1&submit=Download+Data") #  add format option to year and month

#  create list of remote URL from base URL
list_of_url = [base_url.format(date.year, date.month) for date in download_dates]

#  download and combine multiple files into one DataFrame
df = pd.concat((pd.read_csv(url) for url in list_of_url))
keepcolumns=['Station Name','Date/Time (LST)'] 
data=df[keepcolumns]
data=data.rename(columns={'Date/Time (LST)':'dt_var'})
data=data.sample(n=100)

How to extract all supported features automatically.


dfts = DatetimeFeatures(
    variables=["dt_var"],
    features_to_extract='all',
    drop_original=False,
)
data_t = dfts.fit_transform(data)

How to extract the most common date and time features from one of the variables.


dfts = DatetimeFeatures(
    variables=["dt_var"],
    features_to_extract=None,
    drop_original=False,
)
data_t = dfts.fit_transform(data)

How to extract your interesting date nd time features from one of the variables.


dfts = DatetimeFeatures(
    variables=["dt_var"],
    features_to_extract=["year", "month", "day_of_week","day_of_month","day_of_year"],
    drop_original=False,
)
data_t = dfts.fit_transform(data)