How to create features from time series data?

Example code for creating features from time series data, such as lag features and window features? It can answer following questions:

  • How to directly read weather data from Canadian government website?
  • How to add lag features to dataframe?
  •  How to convert date column as an index of type DatetimeIndex?

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from feature_engine.timeseries.forecasting import LagFeatures

#  create range of monthly dates
download_dates = pd.date_range(start='2019-01-01', end='2020-01-01', freq='MS')

#  URL from Chrome DevTools Console
base_url = ("https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&"
              "stationID=51442&Year={}&Month={}&Day=7&timeframe=1&submit=Download+Data") #  add format option to year and month

#  create list of remote URL from base URL
list_of_url = [base_url.format(date.year, date.month) for date in download_dates]

#  download and combine multiple files into one DataFrame
df = pd.concat((pd.read_csv(url) for url in list_of_url))
keepcolumns=['Date/Time (LST)','Temp (°C)','Dew Point Temp (°C)']
data=df[keepcolumns]
data=data.rename(columns={'Date/Time (LST)':'dt_var','Temp (°C)':'T','Dew Point Temp (°C)':'Td'})
data=data.head(100)

Shift a row forward to create new features by lagging all numerical variables 1 row forward


from feature_engine.timeseries.forecasting import LagFeatures
lag_f = LagFeatures(periods=1)
data_t = lag_f.fit_transform(data)

Create multiple lag features with one transformer by passing the lag periods in a list.


lag_f = LagFeatures(periods=[1,2,3])
data_t = lag_f.fit_transform(data)

Create lag features utilizing information in the datetime index of the dataframe.For example, 2 hours (2h) or 120 minutes (120min). The first step is to convert date column as an index of type DatetimeIndex.


datetime_series = pd.to_datetime(data['dt_var'])
datetime_index = pd.DatetimeIndex(datetime_series.values)
data1=data.set_index(datetime_index)
data1.drop('dt_var',axis=1,inplace=True)
data1

lag_f = LagFeatures(variables = ['T','Td'],freq="2h",drop_original=False)
data_t = lag_f.fit_transform(data1)

Create lag multiple time intervals forward.


datetime_series = pd.to_datetime(data['dt_var'])
datetime_index = pd.DatetimeIndex(datetime_series.values)
data1=data.set_index(datetime_index)
data1.drop('dt_var',axis=1,inplace=True)
data1

lag_f = LagFeatures(variables = ['T','Td'],freq=["0h","2h",'240min'],drop_original=True)
data_t = lag_f.fit_transform(data1)