Example code about how to extract several date and time features from datetime variables with feature-engine. It can answer following questions:
- How to directly read weather data from Canadian government website?
- How to randomly pick n rows from pandas data frame?
- How to extract all feature-engine supported date time features automatically?
- How to extract the most commonly used date time features automatically?
- How to extract your interest date time features?
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from feature_engine import transformation as vt
# create range of monthly dates
download_dates = pd.date_range(start='2019-01-01', end='2020-01-01', freq='MS')
# URL from Chrome DevTools Console
base_url = ("https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&"
"stationID=51442&Year={}&Month={}&Day=7&timeframe=1&submit=Download+Data") # add format option to year and month
# create list of remote URL from base URL
list_of_url = [base_url.format(date.year, date.month) for date in download_dates]
# download and combine multiple files into one DataFrame
df = pd.concat((pd.read_csv(url) for url in list_of_url))
keepcolumns=['Station Name','Date/Time (LST)']
data=df[keepcolumns]
data=data.rename(columns={'Date/Time (LST)':'dt_var'})
data=data.sample(n=100)
How to extract all supported features automatically.
dfts = DatetimeFeatures(
variables=["dt_var"],
features_to_extract='all',
drop_original=False,
)
data_t = dfts.fit_transform(data)
How to extract the most common date and time features from one of the variables.
dfts = DatetimeFeatures(
variables=["dt_var"],
features_to_extract=None,
drop_original=False,
)
data_t = dfts.fit_transform(data)
How to extract your interesting date nd time features from one of the variables.
dfts = DatetimeFeatures(
variables=["dt_var"],
features_to_extract=["year", "month", "day_of_week","day_of_month","day_of_year"],
drop_original=False,
)
data_t = dfts.fit_transform(data)