Example code for developing a regression model with keras. It can also answer following questions:
How to remove constant columns?
How to normalize variables with sklearn?
How to remove columns with too many missing values?
Table of Contents
Prepare data
Read data
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
# create range of monthly dates
download_dates = pd.date_range(start='2019-01-01', end='2020-01-01', freq='MS')# URL from Chrome DevTools Console
base_url =("https://climate.weather.gc.ca/climate_data/bulk_data_e.html?format=csv&""stationID=51442&Year={}&Month={}&Day=7&timeframe=1&submit=Download+Data")# add format option to year and month# create list of remote URL from base URL
list_of_url =[base_url.format(date.year, date.month)for date in download_dates]# download and combine multiple files into one DataFrame
df = pd.concat((pd.read_csv(url)for url in list_of_url))
For example, we use temperature, dew point temperature, and station pressure to predict relative humidity (9504 rows). In fact, there is a precisely formulated nonlinear relationship between them.
load machine learning packages and seperate data into predictors and predictant
# Import necessary modulesfrom sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from math import sqrt
# Keras specificimport keras
from keras.models import Sequential
from keras.layers import Dense
predictors=['Temp (°C)','Dew Point Temp (°C)','Stn Press (kPa)']
predictand=['Rel Hum (%)']
X = data[predictors].values
y = data[predictand].values
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=40)