Example code for identifying and selecting the high predictive performance features from dataset for machine learning and deep learning models. It can also answer the following questions: Prepare data Code import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split # create range of monthly dates download_dates = pd.date_range(start=’2019-01-01′, ..
Category : Data
Example code to transform continuous numerical variables into discrete variables with different methods. It cab also answer the following questions. Prepare data and load functions Code import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from feature_engine.discretisation import EqualFrequencyDiscretiser from feature_engine.discretisation import EqualWidthDiscretiser from feature_engine.discretisation import ArbitraryDiscretiser from ..
Example code for creating and adding new features to a data frame using the feature-engine. It also answer following questions: Math features Code import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from feature_engine.creation import MathFeatures from feature_engine.creation import RelativeFeatures from feature_engine.creation import CyclicalFeatures # create range of ..
Example code for creating features from time series data, such as lag features and window features? It can answer following questions: import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from feature_engine.timeseries.forecasting import LagFeatures # create range of monthly dates download_dates = pd.date_range(start=’2019-01-01′, end=’2020-01-01′, freq=’MS’) # URL from ..
Example code about how to extract several date and time features from datetime variables with feature-engine. It can answer following questions: import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from feature_engine import transformation as vt # create range of monthly dates download_dates = pd.date_range(start=’2019-01-01′, end=’2020-01-01′, freq=’MS’) # ..
Example code for log,reciprocal,arcsin ,power transformers of feature-engine. You can find answer to the following question as well: import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from feature_engine import transformation as vt # Load dataset # create range of monthly dates download_dates = pd.date_range(start=’2019-01-01′, end=’2020-01-01′, freq=’MS’) # ..
Example code for handling outlier with 3 methods of feature-engine. Winsorizer Caps maximum and/or minimum values of a variable at automatically determined values.[ref:https://feature-engine.readthedocs.io/en/latest/user_guide/outliers/Winsorizer.html] Code import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from feature_engine.outliers import Winsorizer # Load dataset def load_titanic(): data = pd.read_csv(‘https://www.openml.org/data/get_csv/16826755/phpMYEkMl’) data = data.replace(‘?’, ..
The sample code shows you how to encode categorical data and answer the following questions: One hot encoder Replaces the categorical variable by a group of binary variables which take value 0 or 1, to indicate if a certain category is present in an observation. Example code import numpy as np import pandas as pd ..
Example python code for handling missing data (ref:Python feature engineering cookbook ). Also answer the following questions: import pandas as pd from sklearn.model_selection import train_test_split from sklearn.impute import SimpleImputer from feature_engine.missing_data_imputers import MeanMedianImputer from feature_engine.imputation import ArbitraryNumberImputer from feature_engine.imputation import EndTailImputer from feature_engine.imputation import CategoricalImputer from feature_engine.imputation import RandomSampleImputer from feature_engine.imputation import AddMissingIndicator from feature_engine.imputation ..
The ISIMIP data is widely used in climate change impact analysis. The data is in netcdf format, and the amount of data is very large. There are many files to download. Using wget and the ISIMIP interface, it’s easy to download the file you want. First, use the ISIMIP interface to select the file and ..