Category : Data

Beautiful forest, pond and sunshine

Example code for identifying and selecting the high predictive performance features from dataset for machine learning and deep learning models. It can also answer the following questions: Prepare data Code import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split # create range of monthly dates download_dates = pd.date_range(start=’2019-01-01′, ..

Read more

Example code to transform continuous numerical variables into discrete variables with different methods. It cab also answer the following questions. Prepare data and load functions Code import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from feature_engine.discretisation import EqualFrequencyDiscretiser from feature_engine.discretisation import EqualWidthDiscretiser from feature_engine.discretisation import ArbitraryDiscretiser from ..

Read more

A scenic autumn landscape

Example code for creating and adding new features to a data frame using the feature-engine. It also answer following questions: Math features Code import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from feature_engine.creation import MathFeatures from feature_engine.creation import RelativeFeatures from feature_engine.creation import CyclicalFeatures # create range of ..

Read more

Winding rivers in mountainous forest

Example code for creating features from time series data, such as lag features and window features? It can answer following questions: import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from feature_engine.timeseries.forecasting import LagFeatures # create range of monthly dates download_dates = pd.date_range(start=’2019-01-01′, end=’2020-01-01′, freq=’MS’) # URL from ..

Read more

painting a breathtaking aerial view of lakes and mountains

Example code about how to extract several date and time features from datetime variables with feature-engine. It can answer following questions: import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from feature_engine import transformation as vt # create range of monthly dates download_dates = pd.date_range(start=’2019-01-01′, end=’2020-01-01′, freq=’MS’) # ..

Read more

Example code for log,reciprocal,arcsin ,power transformers of feature-engine. You can find answer to the following question as well: import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from feature_engine import transformation as vt # Load dataset # create range of monthly dates download_dates = pd.date_range(start=’2019-01-01′, end=’2020-01-01′, freq=’MS’) # ..

Read more

deep ocean scape

Example code for handling outlier with 3 methods of feature-engine. Winsorizer Caps maximum and/or minimum values of a variable at automatically determined values.[ref:https://feature-engine.readthedocs.io/en/latest/user_guide/outliers/Winsorizer.html] Code import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from feature_engine.outliers import Winsorizer # Load dataset def load_titanic(): data = pd.read_csv(‘https://www.openml.org/data/get_csv/16826755/phpMYEkMl’) data = data.replace(‘?’, ..

Read more

Example python code for handling missing data (ref:Python feature engineering cookbook ). Also answer the following questions: import pandas as pd from sklearn.model_selection import train_test_split from sklearn.impute import SimpleImputer from feature_engine.missing_data_imputers import MeanMedianImputer from feature_engine.imputation import ArbitraryNumberImputer from feature_engine.imputation import EndTailImputer from feature_engine.imputation import CategoricalImputer from feature_engine.imputation import RandomSampleImputer from feature_engine.imputation import AddMissingIndicator from feature_engine.imputation ..

Read more