Data - AI Hobbyist

November 11, 2022 AI, Data, Data Analytics, Deep Learning, Machine Learning, Pandas, Python, Worknotes

Example code for identifying and selecting the high predictive performance features from dataset for machine learning and deep learning models. It can also answer the following questions: Prepare data Code import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split # create range of monthly dates download_dates = pd.date_range(start=’2019-01-01′, ..

How to transform continuous numerical variables into discrete variables?

November 11, 2022 AI, Data, Deep Learning, Machine Learning, Numpy, Pandas, Python, Scikit-learn, Worknotes

Example code to transform continuous numerical variables into discrete variables with different methods. It cab also answer the following questions. Prepare data and load functions Code import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from feature_engine.discretisation import EqualFrequencyDiscretiser from feature_engine.discretisation import EqualWidthDiscretiser from feature_engine.discretisation import ArbitraryDiscretiser from ..

How to create and add new features to the dataframe with feature-engine?

November 11, 2022 AI, Data, Deep Learning, Machine Learning, Numpy, Pandas, Python, Scikit-learn, Worknotes

Example code for creating and adding new features to a data frame using the feature-engine. It also answer following questions: Math features Code import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from feature_engine.creation import MathFeatures from feature_engine.creation import RelativeFeatures from feature_engine.creation import CyclicalFeatures # create range of ..

How to create features from time series data?

November 11, 2022 AI, Data, Deep Learning, Machine Learning, Pandas, Python, Worknotes

Example code for creating features from time series data, such as lag features and window features? It can answer following questions: import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from feature_engine.timeseries.forecasting import LagFeatures # create range of monthly dates download_dates = pd.date_range(start=’2019-01-01′, end=’2020-01-01′, freq=’MS’) # URL from ..

How to extract date and time features from datetime variables?

November 11, 2022 AI, Data, Data Analytics, Deep Learning, Machine Learning, Python, Scikit-learn, Worknotes

painting a breathtaking aerial view of lakes and mountains

Example code about how to extract several date and time features from datetime variables with feature-engine. It can answer following questions: import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from feature_engine import transformation as vt # create range of monthly dates download_dates = pd.date_range(start=’2019-01-01′, end=’2020-01-01′, freq=’MS’) # ..

How to transform numerical variables?

November 10, 2022 AI, Data, Deep Learning, Machine Learning, Pandas, Python

Example code for log,reciprocal,arcsin ,power transformers of feature-engine. You can find answer to the following question as well: import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from feature_engine import transformation as vt # Load dataset # create range of monthly dates download_dates = pd.date_range(start=’2019-01-01′, end=’2020-01-01′, freq=’MS’) # ..

How to handle outlier with feature-engine?

November 10, 2022 AI, Data, Deep Learning, Machine Learning, Numpy, Pandas, Python

Example code for handling outlier with 3 methods of feature-engine. Winsorizer Caps maximum and/or minimum values of a variable at automatically determined values.[ref:https://feature-engine.readthedocs.io/en/latest/user_guide/outliers/Winsorizer.html] Code import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from feature_engine.outliers import Winsorizer # Load dataset def load_titanic(): data = pd.read_csv(‘https://www.openml.org/data/get_csv/16826755/phpMYEkMl’) data = data.replace(‘?’, ..

How to encode categorical data?

November 9, 2022 AI, Data, Deep Learning, Machine Learning, Numpy, Pandas, Python, Scikit-learn

The sample code shows you how to encode categorical data and answer the following questions: One hot encoder Replaces the categorical variable by a group of binary variables which take value 0 or 1, to indicate if a certain category is present in an observation. Example code import numpy as np import pandas as pd ..

How to handle missing data?

November 9, 2022 AI, Data, Data Analytics, Deep Learning, Machine Learning, Numpy, Pandas, Python, Scikit-learn

Example python code for handling missing data (ref:Python feature engineering cookbook ). Also answer the following questions: import pandas as pd from sklearn.model_selection import train_test_split from sklearn.impute import SimpleImputer from feature_engine.missing_data_imputers import MeanMedianImputer from feature_engine.imputation import ArbitraryNumberImputer from feature_engine.imputation import EndTailImputer from feature_engine.imputation import CategoricalImputer from feature_engine.imputation import RandomSampleImputer from feature_engine.imputation import AddMissingIndicator from feature_engine.imputation ..

October 16, 2022 Data

The ISIMIP data is widely used in climate change impact analysis. The data is in netcdf format, and the amount of data is very large. There are many files to download. Using wget and the ISIMIP interface, it’s easy to download the file you want. First, use the ISIMIP interface to select the file and ..

Category : Data

Categories

Categories

Tags