Category : Data Analytics

Select a subset of xarray data for a specified year range with Python by using the sel() method. Here’s an example: import xarray as xr # Load your xarray dataset ds = xr.open_dataset(‘my_dataset.nc’) # Select a subset of data for a specified year range subset = ds.sel(time=slice(‘start_year’, ‘end_year’)) Replace my_dataset.nc with the name of your ..

Read more

There are a few Python packages that can be used to read RData files. Here are three popular ones: #install rpy2 !pip install rpy2 #An example of how to use rpy2 to load an RData file: #—————————————————- import rpy2.robjects as robjects # Load RData file robjects.r[‘load’](‘file.RData’) # Access R objects from Python r_var = robjects.globalenv[‘var_name’] ..

Read more

The following code is used to count the order of the date of February 29 in leap years for a period. import sys import warnings if not sys.warnoptions: warnings.simplefilter(‘ignore’) #Process data import numpy as np import matplotlib.pyplot as plt #Process data import numpy as np import xarray as xr #Writing data files import pandas as ..

Read more

Beautiful trees landscape image by AI

Example code for transforming a selected group of variables with Sklearn Transformer Wrapper. It can also answer following questions: Prepare data sample import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.impute import SimpleImputer from sklearn.preprocessing import StandardScaler from sklearn.preprocessing import OneHotEncoder from sklearn.preprocessing import PolynomialFeatures from ..

Read more

Beautiful forest, pond and sunshine

Example code for identifying and selecting the high predictive performance features from dataset for machine learning and deep learning models. It can also answer the following questions: Prepare data Code import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split # create range of monthly dates download_dates = pd.date_range(start=’2019-01-01′, ..

Read more

painting a breathtaking aerial view of lakes and mountains

Example code about how to extract several date and time features from datetime variables with feature-engine. It can answer following questions: import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from feature_engine import transformation as vt # create range of monthly dates download_dates = pd.date_range(start=’2019-01-01′, end=’2020-01-01′, freq=’MS’) # ..

Read more

Example python code for handling missing data (ref:Python feature engineering cookbook ). Also answer the following questions: import pandas as pd from sklearn.model_selection import train_test_split from sklearn.impute import SimpleImputer from feature_engine.missing_data_imputers import MeanMedianImputer from feature_engine.imputation import ArbitraryNumberImputer from feature_engine.imputation import EndTailImputer from feature_engine.imputation import CategoricalImputer from feature_engine.imputation import RandomSampleImputer from feature_engine.imputation import AddMissingIndicator from feature_engine.imputation ..

Read more

My example of Python code for mapping variables overlay with geographic information. Elements in the map include ocean, coastline,borders,lakes,rivers,province borders, contour, contourf, pcolormesh, scatter, labels, province (or state) names. Also includes information about how to set cmap, what range of the data (minimum, maximum) to show, how many levels to display the data. How to ..

Read more

from pyspark.sql import SparkSession from pyspark.sql.types import * #data types from pyspark.sql import functions as F #functions spark=SparkSession.builder.appName(‘XIU-Daily’).getOrCreate() input_fn = ‘s-p-tsx-60-futures_01.csv’ df = spark.read.csv(input_fn,header=True,inferSchema=True) df.show(3) +——————-+—–+ | date|value| +——————-+—–+ |1999-09-07 00:00:00|416.5| |1999-09-08 00:00:00|417.2| |1999-09-09 00:00:00|421.5| +——————-+—–+ df=df.withColumn(‘Date’,F.date_format(‘date’,’yyyy-MM-dd’)) #change date format df=df.withColumn(‘current_date’,F.current_date()) #current date df=df.withColumn(‘year’,F.year(‘date’)) df=df.withColumn(‘month’,F.month(‘date’)) df=df.withColumn(‘dayofmonth’,F.dayofmonth(‘date’)) df=df.withColumn(‘minute’,F.minute(‘date’)) df=df.withColumn(‘second’,F.second(‘date’)) df=df.withColumn(‘dayofyear’,F.dayofyear(‘date’)) df=df.withColumn(‘dayofweek’,F.dayofweek(‘date’)) df=df.withColumn(‘weekofyear’,F.weekofyear(‘date’)) df=df.withColumn(‘quarter’,F.quarter(‘date’)) df=df.withColumn(‘next_day_Mon’,F.next_day(‘date’,’Mon’)) df=df.withColumn(‘next_day_Tue’,F.next_day(‘date’,’Tue’)) df=df.withColumn(‘next_day_Wed’,F.next_day(‘date’,’Wed’)) ..

Read more