Category : Uncategorized

Rename columns x1 to x3, x2 to x4 from pyspark.sql import SparkSession spark=SparkSession.builder.appName(‘rename columns’).getOrCreate() data = spark.createDataFrame([(1,2), (3,4)], [‘x1’, ‘x2’]) data.show() data = data.withColumnRenamed(‘x1′,’x3’) \ .withColumnRenamed(‘x2’, ‘x4’) d..

Read more

Based on my code: Canada_COVID19_cases_information.ipynb I like this way to convert string to date: from pyspark.sql.types import * #data types func = udf (lambda x: datetime.strptime(x, ‘%d/%m/%Y’), DateType()) df = df.withColumn(‘newDate’, func(col(‘Date’))) calculate difference days between two date: Some good examples from pyspark.sql import functions as F df = df.withColumn(‘startDay’,F.lit(‘2020-01-01’).cast(“Date”)) df = df.withColumn(‘Days_from_01_Jan’,F.datediff(F.col(‘newDate’),F.col(‘startDay’))) convert pandas ..

Read more