Based on my code: Canada_COVID19_cases_information.ipynb I like this way to convert string to date: from pyspark.sql.types import * #data types func = udf (lambda x: datetime.strptime(x, ‘%d/%m/%Y’), DateType()) df = df.withColumn(‘newDate’, func(col(‘Date’))) calculate difference days between two date: Some good examples from pyspark.sql import functions as F df = df.withColumn(‘startDay’,F.lit(‘2020-01-01’).cast(“Date”)) df = df.withColumn(‘Days_from_01_Jan’,F.datediff(F.col(‘newDate’),F.col(‘startDay’))) convert pandas ..