I’m trying to load the excel file to the dataframe. In that few columns have the dates in integer form. I used the below code and it worked,
df = df.select(col('Date_Column'), expr("date_add(to_date('1900-07-10', 'yyyy-MM-dd'), 2)").alias('New_Date_Column'))
Then I try to get as integer using this code,
df = df.select(col('Date_Column'), expr("date_add(to_timestamp('1900-07-10', 'd/M/yy'), cast('Date_Column' as Integer))").alias('New_Date_Column'))
Here I get only NULL values as a result.
Here the date mentioned are in ISO format, so you can cast string to date.
Pyspark will automatically identify the date and you will get correct results.
Date format expression will be,
to_date('1899-12-30', 'y-M-d')
not
yyyy-MM-dd
.