I’m trying to load the excel file to the dataframe. In that few columns have the dates in integer form. I used the below code and it worked,
df = df.select(col('Date_Column'), expr("date_add(to_date('1900-07-10', 'yyyy-MM-dd'), 2)").alias('New_Date_Column'))
Then I try to get as integer using this code,
df = df.select(col('Date_Column'), expr("date_add(to_timestamp('1900-07-10', 'd/M/yy'), cast('Date_Column' as Integer))").alias('New_Date_Column'))
Here I get only NULL values as a result.
Date format expression will be,
to_date('1899-12-30', 'y-M-d')
not
yyyy-MM-dd
.Here the date mentioned are in ISO format, so you can cast string to date.
Pyspark will automatically identify the date and you will get correct results.