I worked with rank() in dataframe with following python code snippet. I found that ranking is not working
datadf = spark.read.csv(“file:///C://datasets//data.csv”, sep=",", schema =‘ID int, values int’)
p41 = datadf.withColumn(‘rnk’,rank().over(Window.partitionBy(datadf.ID).orderBy((datadf.values).desc())))
p42 = p41.filter(col(‘rnk’) <= 15).orderBy(datadf.values,col(‘rnk’), ascending=[0,1])
p42.show()
it always give the result as given below.Here ‘values’ in descending order and rank column always show 1. The column ‘rnk’ should show 1 ,2,3,4,5…15. But it is not showing.
I just want to know what critical mistake i am doing in this code?
±–±-----±–+
| ID|values|rnk|
±–±-----±–+
| 12|    83|  1|
| 11|    81|  1|
| 10|    69|  1|
|  9|    68|  1|
|  5|    67|  1|
|  4|    56|  1|
|  3|    45|  1|
| 14|    36|  1|
| 13|    34|  1|
|  8|    34|  1|
|  2|    34|  1|
| 15|    28|  1|
|  1|    23|  1|
|  7|    23|  1|
|  6|    12|  1|
±–±-----±–+
I have given below my datasets
|ID|values|
|1|23|
|2|34|
|3|45|
|4|56|
|5|67|
|6|12|
|7|23|
|8|34|
|9|68|
|10|69|
|11|81|
|12|83|
|13|34|
|14|36|
|15|28|
could you please help me to find out why ranking is not working ?
