Rank() is not working

kkraj · September 3, 2020, 11:44am

I worked with rank() in dataframe with following python code snippet. I found that ranking is not working

datadf = spark.read.csv(“file:///C://datasets//data.csv”, sep=",", schema =‘ID int, values int’)
p41 = datadf.withColumn(‘rnk’,rank().over(Window.partitionBy(datadf.ID).orderBy((datadf.values).desc())))
p42 = p41.filter(col(‘rnk’) <= 15).orderBy(datadf.values,col(‘rnk’), ascending=[0,1])
p42.show()

it always give the result as given below.Here ‘values’ in descending order and rank column always show 1. The column ‘rnk’ should show 1 ,2,3,4,5…15. But it is not showing.

I just want to know what critical mistake i am doing in this code?

±–±-----±–+
| ID|values|rnk|
±–±-----±–+
| 12| 83| 1|
| 11| 81| 1|
| 10| 69| 1|
| 9| 68| 1|
| 5| 67| 1|
| 4| 56| 1|
| 3| 45| 1|
| 14| 36| 1|
| 13| 34| 1|
| 8| 34| 1|
| 2| 34| 1|
| 15| 28| 1|
| 1| 23| 1|
| 7| 23| 1|
| 6| 12| 1|
±–±-----±–+

I have given below my datasets
|ID|values|
|1|23|
|2|34|
|3|45|
|4|56|
|5|67|
|6|12|
|7|23|
|8|34|
|9|68|
|10|69|
|11|81|
|12|83|
|13|34|
|14|36|
|15|28|

could you please help me to find out why ranking is not working ?

sgiri · October 16, 2020, 3:22pm

I think instead of using rank() you need to use row_number()