Cannot resolve column name in given input columns in pyspark dataframe error

Hello,

I have created following dataframe:

df = spark.read.csv(“file:///home/pratik58892973/olist_sellers_dataset.csv”, header=“True”, sep="|")

While I’m able to fetch data:
df.show(5)

and display the schema:
df.printSchema()

root
|-- seller_id,seller_zip_code_prefix,seller_city,seller_state: string (nullable = true)

It is showing an error while fetching a single column value for the given dataframe

df.select(“seller_city”).show()

Error:
AnalysisException: “cannot resolve ‘seller_city’ given input columns: [seller_id,seller_zip_code_prefix,seller_city,seller_state];;\n’Project ['seller_city]\n± Relation[seller_id,seller_zip_code_prefix,seller_city,seller_state#10] csv\n”

Can anyone suggest the solution?

my problem is:cannot resolve 'csu_5g_base_user_mon.c1249' given input columns,and the reason is that the function selectcan not deal with character’.’,so i have to remove or replace it with other characters.Hope you can get something userful to resolve your problem.

when you read in the csv file, make sure to use the right separator, it maybe “;”, “,” so that when you check the schema using df.printSchema(), you should get something like:
root
|–seller_id: string (nullable = true)
|–seller_zip_code_prefix: string (nullable = true)
|–seller_city: string (nullable = true)
|–seller_state: string (nullable = true)