Can you tell me how you described the schema here??
CREATE EXTERNAL TABLE tweets_raw ( id BIGINT, created_at STRING, source STRING, favorited BOOLEAN, retweet_count INT, retweeted_status STRUCT< text:STRING, users:STRUCT<screen_name:STRING,name:STRING>>, entities STRUCT< urls:ARRAY<STRUCT<expanded_url:STRING>>, user_mentions:ARRAY<STRUCT<screen_name:STRING,name:STRING>>, hashtags:ARRAY<STRUCT<text:STRING>>>, text STRING, user STRUCT< screen_name:STRING, name:STRING, friends_count:INT, followers_count:INT, statuses_count:INT, verified:BOOLEAN, utc_offset:STRING, -- was INT but nulls are strings time_zone:STRING>, in_reply_to_screen_name STRING, year int, month int, day int, hour int )
i have seen the data file and found many other information in the table before the ‘id’ column. How come we directed hive to ignore those fields of the data file.