For a given dataset in a csv file answer the following using
unix command only:
3) The 2nd column is the unique identifier for a Facebook post.
What are the other
columns?
4) How many Facebook posts are there in the file?
5) What is the date range for Facebook posts in this file? (Assume
that the data is
in order)
6) How many unique pages are there?
7) How many unique posts are there? [Hint: one page can have
multiple posts]
8) When was the first mention in the file regarding “Italian
Dishes” and what was
the post?
9) How many times is “Barack Obama” mentioned in the file? How did
you find
this? (Do not ignore the case)
10) What about “Donald Trump”? Who is more popular on Facebook,
Obama or
Trump? (Do not ignore the case)
The column of the dataset is given below: Filename. xyz.csv
page_name | post_id | page_id | post_name | message | description | caption | post_type | status_type | likes_count | comments_count | shares_count | love_count | wow_count | haha_count | sad_count | thankful_count | angry_count | post_link | picture | posted_at |
3) To print all column names of the csv file, we can use: awk 'BEGIN{ FS="," } { for(fn=1;fn<=NF;fn++) {print fn" = "$fn;}; exit; }' xyz.csv
4) To count number of facebook posts:
awk 'NR>1' xyz.csv | cut -f1 -d, | sort | uniq | wc -l
5) To select date range, you need date column number. Suppose date is given in column number 10, then
Start date can be fetched by:
cut -f10 -d, xyz.txt | head -2 | tail -1
End date can be fetched by:
cut -f10 -d, xyz.txt | tail -1
6) To count number of unique pages;
awk 'NR>1' xyz.csv | cut -f3 -d, | sort | uniq | wc -l
Get Answers For Free
Most questions answered within 1 hours.