During your second Individual Project (IP), you will utilize your Python environment to derive structure from unstructured data. You will utilize the data set "Airline Sentiment" from Kaggle located at Kaggles website. From Welkin10, the dataset "Airline Sentiment". /welkin10/airline-sentinment
Using this data set, you will create a text analytics Python application that extracts themes from each comment using term frequency–inverse document frequency (TF–IDF) or simple word counts. For the deliverable, provide your Python file and a .csv with your results added as a column to the original data set.
Reference
Akash. (2017). Airline sentiment
from collections import Counter
def count_words(text):
skips = [".", ", ", ":", ";", "'", '"']
for ch in skips:
text = text.replace(ch, "")
word_counts = {}
for word in text.split(" "):
if word in word_counts:
word_counts[word]+= 1
else:
word_counts[word]= 1
return word_counts
def count_words_fast(text):
text = text.lower()
skips = [".", ", ", ":", ";", "'", '"']
for ch in skips:
text = text.replace(ch, "")
word_counts = Counter(text.split(" "))
return word_counts
Get Answers For Free
Most questions answered within 1 hours.