Modify your mapper to count words after removing punctuation marks during mapping.
Mapper code below.
#!/usr/bin/env python
#the above just indicates to use python to intepret this file
#This mapper code will input a line of text and output <word, 1> #
import sys
sys.path.append('.')
for line in sys.stdin:
line = line.strip() #trim spaces from beginning and end
keys = line.split() #split line by space
for key in keys:
value = 1
print ("%s\t%d" % (key,value)) #for each word generate 'word TAB 1' line
If you have any doubts, please give me comment...
#!/usr/bin/env python
#the above just indicates to use python to intepret this file
#This mapper code will input a line of text and output <word, 1> #
import sys
import string
sys.path.append('.')
for line in sys.stdin:
line = line.strip() #trim spaces from beginning and end
keys = line.split() #split line by space
for key in keys:
value = 1
key = key.translate(str.maketrans('','', string.punctuation))
print ("%s\t%d" % (key,value))
Get Answers For Free
Most questions answered within 1 hours.