I am reading in a CSV file (using R). When I first check if there are any NA's there are none. I then clean my data and convert my Income variable from num to factor by using this code to discretize income by equal-width bins:
min_income <- min(bd$income) max_income <- max(bd$income) bins = 3 width=(max_income - min_income)/bins; bd$income = cut(bd$income, breaks=seq(min_income, max_income, width))
When I complete cleaning/updating my data and check again for NA's I receive one. It is specific to row 65 for my income column. If I want to update the actual value in it, using the below code I receive an error.
> bd[65,5] = 5014.21 invalid factor level, NA generated
Is there a way to update this without having to change the type of variable? Why would it change the value to an NA (especially for only one value)? I have not come across this issue previously. I could just remove the row, but since I have the value I figured I should just use it.
Check if this particular value is formatted as a string in your original CSV file or got formatted as a string when you imported. Maybe the decimal point in it is misrepresented (as comma etc.). In that case, you can simply fix the value in csv or in data frame before factoring.
Most probably it will solve your issue, If it does not, please let me know in the comments & maybe share a few rows up & down from the problematic row from you csv, I will try to replicate the issue & then solve it.
Solution 2:
If this is the lowest value, use the include.lowest parameter:
bd$income = cut(bd$income,include.lowest = TRUE, breaks=seq(min_income, max_income, width))
The screenshot below shows that NA does not occur after doing this.
(*Please up-vote if you find it helpful. If any doubt, please let me know in the comments)
Get Answers For Free
Most questions answered within 1 hours.