Question

In Python (using pandas and numpy) I am trying to clean CSV data so it adheres...

In Python (using pandas and numpy) I am trying to clean CSV data so it adheres to a strict coding system instead of free response. More specifically, how would I code a simple rule based system to handle the various spellings and word choices that represent the following statuses:

  • Never married
  • Divorced
  • Married
  • Widowed
  • Separated

Homework Answers

Answer #1
#Here is how I converted a string pandas dataframe column to strictly adhere to the categories I have 

import pandas as pd    #Import pandas library
import numpy as np     #Import numpylibrary

#Sample string dataframe
df_sample = pd.DataFrame(pd.Series(['Never married', 'Separated', 'Widowed','Divorced','Married','kjskd','Married','Never married','uguyd']))
print(df_sample)
print(type(df_sample[0]))

#Converting the string dataframe column to a categorical while structurally ordering it using pd.Categorical method
ordered_marstat = ['Never married', 'Divorced', 'Married', 'Widowed', 'Separated']
df = pd.DataFrame(pd.Categorical(df_sample[0], ordered=True,categories=ordered_marstat))
print(df)

This was from my original project where I had to store the string data frame into a new dataframe but you can also replace it with existing column using below syntax

df_sample[0] = pd.DataFrame(pd.Categorical(df_sample[0], ordered=True,categories=ordered_marstat))

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions
ADVERTISEMENT
Need Online Homework Help?

Get Answers For Free
Most questions answered within 1 hours.

Ask a Question
ADVERTISEMENT