Suppose you had information from customers who shop at a grocery store, and you wanted to perform cluster analysis to identify groups of customers who have similar shopping patterns. The data that you have includes the age, income, and educational level of the customer, and the yearly amounts each customer purchases of the following food types: fruits, vegetables, milk, cereal, peanut butter, and bread. What are some of the data preparation steps that should be taken before performing cluster analysis? What distance measure should be used? Explain why you chose the distance measure. Discuss how the retailer could use the results of this cluster analysis to improve grocery sales.
In grocery stores, goods of a similar nature are grouped together in order to make shopping more convenient and efficient.
First preparation step is to check all the categories are clearly defined there must not any abnormal data like age can't contain negative values , fruits name should be checked as sometimes same name interpreted as different if there is difference in spelling like apple and Apple are different for machines.
A lot more can be done in preparing the data
You should rectify problem from outliers and impute missing observation.
You should go with k-means because in this algo objects based on attributes into partitions.
Alfter clustering into groups retailer will get which products is mostly used by which category of people so he will promote and sell that product to that cluster
Get Answers For Free
Most questions answered within 1 hours.