Question

Discuss the strengths and weaknesses of using K-Means clustering algorithm to cluster multi class data sets....

Discuss the strengths and weaknesses of using K-Means clustering algorithm to cluster multi class data sets. How do you compare it with a hierarchical clustering technique.

Read 300 words with no plagrism

Homework Answers

Answer #1

ANSWER :

What is clustering :-

A cluster refers to a collection of data points aggregated together because of certain similarities.
Data clustering approaches can group similar data into clusters. The grouped Data will usually reveal important meanings.
Data that are close to each other tend to share some external relationship. This relationship can be established to group the data into clusters.

How K-Means Clustering Algorithm works :-

In k∗-means, we also use the random initialization method to choose k∗ starting centers, and first assign all points into k∗clusters.

Then we get feature values of mean for each cluster,as shown in lines 1-3. Next, k∗-means performs hierarchical clustering along with k-means adjusting iteration in lines 4-22 and nearest clusters associated with top-n distances merging in line 23-25.
Line 6 describe the proposed cluster pruning strategy. We use collections CS to store the neighbor clusters of specific cluster which after prune remote clusters by Lemma 2.
Then, we only need to verify the adjustable clusters in search space of CS for each point in Ci.
Once percentage of moved points is lower than given θ in first round k-means optimized update principle will be started and radius will be updated during this process( lines 7- 14).
For algorithm’s efficiency, we maintain a value r in each cluster to update its radius( lines 9-13). At the end of each iteration, each cluster mean m and it’s radius is replaced by m{}',radius directly.Lines 23-25 show top-n nearest clusters merging which reduce number of clusters from - Dool
Therefore, parameter n is not fixed, but ranges from given n to 1. For each round refining of k∗-means, we use a decrease strategy to determine value of n, and a top-1 may be performed at final round to make number of clusters reach at k.

K-Means strengths :

  • If variables are huge, then  K-Means most of the times computationally faster than hierarchical clustering, if we keep k smalls.
  • K-Means produce tighter clusters than hierarchical clustering, especially if the clusters are globular.


K-Means weaknesses :

  • Difficult to predict K-Value.
  • With global cluster, it didn't work well.
  • Differenti partitions can result in different final clusters.
  • Itdoes not work well with clusters (in the original data) of Different size and Different density.

Comparison between K Means and Hierarchical clustering :-

  • Hierarchical clustering can’t handle big data well but K Means clustering can. This is because the time complexity of K Means is linear i.e. O(n) while that of hierarchical clustering is quadratic i.e. O(n2).
  • In K Means clustering, since we start with random choice of clusters, the results produced by running the algorithm multiple times might differ. While results are reproducible in Hierarchical clustering.
  • K Means is found to work well when the shape of the clusters is hyper spherical (like circle in 2D, sphere in 3D).
  • K Means clustering requires prior knowledge of K i.e. no. of clusters you want to divide your data into. But, you can stop at whatever number of clusters you find appropriate in hierarchical clustering by interpreting the dendrogram.

Hope it helps... please give an upvote. it's very important to me...thank you:)

Know the answer?
Your Answer:

Post as a guest

Your Name:

What's your source?

Earn Coins

Coins can be redeemed for fabulous gifts.

Not the answer you're looking for?
Ask your own homework help question
Similar Questions
Write a program using K-Means and Expectation Maximization clustering for the given dataset. Use R language...
Write a program using K-Means and Expectation Maximization clustering for the given dataset. Use R language to write the code. Package Name: cluster.datasets install.packages("cluster.datasets") library(cluster.datasets) data(paste your data set name) your dataset name data(mammal.dentition) K-Means Clustering Find optimal number of clusters using Elbow Method and then Apply K-Means clustering. Finally do visualization for K-Means Clustering Expectation Maximization Clustering Q1: Apply Expectation Maximization Q2: Visualization for Expectation Maximization https://www.rdocumentation.org/packages/cluster.datasets/versions/1.0-1/topics/mammal.dentition I have done the K-means part do the second one
As part of the quarterly reviews, the manager of a retail store analyzes the quality of...
As part of the quarterly reviews, the manager of a retail store analyzes the quality of customer service based on the periodic customer satisfaction ratings (on a scale of 1 to 10 with 1 = Poor and 10 = Excellent). To understand the level of service quality, which includes the waiting times of the customers in the checkout section, he collected data on 100 customers who visited the store; see the attached Excel file: ServiceQuality. Using Data Mining > Cluster,...
using Walmarts most recent 10-k report, please add an analysis for the Liquidity Ratio ( A,...
using Walmarts most recent 10-k report, please add an analysis for the Liquidity Ratio ( A, B, C) (if you would do at least one or of them id be grateful) Compute and analyze the following groups of ratios for your company and explain how they affect the investors’ or creditors’ decisions regarding the company in an essay(800 words or more (300 each part)). Please include an introduction sentence referencing the sources of data for ratios. Provide a comparative analysis...
For this project, you will make decisions about how two parameters (proportions or means) compare using...
For this project, you will make decisions about how two parameters (proportions or means) compare using hypothesis tests, and you will estimate the difference between the two parameters using confidence intervals. For each confidence, report the following: the confidence interval limits rounded as directed (from StatCrunch) An interpretation of the confidence interval (e.g. "I am 95% confident ... .") Here is a template for reporting the answer for a sample confidence interval problem. Sample problem: Find a 90% confidence interval...
As you saw from the lab PowerPoint slides last week, you will be doing a research...
As you saw from the lab PowerPoint slides last week, you will be doing a research study looking at ‘Aggression Priming” for your first paper. For this week’s discussion, I want you to discuss with your group what you think this study is about. What is the hypothesis? What theory does it come from? What do you predict will happen (do you expect something different than the hypothesis in the researcher instructions? If so, what and why?)? Do you think...
Write a Python 3 program called “parse.py” using the template for a Python program that we...
Write a Python 3 program called “parse.py” using the template for a Python program that we covered in this module. Note: Use this mod7.txt input file. Name your output file “output.txt”. Build your program using a main function and at least one other function. Give your input and output file names as command line arguments. Your program will read the input file, and will output the following information to the output file as well as printing it to the screen:...
Homework Draw class diagrams for your HW4 - the Tetris Game shown below: Part 1: UML...
Homework Draw class diagrams for your HW4 - the Tetris Game shown below: Part 1: UML As a review, Here are some links to some explanations of UML diagrams if you need them. • https://courses.cs.washington.edu/courses/cse403/11sp/lectures/lecture08-uml1.pdf (Links to an external site.) • http://creately.com/blog/diagrams/class-diagram-relationships/ (Links to an external site.) • http://www.cs.bsu.edu/homepages/pvg/misc/uml/ (Links to an external site.) However you ended up creating the UML from HW4, your class diagram probably had some or all of these features: • Class variables: names, types, and...
In this second portion of the Final Exam, you will critically evaluate a quantitative research study...
In this second portion of the Final Exam, you will critically evaluate a quantitative research study on a social science topic. Your instructor will post an announcement with the reference for the article assigned for the exam. The study will be from a peer-reviewed journal and published within the last 10 years. In the body of your critique, describe the statistical approaches used, the variables included, the hypothesis(es) proposed, and the interpretation of the results. In your conclusion, suggest other...
Please read the article and answear about questions. Determining the Value of the Business After you...
Please read the article and answear about questions. Determining the Value of the Business After you have completed a thorough and exacting investigation, you need to analyze all the infor- mation you have gathered. This is the time to consult with your business, financial, and legal advis- ers to arrive at an estimate of the value of the business. Outside advisers are impartial and are more likely to see the bad things about the business than are you. You should...
QUESTION 1 ? What is the relationship between family dysfunction and schizophrenia? a. ?Research has substantiated...
QUESTION 1 ? What is the relationship between family dysfunction and schizophrenia? a. ?Research has substantiated a link between family dysfunction and schizophrenia but can't say which causes the other. b. ?Family dysfunction is a major causative factor for schizophrenia. c. ?Research has failed to substantiate a direct causal link between family dysfunction and schizophrenia. d. ?Family dysfunction plays a minor role in developing schizophrenia. 1.00000 points    QUESTION 2 ? Chuck has no life plan; he simply lives from...