Discuss the strengths and weaknesses of using K-Means clustering algorithm to cluster multi class data sets....

Question

Question

Discuss the strengths and weaknesses of using K-Means clustering algorithm to cluster multi class data sets....

Discuss the strengths and weaknesses of using K-Means clustering algorithm to cluster multi class data sets. How do you compare it with a hierarchical clustering technique.

Read 300 words with no plagrism

Engineering Computer-Science

0 0

Add a comment Transcribed image text

Answer 1

Answer #1

ANSWER :

What is clustering :-

A cluster refers to a collection of data points aggregated together because of certain similarities.
Data clustering approaches can group similar data into clusters. The grouped Data will usually reveal important meanings.
Data that are close to each other tend to share some external relationship. This relationship can be established to group the data into clusters.

How K-Means Clustering Algorithm works :-

In k∗-means, we also use the random initialization method to choose k∗ starting centers, and first assign all points into k∗clusters.

Then we get feature values of mean for each cluster,as shown in lines 1-3. Next, k∗-means performs hierarchical clustering along with k-means adjusting iteration in lines 4-22 and nearest clusters associated with top-n distances merging in line 23-25.
Line 6 describe the proposed cluster pruning strategy. We use collections CS to store the neighbor clusters of specific cluster which after prune remote clusters by Lemma 2.
Then, we only need to verify the adjustable clusters in search space of CS for each point in Ci.
Once percentage of moved points is lower than given θ in first round k-means optimized update principle will be started and radius will be updated during this process( lines 7- 14).
For algorithm’s efficiency, we maintain a value r in each cluster to update its radius( lines 9-13). At the end of each iteration, each cluster mean m and it’s radius is replaced by m{}',radius directly.Lines 23-25 show top-n nearest clusters merging which reduce number of clusters from - Dool
Therefore, parameter n is not fixed, but ranges from given n to 1. For each round refining of k∗-means, we use a decrease strategy to determine value of n, and a top-1 may be performed at final round to make number of clusters reach at k.

K-Means strengths :

If variables are huge, then K-Means most of the times computationally faster than hierarchical clustering, if we keep k smalls.

K-Means produce tighter clusters than hierarchical clustering, especially if the clusters are globular.

K-Means weaknesses :

Difficult to predict K-Value.
With global cluster, it didn't work well.
Differenti partitions can result in different final clusters.
Itdoes not work well with clusters (in the original data) of Different size and Different density.

Comparison between K Means and Hierarchical clustering :-

Hierarchical clustering can’t handle big data well but K Means clustering can. This is because the time complexity of K Means is linear i.e. O(n) while that of hierarchical clustering is quadratic i.e. O(n2).
In K Means clustering, since we start with random choice of clusters, the results produced by running the algorithm multiple times might differ. While results are reproducible in Hierarchical clustering.
K Means is found to work well when the shape of the clusters is hyper spherical (like circle in 2D, sphere in 3D).
K Means clustering requires prior knowledge of K i.e. no. of clusters you want to divide your data into. But, you can stop at whatever number of clusters you find appropriate in hierarchical clustering by interpreting the dendrogram.

Hope it helps... please give an upvote. it's very important to me...thank you:)

0 0

Add a comment

Discuss the strengths and weaknesses of using K-Means clustering algorithm to cluster multi class data sets....

Homework Answers

Post as a guest

Earn Coins

Not the answer you're looking for?

Similar Questions

Write a program using K-Means and Expectation Maximization clustering for the given dataset. Use R language...

As part of the quarterly reviews, the manager of a retail store analyzes the quality of...

using Walmarts most recent 10-k report, please add an analysis for the Liquidity Ratio ( A,...

For this project, you will make decisions about how two parameters (proportions or means) compare using...

As you saw from the lab PowerPoint slides last week, you will be doing a research...

Write a Python 3 program called “parse.py” using the template for a Python program that we...

Homework Draw class diagrams for your HW4 - the Tetris Game shown below: Part 1: UML...

In this second portion of the Final Exam, you will critically evaluate a quantitative research study...

Please read the article and answear about questions. Determining the Value of the Business After you...

QUESTION 1 ? What is the relationship between family dysfunction and schizophrenia? a. ?Research has substantiated...

Need Online Homework Help?

Active Questions