We employ the Euclidean distance to determine which clusters should be merged. This metric helps in assessing the proximity between different clusters effectively.
The performance of our clustering algorithm is evaluated based on its purity. This metric helps in determining the accuracy with which the algorithm groups samples into the correct clusters.
The algorithm aims to cluster samples based on their features to identify specific types of cancer accurately.
The clustering process is halted once the number of clusters reaches seven. This stop condition ensures that the clusters remain meaningful and manageable.
We utilize two variants of agglomerative clustering:
- Single Link: This method considers the minimum distance between clusters for merging decisions.
- Complete Link: In contrast, this method considers the maximum distance between clusters to guide the merging process.
- Cluster: Represents a single cluster containing grouped samples.
- Data: This class is responsible for organizing the data and computing the initial distance matrix, essential for the clustering process.
- Link: This superclass contains three subclasses:
- Link: The base class for linkage criteria.
- SingleLink: Implements the single link clustering method.
- CompleteLink: Implements the complete link clustering method.
- AgglomerativeClustering: This class updates the clusters and oversees the execution of the algorithm, ensuring that the clustering process adheres to the specified methods and stopping criteria.