Cluster analysis based on pairwise similarities

BioNumerics allows the calculation of pairwise similarity values and a cluster analysis from up to 20,000 database entries for any type of experiment. Various similarity and distance coefficients are available for different data types, for example:

  • Fingerprints: Pearson product-moment correlation, cosine correlation, Dice (or Nei and Li), Jaccard, Jeffrey's X, Ochiai, and number of different bands. Additional options are fuzzy logic and area sensitivity for band-based coefficients. Furthermore, banding patterns have optimization and tolerance settings that can be adjusted from trace-to-trace and for which the most suitable tolerance settings can be statistically determined.
  • Characters: Gower, Rank correlation, Canberra metric, Simple Matching, Bray-Curtis, Chebyshev, Euclidean distance, etc. The categorical coefficient is suitable for multi-state data like VNTR, MLST, AB resistance patterns, etc.
  • Sequences: Similarities calculated on pairwise and multiple sequence alignments, using Needleman-Wunsch, Wilbur-Lipman or BioNumerics’ own proprietary algorithm.

A number of clustering methods are available for calculating dendrograms from pairwise similarity values: Unweighted pair-grouping (UPGMA), complete linkage (furthest neighbor), single linkage (nearest neighbor), Ward, Centroid, Median, Neighbor Joining, Bio-Neighbor Joining, NeighborNet clustering, Correlation Eliminator and Partial Correlation Eliminator methods.

An interactive wizard-driven input of parameters, options and choices makes cluster analysis more intuitive for users with little statistical background.

Share this: