Soft Correspondence Ensemble Clustering (SCEC)

The developed framework is based on soft correspondence to directly address the correspondence problem of clustering ensembles. By the concept of soft correspondence, a cluster from one clustering corresponds to each cluster from another clustering with different weight. Under this framework, we define a correspondence matrix as an optimal solution to a given distance function that results in a new consensus function. Based on the consensus function, we propose a novel algorithm that iteratively computes the consensus clustering and correspondence matrices using multiplicative updating rules. In various data mining applications which involve clustering, SCEC can be used to improve robustness, novelty, and stability of clustering results. SCEC can be used as a privacy-protecting tool in the scenario that different data sources have different sets of features for the same group of objects and cannot share that information with each other. SCEC can be used to find hidden patterns  in relational data, which arise in many applications, such as Web search systems, market basket, and bioinformatics.

Advantages:

  • It directly addresses the core problem of combining multiple clusterings, the correspondence problem, which has theoretic as well as practical importance. 
  • Except for a final consensus clustering, the algorithm also provides correspondence matrices that give intuitive interpretation of the relations between the consensus clustering and each clustering from a clustering ensemble, which may be desirable in many application scenarios. 
  • It is simple for the algorithm to handle clustering ensembles with missing labels.

Intellectual Property:

U.S. 8,195,734; 8,499,022

 

Binghamton University RB239

Patent Information:
Technology/Start-up ID: