Author: Boris Mirkin
This talk introduces the concept of data summarization as a problem of data approximation to review such approaches to systems/data analysis and visualization as singular-value decomposition (SVD), principal component analysis (PCA), latent Dirichlet allocation (LDA) and deep learning networks (DL).
I then focus on the two data summarization problems in the title. An extension of SVD/PCA to cluster analysis appears to lead to k-means, the most popular method in data clustering, and an equivalent criterion leading to Anomalous clustering that proved superior in our computational experiments.
A data summarization approach to multicriteria decisions leads to a novel method for automated ranking. Its application to the issues of evaluating a scientist’s research impact is discussed, including a method for direct evaluation of the quality of research results by mapping the results to a hierarchical taxonomy of the field. This subject is illustrated by an in-house analysis of research results by 30 leading data analysis researchers. The talk concludes with a discussion of implementation of the methods using contemporary big-data analysis platforms such as Hadoop-Map-Reduce and the like.
Boris Mirkin is a professor at the Faculty of Computer Science, National Research University Higher School of Economics, Russia. He holds a PhD in Computer Science and a DSc in Systems Analysis from Russian Universities. In 1991-2010 he traveled extensively, taking visiting research appointments in France, the USA, Germany and a teaching appointment at the Birkbeck University of London, UK.
He develops methods for clustering and interpretation of complex data from the “data recovery” perspective. Currently these approaches are being extended to automation of text analysis problems, including the development and use of hierarchical ontologies.