Mining Very Large Dimensional Data Sets

 

Speaker      :      Dr. Vipin Kumar, University of Minnesota

Time             :    Friday, November 17th, 3:00 pm.

Location      :     Stuart Building Room # 111 (SB 111) 


Data sets with high dimensionality pose major challenges for
conventional data mining algorithms. For example, traditional
clustering algorithms such as K-means fail to produce good 
clusters in large dimensional data sets even when they are 
used along with well known dimensionality reduction techniques
such as Principal Component Analysis.

This talk presents a novel method for clustering related data items
in large high-dimensional data sets. Relations among data items
are captured using a graph or a hyper-graph, and efficient multi-
level graph-based algorithms are used to find clusters of highly 
related items.

We present results of experiments on several data sets including 
S\&P500 stock data for the period of 1994-1996, protein coding
data, and document data sets from a variety of domains. These 
experiments demonstrate that our approach is applicable and 
effective in a wide range of domains, and outperforms techniques
such as K-Means even when they are used in conjunction with 
dimensionality reduction methods such as Principal Component 
Analysis or Latent Semantic Indexing scheme.

 

 

Short bio of the speaker:

Dr. Vipin Kumar is currently the Director of Army High Performance
Computing Research Center and Professor of Computer Science
at the University of Minnesota. His research interests include
High Performance computing, and data mining. His research has
resulted in the development of the concept of isoefficiency metric
for evaluating the scalability of parallel algorithms, as well as highly
efficient parallel algorithms and software for sparse matrix factorization
(PSPACES), graph partitioning (METIS, ParMetis), VLSI circuit
partitioning (hMetis), and dense hierarchical solvers. He has authored
over 100 research articles, and co-edited or coauthored 5 books including
the widely used text book ``Introduction to Parallel Computing" (Publ.
Benjamin Cummings/Addison Wesley, 1994).
Kumar has served as chair/co-chair for many conferences/workshops
in the area of parallel computing and high performance data mining, and
is Program Chair for the First SIAM International Conference on Data
Mining to be held in Chicago in April 2001.

Kumar serves on the editorial boards of IEEE Concurrency, Parallel
Computing, the Journal of Parallel and Distributed Computing, and
served on the editorial board of IEEE Transactions of Data and
Knowledge Engineering during 93-97. He is a Fellow of IEEE, a
member of SIAM, and ACM, and a Fellow of the Minnesota
Supercomputer Institute.