Mining
Very Large Dimensional Data Sets
Speaker : Dr. Vipin Kumar, University of Minnesota
Time : Friday, November 17th, 3:00 pm.
Location
:
Stuart
Building Room # 111 (SB 111)
Data
sets with high dimensionality pose major challenges for
conventional data mining algorithms. For example, traditional
clustering algorithms such as K-means fail to produce good
clusters in large dimensional data sets even when they are
used along with well known dimensionality reduction techniques
such as Principal Component Analysis.
This talk presents a novel method for clustering related data items
in large high-dimensional data sets. Relations among data items
are captured using a graph or a hyper-graph, and efficient multi-
level graph-based algorithms are used to find clusters of highly
related items.
We
present results of experiments on several data sets including
S\&P500 stock data for the period of 1994-1996, protein coding
data, and document data sets from a variety of domains. These
experiments demonstrate that our approach is applicable and
effective in a wide range of domains, and outperforms techniques
such as K-Means even when they are used in conjunction with
dimensionality reduction methods such as Principal Component
Analysis or Latent Semantic Indexing scheme.
Short bio of the speaker:
Dr.
Vipin Kumar is currently the Director of Army High Performance
Computing Research Center and Professor of Computer Science
at the University of Minnesota. His research interests include
High Performance computing, and data mining. His research has
resulted in the development of the concept of isoefficiency metric
for evaluating the scalability of parallel algorithms, as well as highly
efficient parallel algorithms and software for sparse matrix
factorization
(PSPACES), graph partitioning (METIS, ParMetis), VLSI circuit
partitioning (hMetis), and dense hierarchical solvers. He has authored
over 100 research articles, and co-edited or coauthored 5 books
including
the widely used text book ``Introduction to Parallel Computing"
(Publ.
Benjamin Cummings/Addison Wesley, 1994).
Kumar has served as chair/co-chair for many conferences/workshops
in the area of parallel computing and high performance data mining, and
is Program Chair for the First SIAM International Conference on Data
Mining to be held in Chicago in April 2001.
Kumar serves on the editorial boards of IEEE Concurrency, Parallel
Computing, the Journal of Parallel and Distributed Computing, and
served on the editorial board of IEEE Transactions of Data and
Knowledge Engineering during 93-97. He is a Fellow of IEEE, a
member of SIAM, and ACM, and a Fellow of the Minnesota
Supercomputer Institute.