Euiwoong Lee earns NSF CAREER Award to design more efficient data clustering algorithms

Lee seeks to improve performance guarantees in clustering, one of the most fundamental tasks in machine learning.
Euiwoong Lee
Prof. Euiwoong Lee

Euiwoong Lee, assistant professor of computer science at the University of Michigan, has received a National Science Foundation (NSF) CAREER Award to design efficient new data clustering algorithms with improved performance guarantees. The project is titled “Towards Optimal Approximabilities for Clustering.”

In data analysis, clustering involves grouping data together based on some measure of similarity. It’s one of the most common and basic tasks in unsupervised machine learning, and is typically a first step to understanding relationships in big data sets across every field. In his CAREER research, Lee intends to design provably optimal approximation algorithms for clustering. Drawing on optimization, complexity theory, geometry, combinatorics, and analysis, he’ll create new algorithmic techniques that build new connections between these fields and advance the state of the art.

“Clustering is one of the most representative tasks in machine learning and has been actively studied for decades, but there are still wide gaps in our understanding even for the most basic formulations,” Lee says.

Lee’s project will look at two main aspects of the problem, algorithms and hardness. In the first direction, he’ll revisit the study of local search, a technique that, when applied to clustering, iteratively seeks out an optimal set of clusters after an initial guess. Lee will explore recent innovations in the area, including his own combination of local search with linear programming methods. 

“While the standard local search has been well studied, new types of local search algorithms were recently proposed that require a deeper understanding of local landscapes,” Lee says.

His second direction will study his recently proposed complex hypothesis, called the Johnson Coverage Hypothesis. If true, Lee will be able to demonstrate significantly better hardness results for clustering in Euclidean spaces.

“It already reveals exciting connections to other areas of mathematics and theoretical computer science that are worth further investigation.”

Lee’s focus on clustering can have a big impact on the many fields that rely on it for day-to-day data exploration and analysis.

“With the growing presence of machine learning in every aspect of our lives, clustering algorithms are run numerous times a day,” Lee says. “A complete theoretical understanding of fundamental clustering problems and their algorithms will lead to more efficient algorithms that produce better outcomes at less cost.”