Master of Science in Computer Science Thesis Defense

December 13, 2023 2:00pm — 3:00pm

Location:
In Person - Newell-Simon 3002

Speaker:
SHUCHEN LI, Master's Student, Computer Science Department, Carnegie Mellon University

We give the first polynomial time algorithm that achieves arbitrarily high accuracy for robust mean estimation, when the adversary corrupts the sample before the Laplace noise is added. For sufficiently small constant α > 0 and accuracy ε > 0, our algorithm takes as input a sample Y ⊆ ℝ^d of size n ≥ poly(d, ε^-i) obtained by an i.i.d. sample X ⊆ ℝ^d of size n from the Laplace distribution with unknown mean μ and covariance I, while adversarially corrupting an alpha fraction of the points before the Laplace noise is added. Our algorithm runs in Õ(nd) time, and outputs an estimation μ^ˆ that ∥ μ^ˆ - μ ∥₂ ≤ ε with high probability.

Our algorithm transforms the sample to a Fourier-sparse signal that encodes the true mean μ, and applies the sparse Fourier transform to decode μ. In Huber’s contamination model, where the adversary corrupts an α fraction of the sample after the noise is added, it is known that there is a Ω(α) lower bound on the error of the estimator. In contrast, our algorithm can estimate the mean arbitrarily closely in this corruption-before-noise setting.

Our algorithm in this new setting has many possible applications. For example, the one that motivated us is max-affine regression. In max-affine regression, the model is the maximum of k linear models, where the noise is added after taking the maximum. If we can extend our algorithm to the list-decodable setting, then we immediately get an algorithm for the max-affine regression.

Thesis Committee:

Pravesh Kothari (Chair)
David Woodruff

Additional Information

Add event to Google
Add event to iCal

About Main page

Admissions Main page

Academics Main page

People Main page

Research Main page

Master of Science in Computer Science Thesis Defense

December 13, 2023 2:00pm — 3:00pm