Plenary Thu-1

## IMS Medallion Lecture (Daniela Witten)

Conference
9:00 AM — 10:00 AM KST
Local
Jul 21 Wed, 8:00 PM — 9:00 PM EDT

### Selective inference for trees

Daniela Witten (University of Washington)

7
As datasets grow in size, the focus of data collection has increasingly shifted away from testing pre-specified hypotheses, and towards hypothesis generation. Researchers are often interested in performing an exploratory data analysis to generate hypotheses, and then testing those hypotheses on the same data. Unfortunately, this type of 'double dipping' can lead to highly-inflated Type 1 errors. In this talk, I will consider double-dipping on trees. First, I will focus on trees generated via hierarchical clustering, and will consider testing the null hypothesis of equality of cluster means. I will propose a test for a difference in means between estimated clusters that accounts for the cluster estimation process, using a selective inference framework. Second, I'll consider trees generated using the CART procedure, and will again use selective inference to conduct inference on the means of the terminal nodes. Applications include single-cell RNA-sequencing data and the Box Lunch Study. This is collaborative work with Lucy Gao (U. Waterloo), Anna Neufeld (U. Washington), and Jacob Bien (USC).

###### Session Chair

Ja-Yong Koo (Korea University)

Plenary Thu-2

## IMS Medallion Lecture (Andrea Montanari)

Conference
10:00 AM — 11:00 AM KST
Local
Jul 21 Wed, 9:00 PM — 10:00 PM EDT

### High-dimensional interpolators: From linear regression to neural tangent models

Andrea Montanari (Stanford University)

8
Modern machine learning methods —most noticeably multi-layer neural networks — require to fit highly non-linear models comprising tens of thousands to millions of parameters. However, little attention is paid to the regularization mechanism to control model's complexity and the resulting models are often so complex as to achieve vanishing training error. Despite this, these models generalize well to unseen data : they have small test error. I will discuss several examples of this phenomenon, leading to two-layers neural networks in the so-called lazy regime. For these examples precise asymptotics could be determined mathematically, using tools from random matrix theory, and a unifying picture is emerging. A common feature is the fact that a complex unregularized nonlinear model becomes essentially equivalent to a simpler model, which is however regularized in a non-trivial way.

[Based on joint papers with: Michael Celentano, Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, Feng Ruan, Youngtak Sohn, Jun Yan, Yiqiao Zhong]

###### Session Chair

Myunghee Cho Paik (Seoul National University)

Plenary Thu-3

## Blackwell Lecture (Gabor Lugosi)

Conference
7:00 PM — 8:00 PM KST
Local
Jul 22 Thu, 6:00 AM — 7:00 AM EDT

### Estimating the mean of a random vector

Gabor Lugosi (ICREA & Pompeu Fabra Universit)

7
One of the most basic problems in statistics is the estimation of the mean of a random vector, based on independent observations. This problem has received renewed attention in the last few years, both from statistical and computational points of view. In this talk we review some recent results on the statistical performance of mean estimators that allow heavy tails and adversarial contamination in the data. In particular, we are interested in estimators that have a near-optimal error in all directions in which the variance of the one dimensional marginal of the random vector is not too small. The material of this talk is based on a series of joint papers with Shahar Mendelson.

###### Session Chair

Byeong Uk Park (Seoul National University)

Plenary Thu-4

## Tukey Lecture (Sara van de Geer)

Conference
8:00 PM — 9:00 PM KST
Local
Jul 22 Thu, 7:00 AM — 8:00 AM EDT

### Max-margin classification and other interpolation methods

Sara van de Geer (Swiss Federal Institute of Technology Zürich)

7
John Tukey writes that detective work is an essential part of statistical analysis (Tukey [1969]). In this talk we discuss methods that do the opposite of detective work: data interpolation. This was often considered forbidden, but then again, statistical paradigms are not to be sanctified. We consider basis pursuit and one-bit compressed sensing. We re-establish the $\ell_{2}$-rates of convergence for noisy basis pursuit of Wojtaszczyk [2010]. For one-bit compressed sensing we study the algorithm of Plan and Vershynin [2013] and re-derive $\ell_{2}$-rates as well. The techniques used also allow deriving novel results for the max-margin classifier - related to the ada-boost algorithm - as given in Liang and Sur [2020].

This is joint work with Geoffrey Chinot, Felix Kuchelmeister and Matthias Löffler.

References
T. Liang and P. Sur. A precise high-dimensional asymptotic theory for boosting and minimum-$\ell_1$-norm interpolated classifiers, 2020. arXiv:2002.01586.
Y. Plan and R. Vershynin. One-bit compressed sensing by linear programming. Communications on Pure and Applied Mathematics, 66(8):1275–1297, 2013.
J. Tukey. Analyzing data: Sanctification or detective work? American Psychologist, 24:83–91, 1969.
P. Wojtaszczyk. Stability and instance optimality for gaussian measurements in compressed sensing. Foundations of Computational Mathematics, 10(1): 1–13, 2010.