Invited 16

## Bootstrap for High-dimensional Data (Organizer: Kengo Kato)

Conference
9:30 PM — 10:00 PM KST
Local
Jul 20 Tue, 8:30 AM — 9:00 AM EDT

### Inference for nonlinear inverse problems

Vladimir Spokoinyi (Weierstrass Institute for Applied Analysis and Stochastics and Humboldt University of Berlin)

4
Bayesian methods are actively used for parameter identification and uncertainty quantification when solving nonlinear inverse problems with random noise. However, there are only few theoretical results justifying the Bayesian approach. Recent papers, see e.g. Nickl (2017); Lu (2017) and references therein, illustrate the main difficulties and challenges in studying the properties of the posterior distribution in the nonparametric setup. This paper offers a new approach for study the frequentist properties of the nonparametric Bayes procedures. The idea of the approach is to relax the nonlinear structural equation by introducing an auxiliary functional parameter and replacing the structural equation with a penalty and by imposing a prior on the auxiliary parameter. For the such extended model, we state sharp bounds on posterior concentration and on the accuracy of the penalized MLE and on Gaussian approximation of the posterior, and a number of further results. All the bounds are given in terms of effective dimension, and we show that the proposed calming device does not significantly affect this value.

### Change point analysis for high-dimensional data

Xiaohui Chen (University of Illinois at Urbana-Champaign)

6
Cumulative sum (CUSUM) statistics are widely used in the change point inference and identification. For the problem of testing for existence of a change point in an independent sample generated from the mean-shift model, we introduce a Gaussian multiplier bootstrap to calibrate critical values of the CUSUM test statistics in high dimensions. The proposed bootstrap CUSUM test is fully data-dependent and it has strong theoretical guarantees under arbitrary dependence structures and mild moment conditions. Specifically, we show that with a boundary removal parameter the bootstrap CUSUM test enjoys the uniform validity in size under the null and it achieves the minimax separation rate under the sparse alternatives when the dimension p can be larger than the sample size n. Once a change point is detected, we estimate the change point location by maximizing the $\ell^{\infty}$-norm of the generalized CUSUM statistics at two different weighting scales corresponding to covariance stationary and non-stationary CUSUM statistics. For both estimators, we derive their rates of convergence and show that dimension impacts the rates only through logarithmic factors, which implies that consistency of the CUSUM estimators is possible when p is much larger than n. In the presence of multiple change points, we propose a principled bootstrap-assisted binary segmentation (BABS) algorithm to dynamically adjust the change point detection rule and recursively estimate their locations. We derive its rate of convergence under suitable signal separation and strength conditions. Time permitting, we may also discuss some robust extension of the change point detection problem for high-dimensional location parameters.

### Bootstrap test for multi-scale lead-lag relationships in high-frequency data

Yuta Koike (University of Tokyo)

6
Motivated by recent empirical findings in high-frequency financial econometrics, we consider a pair of Brownian motions having possibly different lead-lag relationships at multiple time scales. Given their discrete observation data, we aim to test at which time scales these processes have non-zero cross correlations. For this purpose, we introduce maximum type test statistics based on scale-by-scale cross covariance estimators and develop a Gaussian approximation theory for these statistics. Since their null distributions are analytically intractable, we propose a wild bootstrap procedure to approximate them. Theoretical verification of these approximations are established through recent Gaussian approximation results for high-dimensional vectors of degenerate quadratic forms.

### Q&A for Invited Session 16

0
This talk does not have an abstract.

###### Session Chair

Kengo Kato (Cornell University)

Invited 27

## Random Matrices and Related Fields (Organizer: Manjunath Krishnapur)

Conference
9:30 PM — 10:00 PM KST
Local
Jul 20 Tue, 8:30 AM — 9:00 AM EDT

### The scaling limit of the characteristic polynomial of a random matrix at the spectral edge

Elliot Paquette (McGill University)

3
The Gaussian beta-ensemble (GbetaE) is a 1-parameter generalization of the Gaussian orthogonal/unitary/symplectic ensembles which retains some integrable structure. Using this ensemble, Ramirez, Rider and Virag -- building on a heuristic of Edelman and Sutton -- constructed a limiting point process, the Airy-beta point process, which is the weak limit of the point process of eigenvalues or a random matrix in a neighborhood of the spectral edge. Jointly with Gaultier Lambert, we give a construction of a new limiting object, the stochastic Airy function (SAi); we show this is the limit of the characteristic polynomial of GbetaE in a neighborhood of the spectral edge. It is the bounded solution of the stochastic Airy equation, which is the usual Airy equation perturbed by a multiplicative white noise. We also give some basic properties of SAi.

### Strong asymptotics of planar orthogonal polynomials: Gaussian weight perturbed by finite number of point charges

Seung Yeop Lee (University of South Florida)

3

### Secular coefficients and the holomorphic multiplicative chaos

Joseph Najnudel (University of Bristol)

3
We study the coefficients of the characteristic polynomial (also called secular coefficients) of random unitary matrices drawn from the Circular Beta Ensemble (i.e. the joint probability density of the eigenvalues is proportional to the product of the power beta of the mutual distances between the points). We study the behavior of the secular coefficients when the degree of the coefficient and the dimension of the matrix tend to infinity. The order of magnitude of this coefficient depends on the value of the parameter beta, in particular, for beta = 2, we show that the middle coefficient of the characteristic polynomial of the Circular Unitary Ensemble converges to zero in probability when the dimension goes to infinity, which solves an open problem of Diaconis and Gamburd. We also find a limiting distribution for some renormalized coefficients in the case where beta > 4. In order to prove our results, we introduce a holomorphic version of the Gaussian Multiplicative Chaos, and we also make a connection with random permutations following the Ewens measure.

### Q&A for Invited Session 27

0
This talk does not have an abstract.

###### Session Chair

Ji Oon Lee (Korea Advanced Institute of Science and Technology (KAIST))

Invited 28

## Statistical Inference for Graphs and Networks (Organizer: Betsy Ogburn)

Conference
9:30 PM — 10:00 PM KST
Local
Jul 20 Tue, 8:30 AM — 9:00 AM EDT

### A goodness-of-fit test for exponential random graphs

Gesine Reinert (University of Oxford)

5
For assessing the goodness of fit of a model, often independent replicas are assumed. When the data are given in the form of a network, usually there is only one network available. If the data are hypothesised to come from an exponential random graph model, the likelihood cannot be calculated explicitly. Using Stein's method we introduce a kernelized goodness of fit test and illustrate its performance.

This talk is based on joint work with Nathan Ross and with Wenkai Xu.

### Networks in the presence of informative community structure

Alexander Volfovsky (Duke University)

4
The study of network data in the social and health sciences frequently concentrates on associating covariate information to edge formation and assessing the relationship between network information and individual outcomes. In much of this data, it is likely that latent or observed community structure plays an important role. In this talk we describe how to incorporate this community information into a class of latent space models by allowing the the effects of covariates on edge formation to differ between communities (e.g. age might play a different role in friendship formation in communities across a city). This information is lost by ignoring explicit community membership and we show that ignoring such structure can lead to over- or underestimation of covariate importance to edge formation. We further demonstrate that when designing experiments on networks, if outcomes of interest are community driven (e.g. differential response to a treatment based on community behavior), incorporating this structure directly into the randomization procedure leads to an improvement in the ability to estimate causal effects.

### Motif estimation via subgraph sampling: the fourth-moment phenomenon

Bhaswar Bhattacharya (University of Pennsylvania)

7
Network sampling has emerged as an indispensable tool for understanding features of large-scale complex networks where it is practically impossible to search/query over all the nodes. Examples include social networks, biological networks, internet and communication networks, and socio-economic networks, among others. In this talk we will discuss a unified framework for statistical inference for counting motifs, such as edges, triangles, and wedges, in the widely used subgraph sampling model. In particular, we will provide precise conditions for the consistency and the asymptotic normality of the natural Horvitz-Thompson (HT) estimator, which can be used for constructing confidence intervals and hypothesis testing for the motif counts. As a consequence, an interesting fourth-moment phenomena for the asymptotic normality of the HT estimator and connections to fundamental results in random graph theory will emerge.

### Q&A for Invited Session 28

0
This talk does not have an abstract.

###### Session Chair

Betsy Ogburn (Johns Hopkins University)

Invited 31

## Information Theory and Concentration Inequalities (Organizer: Chandra Nair)

Conference
9:30 PM — 10:00 PM KST
Local
Jul 20 Tue, 8:30 AM — 9:00 AM EDT

### Algorithmic optimal transport in Euclidean spaces

Salman Beigi (Institute for Research in Fundamental Sciences (IPM))

4
Transportation cost inequalities in product spaces put an upper bound on the distance that a random point in the space should traverse in order to reach a point in a given target subset of the space. The main question in this talk is whether given the random starting point, the target point can be found algorithmically. This is a hard problem in general and whose answer depends on the underlying product space and its metric. In this talk after motivating this problem via applications in learning theory, answers to this question are given for Euclidean spaces. A main tool in the design and analysis of our algorithm in the tensorization property of transportation cost inequalities.

This talk is based on a joint work with Omid Etesami and Amin Gohari.

### Entropy bounds for discrete log-concave distributions

Sergey Bobkov (University of Minnesota)

3
We will be discussing two-sided bounds for concentration functions and Renyi entropies in the class of discrete log-concave probability distributions. They are used to derive certain variants of the entropy power inequalities.

The talk is based on a joint work with Arnaud Marsiglietti and James Melbourne.

### Entropy and convex geometry

Tomasz Tkocz (Carnegie Mellon University)

3
I shall survey several problems emerging from the interplay between convex geometry and information theory, pertaining mainly to reverse entropy power inequalities.

(Based mainly on joint works with Ball, Madiman, Melbourne, Nayar.)

### Q&A for Invited Session 31

0
This talk does not have an abstract.

###### Session Chair

Chandra Nair (Chinese University of Hong Kong)