Organized 03

## Gaussian Processes (Organizer: Naomi Feldheim)

Conference
9:30 PM — 10:00 PM KST
Local
Jul 21 Wed, 8:30 AM — 9:00 AM EDT

### Gaussian determinantal processes: a new model for directionality in data

Subhro Ghosh (National University of Singapore)

4
Determinantal point processes (DPPs) have recently become pop- ular tools for modeling the phenomenon of negative dependence, or repulsion, in data. However, our understanding of an analogue of a classical parametric statistical theory is rather limited for this class of models. In this work, we investigate a parametric family of Gaussian DPPs with a clearly interpretable effect of parametric modulation on the observed points. We show that parameter modulation impacts the observed points by introducing direc- tionality in their repulsion structure, and the principal directions correspond to the directions of maximal (i.e., the most long- ranged) dependency. This model readily yields a viable alternative to principal component analysis (PCA) as a dimension reduc- tion tool that favors directions along which the data are most spread out. This methodological contribution is complemented by a statistical analysis of a spiked model similar to that employed for covariance matrices as a framework to study PCA. These theoretical investigations unveil intriguing questions for further examination in random matrix theory, stochastic geometry, and related topics.

Based on joint work with Philippe Rigollet.

### Persistence exponents of Gaussian stationary functions

Ohad Noy Feldheim (Hebrew University of Jerusalem)

3
Let $f:R \to R$ be a Gaussian stationary process, that is, a random function, invariant to real shifts, whose marginals have multi-normal distribution. Persistence is the event that the process remains positive over the interval [0,T]. The asymptotics of this quantity as T tends to infinity has been long studied since the early 50’s with motivation stemming from Probability theory, Physics and Electric Engineering. In recent years, it has been discovered that persistence is best characterized in spectral terms. This view was used to describe the decay rate of persistence probability (up to a constant in the exponent). In this work we take this study one step further, showing mild conditions for the existence of persistence exponents, that is, C such that the probability of persistence on [0,T] is $e^{-CT(1+o(1)}$. This we obtain by establishing an array of continuity properties of the persistence probability and relating the problem to small ball exponents. In particular, we show that the persistence exponent is independent from the singular component of the spectral measure away from the origin.

Joint work with N. Feldheim and S. Mukherjee.

### Connectivity of the excursion sets of Gaussian fields with long-range correlations

Stephen Muirhead (University of Melbourne)

4
In recent years the global connectivity of the excursion sets of smooth Gaussian fields with rapidly decaying correlations has been fairly well understood (at least in the case of positively-correlated fields), and the general picture that emerges is that the connectivity undergoes a phase transition which is analogous to that of Bernoulli percolation. On the other hand, if the fields have long-range correlations then they are believed to lie outside the Bernoulli percolation universality class, with different scaling limits and critical exponents. The behaviour of the connectivity is not well-understood in this regime, and in this talk I will present some recent results and conjectures that shed some light on the behaviour.

### Overcrowding estimates for the nodal volume of stationary Gaussian processes on R^d

Lakshmi Priya (Indian Institute of Science)

3
We consider centered stationary Gaussian processes (SGPs) on Euclidean spaces R^d and study an aspect of their nodal set: for T>0, we study the nodal volume in [0,T]^d. In earlier studies, under varying assumptions on the spectral measures of SGPs, the following statistics were obtained for the nodal volume in [0,T]^d: expectation, variance asymptotics, CLT, exponential concentration (only for d=1), and finiteness of moments.

We study the unlikely event of overcrowding of the nodal set in [0,T]^d; this is the event that the volume of the nodal set in [0,T]^d is much larger than its expected value. Under some mild assumptions on the spectral measure, we obtain estimates for the overcrowding event's probability. We first get overcrowding estimates for the zero count of SGPs on R. In higher dimensions, we consider Crofton's formula which gives the volume of the nodal set in terms of the number of intersections of the nodal set with all lines in R^d. We discretise this formula to get a more workable version of it; we use this and the ideas used to obtain the overcrowding estimates in one dimension to get the overcrowding estimates in higher dimensions.

### Q&A for Organized Contributed Session 03

0
This talk does not have an abstract.

###### Session Chair

Naomi Feldheim (Bar-Ilan University)

Organized 20

## Theories and Applications for Complex Data Analysis (Organizer: Arlene K.H. Kim)

Conference
9:30 PM — 10:00 PM KST
Local
Jul 21 Wed, 8:30 AM — 9:00 AM EDT

### Partly interval-censored rank regression

Sangbum Choi (Korea University)

4
This paper studies estimation of the semiparametric accelerated failure time model for double and partly interval-censored data. Gehan-type weighted estimating function is constructed by contrasting comparable rank cases under interval-censoring. An extension to the general class of log-rank estimating functions can also be investigated, along with an efficient variance estimation procedure. Asymptotic behaviors of the proposed estimator are established under mild conditions by using empirical processes theory. Simulation studies demonstrate our method works very well with practical size of samples. Two data examples are given to illustrate the practical usefulness of our method.

### Two-sample testing of high-dimensional linear regression coefficients via complementary sketching

Tengyao Wang (University College London)

6
We introduce a new method for two-sample testing of high-dimensional linear regression coefficients without assuming that those coefficients are individually estimable. The procedure works by first projecting the matrices of covariates and response vectors along directions that are complementary in sign in a subset of the coordinates, a process which we call 'complementary sketching'. The resulting projected covariates and responses are aggregated to form two test statistics, which are shown to have essentially optimal asymptotic power under a Gaussian design when the difference between the two regression coefficients is sparse and dense respectively. Simulations confirm that our methods perform well in a broad class of settings.

### Optimal rates for independence testing via U-statistic permutation tests

Tom Berrett (University of Warwick)

4
Independence testing is one of the most well-studied problems in statistics, and the use of procedures such as the chi-squared test is ubiquitous in the sciences. While tests have traditionally been calibrated through asymptotic theory, permutation tests are experiencing a growth in popularity due to their simplicity and exact Type I error control. In this talk I will present new, finite-sample results on the power of a new class of permutation tests, which show that their power is optimal in many interesting settings, including those with discrete, continuous, and functional data. A simulation study shows that our test for discrete data can significantly outperform the chi-squared for natural data-generating distributions. Defining a natural measure of dependence $D(f)$ to be the squared $L^2$-distance between a joint density $f$ and the product of its marginals, we first show that there is generally no valid test of independence that is uniformly consistent against alternatives of the form $\{f: D(f) \geq \rho^2 \}$. Motivated by this observation, we restrict attention to alternatives that satisfy additional Sobolev-type smoothness constraints, and consider as a test statistic a U-statistic estimator of $D(f)$. Using novel techniques for studying the behaviour of U-statistics calculated on permuted data sets, we prove that our tests can be minimax optimal. Finally, based on new normal approximations in the Wasserstein distance for such permuted statistics, we also provide an approximation to the power function of our permutation test in a canonical example, which offers several additional insights.

This is joint work with Ioannis Kontoyiannis and Richard Samworth.

### Empirical Bayes PCA in high dimensions

Zhou Fan (Yale University)

4
When the dimension of data is comparable to or larger than the number of data samples, Principal Components Analysis (PCA) may exhibit problematic high-dimensional noise. In this work, we propose an Empirical Bayes PCA method that reduces this noise by estimating a joint prior distribution for the principal components. EB-PCA is based on the classical Kiefer-Wolfowitz nonparametric MLE for empirical Bayes estimation, distributional results derived from random matrix theory for the sample PCs, and iterative refinement using an Approximate Message Passing (AMP) algorithm. In theoretical “spiked” models, EB-PCA achieves Bayes-optimal estimation accuracy in the same settings as an oracle Bayes AMP procedure that knows the true priors. Empirically, EB-PCA significantly improves over PCA when there is strong prior structure, both in simulation and on quantitative benchmarks constructed from the 1000 Genomes Project and the International HapMap Project. An illustration is presented for analysis of gene expression data obtained by single-cell RNA-seq.

### Q&A for Organized Contributed Session 20

0
This talk does not have an abstract.

###### Session Chair

Arlene K.H. Kim (Korea University)

Made with in Toronto · Privacy Policy · © 2021 Duetone Corp.