Contributed 05

## Potential Theory in Probability Theory

Conference
10:30 PM — 11:00 PM KST
Local
Jul 19 Mon, 9:30 AM — 10:00 AM EDT

### Heat contents for time-changed killed Brownian motions

Hyunchul Park (State University of New York at New Paltz)

11
In this talk, we study various heat content with respect to time-changed killed Brownian motions. The time-change is given by either a large class of subordinators or inverse of the subordinators. When the time-change is given by inverse stable subordinators and the domain is smooth, we show that the spectral heat content has a complete asymptotic expansion which is similar to the case of Brownian motions.

This is a joint work with Kei Kobayashi (Fordham University, USA).

### Heat kernel bounds for nonlocal operators with singular kernels

Kyung-Youn Kim (National Chengchi University)

10
We prove sharp two-sided bounds of the fundamental solution for integro-differential operators of order $\alpha$ in (0,2) that generate a d-dimensional Markov process. The corresponding Dirichlet form is comparable to that of d-independent copies of one-dimensional jump processes, i.e., the jumping measure is singular with respect to the d-dimensional Lebesgue measure.

This is joint work with Moritz Kassmann and Takashi Kumagai.

### The full characterization of the expected supremum of infinitely divisible processes

Rafal Martynek (University of Warsaw)

5
In this talk I will present the positive answer to the conjecture posed by M. Talagrand in "Regularity of Infinitely Divisible Processes" (1993) concerning two-sided bound of the expected suprema of such processes which does not require any additional assumption on the Levy measure associated with the process. It states that any infinitely divisible process can be decomposed into the part whose size is explained by the chaining method and the other which is the positive process.
The result relies highly on the Bednorz-Latała theorem characterizing suprema of Bernoulli processes and its recent reformulation due to Talagrand together with series representation due to Rosiński.
I will also describe how the method of the proof leads to the positive settlement of two others conjectures of Talagrand. Namely, the Generalized Bernoulli Conjecture concerning selector processes and analogous result for empirical processes. These three results completes an important chapter of Talagrand's program of understanding the suprema of random processes through chaining.
The part of the talk concerning infinitely divisible processes is based on the joint work with W. Bednorz, while the part about selector and empirical processes was developed by M. Talagrand after we communicated him the initial result.

### The e-property of asymptotically stable Markov-Feller operators

Hanna Wojewódka-Ściążko (University of Silesia in Katowice)

4
We say that a regular Markov operator $P$, with dual operator $U$, has the e-property in the set $R$ of functions if the family of iterates $(U^nf)_{n\in\mathbb{N}}$ is equicontinuous for all $f\in R$. Most often, $R$ is assumed to be the set of all bounded Lipschitz functions, although it can be also the set of all bounded continuous functions, as in our paper [R. Kukulski and H. Wojewódka-Ściążko, Colloq. Math. 165, 269-283 (2021)]. In [S. Hille et al., Comptes Rendus Math. 355, 1247-1251 (2017)] it is shown that any asymptotically stable Markov-Feller operator with an invariant measure such that the interior of its support is non-empty has the e-property. We generalize this result. To be more precise, we prove that any asymptotically stable Markov-Feller operator has the e-property off a meagre set. Moreover, we propose an equivalent condition for the e-property of asymptotically stable Markov-Feller operators. Namely, we prove that an asymptotically stable Markov-Feller operator has the e-property if and only if it has the e-property at least at one point of the support of its invariant measure. Our results then naturally imply the main theorem of [S. Hille et al., Comptes Rendus Math. 355, 1247-1251 (2017)]. Indeed, if the interior of the support of an invariant measure of a Markov-Feller operator $P$ is non-empty, then there exists at least one point in this support at which $P$ has the e-property. This, in turn, implies that $P$ has the e-property at any point. We also provide the example of an asymptotically stable Markov-Feller operator such that the set of points at which the operator fails the e-property is dense. The example shows that the main result of our paper is tight.

### Q&A for Contributed Session 05

0
This talk does not have an abstract.

###### Session Chair

Panki Kim (Seoul National University)

Contributed 15

Conference
10:30 PM — 11:00 PM KST
Local
Jul 19 Mon, 9:30 AM — 10:00 AM EDT

### A multi-species Ehrenfest process and its diffusion approximation

Serena Spina (University of Salerno)

4
The celebrated Ehrenfest model is a Markov chain proposed to describe the diffusion of gas molecules in a container. Our aim is to generalize this model by considering a multi-type Ehrenfest process on a star graph. The considered model results a continuous-time stochastic process describing the dynamics of an evolutionary system that can accomodate N particles and is characterized by d evolution classes, represented with d semiaxis joined at the origin. The evolution of the stochastic process over each line evolves as a classical Ehrenfest model with suitable linear transition rates, moreover, after visiting the origin, the process can move toward any semiaxis with different rates, depending on the elements of a stochastic matrix. We investigate the dynamics of this process making use of a probability generating function-based approach. This leads to the determination of the transient transition probabilities (in closed form for a particular choice of the parameter), and of the asymptotic distribution, in general. In addition, we obtain some results on the asymptotic mean, variance, coefficient of variation for the process. We also consider a continuous approximation of the process, which leads to an Ornstein-Uhlenbeck diffusion process evolving on a spider-shaped continuous state space formed by d semiaxis of infinite length joined at the origin; the origin of the given domain constitutes the equilibrium point of the system. We determine the expression of the asymptotic probability distribution for each ray of the spider. Finally, we compare the discrete process with the diffusion process in order to show the goodness of the continuous approximation.

### Limit theorems for the realised semicovariances of multivariate Brownian semistationary processes

Yuan Li (Imperial College London)

4
In this talk we will introduce the realised semicovariance, which is resulted from the decomposition of the realised covariance matrix into components based on the signs of the returns, and study its in-fill asymptotic properties of multivariate Brownian semistationary (BSS) processes. The realised semicovariance is originally proposed in Bollerslev et al. (2020, Econometrica) where they worked on semimartingale settings. We extend their work to BSS processes, which are not necessarily semimartingales. More precisely, a weak convergence in the space of càdlàg functions endowed with the Skorohod topology for the realised semicovariance of a general Gaussian process with stationary increments is proved first. The methods are based on quantitative Breuer-Major theorems and on moment bound for sums of products of Gaussian vector's functions. Furthermore, we demonstrate the corresponding stable convergence. Finally, a weak law of large numbers and a central limit theorem for the realised semicovariance of multivariate BSS processes are established. These results extend the limit theorems for the realised covariation to a version for the non-linear functionals.

### A Yaglom type asymptotic result for subcritical branching Brownian motion with absorption

Jiaqi Liu (University of California, San Diego)

5
In this talk, we will consider a slightly subcritical branching Brownian motion with absorption, where particles move as Brownian motion with drift $-\sqrt{2+2\epsilon}$, undergo dyadic fission at rate 1, and are killed upon hitting the origin. We are interested in the asymptotic behaviors of the process conditioned on survival up to a large time t as the process approaches criticality. Results like this are called Yaglom type results. Specifically, we will talk about the long run expected number of particles conditioned on survival as the process approaches to being critical.

### Q&A for Contributed Session 15

0
This talk does not have an abstract.

###### Session Chair

Jaehun Lee (Korea Institute for Advanced Study)

Contributed 22

## Bayesian Inference

Conference
10:30 PM — 11:00 PM KST
Local
Jul 19 Mon, 9:30 AM — 10:00 AM EDT

### Bayesian and stochastic modeling of polysomnography data from children using pacifiers for improved estimation of the apnea-hypopnea index

Sujay Datta (University of Akron)

2
Polysomnography is an overnight systematic procedure to collect physiological parameters during sleep. It is considered as gold-standard for diagnosing sleep-related disorders. It takes several days to score and interpret the raw data from this study and confirm a diagnosis of, say, Obstructive Sleep Apnea (OSA) — a potentially dangerous disorder. The presence of artifacts (anomalies created by a malfunctioning sensor) makes scoring even more difficult, potentially resulting in misdiagnosis. It is common to see airflow signal artifacts in infants that use a pacifier during sleep. The act of sucking on the pacifier causes artifacts in the oro-nasal sensor (thermistor) used to monitor airflow during respiration. The resulting inaccurate scoring leads to an under-estimation of the Apnea Hypopnea Index (AHI) — the basis for a formal OSA diagnosis. So researchers are now exploring two other information sources (blood oxygen saturation readings from a pulse-oximeter and occurrence of arousal events) to supplement the artifact-corrupted thermistor data. They first look for statistical association between the thermistor and the pulse-oximeter/arousal data and then statistically predict a modified AHI score using the latter whenever the former is corrupt. To our knowledge, no attempt of statistically modeling these three data-sources to bring out their association currently exists. This project aims at developing several competing probabilistic models for these data-types and then checking how strongly they bring out the association by applying them on archived data from the Akron Children’s Hospital. These modeling approaches are a significant statistical contribution to this important medical problem. After performing some statistical tests for association, the modeling approaches will include naïve Bayes, Beta-Binomial, correlated homogeneous Poisson processes and double-chain Markov models. The resulting improvement in AHI estimates is demonstrated using data-sets from a sample of non-pacifier users after artificially discarding part of their thermistor data (as if they were artifact-corrupted).

### Asymmetric prior in wavelet shrinkage

Alex Rodrigo dos Santos Sousa (University of São Paulo)

5
In bayesian wavelet shrinkage, the already proposed priors to wavelet coefficients are assumed to be symmetric around zero. Although this assumption is reasonable in many applications, it is not general. The present paper proposes the use of an asymmetric shrinkage rule based on the discrete mixture of a point mass function at zero and an asymmetric beta distribution as prior to the wavelet coefficients in a non-parametric regression model. Statistical properties such as bias, variance, classical and bayesian risks of the associated asymmetric rule are provided and performances of the proposed rule are obtained in simulation studies involving artificial asymmetric distributed coefficients and the Donoho-Johnstone test functions. Application in a seismic real dataset is also analyzed. In general, the asymmetric shrinkage rule outperformed classical symmetric rules both in simulation and real data application.

### Semiparametric Bayesian regression analysis of multi-typed matrix-variate responses

Inkoo Lee (Rice University)

4
Complex data such as tensor and multiple types of responses can be found in dental medicine. Dental hygienists measure triple biomarkers at 28 teeth and 6 tooth-sites for each participant. These data have challenging characteristics: 1) binary and continuous responses with skewness, 2) matrix-variate responses for each biomarker have heavy tails, 3) pattern for missing teeth is not random. To circumvent these difficulties, we propose a joint model of multiple types of matrix-variate responses via latent variables. The model accommodates skewness in continuous responses. This statistical framework incorporates exponential factor copula models to capture heavy-tail dependence and asymmetry. Since the number of existing teeth presents the magnitude of periodontal disease (PD) we model the missing mechanism. Our method also guarantees posterior consistency under suitable priors. We illustrate the substantial advantages of our method over alternatives through simulation studies and the analysis of PD data.

### Bayesian phylogenetic inference of stochastic block models on infinite trees

Wenjian Liu (Queensborough Community College, City University of New York)

2
This talk involves a classification problem on a deep network, by considering a broadcasting process on an infinite communication tree, where information is transmitted from the root of the tree to all the vertices with certain probability of error. The information reconstruction problem on an infinite tree, is to collect and analyze massive data samples at the nth level of the tree to identify whether there is non-vanishing information of the root, as n goes to infinity. Its connection to the clustering problem in the setting of the stochastic block model, which has wide applications in machine learning and data mining, has been well established. For the stochastic block mode, an "information theoretically solvable but computationally hard" region, or say "hybrid-hard phase", appears whenever the reconstruction bound is not tight of the corresponding reconstruction on the tree problem. Inspired by the recently proposed $q_1+q_2$ stochastic block model, we try to extend the classical works on the Ising model and the Potts model, by studying a general model which incorporates the characteristics of both Ising and Potts through different in-community and out-community transition probabilities, and rigorously establishing the exact conditions for reconstruction.

### Order-restricted Bayesian inference for the simple step-stress accelerated life tests

David Han (The University of Texas at San Antonio)

3
In this work, we investigate the order-restricted Bayesian estimation for a simple step-stress accelerated life tests. Based on the three-parameter gamma distribution as a conditional prior, we ensure that the failure rates increase as the stress level increases. In addition, its conjugate-like structure enables us to derive the exact joint posterior distribution of the parameters without a need to perform an expensive MCMC sampling. Upon these distributional results, several Bayesian estimators for the model parameters are suggested along with their individual/joint credible intervals. Through Monte Carlo simulations, the performance of our proposed inferential methods are assessed and compared. Finally, a real engineering case study for analyzing the reliability of a solar lighting device is presented to illustrate the methods developed in this work.

### Q&A for Contributed Session 22

0
This talk does not have an abstract.

###### Session Chair

Seongil Jo (Inha University)

Contributed 33

## Novel Statistical Approaches In Genetic Association Analyses

Conference
10:30 PM — 11:00 PM KST
Local
Jul 19 Mon, 9:30 AM — 10:00 AM EDT

### An extended model for phylogenetic maximum likelihood based on discrete morphological characters

5
Maximum likelihood is a common method of estimating a phylogenetic tree based on a set of genetic data. However, models of evolution for certain types of genetic data are highly flawed in their specification, and this misspecification can have an adverse impact on phylogenetic inference. Our attention here is focused on extending an existing class of models for estimating phylogenetic trees from discrete morphological characters. The main advance of this work is a model that allows unequal equilibrium frequencies in the estimation of phylogenetic trees from discrete morphological character data using likelihood methods. Possible extensions of the proposed model will also be discussed.

### Combined linkage and association mapping integrating population-based and family-based designs using multinomial regression

Saurabh Ghosh (Indian Statistical Institute)

4
Genetic association analyses yield higher powers compared to linkage analyses in identifying chromosomal regions harboring susceptibility genes modulating complex human disorders and correlated quantitative phenotypes. However, while population-based association designs suffer from the problem of population stratification that often results in inflated type I errors, linkage designs are based on families and are protected against such inflations. The models suggested in Multiphen (O’Reilly et al., 2012) and BAMP (Majumdar et al. 2015) provide an alternative to study population-based genotype-phenotype association by exploring the dependence of genotype on phenotype instead of the naturally arising dependence of phenotype on genotype. This reversal of the regression model, while has no impact on the inference on association, provides the flexibility of incorporating multiple phenotypes without the requirement of making any a priori assumptions on the correlation structure of the vector of phenotypes. Our aim is to investigate whether family based data can be included in addition to population level data in the framework of the BAMP (Binomial regression-based Association of Multivariate Phenotypes) model so as to develop a combined test for genetic linkage and association. The family-based regression model involves the conditional distribution of identity-by-state (i.b.s.) scores on the squared sib-pair phenotype differences. However, since the marginal distribution of i.b.s. counts do not follow a Binomial distribution, we propose a Trinomial Regression model for the linkage component of our combined test. Given that the marginal distributions of the response variables in the population-based and family-based designs are different, the combined test is constructed jointly on the estimated regression parameters corresponding to the two designs. The likelihood ratio test statistic asymptotically follows a mixture of two chi-squares distributions with one and two degrees of freedom respectively under the null. We carry out extensive simulations to evaluate the power of the proposed combined test.

### An alternative to intersection-union test for the composite null hypothesis used to identify shared genetic risk of disease outcomes

Debashree Ray (Johns Hopkins University)

4
With a growing number of disease- and trait-associated genetic variants detected and replicated across genome-wide association studies (GWAS), scientists are increasingly noting the influence of individual variants on multiple seemingly unrelated traits — a phenomenon known as pleiotropy. Cross-phenotype association tests, applied on two or more traits, usually test the null hypothesis of no association of a variant with any trait. Rejection of this null can be due to association between the variant and a single trait, with no indication if the variant influences >1 trait. This problem can be formulated as a composite null hypothesis test for each variant. For two traits, a level-$\alpha$ two-parameter intersection-union test (IUT) can be used. However, for testing millions of variants at genome-wide significance threshold ($\alpha=5\times10^{-8}$), IUT is extremely conservative. In this talk, I will discuss a new statistical approach, PLACO— pleiotropic analysis under composite null hypothesis— to discover variants influencing risk of two traits using GWAS summary statistics (i.e., using estimated effect size, its standard error and p-value for each variant). PLACO uses the product of Z-statistics across two traits as test statistic for pleiotropy, the null distribution of which is derived in the form of a mixture distribution that allows for fractions of variants to be associated with none or only one of the traits. PLACO gives an approximate asymptotic p-value for association with both traits, avoiding estimation of nuisance parameters related to mixture proportions and variance components. Simulation studies demonstrate its well-controlled type I error, and massive power gain over IUT and alternative ad hoc methods typically used for testing pleiotropy. Finally, I will show application of PLACO to type 2 diabetes and prostate cancer genetics to explain their inverse association reported in many previous epidemiologic studies.

### Efficient SNP-based heritability estimation using Gaussian predictive process in large-scale cohort studies

Saonli Basu (University of Minnesota)

4
For decades, linear mixed models (LMM) have been widely used to estimate heritability in twin and family studies. Recently, with the advent of high throughput genetic data, there have been attempts to estimate heritability from genome-wide SNP data on a cohort of distantly related individuals. Fitting such an LMM in large-scale cohort studies, however, is tremendously challenging due to high dimensional linear algebraic operations. In this paper, we simplify the LMM by unifying the concept of Genetic Coalescence and Gaussian Predictive Process, and thereby greatly alleviating the computational burden. Our proposed approach PredLMM has much better computational complexity than most of the existing packages and thus, provides an efficient alternative for estimating heritability in large-scale cohort studies. We illustrate our approach with extensive simulation studies and use it to estimate the heritability of multiple quantitative traits from the UK Biobank cohort.

This is joint work with Souvik Seal, Colorado School of Public Health, and Abhirup Datta, Johns Hopkins University

### Data-adaptive groupwise test for genomic studies via the Yanai's generalized coefficient of determination

Masao Ueki (Nagasaki University)

4
In genomic studies, repeated univariate regression for each variable is utilized to screen useful variables.
However, signals jointly detectable with other variables may be overlooked. Group-wise analysis for a pre-defined group is often developed, but the power will be limited if the knowledge is insufficient. A flexible data-adaptive test procedure is thus proposed for conditional mean applicable to a variety of model sequences that bridge between low and high complexity models as in penalized regression. The test is based on the model that maximizes a generalization of the Yanai's generalized coefficient of determination by exploiting the tendency for the dimensionality to be large under the null hypothesis. The test does not require complicated null distribution computation, thereby enabling large-scale testing application. Numerical studies demonstrated that the proposed test applied to the lasso and elastic net had a high power regardless of the simulation scenarios. Applied to a group-wise analysis in real genome-wide association study data from Alzheimer's Disease Neuroimaging Initiative, the proposal gave a higher association signal than the existing methods.

### Q&A for Contributed Session 33

0
This talk does not have an abstract.

###### Session Chair

Saurabh Ghosh (Indian Statistical Institute)

Contributed 36

## Statistical Inference

Conference
10:30 PM — 11:00 PM KST
Local
Jul 19 Mon, 9:30 AM — 10:00 AM EDT

### Density deconvolution with non-standard error distributions: rates of convergence and adaptive estimation

Taeho Kim (University of Haifa)

3
It is a standard assumption in the density deconvolution problem that the characteristic function of the measurement error distribution is non-zero on the real line. While this condition is assumed in the majority of existing works on the topic, there are many problem instances of interest where it is violated. In this paper, we focus on non-standard settings where the characteristic function of the measurement errors has zeros, and study how zeros multiplicity affects the estimation accuracy. For a prototypical problem of this type, we demonstrate that the best achievable estimation accuracy is determined by the multiplicity of zeros, the rate of decay of the error characteristic function, as well as by the smoothness and the tail behavior of the estimated density. We derive lower bounds on the minimax risk and develop optimal in the minimax sense estimators. In addition, we consider the problem of adaptive estimation and propose a data-driven estimator that automatically adapts to unknown smoothness and tail behavior of the density to be estimated.

### Moments of the doubly truncated selection elliptical distributions: recurrence, existence and applications

Christian Galarza Morales (Escuela Superior Politécnica del Litoral)

3
We compute doubly truncated moments for the selection elliptical (SE) class of distributions, which includes some multivariate asymmetric versions of well-known elliptical distributions, such as, the normal, Student’s t, among others. We address the moments for doubly truncated members of this family, establishing neat formulation for high order moments as well as for its first two moments. We establish sufficient and necessary conditions for their existence. Further, we propose computational efficient methods to deal with extreme settings of the parameters, partitions with almost zero volume or no truncation. Applications and simulation studies are presented in order to illustrate the usefulness of the proposed methods.

### Characterization of probability distributions by a generalized notion of sufficiency and Fisher information

Atin Gayen (Indian Institute of Technology Palakkad)

4
The notion of sufficiency introduced by Fisher is based on the usual likelihood function. This is useful particularly when the underlying model is exponential. We propose a generalized notion of principle of sufficiency based on two generalized likelihood functions, namely Basu et al. and Jones et al. likelihood functions that arise in robust inference. We find the specific form of the family of probability distributions that have a fixed number of sufficient statistics (independent of sample size) with respect to these likelihood functions. These distributions are of power-law form and are a generalization of the exponential family. Student distributions are a special case of this family. We also extend the concept of minimal sufficiency with respect to this generalized notion and find a minimal sufficient statistic for Student distributions. We observe that the generalized estimators of parameters of Student distributions are functions of the minimal sufficient statistics derived from this generalized notion. We finally show that these estimators are also efficient in the sense that variance of each of these estimators equals the variance given by the asymptotic normality result.

### Q&A for Contributed Session 36

0
This talk does not have an abstract.

###### Session Chair

Mijeong Kim (Ewha Womans University)