Contributed 03

## Numerical Study of Stochastic Processes / Stochastic Interacting Systems

Conference
10:30 PM — 11:00 PM KST
Local
Jul 21 Wed, 9:30 AM — 10:00 AM EDT

### Splitting methods for SDEs with locally Lipschitz drift. An illustration on the FitzHugh-Nagumo model

Massimiliano Tamborrino (University of Warwick)

3
In this talk, we construct and analyse explicit numerical splitting methods for a class of semilinear stochastic differential equations (SDEs) with additive noise, where the drift is allowed to grow polynomially and satis?es a global one-sided Lipschitz condition. The methods are proved to be mean-square convergent of order 1 and to preserve important structural properties of the SDE. In particular, first, they are hypoelliptic in every iteration step. Second, they are geometrically ergodic and have asymptotically bounded second moments. Third, they preserve oscillatory dynamics, such as amplitudes, frequencies and phases of oscillations, even for large time steps. Our results are illustrated on the stochastic FitzHugh-Nagumo model (a well-known neuronal model describing the generation of spikes of single neurons at the intracellular level) and compared with known mean-square convergent tamed/truncated variants of the Euler-Maruyama method. The capability of the proposed splitting methods to preserve the aforementioned properties makes them applicable within different statistical inference procedures. In contrast, known Euler-Maruyama type methods commonly fail in preserving such properties, yielding ill-conditioned likelihood-based estimation tools or computationally infeasible simulation-based inference algorithms.

### Simulation methods for trawl processes

Dan Leonte (Imperial College London)

3
Trawl processes are continuous-time, stationary and infinitely divisible processes which can describe a wide range of possible serial correlation patterns in data. This talk introduces a new algorithm for the efficient simulation of monotonic trawl processes. The algorithm accommodates any monotonic trawl shape and any infinitely divisible distribution described via the Lévy seed, requiring only access to samples from the distribution of the Lévy seed. Further, the computational complexity does not scale with the number of spatial dimensions of the trawl. We describe how the above method can be generalized to a simulation scheme for monotonic ambit fields via Monte Carlo methods.

### Stochastic optimal control of SDEs and importance sampling

Han Cheng Lie (University of Potsdam)

2
In applications that involve rare events, a common problem is to estimate the statistics of a functional with respect to a reference measure, where the reference measure is the law of the solution to a specific SDE. The presence of rare events motivates the approach of importance sampling by the change of drift technique. This leads to a stochastic optimal control problem, where the objective consists in the sum of the expectation of the functional of interest and a regularisation term that is proportional to the relative entropy or Kullback-Leibler divergence between the reference measure and the importance sampling measure. We analyse a class of gradient-based numerical methods for solving these stochastic optimal control problems, by computing derivatives of the individual terms in the objective, and by using this derivative information to analyse the convexity properties of the terms in the objective.

### Opinion dynamics with Lotka-Volterra type interactions

Michele Aleandri (Libera Università Internazionale degli Studi Sociali)

1
We investigate a class of models for opinion dynamics in a population with two interacting families of individuals. Each family has an intrinsic mean field “Voter-like” dynamics which is influenced by interaction with the other family. The interaction terms describe a cooperative/conformist or competitive/nonconformist attitude of one family with respect to the other. We prove chaos propagation, ie, we show that on any time interval [0,T] , as the size of the system goes to infinity, each individual behaves independently of the others with transition rates driven by a macroscopic equation. We focus in particular on models with Lotka-Volterra type interactions, ie, models with cooperative vs. competitive families. For these models, although the microscopic system is driven as to consensus within each family, a periodic behaviour arises in the macroscopic scale. In order to describe fluctuations between the limiting periodic orbits, we identify a slow variable in the microscopic system and, through an averaging principle, we find a diffusion which describes the macroscopic dynamics of such variable on a larger time scale.

### Q&A for Contributed Session 03

0
This talk does not have an abstract.

###### Session Chair

Kyung-Youn Kim (National Chengchi University)

Contributed 08

## Study of Various Distributions

Conference
10:30 PM — 11:00 PM KST
Local
Jul 21 Wed, 9:30 AM — 10:00 AM EDT

### Orlicz norm and concentration inequalities for beta-heavy tailed distributions

Emmanuel Gobet (Ecole Polytechnique)

2
Understanding how sample statistical fluctuations impact prediction errors is crucial in learning algorithms. This is typically made by quantifying the probability that a sum of random variables deviates from its expectation by a certain threshold. The case of sub-Gaussian, or the sub-exponential random variables as well as the case of alpha-exponential tails have been largely covered by the literature (for example, via Bennett inequality and via Bernstein inequality...). In this work we focus on situations where the distributions have long tail (like log-normal or log-gamma distributions). In this setting, we establish a new Talagrand-type inequality about the Orlicz norm of the sum of independent random variables of this type, and some maximal inequality. The concentration inequalities then follow.

### The Dickman-Goncharov distribution

Vladimir Panov (National Research University Higher School of Economics)

3
In the 1930s and 40s, one and the same delay differential equation appeared in papers by two mathematicians, Karl Dickman and Vasily Goncharov, who dealt with completely different problems. Dickman investigated the limit value of the number of natural numbers free of large prime factors, while Goncharov examined the asymptotics of the maximum cycle length in decompositions of random permutations. The equation obtained in these papers defines, under a certain initial condition, the density of a probability distribution now called the Dickman-Goncharov distribution (this term was first proposed by A.Vershik in 1986). Recently, a number of completely new applications of the Dickman-Goncharov distribution have appeared in mathematics (random walks on solvable groups, random graph theory, and so on) and also in biology (models of growth and evolution of unicellular populations), finance (theory of extreme phenomena in finance and insurance), physics (the model of random energy levels), and other fields. Despite the extensive scope of applications of this distribution and of more general but related models, all the mathematical aspects of this topic (for example, infinite divisibility and absolute continuity) are little known even to specialists in limit theorems. My talk is mainly based on our survey [Molchanov S., Panov V. The Dickman-Goncharov distribution. Russian Mathematical Surveys. 2020. Vol. 75. No. 6. P. 1089-1132], which is intended to fill this gap. I'm going also to discuss several new results for the generalised Dickman-Goncharov distribution, which in the discrete case are closely related to the solution of the well-known Erdos problem for Bernoulli convolutions.

### Continuous scaled phase-type distributions

Jorge Yslas (University of Bern)

2
In this talk, we study random variables characterized as the product of phase-type distributions and continuous random variables. Under this construction, one can obtain closed-form formulas for the different functionals of the resulting models. We provide new results regarding the tail behavior of these distributions and show how an EM algorithm can be employed for maximum-likelihood estimation. Finally, we present several numerical examples with real insurance data sets.

### Q&A for Contributed Session 08

0
This talk does not have an abstract.

###### Session Chair

Gunwoong Park (University of Seoul)

Contributed 12

## Optimal Transport

Conference
10:30 PM — 11:00 PM KST
Local
Jul 21 Wed, 9:30 AM — 10:00 AM EDT

### Stochastic-uniform-approximations of Wasserstein barycenters

Florian Heinemann (Georg-August-University Göttingen)

3
Recently, optimal transport and more specifically the Wasserstein distance, have achieved renewed interested as they have been recognized as attractive tools in data analysis. Consequently, this also lead to an increasing interest in Fr_chet means, or barycenters, with respect to that distance. These, so called, Wasserstein barycenters offer favorable geometric properties which lend itself well to many applications. However, even more than usual optimal transport, the barycenter problem suffers from a significant computational cost. To alleviate this issue, we propose a hybrid resampling method to approximate finitely supported Wasserstein barycenters on large-scale datasets, which can be combined with any exact solver. Nonasymptotic bounds on the expected error of the objective value as well as the barycenters themselves allow to calibrate computational cost and statistical accuracy. The rate of these upper bounds is shown to be optimal and independent of the underlying dimension, which appears only in the constants. Using a simple modification of the subgradient descent algorithm of Cuturi and Doucet, we showcase the applicability of our method on a myriad of simulated datasets, as well as a real-data example which are out of reach for state of the art algorithms for computing Wasserstein barycenters.
This is joint work with Axel Munk and Yoav Zemel.

### Measuring dependence between random vectors via optimal transport

Johan Segers (Université catholique de Louvain)

1
To quantify the dependence between two random vectors of possibly different dimensions, we propose to rely on the properties of the 2-Wasserstein distance. We first propose two coefficients that are based on the Wasserstein distance between the actual distribution and a reference distribution with independent components. The coefficients are normalized to take values between 0 and 1, where 1 represents the maximal amount of dependence possible given the two multivariate margins. We then make a quasi-Gaussian assumption that yields two additional coefficients rooted in the same ideas as the first two. These different coefficients are more amenable for distributional results and admit attractive formulas in terms of the joint covariance or correlation matrix. Furthermore, maximal dependence is proved to occur at the covariance matrix with minimal von Neumann entropy given the covariance matrices of the two multivariate margins. This result also helps us revisit the RV coefficient by proposing a sharper normalisation. The two coefficients based on the quasi-Gaussian approach can be estimated easily via the empirical covariance matrix. The estimators are asymptotically normal and their asymptotic variances are explicit functions of the covariance matrix, which can thus be estimated consistently too. The results extend to the Gaussian copula case, in which case the estimators are rank-based. The results are illustrated through theoretical examples, Monte Carlo simulations, and a case study involving electroencephalography data.

### Transportation duality and reverse functional inequalities for Markov kernels

Nathaniel Eldredge (University of Northern Colorado)

1
Functional inequalities for a Markov semigroup $P_t$, which may express its "smoothing" properties, can also be studied in terms of the dual action of $P_t$ on the space of probability measures. These can give rise to "contraction" inequalities in terms of various distances between probability measures, such as the Wasserstein or Hellinger distances. I will discuss results for the reverse Poincar_ and reverse log Sobolev inequalities, which turn out to have dual formulations to which they are actually equivalent. Applications to Markov processes include rates of convergence to equilibrium, smoothness of transition densities, and quasi-invariance properties.

### Q&A for Contributed Session 12

0
This talk does not have an abstract.

###### Session Chair

Yeonwoo Rho (Michigan Technology University)

Contributed 27

## Machine Learning / Structural Equation

Conference
10:30 PM — 11:00 PM KST
Local
Jul 21 Wed, 9:30 AM — 10:00 AM EDT

### Replicability of statistical findings under distributional shift

Suyash Gupta (Stanford University)

5
Common statistical measures of uncertainty like p-values and confidence intervals quantify the uncertainty due to sampling, i.e. the uncertainty due to not observing the full population. In practice, populations change between locations and across time. This makes it difficult to gather knowledge that replicates across data sets. We propose a measure of uncertainty that quantifies the distributional uncertainty of a statistical estimand, that is, the sensitivity of the parameter under general distributional perturbations within a Kullback-Liebler divergence ball. We also propose measure to estimate the stability of estimators with respect to directional or variable-specific shifts. The proposed measures would help judge whether a statistical finding is replicable across data sets in the presence of distributional shifts. Further, we introduce a transfer learning technique that allows estimating statistical parameters under shifted distributions if only summary statistics about the new distribution are available. We evaluate the performance of the proposed measure in experiments and show that it can elucidate the replicability of statistical findings with respect to distributional shifts and give more accurate estimates of parameters under shifted distribution.

### Selection of graphical continuous Lyapunov models with Lasso

Philipp Dettling (Technical University of Munich)

3
In some applications, multivariate data may be thought of as cross-sectional observations of temporal processes. The recently proposed graphical continuous Lyapunov models take this perspective in the context of a multi-dimensional Ornstein-Uhlenbeck process in equilibrium. Under a stability assumption, the equilibrium covariance matrix is determined by the continuous Lyapunov equation. Given a sample covariance matrix, a very natural approach to model selection is to obtain sparse solutions to the Lyapunov equation by means of $\ell_1$-regularization. We apply the primal-dual witness technique to give probabilistic guarantees for successful support recovery in this approach. The key assumption in this guarantee is an irrepresentability condition. As we demonstrate, the irrepresentability condition may be violated in subtle ways, particularly, for models with feedback loops.

### Identifiability of linear structural equation models with homoscedastic errors using algebraic matroids

Jun Wu (Technical University of Munich)

3
We consider structural equation models (SEMs), in which every variable is a function of a subset of the other variables and a stochastic error. Each such SEM is naturally associated with a directed graph describing the relationships between variables. For the case of homoscedastic errors, recent work has proposed methods for inferring the graph from observational data under the assumption that the graph is acyclic (i.e., the SEM is recursive). In this work we study the setting of homoscedastic errors but allow the graph to be cyclic (i.e., the SEM to be non-recursive). Using an algebraic approach that compares matroids derived from the parameterizations of the models, we derive sufficient conditions for two simple directed graphs generating different distributions generically. Based on these conditions, we exhibit subclasses of graphs that allow for directed cycles, yet are generically identifiable. Our study is supplemented by computational experiments that provide a full classification of models given by simple graphs with up to 6 nodes.

### Convergence of stochastic gradient descent for Lojasiewicz-landscapes

Sebastian Kassing (Westfälische Wilhelms-Universität Münster)

4
In this talk we discuss almost sure convergence of Stochastic Gradient Descent (SGD) $(X_n)_{n \in \N}$ and Stochastic Gradient Flow (SGF) (X_t)_{t \ge 0} for a given target function $F$. First, we give a simple proof for almost sure convergence of the target value $(F(X_n))$ (resp. F(X_t)) assuming that $F$ admits a locally H_lder-continuous gradient $f=DF$. This results entails convergence of the iterates $(X_n)$ (resp. $(X_t)$) in the case where $F$ does not posses a continuum of critical points. In a general non-convex setting with $F$ possibly containing a rich set of critical points, convergence of the process itself is sometimes taken for granted, but actually is a non-trivial issue as there are solutions to the gradient flow ODE for $C^\infty$ loss functions that stay in a compact set but do not converge. Using the Lojasiewicz-inequality we derive bounds on the step-sizes and the size of the perturbation in order to guarantee convergence of $(X_n)$ (resp. $(X_t)$) for analytic target functions. Also, we derive the convergence rate under the assumptions that the loss function satisfies a particular Lojasiewicz-inequality. Last, we compare the results for SGD and SGF and discuss optimality of the assumptions.

### Q&A for Contributed Session 27

0
This talk does not have an abstract.

###### Session Chair

Yoonsuh Jung (Korea University)