10th World Congress in Probability and Statistics

Poster Session

Poster II-1

Poster Session II-1

11:30 AM — 12:00 PM KST
Jul 20 Tue, 10:30 PM — 11:00 PM EDT

Nonconstant error variance in generalized propensity score model

Doyoung Kim (Sungkyunkwan University)

In observational study, the most salient challenge is to adjust for confounders to mimic randomized experiment. In the setting of more than two treatment levels, several generalized propensity score (GPS) models have been proposed to balance covariates among treatment groups. Those models assume some parametric forms for treatment variable distributions especially with constant variance assumption. With the existence of heteroskedasticity, the constant variance assumption might affect the existing propensity score methods and the causal effect of interest. In this paper, we propose a novel GPS method to handle non-constant variance in the treatment model by extending Xiao et al. (2020) with weighted least squares method. We conduct a set simulation studies and show that the proposed method outperforms in terms of covariate balance and low bias in causal effect estimates.

Causal mediation analysis with multiple mediators of general structures

Youngho Bae (Sungkyunkwan University)

In assessing causal mediation effects, a challenge is that there can be more than one mediator on pathways from treatment to outcome. More precisely, we do not know exactly how many mediators are in the causal path and how they relate to each other. A few approaches have been proposed to estimate direct and indirect effects in the presence of two causally independent or dependent mediators. However, those methods cannot be generalized to settings of more than two mediators where causally independent and dependent mediators coexist. We propose a novel approach to identify direct and indirect effects under a general situation of multiple mediators: two causally dependent mediators (V,W) and one causally independent mediator (M). With our proposed sequential ignorability assumption, the overall treatment effect can be decomposed into direct and mediator-specific indirect effects. A sensitivity analysis strategy is developed for testing the proposed identifying assumptions. We can try to apply this method to the pollination data. In other words, we may use this approach to estimate the effect of a particular emission control technology, that installed on power plants, on ambient pollution where power plant emissions are potential mediators.

A fuzzy clustering ensemble based Mapper algorithm

SungJin Kang (Chung-Ang University)

Mapper is a popular topological data analysis method to analyze structure of the complex high-dimensional dataset.
Since Mapper algorithm can be applied to the clustering and feature selection with visualization, it is used in various fields such as biology, chemistry, etc. However, there are some resolution parameters to be chosen before applying the Mapper algorithm, and the results are sensitive to these selection. In this paper, we focus on the selection of the two resolution parameters, the number of intervals, and the overlapping percentage. We propose a new parameter selection method in Mapper based on ensemble technique. We generate multiple Mapper results under various parameters, and apply the fuzzy clustering ensemble method to combine the results. Three real data are considered to evaluate mapper algorithms including proposed one, and the results demonstrate the superiority of the proposed ensemble Mapper method.

Analysis of the association between suicide attempts and meteorological factors

Seunghyeon Kim (Chonnam National University)

Several studies indicate that there is an association between suicide and meteorological factors, particularly an increase in ambient temperature increases the risk of suicide. Although suicide attempts are highly likely to lead to suicide in the future, research on the relationship between suicide attempts and meteorological factors is not done much. We evaluated the association between suicide attempts and meteorological factors and examined gender and age differences. Method: We studied 30,012 people who attempted suicide and hospitalized in the emergency room of medical institutions located in Seoul from January 1, 2014, to December 31, 2018. This information was provided by the National Emergency Department Information System data. Seven meteorological factors were studied: daily lowest temperature, highest temperature, average temperature, daily temperature difference, average relative humidity, sunshine duration, and average cloud cover in Seoul during the same period. Meteorological factors were categorized, and the daily Age-standardized Suicide Attempt rate (per 100,000) (ASDAR) was defined for each category. Subgroup analysis by gender and age was done to explore the association between meteorological factors and suicide attempts. From 2014 to 2018, the ASDAR was 61.3. The ASDAR for women was 69.3 and for men was 52.8, the highest suicide attempts by age in their 20s. In terms of the seven meteorological factors, suicide attempts increased as the lowest temperature, the highest temperature, the average temperature, and the relative humidity increased. Both genders showed an increase in suicide attempts as the lowest, the highest, the average temperature, and the relative humidity increased and showed the same trend in all ages except for women in their 20s. We found that the risk of suicide attempts increases as temperature and relative humidity increase. These results suggest that exposure to high temperatures can be a suicide attempt-inducing factor.

Spectral clustering with the Wasserstein distance and its application

SangHun Jeong (Pusan National University)

The advance of modern automatic devices can produce a massive number of samples from the population of the individual subject. Although this development allows us to access the entire distributional structure for the population of each individual subject, traditional approaches tend to focus on detecting the local feature to recognize the pattern of the data. In this project, we consider the pattern recognition problem classifying the subject specific distributions into a few categories after estimating the subject specific distributions. Suggested approach consists of three stages procedure including the probability density estimation, the dissimilarity computation, and the clustering computation. Specifically, we use the kernel density estimator for the subject specific distribution in the first stage. Then, we focus on the Wasserstein distance to account for the dissimilarity between these distributions while using the optimal transport map for distance. Finally, we use such a dissimilar measure to figure out the structure of the Laplacian graph and conduct the spectral clustering to deal with these distributions contained not in the Euclidean space but some nonlinear space. We will demonstrate the benefit of the spectral clustering with the Wasserstein distance through simulation studies, applying our suggested method to the real data.

Robust covariance estimation for partially observed functional data

Hyunsung Kim (Chung-Ang University)

In recent years, applications have emerged that produce partially observed functional data, where each trajectory is collected over individual-specific subinterval(s) within the whole domain of interest. Robustness to atypical partially observed curves in the application is a practical concern, especially in the dimension reduction step through functional principal component analysis (FPCA). Existing studies implemented FPCA by applying smoothing techniques to estimate mean and covariance functions under irregular functional data structure, however, its estimation is easily affected by outlying curves with heavy-tailed noises or spikes. In this study, we investigate the robust method for the covariance estimation by using bounded loss function, and it enables us to obtain robust functional principal components under partially observed functional data. Using the functional principal scores, we reconstruct the missing parts of trajectories. Numerical experiments show that our method provides a stable and robust estimation when the data contain the atypical curves.

Fast Bayesian functional regression for non-Gaussian spatial data

Yeo Jin Jung (Yonsei University)

Functional generalized linear models (FGLM) have been widely used to study the relations between non-Gaussian response and functional covariates. However, most existing works assume independence among observations and therefore have limited applicability on correlated data. A particularly important example is functional data with spatial correlation, where we observe functions over spatial domains, such as the age population curve or temperature curve at each areal unit. In this paper, we extend FGLM by incorporating spatial random effects. However, such models have computational and inferential challenges. The high-dimensional spatial random effects cause the slow mixing of Markov chain Monte Carlo (MCMC) algorithms. Furthermore, spatial confounding can lead to bias in parameter estimates and inflate their variances. To address these issues, we propose an efficient Bayesian method using a sparse reparameterization of high-dimensional random effects. Furthermore, we study an often-overlooked challenge in functional spatial regression: practical issues in obtaining credible bands of functional parameters and assessing whether they provide nominal coverage. We apply our methods to simulated and real data examples, including malaria incidence data and US COVID-19 data. The proposed method is fast while providing accurate functional estimates.

Poster II-2

Poster Session II-2

10:30 PM — 11:00 PM KST
Jul 21 Wed, 9:30 AM — 10:00 AM EDT

Busemann process and semi-infinite geodesics in Brownian last-passage percolation

Evan Sorensen (University of Wisconsin-Madison)

We prove the existence of semi-infinite geodesics for Brownian last-passage percolation (BLPP). Specifically, on a single event of probability one, there exist semi-infinite geodesics, started from every space-time point and traveling in every asymptotic direction. Properties of these geodesics include uniqueness for a fixed initial point and direction, non-uniqueness for fixed direction but random initial points, and coalescence of all geodesics traveling in a common, fixed direction. The semi-infinite geodesics are constructed from Busemann functions, whose existence was proved for fixed initial points and directions by Alberts, Rassoul-Agha, and Simper. We extend their result to a global process of Busemann functions and derive the joint distribution of Busemann functions for varying directions. From this joint distribution, we prove results about the geometry of the semi-infinite geodesics. More specifically, there exists a Hausdorff dimension 1/2 set of initial points, and to each point an associated direction, such that there are two semi-infinite geodesics in that direction whose only shared point is the initial point. Joint work with Timo Sepp_l_inen.

Application of kernel mean embeddings to functional data

George Wynne (Imperial College London)

Kernel mean embeddings (KMEs) have enjoyed wide success in statistical machine learning over the past fifteen years. They offer a non-parametric method of reasoning with probability measures by mapping measures into a reproducing kernel Hilbert space. Much of the existing theory and practice has revolved around Euclidean data whereas functional data has received very little investigation. Likewise, in functional data analysis (FDA) the technique of KMEs has not been explored. This work proposes to bridge this gap in theory and practice. KMEs offer an alternative paradigm than the common practice in FDA of projecting data to finite dimensions. The KME framework can handle infinite dimensional input spaces, offers an elegant theory and leverages the spectral structure of functional data. Empirically, KMEs provide competitive performance against existing functional two-sample and goodness-of-fit tests. Finally, we discuss connections to empirical characteristic function based testing and functional depth techniques currently used in FDA.

SIR-based examination of the policy effects on the COVID-19 spread in U.S.

David Han (The University of Texas at San Antonio)

Since the global outbreak of the novel COVID-19, many research groups have studied the epidemiology of the virus for short-term forecasts and to formulate the effective disease containment and mitigation strategies. The major challenge lies in the proper assessment of epidemiological parameters over time and of how they are modulated by the effect of any publicly announced interventions. Here we attempt to examine and quantify the effects of various (legal) policies/orders in place to mandate social distancing and to flatten the curve in each of the U.S. states. Through Bayesian inference on the stochastic SIR models of the virus spread, the effectiveness of each policy on reducing the magnitude of the growth rate of new infections is investigated statistically. This will inform the public and policymakers, and help them understand the most effective actions to flght against the current and future pandemics. It will aid the policy-makers to respond more rapidly (select, tighten, and/or loosen appropriate measures) to stop/mitigate the pandemic early on.

Cross-validation confidence intervals for test error

Alexandre Bayle (Harvard University)

This work develops central limit theorems for cross-validation and consistent estimators of its asymptotic variance under weak stability conditions on the learning algorithm. Together, these results provide practical, asymptotically-exact confidence intervals for k-fold test error and valid, powerful hypothesis tests of whether one learning algorithm has smaller k-fold test error than another. These results are also the first of their kind for the popular choice of leave-one-out cross-validation. In our real-data experiments with diverse learning algorithms, the resulting intervals and tests outperform the most popular alternative methods from the literature.

Comparison of quantile regression curves under different settings with censored data

Lorenzo Tedesco (Katholieke Universiteit Leuven)

The poster presents a new nonparametric test for conditional quantile curves equality when the outcome of interest, typically a duration, is subjected to right censoring. The test is based on a quantile regression estimation models and do not rely on distributional assumptions. Moreover, the proposed method holds for both dependent and independent samples. Consistency of the test and asymptotic results are also provided together with a bootstrap procedure which is intended to avoid density estimations in case of small sample sizes. The poster also includes a comparison with other methods and examples of application for both dependent and independent setting.

Made with in Toronto · Privacy Policy · © 2021 Duetone Corp.