10th World Congress in Probability and Statistics

Organized Contributed Session (live Q&A at Track 3, 10:30PM KST)

Organized 14

Multivariate and Object-Oriented Data Analysis (Organizer: Cheolwoo Park)

Conference
10:30 PM — 11:00 PM KST
Local
Jul 19 Mon, 9:30 AM — 10:00 AM EDT

Bayesian spatial binary regression for label fusion in structural neuroimaging

Andrew Brown (Clemson University)

2
Alzheimer's disease is a neurodegenerative condition that accelerates cognitive decline relative to normal aging. It is of critical scientific importance to gain a better understanding of early disease mechanisms in the brain to facilitate effective, targeted therapies. The volume of the hippocampus can be used as an aid to diagnosis and disease monitoring. Measuring this volume via neuroimaging is difficult since each hippocampus must either be manually identified or automatically delineated, a task referred to as segmentation. Automatic hippocampal segmentation often involves mapping a previously manually segmented image to a new brain image and propagating the labels to obtain an estimate of where each hippocampus is located in the new image. A more recent approach to this problem is to propagate labels from multiple manually segmented atlases and combine the results using a process known as label fusion. To date, most label fusion algorithms either employ voting procedures or impose prior structure and subsequently find the maximum a posteriori estimator through optimization. We propose using a fully Bayesian spatial regression model for label fusion that facilitates direct incorporation of covariate information while making accessible the entire posterior distribution. Our results suggest that incorporating tissue classification (gray matter, white matter, etc.) into the label fusion procedure can greatly improve segmentation when relatively homogeneous, healthy brains are used as atlases for diseased brains. The fully Bayesian approach also allows quantification of the associated uncertainty, information which we show can be leveraged to detect significant differences between healthy and diseased populations that would otherwise be missed.

Convex clustering analysis for histogram-valued data

Cheolwoo Park (Korea Advanced Institute of Science and Technology (KAIST))

5
In recent years, there has been increased interest in symbolic data analysis, including for exploratory analysis, supervised and unsupervised learning, time series analysis, etc. Traditional statistical approaches that are designed to analyze single-valued data are not suitable because they cannot incorporate the additional information on data structure available in symbolic data, and thus new techniques have been proposed for symbolic data to bridge this gap. In this article, we develop a regularized convex clustering approach for grouping histogram-valued data. The convex clustering is a relaxation of hierarchical clustering methods, where prototypes are grouped by having exactly the same value in each group via penalization of parameters. We apply two different distance metrics to measure (dis)similarity between histograms. Various numerical examples confirm that the proposed method shows better performance than other competitors.

A geometric mean for multivariate functional data

Juhyun Park (ENSIIE)

6
The analysis of curves has been routinely dealt with using tools from functional data analysis. However its extension to multi-dimensional curves poses a new challenge due to its inherent geometric features that are difficult to capture with the classical approaches that rely on linear approximations. We propose an alternative notion of mean that reflects shape variation of the curves. Based on a geometric representation of the curves through the Frenet-Serret ordinary differential equations, we introduce a new definition of mean curvature and mean shape through the mean ordinary differential equation. We formulate the estimation problem in a penalized regression and develop an efficient algorithm. We demonstrate our approach with both simulated data and a real data example.

A confidence region for the elastic shape mean of planar curves

Justin Strait (University of Georgia)

3
Visualization is an integral component of statistical shape analysis, where the goal is to perform inference on shapes of objects. When interested in identifying shape variation, one typically performs principal component analysis (PCA) to decompose total variation into orthogonal directions of variation. In many cases, shapes observe multiple sources of variation; using PCA to visualize requires decomposition into several plots displaying each mode of variation, without the ability to understand how these components work together. In this talk, I will discuss a constructive confidence region associated with the elastic shape mean, with a significant emphasis on producing a succinct visual summary of this region. The use of elastic shape representations allows for optimal matching of shape features, yielding more appropriate estimation of shape variation compared to other approaches within the shape analysis literature. The proposed region is demonstrated on simulated data, as well as common shapes from the MPEG-7 dataset (popular in computer vision applications).

Q&A for Organized Contributed Session 14

0
This talk does not have an abstract.

Session Chair

Cheolwoo Park (Korea Advanced Institute of Science and Technology (KAIST))

Enter Zoom
Organized 26

Recent Advances in Network Learning: Theory and Practice (Organizer: Kyoungjae Lee)

Conference
10:30 PM — 11:00 PM KST
Local
Jul 19 Mon, 9:30 AM — 10:00 AM EDT

Scalable Bayesian high-dimensional local dependence learning

Kyoungjae Lee (Inha University)

6
In this work, we propose a scalable Bayesian procedure for learning the local dependence structure in a high-dimensional model where the variables possess a natural ordering. The ordering of variables can be indexed by time, the vicinities of spatial locations, and so on, with the natural assumption that variables far apart tend to have weak correlations. Applications of such models abound in a variety of fields such as finance, genome associations analysis and spatial modeling. We adopt a flexible framework under which each variable is dependent on its neighbors or predecessors, and the neighborhood size can vary for each variable. It is of great interest to reveal this local dependence structure by estimating the covariance or precision matrix while yielding a consistent estimate of the varying neighborhood size for each variable. The existing literature on banded covariance matrix estimation, which assumes a fixed bandwidth cannot be adapted for this general setup. We employ the modified Cholesky decomposition for the precision matrix and design a flexible prior for this model through appropriate priors on the neighborhood sizes and Cholesky factors. The posterior contraction rates of the Cholesky factor are derived which are nearly or exactly minimax optimal, and our procedure leads to consistent estimates of the neighborhood size for all the variables. Another appealing feature of our procedure is its scalability to models with large numbers of variables due to efficient posterior inference without resorting to MCMC algorithms. Numerical comparisons are carried out with competitive methods, and applications are considered for some real datasets.

Fast and flexible estimation of effective migration surfaces

Wooseok Ha (University of California at Berkeley)

6
An important feature in spatial population genetic data is often “isolation-by-distance,” where genetic differentiation tends to increase as individuals become more geographically distant. Recently, Petkova et al. (2016) developed a statistical method called Estimating Effective Migration Surfaces (EEMS) for visualizing spatially heterogeneous isolation-by-distance on a geographic map. While EEMS is a powerful tool for depicting spatial population structure, it can suffer from slow runtimes. Here we develop a related method called Fast Estimation of Effective Migration Surfaces (FEEMS). FEEMS uses a Gaussian Markov Random Field in a penalized likelihood framework that allows for efficient optimization and output of effective migration surfaces. Further, the efficient optimization facilitates the inference of migration parameters per edge in the graph, rather than per node (as in EEMS). When tested with coalescent simulations, FEEMS accurately recovers effective migration surfaces with complex gene-flow histories, including those with anisotropy. Applications of FEEMS to population genetic data from North American gray wolves show it to perform comparably to EEMS, but with solutions obtained orders of magnitude faster. Overall, FEEMS expands the ability of users to quickly visualize and interpret spatial structure in their data.

Statistical inference for cluster trees

Jisu Kim (Inria)

8
A cluster tree provides a highly-interpretable summary of a density function by representing the hierarchy of its high-density clusters. It is estimated using the empirical tree, which is the cluster tree constructed from a density estimator. This talk addresses how to quantify the uncertainty by assessing the statistical significance of topological features of an empirical cluster tree. To do this, I propose methods to construct and summarize confidence sets for the unknown true cluster tree. And I introduce how to prune some of the statistically insignificant features of the empirical tree, yielding interpretable and parsimonious cluster trees. Finally, I illustrate the proposed methods on a variety of examples data set.

Autologistic network model on binary data for disease progression study

Yei Eun Shin (National Cancer Institute)

7
We propose an autologistic network model on binary spatiotemporal data to study the spreading patterns of disease. The proposed model identifies an underlying network, without the pre-specification of neighborhoods based on proximity, that can have varying effects depending on the previous states. The model parameters are estimated by maximizing the penalized pseudolikelihood with bias-corrected, which can be adapted to the generalized linear model (GLM) framework, where we show the resulting estimators are asymptotically normal. We provide spatial-joint transition probabilities for predicting disease status in the next time interval. Simulation studies were conducted to evaluate the validity and performance of the proposed method. Examples are provided using the amyotrophic lateral sclerosis (ALS) patients’ data from EMPOWER Study.

Q&A for Organized Contributed Session 26

0
This talk does not have an abstract.

Session Chair

Kyoungjae Lee (Inha University)

Enter Zoom

Made with in Toronto · Privacy Policy · © 2021 Duetone Corp.