10th World Congress in Probability and Statistics

Poster Session

Poster III-1

Poster Session III-1

11:30 AM — 12:00 PM KST
Jul 21 Wed, 10:30 PM — 11:00 PM EDT

A penalized matrix normal mixture model for clustering matrix data

Jinwon Heo (Chonnam National University)

Along with the advance of technologies, matrix data such as medical/industrial images have been emerged in many practical fields. These data usually have high dimension and are not easy to be clustered due to its intrinsic correlated structure among rows and columns. Most approaches convert matrix data to multi-dimensional vector and apply conventional clustering methods to them, and hence suffer from extreme high-dimensionality problem as well as lack of interpretability of the correlated structure among row/column variables. Gao et al. (2020) proposed a regularized mixture model for clustering matrix-valued data by imposing a sparsity structure for the mean signal of each cluster. We extend their approach by regularizing further on the covariance to cope with curse of dimensionality for images with large size. We propose a penalized matrix-normal mixture model with lasso-type penalty terms in both mean and covariance matrices, and then develop an expectation maximization algorithm to estimate the parameters. We apply the proposed method to simulated data as well as real data sets, and confirm its clustering accuracy performance over some conventional methods.

Univariate and multivariate normality tests using an entropy-based transformation

Shahzad Munir (Xiamen University)

We introduce a new normality test which may be applied to univariate and multivariate IID or time series data. In the univariate case, the test is constructed by first applying a transformation based on the definition of entropy. The test only requires the estimation of the variance of transformed data; however, it is less sensitive to errors from estimating the kurtosis coefficient but is able to detect deviations from this higher-order moment. In the univariate case, we show that in a broad class of stationary processes, the proposed test statistic asymptotically follows a standard normal distribution, and does not require any kernel smoothing to consistently estimate the asymptotic variance of the proposed test. The extension to the multivariate case is also straightforward and allows for alternatives to diagnostic testing of vector autoregressive models.

Geum river network data analysis via weighted PCA

Seeun Park (Seoul National University)

Various measurements of water quality are collected at monitoring sites, spread throughout the river network. Monitoring this kind of dataset is critical for water quality evaluation and improvement, but the unique structure of the river network interrupts PCA, achieving accurate results due to autocorrelation among variables. In literature, Gallacher et al. (2017) introduced a weighted PCA that reflects the known spatiotemporal structure of the river network to adjust the autocorrelation. This study aims to apply the weighted PCA method to Geum River network data in South Korea and improve the method itself. As a result, the weighted PCA successfully identified certain patterns in Geum River data that the conventional PCA cannot process. However, we believe that the weighted PCA method does not take into account the inhomogeneity on the covariance structure of the data, which might lead to inaccurate results in PCA. In fact, inhomogeneous covariance structures are found in Geum River data across regions and seasons. Therefore, our further plan is to improve the weighted PCA that can handle this problem due to the inhomogeneous structure.

Cauchy combination test with thresholding under arbitrary dependency structures

Junsik Kim (Seoul National University)

Combining individual p-values to aggregate sparse and weak effects is a substantial interest in large-scale data analysis. The individual p-values or test statistics are often correlated, although many p-values combining methods are developed under i.i.d. assumption. The Cauchy combination test is a method to combine p-values for this arbitrary dependence structure, but in practice, type I error increases as the correlation increases. In this paper, we propose a global test that extends the Cauchy combination test by thresholding arbitrarily dependent p-values. Under an arbitrary dependence structure, we show that the tail probability of the proposed method is asymptotically equivalent to that of the Cauchy distribution. In addition, we show that the power of the proposed test achieves the optimal detection boundary asymptotically in a strong sparsity condition. Extensive simulation results show that the power of the proposed test is robust to correlation coefficients and more powerful under a sparse situation. As a case study, we apply the proposed test to GWAS of Inflammatory bowel disease (IBD).

Control charts for monitoring linear profiles in the detection of network intrusion

Daeun Kim (Dankook University)

This study considers the problem of network intrusion detection. Sklavounos et al. (2017) proposed using EWMA charts for crucial characteristics as a network intrusion detection method. This paper expands on this idea and attempts to detect network intrusions by monitoring functional relationships between multiple features rather than a single feature. We consider profiles when the principal characteristic is functionally dependent on explanatory variables. Profile monitoring is then used to verify the stability of the functional relationship over time, which is widely applied in calibration applications. In particular, there has been much work on linear profiles. In this case, the stability of the profile is determined by monitoring statistics on the slope and intercept. We thus can consider Shewhart control charts or multivariate control charts. The previous studies assume that the explanatory variable has the same fixed value for each profile. Therefore, to consider the network intrusion problem, the explanatory variable should be expanded to the case observed differently for each profile. In this regard, we will evaluate the robustness of the existing control charts and determine whether the extended control charts effectively detect network intrusion. We perform real analysis using the NSL-KDD data, which is popular in evaluating the performance of network detection algorithms.

Benefits of international agreements as switching diffusions

Sheikh Shahnawaz (California State University)

We formulate a model to consider the dynamic stability of international agreements (IAs) such as those on disarmament, nuclear non-proliferation, the environment, or sovereign debt. An agreement is reached because all participants initially receive some benefit that is above a minimum threshold level, but the distribution of total benefits X (modeled as a stochastic process that solves a switching stochastic differential equation) varies beyond ratification. The agreement is sustained as long as participants receiving higher benefits transfer their surplus to those with slack but this comes at a cost alpha, which is a homogeneous continuous Markov chain. Under certain assumptions on uniqueness of the solution of our SDE and on alpha, we derive an optimal strategy that prolongs the life of the IA.

Estimation of Hilbertian varying coefficient models

Hyerim Hong (Seoul National University)

In this paper we discuss the estimation of a fairly general type of varying coefficient model. The model is for a response variable that takes values in a general Hilbert space and allows for various types of additive interaction terms in representing the effects of predictors. It also accommodates both continuous and discrete predictors. We develop a powerful technique of estimating the very general model. Our approach may be used in a variety of situations where one needs to analyze the relation between a set of predictors and a Hilbertian response. We prove the existence of the estimators of the model itself and of its components, and also the convergence of a backfitting algorithm that realizes the estimators. We derive the rates of convergence of the estimators and their asymptotic distributions. We also demonstrate via simulation study that our approach works efficiently, and illustrate its usefulness through a real data application.

Duality for a class of continuous-time reversible Markov models

Freddy Palma (Fundación Universidad de las Américas Puebla)

Using a conditional probability structure we build transition probabilities that drive appealing classes of reversible Markov processes. The mechanism used in such a construction allows to find a dual Markov process. This kind of duality is then used to compute the predictor operator of one process via its dual. In particular, we identify the dual of some non-conjugate models, namely the $M/M/\infty$ queue model and a simple birth, death and immigration process. Such duals ensures that the computation of the predictor operators can be done via finite sums.

Made with in Toronto · Privacy Policy · © 2021 Duetone Corp.