Advanced searches left 3/3

Categorical Data - Astrophysics Data System

Summarized by Plex Scholar
Last Updated: 10 November 2022

* If you want to update the article please login/register

Significance-Based Categorical Data Clustering

Although several algorithms have been designed to solve the categorical data clustering problem, a series of categorical clusters' statistical significance is unaddressed. We use the likelihood ratio test to derive a test statistic that can be used as a significance-based objective function in categorical data clustering to fill this void. As a by-product, we can even estimate an empirical p -value to analyze the statistical significance of a number of clusters and create a new gap estimator for estimating cluster number.

Source link: https://ui.adsabs.harvard.edu/abs/2022arXiv221103956H/abstract


Improving Group Lasso for high-dimensional categorical data

Therefore, a fitted model may not be sparse, making the model interpretation difficult. To obtain a sparse solution of the Group Lasso, we recommend the following two steps: first, we reduce data dimensionality using the Group Lasso; then, we select the final model from a small family of models created by clustering levels of individual variables in order to select the best model. We also tested our algorithm on synthetic as well as real datasets, finding that it performs better than traditional algorithms in terms of prediction accuracy or model dimension.

Source link: https://ui.adsabs.harvard.edu/abs/2022arXiv221014021N/abstract


Dimension reduction of high-dimension categorical data with two or multiple responses considering interactions between responses

We investigate the theoretical guarantees of the proposed method under two- and multiple-response models, demonstrating the uniqueness of the estimated estimator and determining the likelihood that the recovered oracle least squares estimators can be recovered by the proposed method. We apply this modeling and the proposed procedure to an adult dataset and the right heart catheterization registry dataset in order to obtain meaningful results.

Source link: https://ui.adsabs.harvard.edu/abs/2022arXiv221011811Y/abstract


Clustering Categorical Data: Soft Rounding k-modes

Despite the introduction of various clustering techniques, the classical k-modes algorithm remains a common choice for unsupervised categorical data analysis. We solve this problem by presenting a soft rounding variant of the k-modes algorithm and then show that our variant addresses the drawbacks of the generative model's k-modes algorithm.

Source link: https://ui.adsabs.harvard.edu/abs/2022arXiv221009640T/abstract

* Please keep in mind that all text is summarized by machine, we do not bear any responsibility, and you should always check original source before taking any actions

* Please keep in mind that all text is summarized by machine, we do not bear any responsibility, and you should always check original source before taking any actions