Advanced searches left 3/3

Genome - Arxiv

Summarized by Plex Scholar
Last Updated: 03 September 2022

* If you want to update the article please login/register

Sorting Genomes by Prefix Double-Cut-and-Joins

In both the signed and unsigned formats, we explore the challenge of sorting unichromosomal linear genomes by prefixing double-cut-and-joins. Prefix DCJs cut the leftmost segment of a genome and any other part of a genome and any other segment of a DNA, and recombine the severed endpoints in one of two possible ways: one of these options corresponds to a prefix reversal, which changes the order of elements between the two cuts.

Source link: https://arxiv.org/abs/2208.14315v1


High-dimensional sparse vine copula regression with application to genomic prediction

However, the new vine copula based regression techniques do not scale up to high and super-high dimensions. We recommend two techniques for high-dimensional sparse vine copula based regression. Through simulation studies, we demonstrate our methodology's effectiveness in selecting relevant variables and prediction accuracy in high-dimensional sparse data sets. We'll continue to apply the new techniques to the high-dimensional real data next week, aimed at the genomic identification of maize traits. Some data-processing and feature extraction steps for the real data are also explored. In the real data application, we demonstrate the advantage of our techniques over linear models.

Source link: https://arxiv.org/abs/2208.12383v1


PRIME: Uncovering Circadian Oscillation Patterns and Associations with AD in Untimed Genome-wide Gene Expression across Multiple Brain Regions

Alzheimer's disease patients have a cardinal symptom of circadian rhythm disruption. The full circadian rhythm orchestration of gene expression in the human brain and its inherent links with AD are largely unknown. We use it to investigate oscillation patterns in 19 human brain regions of controls and AD patients. In 15 pairs of brain regions of control, our results reveal consistent, synchronized oscillation patterns, although AD or dime's fluctuation patterns differ.

Source link: https://arxiv.org/abs/2208.12811v1


Genome-wide nucleotide-resolution model of single-strand break site reveals species evolutionary hierarchy

The majority of DNA damage in the genome arises spontaneously as a result of genotoxins and intermediary DNA exchanges. Several SSB detection techniques, including S1 END-seq and SSiNGLe-ILM, have helped to map the SSB's genomic map with nucleotide resolution. This is why we developed SSBlazer, the first computational framework for genome-wide nucleotide resolution SSB site prediction. We showed that SSB lapzer can correctly forecast SSB sites and effectively eliminate false positives by designing an incomplete database to mimic the actual SSB distribution. SSBlazer captures the pattern of individual CpG in genomic context as well as the motif of TGCC in the center region as essential features, according to the model interpretation report. The putative SSB genomes of 216 vertebrates, which suggest a negative correlation between SSB frequency and evolutionary hierarchy, implying that the genome tends to be intact throughout evolution.

Source link: https://arxiv.org/abs/2208.09813v1


I-GWAS: Privacy-Preserving Interdependent Genome-Wide Association Studies

Unfortunately, mistakenly posting of GWAS results may lead to privacy concerns. We find that even when relying on state-of-the-art technologies for securing leaks, an adversary can reconstruct the genetic variations of up to 28 percent of participants, and that publication of up to 92. 3% of the genome variations would encourage membership inference attacks. As new genomes become available, the I-GWAScontinuously releases privacy-preserving and noise-free GWAS findings.

Source link: https://arxiv.org/abs/2208.08361v1


AirLift: A Fast and Comprehensive Technique for Remapping Alignments between Reference Genomes

However, the increasing amount of publicly available genomic information makes it prohibitively costly to completely re-map each read set to its respective reference genome every time the reference is updated. There are several ways that could help to convert a read data set from one reference to another. However, if a reader maps to a location in the old reference that does not appear with a significant degree of similarity in the new version, the read cannot be remapped. AirLift cuts the overall execution time to remap read sets between two reference genome versions by up to 27. 4x, relative to the state-of-the-art method for remapping studies. We validate our remapping results with GATK and discover that AirLift is among the most accurate in identifying ground truth SNP/INDEL variants.

Source link: https://arxiv.org/abs/1912.08735v3


r/K selection of genomic GC content in prokaryotes

When phenotypic plasticity is considered, integrated review of GC data in thousands of prokaryotes integrates well with reproductive characteristics, cell shape, and motility into a r/K selection model. Prokaryotes tend to be K-strategists, who live in a stable environment with a higher nutrient-to-biomass ratio. The association of GC-rich codons and cheaper amino acids in the genetic code is likely to reduce the ratio between GC-rich codons and cheaper amino acids in the genome code, while the correlation between GC content and genome size may partially reflect a change in genome sequence driven by r/K selection. All in all, molecular variation in the genome GC content of prokaryotes and the related species diversity may be a result of ecological r/K selection.

Source link: https://arxiv.org/abs/2208.04771v1


MutFormer: A context-dependent transformer-based model to predict deleterious missense mutations from protein sequences in the human genome

Various machine-learning schemes have already been developed, including deep neural network models, have been used to predict the deletion of missense mutations. Transformer models, a form of deep neural network, have been shown by recent advancements in the natural language processing field to be particularly good at modeling sequence information with context dependence. MutFormer's key features are reference and mutated protein sequences from the human genome. To find both long-range and short-range dependencies between amino acid mutations in a protein sequence, it uses a combination of self-attention layers and convolutional layers. MutFormer was pre-trained by Wee Pre-trained MutFormer on reference protein sequences and mutated protein sequences resulting from common genetic variations present in human populations. We then evaluated MutFormer's results on various test results sets. MutFormer successfully considers sequence features that are not investigated in previous studies, and may have the ability to assist existing computational models or numerically generated functional scores to improve our understanding of disease variants.

Source link: https://arxiv.org/abs/2110.14746v3

* Please keep in mind that all text is summarized by machine, we do not bear any responsibility, and you should always check original source before taking any actions

* Please keep in mind that all text is summarized by machine, we do not bear any responsibility, and you should always check original source before taking any actions