Advanced searches left 3/3

AED Training - Arxiv

Summarized by Plex Scholar
Last Updated: 03 November 2022

* If you want to update the article please login/register

SOLAR: A Highly Optimized Data Loading Framework for Distributed Training of CNN-based Scientific Surrogates

Although these surrogates can produce good results with significantly reduced computation times for smaller training datasets, our benchmarking results show that data-loading overhead is the biggest performance bottleneck when training surrogates with large datasets. Several state-of-the-art data loaders have been designed to increase the loading throughput in general CNN preparation; however, when applied to the surrogate training, they are suboptimal. Specifically, SOLAR first designs a pre-determined shuffled index list and optimizes the global access order and the buffer eviction scheme to optimize the data reuse and the buffer hit rate. To increase overall fitness, it then recommends a tradeoff between lightweight computational imbalance and heavyweight load imbalance. SOLAR can achieve up to 24. 4X speed improvement over PyTorch Data Loader and 32 GPUs, according to our review of three scientific surrogates and 32 GPUs.

Source link: https://arxiv.org/abs/2211.00224v1


Distributed Graph Neural Network Training: A Survey

Graph neural networks are a form of deep learning techniques that teach graphs, and they have been successfully used in many domains. Many attempts have been made on distributed GNN preparation in recent years, and a variety of training algorithms and platforms have been introduced. In this article, we examine three key issues in distributed GNN training that include: large feature connectivity, model accuracy, and workload disparities. The above issues are addressed by the authors' introduction of a new taxonomy for distributed GNN training's optimization methods. We carefully discuss the methods in each category. GNN data partitioning, GNN batch generation, GNN batch generation, GNN execution model, and GNN communication protocol are among the new taxonomy groups. We'll end with a review of existing distributed GNN solutions for multi-GPUs, GPU-clusters, and CPU-clusters, as well as a talk about scalable GNNs.

Source link: https://arxiv.org/abs/2211.00216v1


A Close Look into the Calibration of Pre-trained Language Models

Pre-trained language models excel on several downstream tasks, but they are often lacking in providing accurate estimates of predictive uncertainty. Given the lack of a comprehensive grasp of PLMs calibration, we take a closer look at this emerging research issue, aiming to answer two questions: Can PLMs learn to become calibrated in the training process? For the first question, we perform fine-grained control experiments to investigate the dynamic shift in PLMs' calibration results in training. We use two new learnable techniques that directly collect data to build models in order to give models accurate confidence estimates, in place of unlearnable calibration methods. We also provide extended learnable techniques based on existing ones to further enhance or maintain PLMs calibration without sacrificeing the original task's quality. Experimental results show that learnable techniques significantly reduce PLMs' confidence in inaccurate forecasts, as well as our methods' superior results in comparison to previous approaches.

Source link: https://arxiv.org/abs/2211.00151v1


DialogVED: A Pre-trained Latent Variable Encoder-Decoder Model for Dialog Response Generation

Dialog response generation in the open domain is an important research field, where the main challenge is to gather accurate and diverse responses. To increase the relevance and variety of responses, we recommend DialogVED, a new dialog pre-training framework that incorporates continuous latent variables into the enhanced encoder pre-training framework.

Source link: https://arxiv.org/abs/2204.13031v2


token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and Text

First, because of the particulars of speech and text modalities, where speech is continuous but text is discrete, we first discretize speech into a sequence of discrete speech tokens to solve the modality mismatch problem. We convert the words of text into phoneme sequences and then repeat each phoneme in the sequences in order to solve the length mismatch problem, where the speech sequence is often longer than the text sequence.

Source link: https://arxiv.org/abs/2210.16755v1


LearningGroup: A Real-Time Sparse Training on FPGA via Learnable Weight Grouping for Multi-Agent Reinforcement Learning

Multi-agent reinforcement learning is a common technique to develop interactive artificial intelligence systems in various industries such as multi-robot control and self-driving vehicles. Unlike supervised model or single-agent reinforcement learning, which actively promotes network pruning, it's unclear how pruning would work in multi-agent reinforcement learning due to its collaborative and collaborative characteristics. paraphrasedoutput:parpar : par : LearnGroup, a real-time sparse training acceleration system, debuts in this paper, adopting network pruning on the education of MARL for the first time with an algorithm/architecture co-design approach. LearningGroup's encoding system ensures effective weight reduction and computation workload allocation to multiple cores, where each core handles multiple sparse rows of the weight matrix simultaneously with vector processing units. Based on the OSEL's encoding scheme, LearningGroup provides accurate weight compression and computation workload allocation to multiple cores, based on the OSEL's encoding system, where each core handles multiple sparse rows of the weight matrix simultaneously with vector processing units. LearningGroup's course optimization optimizes the cycle time and memory footprint for sparse data processing, which is reduced by 5. 72x and 6. 81x.

Source link: https://arxiv.org/abs/2210.16624v1


A Systematic Survey of Molecular Pre-trained Models

Deep learning has achieved unsurpassed success in learning representations of molecules by automated feature learning in a data-driven manner. However, developing deep neural networks from scratch can often requires adequate labeled molecules, which are impossible to obtain in real-world scenarios. First, we discuss the difficulties of creating deep neural networks for molecular representations in order to inspire MPM research.

Source link: https://arxiv.org/abs/2210.16484v1


A Functional-Space Mean-Field Theory of Partially-Trained Three-Layer Neural Networks

We generalize the MF theory of two-layer NNs by treating the neurons as belonging to functional spaces in order to constrain the limiting model more precisely. L_2 regression declines at a linear rate, with Then's writing the MF training dynamics as a kernel gradient flow with a time-changing kernel that remains positive and steady. We conclude that its education loss in regression declines to zero at a linear rate. In addition, we establish function spaces that incorporate the solutions that are obtainable through the MF training dynamics and establish Rademacher complexity bounds for these spaces.

Source link: https://arxiv.org/abs/2210.16286v1


ELMER: A Non-Autoregressive Pre-trained Language Model for Efficient and Effective Text Generation

We investigate the text generation task under the influence of pre-trained language models. However, NAR programs typically produce texts of lower quality due to the lack of token dependence in the output text. We also recommend ELMER, an affordable and cost-effective PLM for NAR text generation, to explicitly model token dependence during NAR generation. ELMER clearly outperforms NAR models and narrows the performance gap with AR PLMs while attained speeds of up to ten times per cent inference speed up, according to three experiments.

Source link: https://arxiv.org/abs/2210.13304v2


PatchRot: A Self-Supervised Technique for Training Vision Transformers

Vision transformers require a substantial amount of labeled data to outperform convolutional neural networks. Self-supervised learning methods help solve this problem by acquiring skills related to supervised learning in unsupervised manner. PatchRot rotates images and image patches, and trains the network to determine the rotation angles.

Source link: https://arxiv.org/abs/2210.15722v1

* Please keep in mind that all text is summarized by machine, we do not bear any responsibility, and you should always check original source before taking any actions

Source Recommendations

* Please keep in mind that all text is summarized by machine, we do not bear any responsibility, and you should always check original source before taking any actions