Weak Labelers for Iteratively Improving Models Faster

Ashish Bansal

doi:10.21275/SR24826130301

Weak Labelers for Iteratively Improving Models Faster

Ashish Bansal

Abstract: Deep neural networks are becoming omnipresent in natural language applications (NLP). However, they require large amounts of labeled training data, which is often only available for English. This is a big challenge for many languages and domains where labeled data is limited. In recent years, a variety of methods have been proposed to tackle this situation. This paper gives an overview of these approaches that help you train NLP models in resource-lean scenarios. This includes both ideas to increase the amount of labeled data as well as methods following the popular pre-train and fine-tune paradigm. Supervised learning techniques construct predictive models by learning from a large number of training examples, where each training example has a label indicating its ground-truth output. Though current techniques have achieved great success, it is noteworthy that in many tasks it is difficult to get strong supervision information like fully ground-truth labels due to the high cost of the data-labeling process. Thus, it is desirable for machine-learning techniques to work with weak supervision. This paper outlines the advantages of weakly supervised learning in collecting more robust data fastly and using less resource, focusing on three typical types of weak supervision: incomplete supervision, where only a subset of training data is given with labels; inexact supervision, where the training data are given with only coarse-grained labels; and inaccurate supervision, where the given labels are not always ground-truth. The main focus will be on the weak supervision technique where we will explain how a smaller dataset is used to train a classifier model and then that model is used to label the new data having weak labels which might be accurately predicting those labels to some extent. This method involves human-in-loop where human would review those predicted labels and correct the wrong predictions which create an additional data point to train a new weak labeler model. Using this technique iteratively it helped researchers in creating more ground truth data that can be used to train better performing models very fast.

Keywords: machine learning, weakly supervised learning, supervised learning, NLP

Weak Labelers for Iteratively Improving Models Faster

Rate this Article

Received Comments