Front cover image for Active learning

Active learning

The key idea behind active learning is that a machine learning algorithm can perform better with less training if it is allowed to choose the data from which it learns. An active learner may pose "queries," usually in the form of unlabeled data instances to be labeled by an "oracle" (e.g., a human annotator) that already understands the nature of the problem. This sort of approach is well-motivated in many modern machine learning and data mining applications, where unlabeled data may be abundant or easy to come by, but training labels are difficult, time-consuming, or expensive to obtain. This book is a general introduction to active learning. It outlines several scenarios in which queries might be formulated, and details many query selection algorithms which have been organized into four broad categories, or "query selection frameworks."We also touch on some of the theoretical foundations of active learning, and conclude with an overview of the strengths and weaknesses of these approaches in practice, including a summary of ongoing work to address these open challenges and opportunities
eBook, English, ©2012
Springer, Cham, Switzerland, ©2012
1 online resource (xiii, 100 pages) : illustrations
9781608457267, 9783031015601, 1608457265, 3031015606
Acknowledgments. 1. Automating inquiry
1.1 A thought experiment
1.2 Active learning
1.3 Scenarios for active learning. 2. Uncertainty sampling
2.1 Pushing the boundaries
2.2 An example
2.3 Measures of uncertainty
2.4 Beyond classification
2.5 Discussion. 3. Searching through the hypothesis space
3.1 The version space
3.2 Uncertainty sampling as version space search
3.3 Query by disagreement
3.4 Query by committee
3.5 Discussion. 4. Minimizing expected error and variance
4.1 Expected error reduction
4.2 Variance reduction
4.3 Batch queries and submodularity
4.4 Discussion. 5. Exploiting structure in data
5.1 Density-weighted methods
5.2 Cluster-based active learning
5.3 Active + semi-supervised learning
5.4 Discussion. 6. Theory
6.1 A unified view
6.2 A PAC bound for active learning
6.3 Discussion. 7. Practical considerations
7.1 Which algorithm is best?
7.2 Real labeling costs
7.3 Alternative query types
7.4 Skewed label distributions
7.5 Unreliable oracles
7.6 Multi-task active learning
7.7 Data reuse and the unknown model class
7.8 Stopping criteria. A. Nomenclature reference
Author's biography
Part of: Synthesis digital library of engineering and computer science