Automatic Subject Heading Assignment for Online Government Publications Using a Semi-Supervised Machine Learning Approach
Title
Automatic Subject Heading Assignment for Online Government Publications Using a Semi-Supervised Machine Learning Approach
Subject
Description
As the dramatic expansion of online publications continues, state libraries urgently need effective tools to organize and archive the huge number of government documents published online. Automatic text categorization techniques can be applied to classify documents approximately, given a sufficient number of labeled training examples. However, obtaining training labels is very expensive, requiring a lot of manual labor. We present a semi-supervised machine learning approach, an Expectation-Maximization (EM) algorithm text classifier, which makes use of easily obtained unlabeled documents and thus reduces the demand for labeled training examples. This paper describes the whole procedure of applying this approach to a real world online information preservation project where a collection is harvested from the websites of Illinois State Government agencies and a subject heading taxonomy is adapted from the State GILS topic tree. A formal evaluation has been performed based on the intended use of the assigned headings. The results demonstrate the semi-supervised approach improves subject heading assignment compared to the supervised approach, and is more efficient in using labeled documents.
Creator
Hu, Xiao; Jackson, Larry S.; Deng, Sai; Zhang, Jing
Publisher
Wiley-Blackwell
Rights
This resource may be protected by copyright. You may make use of this resource, with proper attribution, for educational and other non-commercial uses only. Permission to reproduce the resource beyond the bounds of Fair Use or other exemptions to copyright law must be obtained from the copyright holder.
Relation
Language
English
Type
publication; text; conference paper
Date Available
2017-05-01
Date Issued
2006-10-18
Extent
6 pages
Bibliographic Citation
Hu, X., Jackson, L., Deng, S. & Zhang, J. (2006). Automatic subject heading assignment for online government publications using a semi-supervised machine learning approach. In Proceedings of the American Society for Information Science and Technology. Volume 42, Issue 1, 2006.
Position: 171 (343 views)
Collection
Citation
Hu, Xiao; Jackson, Larry S.; Deng, Sai; Zhang, Jing, “Automatic Subject Heading Assignment for Online Government Publications Using a Semi-Supervised Machine Learning Approach,” CALASYS - CALA Academic Resources & Repository System, accessed January 21, 2025, https://ir.cala-web.org/items/show/267.