skip to main content
10.5555/1367832.1367873acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesdg-oConference Proceedingsconference-collections
research-article

Active learning for e-rulemaking: public comment categorization

Published:18 May 2008Publication History

ABSTRACT

We address the e-rulemaking problem of reducing the manual labor required to analyze public comment sets. In current and previous work, for example, text categorization techniques have been used to speed up the comment analysis phase of e-rulemaking --- by classifying sentences automatically, according to the rule-specific issues [2] or general topics that they address [7, 8]. Manually annotated data, however, is still required to train the supervised inductive learning algorithms that perform the categorization. This paper, therefore, investigates the application of active learning methods for public comment categorization: we develop two new, general-purpose, active learning techniques to selectively sample from the available training data for human labeling when building the sentence-level classifiers employed in public comment categorization. Using an e-rulemaking corpus developed for our purposes [2], we compare our methods to the well-known query by committee (QBC) active learning algorithm [5] and to a baseline that randomly selects instances for labeling in each round of active learning. We show that our methods statistically significantly exceed the performance of the random selection active learner and the query by committee (QBC) variation, requiring many fewer training examples to reach the same levels of accuracy on a held-out test set. This provides promising evidence that automated text categorization methods might be used effectively to support public comment analysis.

References

  1. K. Brinker. Incorporating diversity in active learning with support vector machines. In Proceedings of ICML-03, 20th International Conference on Machine Learning. Morgan Kaufmann Publishers, San Francisco, US, 2003.Google ScholarGoogle Scholar
  2. Claire Cardie, Cynthia Farina, Matt Rawding, Adil Aijaz, and Stephen Purpura. A Study in Rule-Specific Issue Categorization for e-Rulemaking. In Proceedings of the 9th Annual International Conference on Digital Government Research, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. C. Coglianese. Weak democracy, strong information: The role of information technology in the rulemaking process. In V. Mayer-Schoenberger and D. Lazer, editors, Electronic Government to Information Government: Governing in the 21ST Century, 2007.Google ScholarGoogle ScholarCross RefCross Ref
  4. D. Cohn, L. Atlas, and R. Ladner. Improving generalization with active learning. Machine Learning, 15(2):201--221, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Y. Freund, H. S. Seung, E. Shamir, and N. Tishby. Selective sampling using the query by committee algorithm. Machine Learning, 28:133--168, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Kerwin. The state of rulemaking in the federal government. Technical report, Transcript Panel 1, 2005.Google ScholarGoogle Scholar
  7. N. Kwon and E. Hovy. Information acquisition using multiple classifications. In Proceedings of the Fourth International Conference on Knowledge Capture (K-CAP 2007), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Kwon, E. Hovy, and S. Shulman. Multidimensional text analysis for erulemaking. In Proceedings of the 7th Annual International Conference on Digital Government Research, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. D. D. Lewis and J. Catlett. Heterogeneous Uncertainty Sampling for Supervised Learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 148--156, Rutgers University, New Brunswick, NJ, 1994. Morgan Kaufmann.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. P. Melville and R. Mooney. Diverse ensembles for active learning. In Proceedings of ICML-04, 21st International Conference on Machine Learning. Morgan Kaufmann Publishers, San Francisco, US, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. I. Muslea, S. Minton, and C. Knoblock. Selective sampling with redundant views. In Proceedings of the Seventeenth National Conference on Artificial Intelligence, pages 621--626, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. K. Papineni. Why inverse document frequency? In Proceedings of the North American Association for Computational Linguistics, NAACL, pages 25--32, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. M. F. Porter. An algorithm for suffix stripping. Program, 14(3):130--137, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  14. S. Purpura and D. Hillard. Automated Classification of Congressional Legislation. In Proceedings of the 7th Annual International Conference on Digital Government Research, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. G. Salton and M. McGill. Introduction to Modern Information Retrieval. McGraw-Hill, New York, 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. B. Scholkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond (Adaptive Computation and Machine Learning). MIT Press, Cambridge, MA, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. H. S. Seung, M. Opper, and H. Sompolinsky. Query by committee. In Computational Learning Theory, pages 287--294, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Shulman. Perverse incentives: The case against mass e-mail campaigns. In Proceedings of the Annual Meeting of the American Political Science Association, 2008.Google ScholarGoogle Scholar
  19. P. Strauss, T. Rakoff, and C. Farina. Administrative Law. 10th edition, 2003.Google ScholarGoogle Scholar
  20. V. N. Vapnik. The Nature of Statistical Learning Theory. Springer, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. H. Yang and J. Callan. Near-duplicate detection for erulemaking. In Proceedings of the Fifth National Conference on Digital Government Research, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. H. Yang and J. Callan. Near-duplicate detection by instance-level constrained clustering. In Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Active learning for e-rulemaking: public comment categorization

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader