skip to main content
article
Free Access

Word association norms, mutual information, and lexicography

Published:01 March 1990Publication History
Skip Abstract Section

Abstract

The term word association is used in a very particular sense in the psycholinguistic literature. (Generally speaking, subjects respond quicker than normal to the word nurse if it follows a highly associated word such as doctor. ) We will extend the term to provide the basis for a statistical description of a variety of interesting linguistic phenomena, ranging from semantic relations of the doctor/nurse type (content word/content word) to lexico-syntactic co-occurrence constraints between verbs and prepositions (content word/function word). This paper will propose an objective measure based on the information theoretic notion of mutual information, for estimating word association norms from computer readable corpora. (The standard method of obtaining word association norms, testing a few thousand subjects on a few hundred words, is both costly and unreliable.) The proposed measure, the association ratio, estimates word association norms directly from computer readable corpora, making it possible to estimate norms for tens of thousands of words.

References

  1. Church, K. 1988 "A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text," Second Conference on Applied Natural Language Processing, Austin, TX. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Church, K.; Gale, W.; Hanks, P.; and Hindle, D. 1989 "Parsing, Word Associations and Typical Predicate-Argument Relations," International Workshop on Parsing Technologies, CMU.Google ScholarGoogle Scholar
  3. Fano, R. 1961 Transmission of Information: A Statistical Theory of Communications. MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  4. Firth, J. 1957 "A Synopsis of Linguistic Theory 1930-1955," in Studies in Linguistic Analysis, Philological Society, Oxford; reprinted in Palmer, F. (ed.) 1968 Selected Papers of J. R. Firth, Longman, Harlow.Google ScholarGoogle Scholar
  5. Francis, W. and Kučera, H. 1982 Frequency Analysis of English Usage. Houghton Mifflin Company, Boston, MA.Google ScholarGoogle Scholar
  6. Good, I. J. 1953 The Population Frequencies of Species and the Estimation of Population Parameters. Biometrika, Vol. 40, 237--264.Google ScholarGoogle ScholarCross RefCross Ref
  7. Hanks, P. 1987 "Definitions and Explanations," in J. Sinclair (ed.), Looking Up: An Account of the COBUILD Project in Lexical Computing. Collins, London and Glasgow.Google ScholarGoogle Scholar
  8. Hindle, D. 1983a "Deterministic Parsing of Syntactic Non fluencies." In Proceedings of the 23rd Annual Meeting of the Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Hindle, D. 1983b "User Manual for Fidditch, a Deterministic Parser." Naval Research Laboratory Technical Memorandum #7590--142.Google ScholarGoogle Scholar
  10. Hornby, A. 1948 The Advanced Learner's Dictionary, Oxford University Press, Oxford, U.K.Google ScholarGoogle Scholar
  11. Jelinek, F. 1982. (personal communication)Google ScholarGoogle Scholar
  12. Kahan, S.; Pavlidis, T.; and Baird, H. 1987 "On the Recognition of Printed Characters of any Font or Size," IEEE Transactions PAMI, 274--287. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Meyer, D.; Schvaneveldt, R.; and Ruddy, M. 1975 "Loci of Contextual Effects on Visual Word-Recognition," in P. Rabbitt and S. Dornic (eds.), Attention and Performance V, Academic Press, New York.Google ScholarGoogle Scholar
  14. Palermo, D. and Jenkins, J. 1964 "Word AssociationNorms." University of Minnesota Press, Minneapolis, MN.Google ScholarGoogle Scholar
  15. Sinclair, J.; Hanks, P.; Fox, G.; Moon, R.; and Stock, P. (eds.) 1987a Collins Cobuild English Language Dictionary. Collins, London and Glasgow.Google ScholarGoogle Scholar
  16. Sinclair, J. 1987b "The Nature of the Evidence," in J. Sinclair (ed.), Looking Up: An Account of the COBUILD Project in Lexical Computing. Collins, London and Glasgow.Google ScholarGoogle Scholar
  17. Smadja, F. In press. "Microcoding the Lexicon with Co-Occurrence Knowledge," in Zernik (ed.), Lexical Acquisition: Using On-Line Resources to Build a Lexicon, MIT Press, Cambridge, MA.Google ScholarGoogle Scholar

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image Computational Linguistics
    Computational Linguistics  Volume 16, Issue 1
    March 1990
    72 pages
    ISSN:0891-2017
    EISSN:1530-9312
    Issue’s Table of Contents

    Publisher

    MIT Press

    Cambridge, MA, United States

    Publication History

    • Published: 1 March 1990
    Published in coli Volume 16, Issue 1

    Qualifiers

    • article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader