Abstract
The term word association is used in a very particular sense in the psycholinguistic literature. (Generally speaking, subjects respond quicker than normal to the word nurse if it follows a highly associated word such as doctor. ) We will extend the term to provide the basis for a statistical description of a variety of interesting linguistic phenomena, ranging from semantic relations of the doctor/nurse type (content word/content word) to lexico-syntactic co-occurrence constraints between verbs and prepositions (content word/function word). This paper will propose an objective measure based on the information theoretic notion of mutual information, for estimating word association norms from computer readable corpora. (The standard method of obtaining word association norms, testing a few thousand subjects on a few hundred words, is both costly and unreliable.) The proposed measure, the association ratio, estimates word association norms directly from computer readable corpora, making it possible to estimate norms for tens of thousands of words.
- Church, K. 1988 "A Stochastic Parts Program and Noun Phrase Parser for Unrestricted Text," Second Conference on Applied Natural Language Processing, Austin, TX. Google ScholarDigital Library
- Church, K.; Gale, W.; Hanks, P.; and Hindle, D. 1989 "Parsing, Word Associations and Typical Predicate-Argument Relations," International Workshop on Parsing Technologies, CMU.Google Scholar
- Fano, R. 1961 Transmission of Information: A Statistical Theory of Communications. MIT Press, Cambridge, MA.Google Scholar
- Firth, J. 1957 "A Synopsis of Linguistic Theory 1930-1955," in Studies in Linguistic Analysis, Philological Society, Oxford; reprinted in Palmer, F. (ed.) 1968 Selected Papers of J. R. Firth, Longman, Harlow.Google Scholar
- Francis, W. and Kučera, H. 1982 Frequency Analysis of English Usage. Houghton Mifflin Company, Boston, MA.Google Scholar
- Good, I. J. 1953 The Population Frequencies of Species and the Estimation of Population Parameters. Biometrika, Vol. 40, 237--264.Google ScholarCross Ref
- Hanks, P. 1987 "Definitions and Explanations," in J. Sinclair (ed.), Looking Up: An Account of the COBUILD Project in Lexical Computing. Collins, London and Glasgow.Google Scholar
- Hindle, D. 1983a "Deterministic Parsing of Syntactic Non fluencies." In Proceedings of the 23rd Annual Meeting of the Association for Computational Linguistics. Google ScholarDigital Library
- Hindle, D. 1983b "User Manual for Fidditch, a Deterministic Parser." Naval Research Laboratory Technical Memorandum #7590--142.Google Scholar
- Hornby, A. 1948 The Advanced Learner's Dictionary, Oxford University Press, Oxford, U.K.Google Scholar
- Jelinek, F. 1982. (personal communication)Google Scholar
- Kahan, S.; Pavlidis, T.; and Baird, H. 1987 "On the Recognition of Printed Characters of any Font or Size," IEEE Transactions PAMI, 274--287. Google ScholarDigital Library
- Meyer, D.; Schvaneveldt, R.; and Ruddy, M. 1975 "Loci of Contextual Effects on Visual Word-Recognition," in P. Rabbitt and S. Dornic (eds.), Attention and Performance V, Academic Press, New York.Google Scholar
- Palermo, D. and Jenkins, J. 1964 "Word AssociationNorms." University of Minnesota Press, Minneapolis, MN.Google Scholar
- Sinclair, J.; Hanks, P.; Fox, G.; Moon, R.; and Stock, P. (eds.) 1987a Collins Cobuild English Language Dictionary. Collins, London and Glasgow.Google Scholar
- Sinclair, J. 1987b "The Nature of the Evidence," in J. Sinclair (ed.), Looking Up: An Account of the COBUILD Project in Lexical Computing. Collins, London and Glasgow.Google Scholar
- Smadja, F. In press. "Microcoding the Lexicon with Co-Occurrence Knowledge," in Zernik (ed.), Lexical Acquisition: Using On-Line Resources to Build a Lexicon, MIT Press, Cambridge, MA.Google Scholar
Recommendations
Word association norms, mutual information, and lexicography
ACL '89: Proceedings of the 27th annual meeting on Association for Computational LinguisticsThe term word association is used in a very particular sense in the psycholinguistic literature. (Generally speaking, subjects respond quicker than normal to the word "nurse" if it follows a highly associated word such as "doctor.") We will extend the ...
Comparing lexical relationships observed within Japanese collocation data and Japanese word association norms
COGALEX '08: Proceedings of the workshop on Cognitive Aspects of the LexiconWhile large-scale corpora and various corpus query tools have long been recognized as essential language resources, the value of word association norms as language resources has been largely overlooked. This paper conducts some initial comparisons of ...
Using Word Association Norms to Measure Corpus Representativeness
CICLing 2014: Proceedings of the 15th International Conference on Computational Linguistics and Intelligent Text Processing - Volume 8403An obvious way to measure how representative a corpus is for the language environment of a person would be to observe this person over a longer period of time, record all written or spoken input, and compare this data to the corpus in question. As this ...
Comments