Publication: A Computational Method for Predicting Bare Nouns in Peninsular Spanish

Field, Mike. 2011. "A Computational Method for Predicting Bare Nouns in Peninsular Spanish." The International Journal of Interdisciplinary Social Sciences: Annual Review 5 (12): 27-44. doi:10.18848/1833-1882/CGP/v05i12/51963.

This paper presents a specific example of how syntax and semantics in language can be studied computationally using mathematical techniques. The study describes a method for predicting where bare noun phrases are permitted in Peninsular Spanish. The C4.5 decision tree algorithm was used to classify 48,554 noun phrases from the AnCora-ES corpus of Spanish news articles based on several syntactic, semantic and lexical properties of each noun phrase, including grammatical function, theta role, number, gender, if the noun is modified, position relative to the head verb, and if certain special-case verbs and nouns appeared in the sentence. On 3,852 noun phrases in the evaluation data set, the system achieved 81% precision, 77% recall, an F-measure of 79% and 81% accuracy. Gender was found to have significant importance in the classification, as did the prepositions de, a, entre, sin and con.