Submission 701
Lexical Processing in English, Dutch, Estonian, and Malay: Pros and Cons of Combined N-Grams of Different Sizes in the Discriminative Lexicon Model
SymposiumTalk-04
Presented by: Maziyah Mohamed
The Discriminative Lexicon Model (DLM) works with linear or deep mappings from form embeddings to meaning embeddings. One simple way of designing form embeddings is to construct multiple-hot binary vectors that specify which n-grams (for fixed n, e.g., bigrams or trigrams) are present in a words orthographic or phonological representation. We report on explorations of combining n-grams for multiple n (e.g., form vectors specifying the presence of both bigrams, trigrams, and 4-grams), for four different languages, with special attention to trade-offs between gram sizes, the use of linear versus deep mappings, and language. Deep mappings offer higher prediction accuracy, but linear mappings tend to be more precise for predicting response latencies. Linear mappings with multiple-sized n-grams also show a clear advantage compared to models with only 3-grams, and tend to outperform models with deep mappings. The option of working will multiple n-grams will become available in the julia package JudiLing, which provides a computational implementation of the DLM.