Learning to Write Rationally: How Information Is Distributed in Non-Native Speakers' Essays
Research Poster Social & Behavioral Sciences 2025 Graduate ExhibitionPresentation by Zixin Tang
Exhibition Number 118
Abstract
People tend to distribute information evenly during language production, such as when writing an essay, to improve clarity and communication. However, this may pose challenges to non-native speakers. In this study, we compared essays written by second language (L2) learners with various native language (L1) backgrounds to investigate how they distribute information in their non-native L2 written essays. We used information-based metrics, \ie, word surprisal, word entropy, and uniform information density, to estimate how writers distribute information throughout the essay to deliver information. The surprisal and constancy of entropy metrics showed that as writers' L2 proficiency increases, their essays show more native-like patterns will be in the essay, indicating more native-like mechanisms in delivering informative but less surprising content. In contrast, the uniformity of information density metric showed fewer differences across L2 speakers, regardless of their L1 background and L2 proficiency, suggesting that distributing information evenly is a more universal mechanism in human language production mechanisms.
Importance
This work provides a computational approach to investigate language diversity, variation, and L2 acquisition via human language production, which might further help researchers better understand the mechanism of language learning, late language acquisition, and language diversity via quantitative measures and features. Furthermore, these linguistic measures and their corresponding patterns in human language production can also benefit the development of general large language models in handling, processing, and generating various styles of language sequences to fit different users' needs and language background.