Abstract
Traditional research on preschool language development often fails to capture the complex nonlinear relationships and high-dimensional characteristics of language growth, leading to low prediction accuracy and poor cross-cultural applicability. This paper introduces a novel BERT (Bidirectional Encoder Representations from Transformers)-based model to predict preschool language development and evaluate its cross-cultural effectiveness. Text data from preschool children’s language datasets across multiple cultural backgrounds is collected, cleaned, and preprocessed to create suitable training samples. Special attention is given to the unique grammatical structures and cultural expressions in each language to ensure compatibility with the model. The BERT model is used to encode the processed text, leveraging its bidirectional self-attention mechanism to extract contextual information and generate deep feature representations essential for understanding preschool language development. The model combines both grammatical and semantic features for meaningful representations in subsequent predictions. Fine-tuning the pre-trained BERT model using the Adam optimizer enhances prediction accuracy, while cross-validation and hyperparameter tuning further improve its performance. Culturally specific annotations and vocabularies are incorporated to ensure the model’s effective prediction of language development across different regions. Experimental results show that the BERT model achieves an MAE (Mean Absolute Error) between 0.20 and 0.25, an MSE (Mean Squared Error) between 0.05 and 0.08, and an average R2 value of 0.84 across English, Chinese, Spanish, and Japanese. These results demonstrate the model’s high accuracy and strong cross-cultural stability in predicting preschool language development.
Keywords
Get full access to this article
View all access options for this article.
