Abstract
The rapid advancement of large language models (LLMs) has enabled their application across complex professional domains. In the telecommunications industry, operations scheduling integrates network monitoring, ticket management, risk assessment, and workforce coordination, requiring intelligent support due to its language-knowledge-decision coupling. However, the lack of standardized benchmarks limits LLM development in this field. To address these issues, we introduce TeleEval-OS, the first dedicated evaluation benchmark for telecommunications operations scheduling. TeleEval-OS comprehensively covers the four essential stages of dispatch workflows: intelligent ticket creation, resolution, ticket closure, and operational assessment. The benchmark includes 15 high-quality, manually annotated datasets with a total of 10.4K samples, spanning 13 representative real-world sub-tasks, such as similar-ticket recommendation, service intent classification, network fault ticket report generation, and risk indicator interpretation. To capture the spectrum of task complexity, we define a four-level evaluation hierarchy: basic natural language processing (NLP), domain-specific question answering (Q&A), structured report generation, and operational report analysis. We conduct a systematic evaluation of 14 representative LLMs under zero-shot and few-shot settings, such as GPT-4o, DeepSeek-V3, and Qwen-2.5-72B-Instruct. The results show that DeepSeek-V3 achieves the best performance on basic NLP and structured report generation tasks, while GPT-4o demonstrates superior capabilities in operational report analysis. These findings highlight the complementary strengths of LLMs across different task types. They also underscore the practical value of TeleEval-OS as the first dedicated benchmark for telecommunications operations scheduling, providing a unified framework for both future research and real-world deployment. Code is available at https://github.com/zjsllab/TeleEval-OS.
Get full access to this article
View all access options for this article.
