Abstract
In professional tennis, converting broadcast footage into tactical intelligence remains limited by labor-intensive manual logging and subjective observation. Automated methods still struggle with fine-grained stroke perception because of fast ball motion, occlusion, and broadcast variability. We present a video-based tactical intelligence pipeline that uses an ensemble YOLO detector (YOLOv8s + YOLOv10n) with weighted boxes fusion to identify six near-side semantic targets: tennis ball, player, forehand, backhand, ready position, and serve. From these detections, nine coach-interpretable indicators are derived across five tactical dimensions: hitting-type distribution, time-series rhythm, state transitions, serve initiative, and fatigue dynamics. The detector was trained on 14,892 annotated images from 360 competitive points across three court surfaces and achieved a test mAP@0.5 of 0.863, outperforming the individual YOLO models (+7.5%), Faster R-CNN, and DETR. Rally-level analysis of 36 ATP players from the 2025 Rome Masters, Shanghai Masters, and Wimbledon Championships, grouped into elite, mid-level, and lower-ranked tiers (n = 12 per tier), revealed consistent tier differentiation across all indicators (all p < 0.001, Cohen's d > 2.5) and clear surface-related variation in stroke selection and serve initiative. Expert validation showed strong agreement between AI-derived and coach-annotated indicators (ICC > 0.85). The main contribution is the integrated detection-to-indicator framework. The ensemble detector is a practical engineering solution rather than a methodological advance. Current limitations include near-side-only analysis, single-view broadcast input, and a moderately sized dataset.
Get full access to this article
View all access options for this article.
