Eddine, K. S., Bagaa, M., Ouahouah, S., Ouameur, M. A. et Ksentini, A. (2025). Tail-Latency Aware Scheduler For Inference Workloads. Dans 2025 International Wireless Communications and Mobile Computing (IWCMC) DOI 10.1109/IWCMC65282.2025.11059653.
Prévisualisation |
PDF
Télécharger (3MB) | Prévisualisation |
Résumé
In recent years, AI inference has seen widespread adoption across fields like finance and healthcare, driving significant demand for high-performing applications. This demand brings about a complex relationship between inference application types, such as real-time applications, and their specific service level objectives (SLOs), like tail-latency. Tail-Latency is a metric requiring a defined percentage of requests to meet a maximum response time, which is crucial for applications where delays can impact user experience or decision-making. This dependency creates a challenging research problem in scheduling inference workloads. The core question becomes: How can we deploy AI workloads in a way that minimizes SLO violations?Specifically, we worked on real-time applications that require tail-latency guarantees. To address this, we developed a tail-latency-aware scheduler designed for resource-constrained devices. Our scheduler employs advanced machine learning techniques to optimize task placement, aiming to minimize SLO violations and enhance performance for latency-sensitive applications. We have developed and integrated our custom scheduler into Kubernetes, which operates on a specially configured cluster designed to test its performance. This cluster features diverse computing capabilities, enabling a comprehensive evaluation of the scheduler’s effectiveness. The experimental results highlight that our proposed scheduler outperforms the native Kubernetes scheduler in terms of efficiency.
| Type de document: | Document issu d'une conférence ou d'un atelier |
|---|---|
| Mots-clés libres: | Performance evaluation Wireless communication Adaptation models Processor scheduling Computational modeling Predictive models Real-time systems User experience Time factors Random forests Cloud-Edge-Computing-Continuum Scheduling Kubernetes |
| Date de dépôt: | 03 juill. 2026 15:19 |
| Dernière modification: | 03 juill. 2026 15:19 |
| Version du document déposé: | Post-print (version corrigée et acceptée) |
| URI: | https://depot-e.uqtr.ca/id/eprint/12963 |
Actions (administrateurs uniquement)
![]() |
Éditer la notice |


Statistiques de téléchargement
Statistiques de téléchargement