What is it about?
This study presents the first large-scale Turkish medical language model, trained on 167,732 real patient-doctor question–answer pairs collected from a verified Turkish medical advice platform. The dataset captures authentic language used in everyday clinical communication, allowing the model to understand and respond naturally in Turkish. Using modern techniques such as Low-Rank Adaptation (LoRA) and spherical linear interpolation (Slerp) merging, the model was fine-tuned on top of open large language models (LLMs) to improve medical reasoning and linguistic accuracy. The resulting Turkish Medical LLM can assist doctors, patients, and researchers by providing reliable, context-aware responses in Turkish healthcare settings.
Featured Image
Photo by National Cancer Institute on Unsplash
Why is it important?
Medical AI tools are typically developed in English, leaving low-resource languages like Turkish underserved. This work addresses that gap by creating a domain-specific model trained on high-quality, native Turkish medical data. The open dataset of 167K real interactions provides an unprecedented resource for evaluating and training medical language models in Turkish. The study also demonstrates that local, data-driven adaptation can significantly improve healthcare accessibility, data quality, and patient safety in multilingual contexts.
Perspectives
Developing this dataset and model was deeply rewarding because it showed how linguistically localized AI can make healthcare more inclusive. Creating a medical AI that “speaks Turkish like a doctor” required combining NLP expertise with real-world medical dialogue — bridging the gap between AI research and patient needs. I believe this project sets a foundation for future domain-specific language models in Turkish, particularly in healthcare, law, and education.
M. Ali Bayram
Yildiz Teknik Universitesi
Read the Original
This page is a summary of: Healthcare-Focused Turkish Medical LLM: Training on Real Patient-Doctor Question-Answer Data for Enhanced Medical Insight, ACM Transactions on Asian and Low-Resource Language Information Processing, October 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3772000.
You can read the full text:
Resources
Türkçe Tıbbi Soru-Cevap Veri Seti: 167 Bin Sağlık Sorusu ve Cevabı
This dataset, “Türkçe Tıbbi Soru-Cevap Veri Seti,” is a comprehensive collection of 167,732 health-related questions and answers sourced from DoktorSitesi.com. It is an invaluable resource for researchers, developers, and healthcare professionals involved in natural language processing (NLP) projects, especially those focused on health applications in the Turkish language.
Healthcare-Focused Turkish Medical LLM: Training on Real Patient-Doctor Question-Answer Data for Enhanced Medical Insight
This study introduces a specialized Turkish Medical LLM fine-tuned on over 167,732 real patient-doctor question-answer pairs sourced from a trusted medical platform and capturing authentic linguistics in Turkish medical language.
Contributors
The following have contributed to this page







