What is it about?

This study presents an artificial intelligence system that can detect fire, smoke, and people inside indoor environments such as rooms, halls, kitchens, warehouses, and industrial areas. The system uses a Vision Transformer model to analyze images and recognize dangerous situations more accurately. The goal of this work is to support smart surveillance systems, early fire warning, and emergency response by helping detect fire-related risks and human presence in real time. The model was trained and tested on indoor image data and achieved strong results, showing high accuracy, precision, recall, F1-score, and mean average precision. This makes the proposed approach useful for improving safety monitoring in buildings and closed spaces.

Featured Image

Why is it important?

This work is important because early detection of fire and smoke can help reduce risks to people and property inside buildings. Traditional camera-based systems may struggle in complex indoor scenes, especially when fire, smoke, and people appear together or when visibility is affected by low light or dense smoke. This study uses a Vision Transformer model to analyze the whole image and better understand the relationship between fire, smoke, and human presence. The results show strong performance, with high accuracy, recall, F1-score, and mean Average Precision, which makes the approach useful for smart surveillance, early warning systems, and emergency response applications.

Read the Original

This page is a summary of: A Vision Transformer (ViT) Based Approach for Real-Time Indoor Fire, Smoke, and Human Detection, January 2026, Springer Science + Business Media,
DOI: 10.1007/978-3-032-23219-9_39.
You can read the full text:

Read

Contributors

The following have contributed to this page