What is it about?
Compared to traditional disease surveillance approaches that are based on data collected from existing healthcare systems —for example GP consultations, hospitalisations, or laboratory tests— web search data provide more timely estimates, offer a broader demographic and geographic coverage, and can also be considered as a low-cost solution. However, previous models using web search activity data were not always successful in capturing out-of-sample disease rates. In this paper, we focus on one aspect of modelling and propose a method that improves the selection of search queries by combining their temporal patterns and their meaning. Our experiments indicate that our approach improves model accuracy by more than 12% compared to established baselines.
Featured Image
Photo by Edho Pratama on Unsplash
Why is it important?
We propose a method that combines the temporal patterns and the semantic interpretation (using word embeddings) of search queries in determining which ones might be more suitable features for models that estimate influenza rates based on web search activity. Our approach improved the accuracy of flu rate estimates by > 12% across 3 flu seasons in England.
Perspectives
This paper presents a quite important milestone for models that estimate disease prevalence based on web search activity. To this end, it was definitely the basis for establishing our model as a reliable resource for syndromic surveillance by the UK government (see https://fludetector.cs.ucl.ac.uk/).
Vasileios Lampos
University College London
Read the Original
This page is a summary of: Enhancing Feature Selection Using Word Embeddings, April 2017, ACM (Association for Computing Machinery),
DOI: 10.1145/3038912.3052622.
You can read the full text:
Resources
Contributors
The following have contributed to this page







