What is it about?

Compared to traditional disease surveillance approaches that are based on data collected from existing healthcare systems —for example GP consultations, hospitalisations, or laboratory tests— web search data provide more timely estimates, offer a broader demographic and geographic coverage, and can also be considered as a low-cost solution. However, previous models using web search activity data were not always successful in capturing out-of-sample disease rates. In this paper, we focus on one aspect of modelling and propose a method that improves the selection of search queries by combining their temporal patterns and their meaning. Our experiments indicate that our approach improves model accuracy by more than 12% compared to established baselines.

Featured Image

Why is it important?

We propose a method that combines the temporal patterns and the semantic interpretation (using word embeddings) of search queries in determining which ones might be more suitable features for models that estimate influenza rates based on web search activity. Our approach improved the accuracy of flu rate estimates by > 12% across 3 flu seasons in England.

Perspectives

This paper presents a quite important milestone for models that estimate disease prevalence based on web search activity. To this end, it was definitely the basis for establishing our model as a reliable resource for syndromic surveillance by the UK government (see https://fludetector.cs.ucl.ac.uk/).

Vasileios Lampos
University College London

Read the Original

This page is a summary of: Enhancing Feature Selection Using Word Embeddings, April 2017, ACM (Association for Computing Machinery),
DOI: 10.1145/3038912.3052622.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page