Enhancing Feature Selection Using Word Embeddings

Vasileios Lampos; Bin Zou; Ingemar Johansson Cox

doi:10.1145/3038912.3052622

What is it about?

Compared to traditional disease surveillance approaches that are based on data collected from existing healthcare systems —for example GP consultations, hospitalisations, or laboratory tests— web search data provide more timely estimates, offer a broader demographic and geographic coverage, and can also be considered as a low-cost solution. However, previous models using web search activity data were not always successful in capturing out-of-sample disease rates. In this paper, we focus on one aspect of modelling and propose a method that improves the selection of search queries by combining their temporal patterns and their meaning. Our experiments indicate that our approach improves model accuracy by more than 12% compared to established baselines.

Photo by Edho Pratama on Unsplash

Why is it important?

We propose a method that combines the temporal patterns and the semantic interpretation (using word embeddings) of search queries in determining which ones might be more suitable features for models that estimate influenza rates based on web search activity. Our approach improved the accuracy of flu rate estimates by > 12% across 3 flu seasons in England.

Perspectives

This paper presents a quite important milestone for models that estimate disease prevalence based on web search activity. To this end, it was definitely the basis for establishing our model as a reliable resource for syndromic surveillance by the UK government (see https://fludetector.cs.ucl.ac.uk/).
Vasileios Lampos
University College London

This page is a summary of: Enhancing Feature Selection Using Word Embeddings, April 2017, ACM (Association for Computing Machinery),
DOI: 10.1145/3038912.3052622.
You can read the full text:

Read

Resources

URL
Flu Detector
Flu Detector uses Google search data to estimate influenza-like illness (flu) rates in England.

Contributors

The following have contributed to this page

Vasileios Lampos
University College London

Improving web search selection for influenza prevalence models

What is it about?

Why is it important?

Perspectives

Resources

Flu Detector

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Improving web search selection for influenza prevalence models

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Resources

Flu Detector

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management