Learning Multimodal Affinities for Textual Editing in Images

Or Perel; Oron Anschel; Omri Ben-Eliezer; Shai Mazor; Hadar Averbuch-Elor

doi:10.1145/3451340

What is it about?

Documents may convey ideas with more than just text: e.g: font characteristics, geometric position & indention are also indicators for the function of words in a document. This paper aims to make the task of group editing of related words in a document easier, by exploiting language, visual and geometric similarities of words in a document. Words are clustered together according to these features, and optionally additional user constraints. Users may then perform quick group editing on these clusters. The method relies on an optimization, which is orchestrated by an unsupervised siamese network. No training set is assumed.

Photo by Patrick Tomasso on Unsplash

Why is it important?

Application wise, our work paves the way for edit propagation among textual entities in a document. For example - users may highlight, delete or indent a group of words at the same time, by quickly using our method to figure out which groups compose which words. Our work considers language, font and geometric characteristics at the same time, ensuring clusters represent meaningful word groups in a document. In the case that the automatic clustering is non-satisfactory, a user may add constraints and refine the process. Hopefully, conclusions from our work may also facilitate additional research on unsupervised deep learning involved with multimodal features of language, vision and geometry.

Perspectives

This publication have taught us a great deal about how we, as humans, tend to structure our ideas in documents. By studying what kinds of clusters are formed, we learned that altering font or geometric position, words may implicitly imply new ideas which are not explicitly stated using language. We hope readers may find this work inspiring and useful, both for practical applications and further research.
Or Perel
Tel Aviv University

This page is a summary of: Learning Multimodal Affinities for Textual Editing in Images, ACM Transactions on Graphics, July 2021, ACM (Association for Computing Machinery),
DOI: 10.1145/3451340.
You can read the full text:

Read

Contributors

The following have contributed to this page

Or Perel
Tel Aviv University

Text clustering in document images for group editing.

What is it about?

Why is it important?

Perspectives

Contributors

You might also like

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Text clustering in document images for group editing.

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

You might also like

Supervised Image segmentation based on Superpixel and Improved Normalized Cuts

Prediction of sepsis patients using machine learning approach: A meta-analysis

Heritage IV: new system installation at Central School of Speech and Drama

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management