What is it about?

Documents may convey ideas with more than just text: e.g: font characteristics, geometric position & indention are also indicators for the function of words in a document. This paper aims to make the task of group editing of related words in a document easier, by exploiting language, visual and geometric similarities of words in a document. Words are clustered together according to these features, and optionally additional user constraints. Users may then perform quick group editing on these clusters. The method relies on an optimization, which is orchestrated by an unsupervised siamese network. No training set is assumed.

Featured Image

Why is it important?

Application wise, our work paves the way for edit propagation among textual entities in a document. For example - users may highlight, delete or indent a group of words at the same time, by quickly using our method to figure out which groups compose which words. Our work considers language, font and geometric characteristics at the same time, ensuring clusters represent meaningful word groups in a document. In the case that the automatic clustering is non-satisfactory, a user may add constraints and refine the process. Hopefully, conclusions from our work may also facilitate additional research on unsupervised deep learning involved with multimodal features of language, vision and geometry.

Perspectives

This publication have taught us a great deal about how we, as humans, tend to structure our ideas in documents. By studying what kinds of clusters are formed, we learned that altering font or geometric position, words may implicitly imply new ideas which are not explicitly stated using language. We hope readers may find this work inspiring and useful, both for practical applications and further research.

Or Perel
Tel Aviv University

Read the Original

This page is a summary of: Learning Multimodal Affinities for Textual Editing in Images, ACM Transactions on Graphics, July 2021, ACM (Association for Computing Machinery),
DOI: 10.1145/3451340.
You can read the full text:

Read

Contributors

The following have contributed to this page