What is it about?

The paper introduces easy way to compare documents. There are number of factors identified that allow to answer the question how much one document was built by copying content from the other one. The method uses hashtags for coding of words. In the effect a vector of values is produced that can be used for identification algorithm to assess the plagiarism ratio. The method can be customized according to the field and specificity of texts. It also keep anonymity of sources if needed.

Featured Image

Why is it important?

There is lots of different methods to assess plagiarism, however not all of them take in the account the specificity of field of document type. It was the reason to introduce this method, that can be suited according to needs.


I truly hope that appropriate methods will help to make our work easier and more reliable. In nowadays research many people try to take shortcuts by using other works. It is much better to cite the someones article than to rewrite it !!! I hope that either this method or any other rises the quality of papers and thus research.

Dr Krystian Wojtkiewicz
Politechnika Wroclawska

Most of the metrics showed in the paper might be used for assessing simiarity of texts. The paper itself is a resource for algorithms as well as ideas that might be used for text anonimisation.

Dr Marek Krótkiewicz
Politechnika Wroclawska

Read the Original

This page is a summary of: Features for Text Comparison, Springer Science + Business Media, DOI: 10.1007/978-3-540-68168-7_52.
You can read the full text:




The following have contributed to this page