Semantic recognition of duplicates using Textual Entailment (SemDupl) is a research project funded by the German Research Foundation (DFG). The aim of this project is to develop methods for recognizing duplicates of texts where both deep and shallow methods should be employed.
One of the most important application scenarios is the recognition of plagiats of texts from the Web. Moreover, recognition of duplicates is also vital for text summarization, question answering, and information retrieval.
A special challenge of this project is detecting duplicates where the contents of texts are identical but expressed in a completely different way. Using shallow methods alone such duplicates can usually not been recognized.