Diploma thesis
Intrinsic Plagiairsm Detection
There are two approaches to plagiarism detection. Extrinsic plagiarism detection looks for similarities across different documents, intrinsic plagiarism detection looks for dissimilarities within one document. Crucial presumption is that different authors have different style of writing, which allows their identification. Given suspicious document, the goal is to identify passages with different (stylometric) characteristic. Given the set of these passages, the next goal is to group them by authorship. The task of the student will be to identify prospective stylometric features, implement intrinsic plagiarism detector and test it on PAN corpus. Outperforming contesters in PAN competition is welcome, but not necessary :-) The thesis can be elaborated in Czech or English.

