Summary of topics offered - Department of Informatics (FBE)

Basic information

Type of work: Diploma thesis
Comparison of open-source data mining tools for textual data analysis
State of topic:
approved (prof. Ing. Cyril Klimeš, CSc. - head of department)
Thesis supervisor:
doc. Ing. František Dařena, Ph.D.
Faculty: Faculty of Business and Economics
Supervising department:
Department of Informatics - FBE
Max. no. of students:5
Proposed by:
Summary: For mining knowledge from textual data, a variety of open-source solutions can be used. These solutions implement many commonly used machine learning algorithms. Differences can be seen in the possibilities of the process of transforming the raw data into a suitable format, the technological possibilities of the programs (memory management, speed), the variety of provided outputs, the connection of simple steps to more complicated tasks, etc. The aim of the thesis is to propose experiments employing inductive supervised and unsupervised learning, carry them out with selected open-source tools (c5, Weka, SVMlight, Cluto, R, Octave, Python, Perl), and evaluate the suitability of deploying these tools for specific types of tasks on the basis of the specified criteria.

