Introduction of terms (concept, sample, attributes and its types). Data preparation – handling of missing and noise data. Knowledge representation formats – decision tables and trees, classification and association rules, clusters. Simple algorithms – 1R, Naive Bayes, covering algorithms (ID3, Prism), mining association rules, linear models, instance based learning (NN method). Evaluation methods – training and test data, cross-validation, leave-out-one, bootstrap, counting the cost, evaluating numeric predictions, MDL principle. Complex algorithms – C4.5, support vector, model tree, generalization of clusters. Attribute selection, data cleansing, combining multiple models. Bioinformatical applications.
Data Mining, Practical Machine Learning Tools and Techniques with Java Implementations, by Ian Witten and Eibe Frank, 2000, Morgan Kaufmann Publishers.