Information and Knowledge Society

Parameter-free Agglomerative Hierarchical Clustering to Model Learners' Activity in Online Discussion Forums

Doctoral Programme on the Information and Knowledge Society
22/04/2014

Author: Germán Cobo Rodríguez
Programme: Doctoral Programme on the Information and Knowledge Society
Language: English
Supervisors: Dr Eugènia Santamaría Pérez and Dr José Antonio Morán Moreno
Faculty / Institute: Internet Interdisciplinary Institute (IN3)
Subjects: Computer Science, Higher Education, Universities
Key words: Parameter-free clustering, Educational data mining, Learner behaviour modelling
Area of knowledge: Artificial Learning and Data Mining in Education

+ Link to project

Summary

The analysis of learners' activity in online discussion forums leads to a highly context-dependent modelling problem, which can be posed from both theoretical and empirical approaches. When this problem is tackled from the data mining field, a clustering-based perspective is usually adopted, thus giving rise to a clustering scenario where the real number of clusters is a prior unknown. Hence, this approach reveals an underlying problem, which is one of the best-known issues of the clustering paradigm: the estimation of the number of clusters, habitually selected by user according to some kind of subjective criterion that may easily lead to the appearance of undesired biases in the obtained models.

With the aim of avoiding any user intervention in the cluster analysis stage, two new cluster merging criteria are proposed in the present thesis, which allow the implementation of a novel parameter-free agglomerative hierarchical algorithm. A complete set of experiments indicate that the new clustering algorithm is able to provide optimal clustering solutions in the face of a great variety of clustering scenarios, both having the ability to deal with different kinds of data and outperforming clustering algorithms most widely used in practice.

Finally, a two-stage analysis strategy based on the subspace clustering paradigm is proposed to properly tackle the issue of modelling learners' participation in the asynchronous discussions. In combination with the new clustering algorithm, the proposed strategy proves to be able to limit user's subjective intervention to the interpretation stages of the analysis process and to lead to a complete modelling of the activity performed by learners in online discussion forums.