You are here:
Data Preparation
Data preparation is the step after data collection. It is the process of cleaning and transforming the raw data collected. This includes, but is not limited to, the following:
- Defining the variables used
- Defining outliers and how to deal with outliers. This mainly involves the definition of serious or non-serious test participation
- missing values
- one-hot-encoding of categorival features
- dealing with class imbalances (e.g. number of students per semester or faculty)
- The data we receive for analysis are anonymised.
Non-serious Testtakers
Participation in the PTM is compulsory for most faculties. However, students can usually answer as many or as few questions as they like - including none at all. The actural feedback is based on the number of correctly or incorrectly answered questions. It therefor improves with more answered questions, as more information is available about the level of knowledge. The "non-serious test participations" make the evaluation more difficult, they distort the comparison groups. For this reason, these participations are not included in the dataset.
There are various approaches and criteria that have been used to determine the lack of seriousness. The shift from paper-pencil to digital examinations meakes a new evaluation of these approaches necessary.