Meta menu:

From here, you can access the Emergencies page, Contact Us page, Accessibility Settings, Language Selection, and Search page.

Open Menu

Survey "Influence of the Pandemic on Undergaduate Medical Studies" (AI-supported Analysis)

Authors: Dr. Maren März, Victoria Sehy, Jana Struzena

Last update:

30.04.2021 editorial corrections


You are here:


The survey was conducted in January 2021 within the Progress Test Medicine cooperation.

A total of 2 715 students from 11 faculties participated.


Student entries contained a wealth of information, touching many topics and also including disadvantages, advantages and emotions in all of their answers, independendly of the question. Because of that, assigning these responses to only one main topic was not entirely straightforward. There were also instances where very similar sentences would express opposite meanings, thus making the training of the Algorithm somewhat challenging.  


  • Students moved their place of learning to their homes, and most of them studied alone. The libraries in particular were missing. 
  • Students see the online lectures as a clear advantage. The discontinuation of face-to-face lectures also brings a gain in time and more flexibility.
  • The omission of practical lessons - and the lack of organisation and communication of the faculties is seen as a disadvantage. Time and organisation of assessments in particular are unclear. Another disadvantage is the lack of social contacts.
  • For a part of the students, there is no change in their emotional state in relation to their studies. However, a part suffers from the lack of social contacts. Students feel lonely and demotivated. Students lose the fun of studying. 


The survey was anonymous.

Positive vote of the Ethics Committee: EA4/242/20


A Closed questions:

1. At which faculty are you studying medicine?

2. In which semester are you studying medicine?

3. How satisfied were you with your overall exam results in the summer semester 2020?


B Open questions:

4. How did you adapt your learning behaviour to the Corona circumstances? (Adaptation)

5. What were the most positive changes in your learning environment during the Corona semesters (Summer Semester 20 & Winter Semester 20/21)? (Advantages/positive changes)

6. What were the most negative changes in your learning environment during the Corona semesters (Summer Semester 20 & Winter Semester 20/21)? (Disadvantages/negative changes)

7. Did your emotional state change in relation to your studies during the Corona semesters (Summer Semester 20 & Winter Semester 20/21)? If yes, to what extent? If not, please write "no" (Emotional state).


The questionnaire was prepared by Victoria Sehy and Jana Struzena together with Susanne Werner from the assessment-team of the Charité - Universitätsmedizin, based on (Schauber et. al, 2015, Mahdy 2020).

A Closed Questions

1. Faculties and 2. Semesters

Faculties were pseudonymised.

Missings regarding the faculty: 93

Missings regarding the semester: 50

3. Satisfaction with the examination results (performance)

There was a choice of 5 smileys ranging from very sad smiley = 1 to very happy smiley = 5.

N = 2574 

Mean = 3.7; SD = 1.2 ; Median = 4

Overall, students across all semesters and faculties were quite satisfied with their performance in the summative examinations. One university (faculty 2) stands out with particularly satisfied students. Only one student from faculty 6 participated, so the graph is empty.

In terms of semesters, the first semester, as well as the 11th and 12th semesters (i.e. the semesters after the 2nd state examination) differ from the distribution pattern of the remaining semesters.



B Open Questions

The vast majority of surveys use questions with predetermined responses (e.g. scale or predetermined categories). Open questions are traditionally considered more difficult to analyse than their closed counterparts, as human coding or tagging is usually used (Roberts et al., 2014).

Increasingly, topic analysis or topic modelling is being done through machine learning. In its simplest form, this is an automated, unsupervised process (Ghahramani, 2004). Based on the assumption that each document contains a certain number of topics, so-called "bags of words" (BOW) are extracted, i.e. words that are statistically close to each other. It is possible to derive Topics from these BOW (Campbell et al., 2015).

LDA (latent Dirichlet Allocation)

We used the Latent Dirichlet Allocation algorithm (LDA). This algorithm is based on the idea that documents are represented as random mixtures over latent topics, where each topic is characterized by a probability distribution over words. LDA assigns both, words and documents (defined as sequences of words) to one or more topics. Each document is assigned to the "dominant" topic, i.e. the one with the best fit, while in the case of words the probability values of each topic remain relevant.

With LDA, the number of topics must be determined in advance, the corresponding words are determined arithmetically.

Two further essential parameters are the alpha and beta parameters, each of which can assume a value between 0 (theoretically) and 1.

The alpha parameter (in our case doc_topic_prior) represents the document-topic density: the higher the value, the more topics the individual documents consist of. This leads to the topic distribution per document.

The beta parameter (topic_word_prior) represents the topic word density, which leads to a more specific word distribution per topic (Blei et al., 2003).  

The algorithm assigns a number to each topic and determines the frequency of words for the topic.

Data preparation

Before any analysis, the data must be prepared. Each of the following processing steps involves a loss of information. However, most of the steps are helpful in capturing context (Campbell et al., 2015).

  • All letters were converted to lower case letters
  • We used lemmatization (spacy - and after an analysis of the texts also manual ('Bib' -> libraries)).
  • We identified two so-called two-grammes ('gar nicht' in the sense of 'not at all' and 'zu hause' in the sense of 'at home'), which we included in the data processing.
  • We removed so-called stop words, but kept some words that were important in the context of the survey (e.g. 'not' or 'no').

We use the sklearn CountVectorizer to build feature vectors from the entries.

We used gridsearch for model optimisation (Buitinck et al., 2013) of  three parameters (number of topics, alpha and beta).



Topic modelling was done with python 3.9 (Python, n.d.). For processing and analysis we used the packages NLTK (Natural Language Toolkit, n.d.), sklearn (Pedregosa et al., 2011) and spaCy (SpaCy, n.d.). The visualisation was done with seaborn (Waskom & the seaborn development Team, 2020) and matplotlib (Hunter, 2007).

Explanation of the visualisations we use

To visualise the topics, t-SNE (t-distributed Stochastic Neighbor Embedding) is used: each document receives a pair of values in the two-dimensional space. The algorithm is non-linear and adapts to the underlying data by performing different transformations for different regions. Each point represents a document, coloured according to its associated topic. (van der Maaten & Hinton, 2008).

4. Question on adaptation due to the pandemic

The biggest adjustment was the switch to learning at home instead, mostly alone. The library as a place of learning has disappeared (Topic 1). Students had to structure their learning and everyday life by themselfs (Topic 0).

Overall, more time was available or more time was invested (although with varying degrees of success) (Topic 3). The time could be used more efficiently because there was no longer any need to commute to the faculty and the lectures were available online (Topic 4).

Finally, there were also students who did not change their learning behaviour at all or only slightly. These are students who were already used to studied alone at home and students who began their studies during the pandemic. Some students find it difficult to adapt, and this leads to uncertainties, e.g. with regard to exams (Topic2).

Topics=5, doc_topic_prior=1 , topic_word_prior=0.6,

444 different words

Not assigned to a topic: 21 documents/entries

Entries: 2265, Missings: 450




Topic 1    25 %    Study at home, mostly alone, instead of in the library

Topic 3    22 %    Overall more time had or invested, with varying degrees of success

Topic 0    20 %    Structuring own learning and everyday life

Topic 4    17 %    Better use of time, as there was no need to travel and lectures were available online

Topic 2    16 %    No/low adjustment

5. Question Regarding Advantages/Positive Changes

The digital lectures are a clear advantage, with three aspects highlighted in particular:

Overall, there is more time available because the face-to-face lectures are cancelled (Topic 0).

This also allows for more flexible scheduling (Topic 1).

In addition, the digital lectures enable learning at one's own pace - they can be viewed faster, slower and repeatedly as needed (Topic 3).

Some of the students, however, did not perceive any positive changes (Topic 2).

Topics = 4, doc_topic_prior = 1,topic_word_prior = 0.5,

323 different words

Not assigned to a topic: 73

Entries: 2306, Missings: 409




Topic 0    31 %    More time due to cancelled lectures

Topic 3    25 %    Learning at one's own pace

Topic 1    24 %    More flexible time management

Topic 2    20%     No positive changes

6. Questions Regarding Disadvantages/ Negative Changes

The main disadvantage is the lack of social contact, especially with fellow students, but also with lecturers (Topic 0) and the loss of practical lessons (Topic 3).

Learning from home - with more distractions and less compensation and the closed libraries were perceived as disadvantages (Topic 1).

Students feel left alone by the faculties. The cancellation of practical lessons and the overall poor organisation (Topic 4), but also the unclear situation and lack of communication, especially with regard to exams (Topic 2), are seen as a major disadvantage.

Topics = 5, doc_topic_prior = 1, topic_word_prior = 1

461 words

Not assigned to a topic: 50

Entries: 2333, Missings: 382




Topic 0     25 %    Little contact with fellow students, overall lack of social contacts

Topic 1     21%     Unfamiliar learning environment, no library

Topic 4    19 %     Left alone, poor organisation, no practical lessons

Topic 2    18 %     Unclear situation, lack of communication from the faculty, especially regarding exams

Topic 3    17 %     No practical lessons, more distractions, little variety

7. Question about the Emotional State

The majority of students are not emotionally burdened by the pandemic (Topic 101). A small number are heavily burdened - and suffer from loneliness, stress and even depression (Topic 100). Motivation is also a key issue - while most are less motivated, some have become more motivated. There were also fluctuations between the summer and winter semester (in both directions) (Topic 0).

Lack of organisation and communication and lack of practical lessons lead to doubts about studies and profession, alienation also occurs  (Topic 3). There is a lack of exchange with fellow students. This leads to uncertainty and demotivation. The assessment of one's own knowledge and learning deteriorates, students feel left alone by the faculty and develop examination anxiety (Topic 2).

Topics = 4,doc_topic_prior = 1, topic_word_prior = 0.5,

441 individual words

No topic to assign: 19

Entries: 2462, Missings: 253





Topic 1: 36 % we performed a separate topic modelling within this topic

              Topic 101   28 %    (79 % of Topic 1) No change in emotional state in relation to studies

              Topic 100     8 %    (21 % of Topic 1) Loneliness, stress, lack of motivation up to depression

Topic 0    24 %    The central point here is motivation: motivation decreased for most, but increased for some. There were also fluctuations between the summer and winter semesters.

Topic 3    20 %    Organisation and communication are perceived as poor, in some cases there is alienation from studies and work.

Topic 2    20 %    Lack of exchange with fellow students, insecurity, demotivation, left alone by faculty, fear of exams


Detailed Look at Topic 1

The screening of the entries, the tSNE representation, and the frequency distribution show some particularities within this topic. Latter shows that the main words of this topic are "no" and "yes".

We therefore carried out a topic modelling for all entries that were assigned to this topic, with the number of "subtopics" set to 2.

Subtopic 101 are students who are not experiencing emotional distress regarding their studies. They simply wrote "no" (as indicated in the instructions).

Subtopic 100 are students who suffer from loneliness, stress and lack of motivation, even depression.

These two emotionally contrasting groups of students were assigned to the same topic because they share the word "no". 

Heatmaps: Zuordnung einer jeden Eingabe (eines jeden Dokumentes) zu den Topics


Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 139–159.

Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O., Niculae, V., Prettenhofer, P., Gramfort, A., Grobler, J., Layton, R., Vanderplas, J., Joly, A., Holt, B., & Varoquaux, G. (2013). API design for machine learning software: experiences from the scikit-learn project. 1–15.

Campbell, J. C., Hindle, A., & Stroulia, E. (2015). LATENT DIRICHLET ALLOCATION: EXTRACTING TOPICS FROM SOFTWARE ENGINEERING DATA. In The Art and Science of Analyzing Software Data. Elsevier Inc.

Ghahramani, Z. (2004). Unsupervised Learning (O. Bousquet, U. von Luxburg, & G. Rätsch (Eds.); pp. 72–112). Springer Berlin Heidelberg.

Hu, Y., Boyd-Graber, J., & Satinoff, B. (2011). Interactive topic modeling. Proceedings Ofthe 49th Annual Meeting Ofthe Association for Computational Linguistics, 248–257.

Hunter, J. D. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering, 9(3), 90–95.

Mahdy MAA (2020) The Impact of COVID-19 Pandemic on the Academic Performance of Veterinary Medical Students. Front. Vet. Sci. 7:594261.doi: 10.3389/fvets.2020.594261

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., & Duchesnay, É. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830.

Python. (n.d.). Retrieved October 25, 2019, from

Roberts, M. E., Stewart, B. M., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S. K., Albertson, B., & Rand, D. G. (2014). Structural topic models for open-ended survey responses. American Journal of Political Science, 58(4), 1064–1082.

spaCy. (n.d.).

Schauber SK, Hecht M, Nouns ZM, Kuhlmey A, Dettmer S. The role of environmental and individual characteristics in the development of student achievement: a comparison between a traditional and a problem-based-learning curriculum. Adv Health Sci Educ Theory Pract. 2015 Oct;20(4):1033-52. doi: 10.1007/s10459-015-9584-2. Epub 2015 Jan 24. PMID: 25616720.

van der Maaten, L., & Hinton, G. (2008). Visualizing Data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.

Waskom, M., & the seaborn development Team. (2020). Seaborn.


Research Resource Identifiers

MatPlotLib, RRID:SCR_008624

Python Programming Language, RRID:SCR_008394

scikit-learn, RRID:SCR_002577

seaborn, RRID:SCR_018132