Identifying Reading Fluency in Pupils with and without Dyslexia Using a Machine Learning Model on Texts Assessed with a Readability Application

• Measurement of readability is an important tool for assessing reading disorders such as dyslexia. Among the screening procedures for dyslexia is the reading fluency test, which is defined as the ability to read with speed, accuracy and proper expression. The reading fluency test often consists of a sequence of unrelated written texts ranging from simple short sentences to more difficult and longer paragraphs. In psychological testing instruments, subjective text assessment is often replaced by objective readability formulas, e.g., the Automated Readability Index. Readability formulas extract multiple features from a given text and output a score indicating the difficulty of the text. The aim of the pre-sent study is to build a machine learning model that discriminates be-tween pupils identified with dyslexia and a control group without dyslexia based on fluency in oral reading of texts assessed with a readability application developed within the project For the Quality of Slovenian Textbooks. We focus on differentiation between both groups of pupils by analysing data obtained from transcriptions of audio recordings of oral reading. The empirical study was conducted with 27 pupils aged 8 and 9 with officially diagnosed dyslexia and a control group without identified dyslexia.


Introduction
Reading and writing are basic skills that are taken for granted in today's society.They are key elements of literacy, enabling individuals to develop the skills of reflection, critique and empathy, leading to a sense of self-efficacy, identity and full participation in society.Among learning difficulties, it is reading difficulties that have a significant impact on an individual's educational success throughout life.Despite an education system that focuses on literacy development, there are still many pupils who leave primary school without adequately developed literacy skills and who are unable to overcome this deficit even in adulthood (Carpentieri, 2012).Learning to read is one of the most important outcomes of early education, and developing reading and writing skills as two key communicative skills are among the basic goals of teaching Slovenian in the first educational period in primary school (Poznanovič et al., 2018).There are increasing numbers of pupils in schools who have difficulties in learning to read and write due to dyslexia (Snowling et al., 2020).Moreover, difficulties in reading also lead to difficulties in other areas of learning, including writing, spelling, reading fluency and comprehension (Moats & Dakin, 2008;Shaywitz, 2003).
The best-known and most widely researched specific learning difficulty is dyslexia, which is a neurophysiologically conditioned reading disorder originating from a developmental or central nervous peculiarity (Magajna et al., 2015;Raduly Zorgo et al., 2010).It includes a group of diverse but interrelated factors that are part of the individual and affect him/her and his/her functioning throughout life (Magajna et al., 2015;Raduly Zorgo et al., 2010).Dyslexia is characterised by difficulties in accurate and/or fluent word recognition, poor spelling and poor decoding skills, all of which affect reading acquisition, reading comprehension and writing (IDA, 2002).The difficulties are not limited to reading and spelling; there are also difficulties with sustaining attention and automating new knowledge, as well as with gross and fine motor skills (Nicolson & Fawcett, 1990, 2007;Rose, 2009).In addition to neurological differences, dyslexia is also associated with cognitive difficulties that can affect organisational skills, numeracy and other cognitive and emotional abilities (Rose, 2009).People with dyslexia can be extremely talented and original when it comes to solving different types of problems and often have good visual skills (Nijakowska, 2016).Approximately seven percent of children and adolescents in the population have dyslexia (Hulme & Snowling, 2016).It is more common in males and often co-occurs with other developmental disorders, such as specific language disorder, attention-deficit/hyperactivity disorder (ADHD) or developmental coordination disorder (dyspraxia) (Hulme & Snowling, 2016).
Dyslexia affects the ability to decode or transfer phonological skills to spelling.Over the last decade, decoding skills and phonological awareness in pupils with reading difficulties have been identified as serious inhibitors of successful reading (Klingner et al., 2007), as they affect the fluency of reading.Decoding depends primarily on letter knowledge and phonological skills, which include phonological awareness (Hulme & Snowling, 2009).Phonological awareness is the ability to recognise and manipulate phonemes and is a strong predictor of the development of decoding skills or the successful onset of learning to read (Hulme & Snowling, 2009).Inefficiency in performing these skills can lead to reading being a slow and difficult process (Anderson, 1999;Erbeli & Pizorn, 2012;Segalowitz et al., 1991) and may even lead to a decrease in motivation for reading (Erbeli & Pizorn, 2012).
For many years, experts in the field of reading disabilities have agreed that phonological deficits are a primary cause of dyslexia, as they directly affect learning to read (Snowling & Hulme, 2012).Such deficits are therefore an early and strong predictor of dyslexia (Mather & Wendling 2012).For pupils with dyslexia, difficulties in learning to read accurately and at an adequate speed (reading fluency) are usually at the forefront (Snowling & Hulme, 2012).Even when a pupil achieves adequate reading accuracy, it is significantly more difficult to achieve adequate reading speed with treatment (Fletcher et al., 2007).Young pupils with dyslexia are characterised by (Rief & Stern, 2010) Such pupils have good comprehension of material read to them as opposed to material they attempt to read themselves (Rief & Stern, 2010).

Screening and assessment of dyslexia in Slovenia
In Slovenia, pupils with mild to moderate dyslexia receive adapted methods and forms of teaching and testing under the Primary Education Act (Primary Education Act ZOsn-UPB3, 2006), while pupils with severe dyslexia receive more intensive accommodations and additional professional support under the Act on the Guidance of Children with Special Needs (ZUOPP-1, 2011).The process of identification and diagnostic assessment of dyslexia, which requires a multidisciplinary team of professionals (psychologist, special and rehabilitation teacher, speech therapist), involves several stages, from detection, classification, support planning and progress monitoring to evaluation (Magajna, 2011).The first stage of identifying pupils with dyslexia (detection) is screening, which aims to identify students in need of diagnostic assessment and inform individuals of the likelihood of dyslexia (Pollak, 2009).Screening tests allow dyslexia to be confirmed in young pupils, thus enabling appropriate treatment to be implemented before they experience a sense of failure (Snowling, 2013).Tests used to detect dyslexia include phonological awareness tests, tests of reading aloud and silently (decoding, spelling, reading fluency -speed and accuracy), reading comprehension, rapid naming, memory, attention, etc.
In Slovenia, there are several tests for dyslexia-like reading and writing difficulties that test different elements of reading and writing (phonological awareness, reading speed and accuracy, reading automation, reading comprehension, dictated writing, written expression): The Reading and Writing Disability Test or Šali Test (Šali, 1971) (the test is only partially standardised for the population of children in the second grade); SNAP -Special Needs Assessment Profile (SNAP is not a test in the psychometric sense, but an instrument for gathering information about the pupil relevant to identifying potential difficulties in a particular skill) (Weedon & Reid, 2018); The One-Minute Test of Reading Aloud (Gradišar & Pečjak, 1991); The Reading Comprehension Test (Elley et al., 1995); The Reading Test (Pečjak et al., 2012b) (the test is a standardised measurement instrument that assesses general reading ability at the end of the first three years of primary school); The Reading Ability Assessment Scheme -OSBZ (Pečjak et al., 2012a) (the test is a standardised measurement instrument and the data collected with the OSBZ provide information about what reading skills the student has already developed); The Test of Reading Fluency Based on the Curriculum Model for Grades 2, 3 and 4 (Košir, 2011); and The Phonological Awareness Test (Magajna, 1994).
Early identification of dyslexia is a key to providing appropriate support and intervention for pupils with dyslexia.Due to the multidimensional nature of the disorder, a variety of tests and test batteries are used to effectively identify dyslexia.Good screening is important in order to distinguish pupils who are at risk of developing reading and writing disorders from those who are not.To identify reading difficulties, pupils are screened for various components of reading, such as phonological awareness, reading fluency (speed and accuracy of decoding), reading automaticity, reading comprehension, etc.
Information and communication technology (ICT) appears to be an increasingly important tool for dyslexia screening and the necessary interventions to address the specific learning difficulties and needs of individual learners (Drigas & Politi-Georgousi, 2019).ICT is an important factor in improving traditional methods of identifying dyslexia, as well as in exploring new perspectives on identifying individuals with dyslexia (Perera et al., 2016).Rooms (2000) highlights the potential benefits of using ICT for pupils with dyslexia in primary schools, emphasising the fact that it can be accessible and available without making pupils with dyslexia feel different or excluded.Multisensory approaches (auditory, oral, visual, kinaesthetic) and systems are incorporated to mitigate the difficulties of pupils with dyslexia (Rooms, 2000).Diagnostic assessment using ICT allows psychologists and other professionals to easily and quickly assess cognitive abilities and other important skills (Singleton, 2001).Interactive multimedia, virtual environments, neural networks, software, fuzzy logic, game-based techniques and mobile applications improve the effectiveness of traditional dyslexia screening procedures, with each approach offering sophisticated features that facilitate assessment procedures (Menghini et al., 2011).

Research problem and Research question
Dyslexia often manifests itself in young pupils through slow progress in learning to read and write.The difficulties are frequently reflected in poorer academic achievement and, consequently, lower self-esteem.It is therefore important to identify dyslexia as early as possible and treat it appropriately.This helps to prevent the stigmatisation of children and adolescents with dyslexia, to promote their inclusion in society and to reduce difficulties in adulthood.
The use of computer systems to identify pupils with dyslexia is already relatively well established worldwide.A wide range of software is available to teachers, from screening software to more detailed computer-based assessment batteries.Most computer-based dyslexia detection programs rely on assessments of reading and spelling skills as well as cognitive abilities such as phonological awareness and verbal memory, which support literacy development and are generally good predictors of dyslexia (Singleton et al., 2009).Both traditional tests and applications have their advantages and limitations.The advantage of traditional tests is the presence of an expert who administers the test while observing the pupil, checking the pupil's comprehension, adjusting the instructions so that the pupil understands them, and observing the pupil's attention span and possible fatigue.At the same time, the expert can encourage and support the pupil.The main disadvantages of traditional testing are the exposure of the individual and the time-consuming nature of the test.These factors can be eliminated with the help of an application.Moreover, the application can be used by several pupils at the same time, so that many pupils can be assessed in a short time, enabling at-risk pupils to be differentiated from those who are not at risk.The application also has advantages from a motivational point of view, as it often resembles a computer game rather than an assessment.
Our overall goal was to train a machine learning model to differentiate between pupils with identified dyslexia and a control group of pupils without dyslexia.In this context, our research problem was to identify the important parameters of pupils' oral reading fluency and to investigate whether we can use these parameters as features for our model.In order to determine the parameters of oral reading fluency, we first performed manual transcriptions of audio recordings and defined the types of errors that pupils made most frequently in reading.We defined the parameters of reading based on the defined error types for each word in the six texts obtained from the test battery of the Slovenian National External Assessment of Knowledge for third-grade pupils.Based on the values of these parameters for each pupil, we extracted a subset of the most important parameters and used machine learning methods to build models to classify pupils into one of two groups: 'identified dyslexia' or 'control' .

Participants
The participants of the study were 12 pupils with dyslexia officially diagnosed by experts from the Counselling Centre for Children, Adolescents and Parents Ljubljana and 15 pupils without identified dyslexia.The pupils were from six different primary schools in Ljubljana, from the third (n = 13) and fourth (n = 14) grades.The age of the participants ranged from 8 to 9 years.Five of the pupils in the third grade and seven in the fourth grade were officially diagnosed with dyslexia.We only included pupils who had a signed parental consent form confirming participation and storage of the collected data for further analysis.Participation was anonymous.We did not record the pupils' first and last names; we only recorded their age and whether they had already been diagnosed with dyslexia.

Instruments
As a research instrument, we used the desktop application PKP Dyslexia 4 to test skills that are typically less well developed in people with dyslexia.The application contains six tests (sequencing concept test, reading comprehension test, phonological awareness test, working memory test, reading aloud test and silent reading with an eye tracker), each of which comprises a series of tasks.The tests require the use of cognitive and language skills, which are key to successful reading and writing.In designing the tests, we followed the protocols for developing psychological tests according to international guidelines (e.g., various International Testing Commission guidelines) and the American Standards for Educational and Psychological Instruments (Standards for Educational and Psychological Testing, 2014), as well as guidelines for developers of computer-based psychological tests.Experts from various fields participated in the development.In this study, we present only the results of the Reading Aloud test used to test reading fluency (speed and accuracy/correctness/error).
The reading aloud test included six texts from the test battery of the Slovenian National External Assessment of Knowledge (CEAK) in the mother tongue (Slovenian) for third-grade pupils.This is the first national assessment of knowledge in mother tongue (Slovenian) proficiency in which pupils take part.For the purposes of the present study, the texts were selected from previous years' test batteries.The six texts were all informative and were about topics of general interest to the children (e.g., wild animals and a fairy tale).The level of difficulty of the texts was assessed using an application developed within the project For the Quality of Slovenian Textbooks (KaUč).According to the Automated Readability Index and the Coleman-Liau Index, which take word and sentence length as a criterion, each of the six reading tasks had acceptable reliability indices in the respective years in which they were administered to a national sample of students.The texts have a very similar difficulty level, with the exception of the text entitled The Mountain Gorilla, which is slightly more difficult but still much easier than average (see Table 1).
The texts used in the task vary in length.The shortest text contains 28 words, three of the texts contain about 40 words (36, 40 and 41 words, respectively), one is slightly longer at 57 words, while the longest text has 123 words.The difficulty level of the texts used to test reading fluency (speed and accuracy) is important, as they must be simple enough to be appropriate for pupils in the third and fourth grades.At the same time, the texts should contain enough specific features that might cause reading difficulties for pupils with dyslexia.
The texts used in the test battery of CEAK also contain rare words (Table 1) that are considered more difficult to process for pupils with dyslexia (Rüsseler et al., 2003;Suárez-Coalla & Cuetos, 2015).Pupils with dyslexia read the words they encounter frequently in texts faster and more accurately, so they become part of their reading vocabulary.Building a reading vocabulary is challenging for pupils with dyslexia, as they have difficulty learning and recognising new words in print.In pupils with dyslexia, there is often a discrepancy between their spoken vocabulary, which can be very large, and their reading vocabulary (Bailey, 2020).
Below we present the six texts included in our study and a graphical representation of their readability.The graphs included show (1) how the entered text compares with texts from the ccKres8 corpus in terms of readability, and (2) a histogram of the readability measures across texts in the ccKres corpus, where the red line shows where the evaluated text is located compared to all of the texts in the corpus (Škvorc et al., n. d.).The position of the red lines in the histograms above for texts 1 through 6 indicates where the scored text is placed compared to all of the texts in the corpus, thus showing that all six texts are relatively easy texts.

Graphical representation of readability for
The user interface is designed to attract pupils while ensuring that a single display does not contain unnecessary and distracting stimuli or too many elements at once.Information is displayed sequentially and in small sections.
The colour contrast between the text and the background is specifically designed to suit the visual processing characteristics of pupils with dyslexia.The text is left-aligned to make it easier and faster for pupils to find the beginning of the text on a new line.When the text is displayed, the program begins to time and record the voice.The time is stopped when the pupil reads the text and clicks the 'NEXT TEXT' button.The purpose of this task was to obtain audio recordings of the pupils reading aloud.

Research Design
In our experiments, we used the desktop application PKP -Disleksija.In the Reading Aloud Test, the pupils were asked to read six texts each, which were displayed on a 15-inch laptop screen.The test contains written and auditory instructions that are carefully prepared in such a way that it is assumed that pupils will understand them.However, it is also accepted that parents will help pupils to understand the instructions.The instructions are followed by a brief demonstration that gives the pupil a clear visual idea of how to approach the test.After the initial instructions, the pupil is given a series of exercises to check that he or she has understood the instructions (verifying that the pupil has understood how to complete the task).These preliminary exercises are not scored and the pupil has the opportunity to review the instructions again while performing them.This is followed by six reading aloud tasks that are recorded and then scored.The Zoom H4n Pro handheld digital recorder was used to collect the audio data.The readaloud test data was collected between 9 June and 18 June 2021.

Results
Our experimental work focused on using machine learning methods for the classification of pupils into one of two groups: those with 'identified dyslexia' and a control group 'without identified dyslexia' .Due to the small sample of pupils, we were limited to using machine learning methods that require a great amount of pre-processing; we could not use raw audio recordings for input, but instead had to extract the features from them.We struggled to automate the feature extraction process, but managed to construct the features manually.This limits the applicability of our models to the six texts that were used in this study.

Audio transcription and feature construction
The audio recordings were manually transcribed using Audacity software (Audacity® software is copyright © 1999-2021).Four attributes were defined for each transcribed word: start, word, end, error_type.Each line of the transcription file refers to a single word in the text.The start feature indicates the time when the reader started reading the word aloud, and end indicates the end time of reading the word.Both values are written in the format {MM:SS.mmm}, where MM denotes minutes, SS seconds and mmm milliseconds.The word feature indicates the word that was read: all of the vowels that the reader read aloud when reading each particular word we written down.The error_ type denotes the type of error that occurred while reading the word.From the audio transcriptions of all six texts, seven most common error_types were identified, which were labelled with numbers from 1 to 7: 1.
Word sequence read n-times (e.g., ki živijo v ki živijo v, marked as 3:2 at each word of the sequence), 4.
Omitting the word.
For each pupil, a separate transcription file was created for each text, giving a total of 27 * 6 = 162 transcription files.In order to use this data in our Orange (Demsar et al., 2013) machine learning setting, all transcribed features for each pupil were combined into a single learning example, resulting in 27 learning examples and 618 features (the features from transcriptions, i.e., error types, silence before and reading time).The dataset is well balanced: 12 examples belong to a positive target class (identified dyslexia) and 15 to a negative class (without identified dyslexia).
The attributes were standardised so that they all have μ = 0 and σ² = 1.Despite the small dataset and the large number of features, the goal was to learn a model that predicts the target outcome (identified dyslexia).The leave-oneout method was used in all of our experiments in order to evaluate the models.
The goal was to see how well an ensemble method performed on our data.Ensemble methods are machine learning techniques that combine a set of base models, such as decision trees.Each base model contributes to the ensemble model with its own prediction; ultimately, the ensemble model predicts the outcome based on the votes of all of the base models.We tried extreme gradient boosting of random forest (xgboost), which consists of 100 trees and limits the depth of each tree to 3, but allows all of the attributes in each tree, level and split.The confusion matrix in Table 2 shows the results of the leave-one-out test for the xgboost model, which indicates three misclassified pupils from our dataset.Ensemble models usually provide good predictions but are difficult or impossible for humans to understand.In order to gain insights, we focused on simple methods that can provide models humans can understand: a naive Bayesian classifier, a decision tree and Freeviz (Demsar et al., 2013).Before learning, feature subset selection was performed using ReliefF (Kononenko, 1994), which selected the following top ten features from all six texts: These features were used for learning a naive Bayesian classifier, a classification tree and the Freeviz visualisation.The feature names are combinations of the word that was read and the type of feature it describes.Two types of features were chosen: • silence_before describes how much time was needed before the word was read aloud (example: igrajo_silence_before is a feature that describes the silence needed before the word 'igrajo' was read aloud), • reading_time describes how much time was needed to read the word aloud (example: običajno_reading_time is a feature that describes the time needed to read the word 'običajno' aloud).
The nomogram in Figure 7 serves as a visual representation of the naive Bayesian classifier.The contribution of each feature is measured as a score and the individual scores are summed and converted into the probability of the target class (pupils with identified dyslexia).The features are ranked by importance: the strongest influence on the target class (pupils with identified dyslexia) are the features običajno_silence_before and nenadoma_silence_before.
The confusion matrix of the naïve Bayesian classifier in Table 3 shows that only one child was misclassified in the leave-one-out test.

Figure 7
The nomogram of the Naive Bayesian classifier The classification tree learned with the above features is shown in Figure 8. Again, the same two features turn out to be the most important: the words 'običajno' and 'nenadoma' seem to be the most difficult in the six texts.The values of the splits should be interpreted in the context of feature standardisation (μ = 0 and σ² = 1): • običajno_silence_before takes the values in the interval [-0.99, 2.35]; the divided value of 0.371 is slightly above the mean and indicates that the pupils with a longer than average pause before this word are classified as pupils with dyslexia.The rest of the pupils -those who make shorter pauses before reading the word 'običajno' -are further checked in the classification tree for the time of silence before the word 'nenadoma' .• nenadoma_silence_before takes the values in the interval [-0.88, 2.5].
The divided value of 0.38 is again about one-third of the length of the interval.Those who took less time before reading the word 'nenadoma' are classified as pupils without dyslexia, while the rest are predicted as pupils with dyslexia.

Figure 8
While the numbers in the splits with their absolute values do not explain much (due to standardisation), both splits show that long silences before the two difficult words 'običajno' and 'nenadoma' predict a positive target class (identified dyslexia).Note that the same two attributes have the largest positive influence on the target class in the above NB nomogram.
Finally, in Figure 9 we present a FreeViz projection that visually confirms the observations from the nomogram and the decision tree.FreeViz (Demšar, 2007) is a method that optimises a linear projection of data with a discrete class variable (in our case it has two values: 'identified dyslexia' and 'control group') and displays the projected data in a two-dimensional scatter plot.FreeViz can reveal interesting relationships between classes and features; in our domain, the explanation for the FreeViz projection is as follows.
The blue area, concentrated in the middle, represents the pupils from the control group; they have shorter reading times and even pause before the more difficult words.In contrast, the red area, which represents our target class (pupils identified with dyslexia), extends around the blue area and shows higher scores on all observed variables.

Discussion
In order to become a good reader, pupils need to develop two basic skills: decoding and reading comprehension (Nation, 2006).With practice, decoding soon becomes quick, flexible and efficient in pupils who have no difficulty in this area (Nation, 2006).The reading test in the present study included six texts from the Slovenian National External Assessment of Knowledge (CEAK) battery in the mother tongue (Slovenian), all of which deal with topics of general interest to children (e.g., wild animals and fairy tales).
In order to select texts that are easy enough to be suitable for 8-and 9-year-olds, but at the same time contain enough features (e.g., rare words) that might create reading difficulties for pupils with dyslexia, the difficulty of all six texts was assessed using the KaUč readability application, which is used to evaluate Slovenian textbooks.Since the texts belong to the CEAK battery, -Pupils without identified dyslexia -Pupils with identified dyslexia the KaUč readability application was found to be an appropriate tool for assessment.We used the Automated Readability Index, which assesses the difficulty of a text based on the length of words and sentences.All six texts were rated as very easy in the ccKres text corpus, which contains Slovenian texts from various sources and has more than 10 million words.Although the texts were simple, they discriminated between the two groups of learners: those with dyslexia and those without.
Machine learning methods have been used to predict pupil reading disabilities.Based on a small but balanced sample, our models clearly distinguished between pupils with reading difficulties, e.g., dyslexia, and a control group of pupils without dyslexia.Although the six selected texts were classified as easy by the KaUč readability application, we were able to determine that they were suitable for detecting a reading difficulty in pupils in the third grade.Other authors have also emphasised the importance of assessing reading fluency as one of the distinguishing characteristics of pupils with dyslexia.In a meta-analysis, Carioti et al. (2021) explained that reading fluency can be meaningfully considered as the most important parameter for diagnosing developmental dyslexia, as deficits in reading speed, lexical recognition and phonological recoding have been identified as universal manifestations of reading deficits, regardless of age and orthographic depth of language.This suggests that the use of time-limited approaches in reading tasks does not provide contradictory or less robust evidence for the presence of developmental dyslexia (Carioti et al., 2021).In particular, in the context of transparent orthographic systems, where there is a high degree of correspondence between graphemes and phonemes, the authors suggest that the main feature of developmental dyslexia is poor reading fluency (Martínez-García et al., 2019).
The models presented in the aforementioned study are highly relevant to the six selected texts from our study, but are not generally applicable.The data set used in our experiments is very small, consisting of only 27 pupils, which is understandable due to the nature of the task.Although the conclusions are promising, a bigger sample size would be needed to determine whether the results are significant and how well they generalise to a larger population.Our methodology shows that different machine learning methods on audio transcripts can clearly distinguish between pupils with reading disabilities, e.g., dyslexia, and a control group without dyslexia, even for short and simple texts.The latter suggests that basic screening tests could be short and effective.Carioti et al. (2021) make a similar point: it is important to be aware that the reading process can be stressful for those with developmental dyslexia.Therefore, it is useful to use time-limited reading tasks and not to overwhelm pupils with long and complex reading tasks whose reliability and clinical validity may be questionable.In this context, several authors have pointed out that it is not optimal to adopt an assessment of reading skills based solely on accuracy; although accuracy is an important parameter, it is not the only one, especially when assessing cross-linguistic differences in reading skills, when orthographic transparency or deficit compensation (at least for this parameter) can easily lead to inaccurate results in adulthood (Carioti et al., 2021;Sprenger-Charolles et al., 2011).
Assessing pupils' reading fluency is important not only to identify problems, but also to monitor progress in this area.Based on research findings, Kairaluoma et al. (2007) suggest that students with reading difficulties benefit from reading fluency intervention.They add that the intervention should be long term and initially based on emphasising syllables as sublexical reading units and then gradually progressing to larger reading units.It is also worth noting that prior phonological and semantic training facilitates the formation of orthographic representations, as evidenced by a reduction in the length effect (Martínez-García et al., 2019).When comparing 8-to 9-year-old pupils with and without dyslexia before the implementation of a training programme based on letter-sound associations, with a particular focus on increasing reading fluency, González et al. (2015) found that the group of dyslexic pupils showed more severe impairments on measures of word reading speed than on measures of accuracy (González et al., 2015).When evaluating the impact of the training programme comparing 8 to 9-year-old pupils with and without dyslexia, they found that the pupils with dyslexia improved significantly in the main measures of word reading and spelling after the training, progressing at a faster rate than both the group of pupils without dyslexia and the group of pupils with dyslexia in the control group who were waiting for the programme (González et al., 2015).

Conclusion
In our study, we trained different machine learning models to predict pupil reading disabilities.Despite the small sample, all of the models clearly distinguished between pupils with reading disorders and a control group.It was demonstrated that fluency in oral reading can be measured objectively even in short and simple texts.The machine learning methodology used is based on transcription data, which was constructed manually from audio recordings of oral reading.Manual construction of such data is tedious and subjective work, and is therefore impractical for larger datasets of audio recordings.Our future work will focus on automating audio transcriptions and feature construction from automatically obtained transcripts.We will also explore the possibility of working directly with audio signals and include methods for incorporating the knowledge of domain experts into our learning dataset.

Biographical note
Jure Žabkar, PhD, is an Assistant Professor and researcher at the Artificial Intelligence Laboratory at the Faculty of Computer and Information Science, University of Ljubljana, Slovenia.He conducts research in machine learning and data mining, qualitative reasoning, cognitive robotics and systems for decision support, with applications in robotics and healthcare.
Karmen Javornik is a teaching assistant of Special and Rehabilitation Education at the Faculty of Education, University of Ljubljana, Slovenia.Her research interests include inclusion of people with special needs in the context of education, with a focus on general and specific learning difficulties and the development of strategies and models of support and treatment in these areas, which she links to research on executive functioning.
Milena Košak Babuder, PhD, is an Assistant Professor of Special and Rehabilitation Education at the Faculty of Education, University of Ljubljana, Slovenia.Her research interests include the inclusion of people with special educational needs, the impact of general and specific learning difficulties on the academic performance of pupils and students, and the development of strategies and models of support and treatment in these areas, and in particular the impact of dyslexia on learning English as a foreign language.
Tajda Urankar, BSc, is pursuing a Master of Applied Data Science degree at Frankfurt School of Finance and Management, Frankfurt am Main, Germany.Her main areas of interest are deep learning topics such as natural language processing, quantitative trading and pricing models with the current focus on the growing digital lending market.

Figure 5
Figure 5 Graphical representation of readability for Text 5 (Lev; Eng.Lion)

Figure 6
Figure 6 Graphical representation of readability for Text 6 (Dobra vila v dolini Soče (Eng.A Good Fairy in the Soča Valley))

Table 2
Results of the leave-one-out test for the xgboost model

Table 3
Results of the leave-one-out test for the Naive Bayesian model