|Year : 2018 | Volume
| Issue : 2 | Page : 59-64
The Criteria and Analysis of Multiple-Choice Questions in Undergraduate Dental Examinations
Hassan Mohamed Abouelkheir
Department of Oral and Maxillofacial Surgery and Diagnostic Sciences, Riyadh Colleges of Dentistry and Pharmacy, Riyadh, Kingdom of Saudi Arabia
|Date of Web Publication||3-Aug-2018|
Hassan Mohamed Abouelkheir
Riyadh Colleges of Dentistry and Pharmacy, P.O. Box. 84891, Riyadh 11681
Kingdom of Saudi Arabia
Source of Support: None, Conflict of Interest: None
Aims: This study aimed to evaluate past midterm multiple-choice question (MCQ) examinations in oral and the maxillofacial radiology course by identifying item-writing flaws (IWFs) along with cognitive level of each item as defined by Bloom's Taxonomy. Afterward, the quality of MCQs is determined by the Facility Index (FI) and Discrimination Index (DI) before and after removal of IWFs. Materials and Methods: This study was conducted in the Dental School of Riyadh ELM university, Riyadh, KSA. It consisted of reviewing the past midterm MCQ examinations of last semester (2nd semester 2016—2017) in three courses, namely advanced radiology (MidT1) for Level 12 dental students, radiographic interpretation of oral and maxillofacial radiology for level 8 dental students (MidT2), and finally, radiation physics and techniques for Level 5 dental students (MidT3). Identification of IWFs as well as cognitive level of MCQs was determined and then the quality of MCQ examinations was determined by FI and DI before and after removal of IWFs. Results: A total of 120 MCQs were evaluated and revised. The overall percentage of IWFs was 55%. In cognitive-level analysis, there was an increase in lower levels after removal of the IWFs from 68% to 88% and decrease in higher cognitive levels from 33% to 13%. The FI average level was increased after removal of IWFs from 50% to 55%, 77.5%, and 80% in MidT2, MidT5, and MidT6, respectively. DI average level also increased after removal of IWFs from 20% to 30% and finally from 42.5% to 57.5%. Conclusion: After removal of IWFs, cognitive levels as well as FI and DI were improved.
Keywords: Analysis criteria, assessment, cognitive knowledge, multiple-choice questions, quality
|How to cite this article:|
Abouelkheir HM. The Criteria and Analysis of Multiple-Choice Questions in Undergraduate Dental Examinations. J Dent Res Rev 2018;5:59-64
|How to cite this URL:|
Abouelkheir HM. The Criteria and Analysis of Multiple-Choice Questions in Undergraduate Dental Examinations. J Dent Res Rev [serial online] 2018 [cited 2022 Jul 1];5:59-64. Available from: https://www.jdrr.org/text.asp?2018/5/2/59/238536
| Introduction|| |
Multiple-choice questions (MCQs) were introduced into medical examinations in the 1950s, and it was proved that it is more reliable in testing knowledge than the traditional essay questions. Since their introduction, there have been many modifications to MCQs resulting in formats. Well-constructed MCQs can assess higher-order cognitive knowledge, such as interpretation and problem-solving rather than only recalling facts.,
MCQs have their strengths and weaknesses. Scoring of questions is easy, reliable, and their use permits a wide sampling of student's knowledge in an examination of reasonable duration.
Bloom's taxonomy, is used in this study for the knowledge-level testing information. It is a classification used for writing learning objectives or intended learning outcomes as it is a collective hierarchy to assess the corresponding level of learning objectives that reflect six categories of cognitive knowledge learning. MCQs can assess four items (knowledge, comprehension, application, and analysis).,
Many scientific researchers found that high percentages of writing MCQs in formative and summative examinations were testing lower cognitive knowledge and factual recall of information. Baig et al. concluded in their research that 76% of MCQs were testing the recall of isolated facts, 24% were testing the skill of interpretation of data, and there was no single MCQ assessing problem-solving. They added that problem-solving MCQs are difficult to construct in basic sciences rather than in clinical sciences. This is due to the fact that the designing of a high-quality MCQ is a skill and it needs training and practice.
Furthermore, Writing Item Flaws (WIFs), for example, grammatical cues, absolute terms, heterogeneity of choices, and logical cues, make some students answer questions correctly based on their test-taking skills and not their knowledge base. A high proportion of MCQs generated by teaching staff have one or more implausible distractors. Violations of accepted item-writing guidelines by the presence of IWFs can affect students' performance on MCQs, making the item either easier or more difficult.,
Tarrant et al. encouraged the training of medical educators for writing high cognitive-level MCQs as this will remove numerous IWFs. Some of the faculty members of medical or dental colleges may not be familiar with the modern concepts in medical education. Some others may be new to teaching and assessment processes. Finally, an important issue is that in Saudi Arabia like many cosmopolitan countries, faculty members/examiners are coming from diverse cultural and linguistic backgrounds where English is not their first language. This leads to lack of uniformity in the conduct and quality of assessment.
Assessment drives learning becomes a valid fact in medical education. Therefore, Students tend to concentrate on parts of the course that they think it will come in the assessment, not study the whole course. Assessment drives learning consists of four categories; contents, format, timing and feedback.
Unfortunately, MCQs are written more frequently to test the recall of isolated facts. Some researchers suggest that many in-house medical examination MCQs do not follow the general item-writing guidelines. They analyzed the past MCQs submitted by faculty from 2009 to 2012 and reported that approximately 37% of final examinations were flawed.,,
Furthermore, it was described that high-achieving students were more likely to be penalized by flawed items than borderline students. Therefore, education and training of faculty would substantially improve the item quality of MCQs and quality checklist should be reviewed before writing MCQs. Furthermore, it is reported that MCQs for nursing education within the hospital assessment showed item-writing flaws. Many questions were written to assess low-level cognitive processes and were not properly linked to the learning objectives.
Finally, the quality of questions can also be measured by item analysis that also measures the ability of students to respond to each question. It provides information regarding the reliability and validity of a test item. Two important indices provided by item analysis are the Facility index (FI) and the Discrimination index (DI).
FI indicates how many students chose the designated correct response compared to those who chose other options (distractors) and is expressed in fractions. The academic staff should aim for FI between 0.3 and 0.8 for each question. DI describes the ability of an item to distinguish between high and low scores. It ranges between −1.00 and +1.00. It is expected that the high-performing students select the correct answer for each item more often than the low-performing students. If this is true, the assessment is said to have a positive DI (between 0.00 and +1.00). Fowell et al. reported that item analysis is the first level of MCQ evaluation.
Finally, student's orientation to teaching, learning, and assessment methods has become a prime object for many medical schools by engaging students in the development of formative assessment items.
Aim of the research
- To assess the cognitive knowledge level of single best answer MCQs as shown in [Table 1] by using Bloom's taxonomy.,
- To check IWFs as shown in [Table 2] where the most common WIFs are mentioned by Haladyna et al.
- To check the quality of MCQs by item analysis through FI and DI before and after revision of midterm examinations.
|Table 1: Levels of cognitive learning according to Bloom's Taxonomy (Bloom, 1956; Krathwohl, 2002)|
Click here to view
| Materials and Methods|| |
The study design is of a quantitative research descriptive type. The study composed of reviewing of the past MCQ midterm examinations. A total of 232 students had undergone three formative midterm examinations (MidT1, MidT2, and MidT3). Each one composed of forty MCQs with a total of 120 MCQs. These MCQ examinations were done during the 2nd semester, 2016—2017, in the Faculty of Dentistry, Riyadh ELM University (REU), formally called Riyadh Colleges of Dentistry and Pharmacy (RCsDP), Riyadh, Kingdom of Saudi Arabia.
Reviewing of these midterm examinations was done again after removal of IWFs and testing them in midterm examinations of the current semester (1st semester, 2017—2018). They were coded as MidT4, MidT5, and MidT6. MCQs were collected for analysis. Each MCQ has one correct answer and three distractors and they were constructed according to the standard guidelines for MCQ construction.,
Documents of the past MCQ formative midterm examinations of the last and current semesters were revised for assessment of the following:
- Cognitive knowledge testing: Bloom's taxonomy, defines six levels of cognitive behavior associated with cognitive learning as shown in [Table 1]
- IWFs: The data will be checked for IWFs, according to the standard guidelines for MCQ construction., The proposed guidelines for detecting IWFs are shown in [Table 2]
- Item analysis in MCQs: The quality of MCQs can be assisted by performing item analysis in the form of FI and DI.
Descriptive statistics were done for analysis of the past midterm MCQs (MidT1, MidT2, and MidT3). Then, after revision and removal of all IWFs, cognitive levels and item analysis were measured again in midterm exams of the current semester (MidT4, MidT5, and MidT6). Statistical analysis was done by using SPSS version 22 (IBM Corp., Armonk, NY, USA).
Conducting a scientific research in the Faculty of Dentistry, REU, should be submitted to Ethical and Research Committee (ERC), which is a part of the quality assurance unit to get ethical approval before starting any scientific research. The study was reviewed and approved by the ERC. The Institutional Review Board approval number is RC/IRB/2018/803.
| Results|| |
[Table 3] shows IWFs in MidT1, MidT2, and MidT3 and their percentages to the total 120 MCQs. It was shown that a high percentage of IWFs was in format technical error (24%), followed by logic que (12%) and finally ambiguity (10%). The overall percentage of IWFs was 55%.
|Table 3: Item-writing flaws in MidT1, MidT2, and MidT3 and their percentages in a total of 120 multiple-choice questions|
Click here to view
[Table 4] shows percentages of cognitive knowledge levels as divided into lower levels (levels 1 and 2) and higher levels (levels 3 and 4). Levels of percentages were calculated before (MiT2, MidT2, and MidT3) and after (MidT4, MidT5, and MidT6) the removal of IWFs. The results were as follows:
|Table 4: Percentage of Bloom's cognitive level of midterm examinations before and after removal of item-writing flaws|
Click here to view
- Increase in the percentage of L1+L2 from 68% to 88% after removal of IWFs (MidT1/MidT4), while a decrease of percentages of L3+L4 from 33% to 13% after removal of IWFs
- Decrease in the percentage of L1+L2 from 80% to 70% after removal of IWFs (MidT2/MidT5), while there is an increase in the percentages of L3+L4 from 20% to 30% after removal of IWFs
- Percentage of L1+L2 was the same (88%) before and after removal of IWFs (MidT3/MidT6) and the percentage of L3+L4 was the same (30%) before and after removal of IWFs.
[Table 5] shows item analysis (FI and DI) for midterm examinations before and after removal of IWFs. For the scores to be acceptable, an item should have an average difficulty (FI) of 26%—74% and a DI average of 0.20—0.30.
|Table 5: Classification of questions according to Facility and Discrimination indices|
Click here to view
The FI percentages of acceptable level in MidT1 were 50%, and after revision, increased to 55% (MidT4), in MidT2 to 77.5%, and after revision increased to 80% (MidT5) and finally, in MidT3, the percentage was 70% and after revising, it increased to 77.5% (MidT6).
The DI percentages of acceptable level in MidT1 were 20%, and after revision, increased to 30% (MidT4), in MidT2 to 17.5%, and after reviewing, increased to 57.5% (MidT5) and finally, in MidT3, the percentage was 42.5% and after revision, it increased to 57.5% (MidT6).
[Table 6] shows a significant difference between both cognitive levels in relation to midterm exams before and after revision and removal of IWFs by using Chi-square test of significance at the 5% level.
|Table 6: Chi-square test of significance of cognitive levels in relation to midterm examinations before and after revision|
Click here to view
[Figure 1] showing the percentage of cognitive levels in relation of midterm examinations before and after removal of IWFs. It shows improvement of Level 1 and Level 2 after removal of IWFs while decrease in Level 3 and Level 4.
|Figure 1: Percentage of cognitive levels in relation of midterm examinations before and after removal of item-writing flaws|
Click here to view
| Discussion|| |
In the present study, the total percentage in IWF was 55%. This is in agreement with the findings of Tarrant et al. (46.2%) and similar to Dowing as he found that the percentage of flawed questions in his study was 46%.
Furthermore, the highest percentage of IWFs in this study was found to be in the format technical errors (24%) and ambiguity (10%). Those in agreement with Tarrant et al. as ambiguous or unclear questions were 7.5%. In the present study, studying dental sciences in English which is a second language for both teachers and students constitutes a linguistic bias where items containing complex or unnecessary information may increase item difficulty. Therefore, simple, clear language in both the stem and options is highly recommended as it reduces the influence of reading ability on students' performance.,
In the present study, testing of cognitive levels, according to Bloom's taxonomy,, showed that in MidT4, there was an increase in the percentage of L1+L2 after removing of IWFs from 68% to 88%, while in higher levels of L3+L4, it was decreased from 33% to 13%. In MidT5, there was a decrease in recall levels (L1+L2) after removal of IWFs from 80% to 70% and there was an increase in the higher levels (L3+L4) from 20% to 30%. Finally, in MidT6, there was stability before and after removal of IWFs in both lower cognitive levels (L1+L2) (88%) and higher cognitive levels (13%). This can be explained since in MidT1 and MidT4, the course was theoretical in the form of seminars on radiographic cases with group presentations, and 80% of assessment was in the form of MCQs and no practical sessions. Saudi students still have an English language barrier as well as a feeling of uncertainty. They found difficulty in reading recommended articles. MidT2 and MidT5 showed decrease in lower cognitive levels (L1+L2) and there was an increase in higher cognitive levels due to more practical problem-based learning session training for problem-solving and analysis in practical sessions.
In MidT3 and MidT6, as the students study preclinical basic course in radiation physics, techniques, and normal radiographic landmarks, MCQs were mainly recall information and of low cognitive level.
This is in agreement with Tarrant and Ware who found in nursing examination that over 90% of MCQs were written at lower cognitive levels and more significantly contain IWFs. They also conclude that removal of IWFs from the test allows a few low-achieving examinees to pass the test (90.3%—94.3%).
This is also in agreement with Baig et al. where they found that 76% of MCQs were testing the recall information and the remaining 24% were testing the interpretation skills and not the problem-solving level. On the other hand, writing higher cognitive level MCQs removes a lot of IWFs. McDonald suggested setting the average difficulty levels with the average grade of the course. If “C” grade was the average grade for the course, the difficulty level would fall between 0.70 and 0.80.
In the present study, FI percentage of average score (26%—74%) was slightly increased from MidT1 (50%) to MidT4 (55%) after removal of IWFs, in MidT2 was 77.5%, and after revision increased to 80% (MidT5) and finally, in MidT3, percentage was 70% and after reviewing, it increased to 77.5% (MidT6). This is in agreement with Linnette and Visbal-Dionaldo as they found Average difficulty Index to be present in 50% of MCQs. It is also coinciding with the findings of Kaur et al. who found that 76% of MCQs had average FI.
The DI percentages of acceptable level (0.20—0.30) in MidT1 were 20%, and after revision, increased to 30% (MidT4), in MidT2 to 17.5%, and after revision, it increased to 57.5% (MidT5) and finally, in MidT3, the percentage was 42.5% and after revision, it increased to 57.5% (MidT6). This was also in agreement with Linnette and Visbal-Dionaldo as they found average DI to be 20.83%, but in high DI (>0.35), the percentage was high (60.40%), while in the present study, the higher values of DI (>0.30) were decreased after revision. This is also in agreement with the findings of Kaur et al. who found that 24% of MCQs had average DI.
In cases where FI was >75%, the DI was decreased. This coincided with Linnette and Visbal-Dionaldo as they found a significant negative correlation between the FI and DI. Therefore, decreased DI items should be checked for ambiguity, clues, and wrong keys. It was recommended that ≥60% of the items should have moderate or better DI and the frequency of IWFs should be <10%.
Negative DI (<0) means that more students from the low-achiever group were able to answer the item correctly in comparison with students from high-achiever group. This is because of ambiguity questions or wrong answer key recording. Education and training of faculty can improve the quality of MCQs they create. Confusion over grammar or question structure invalidates the test as grammatical error does not relate to knowledge of the subject or it discriminates against examinees for whom English is not their first language and finally it is for the benefit of experienced examinee.
| Conclusion|| |
Therefore, removal of IWFs will improve the quality of MCQs, but not necessarily increase the cognitive levels of MCQs. Higher cognitive level (L3+L4) MCQs increased in problem-based courses and decreased in basic courses. Average level of FI items (26%—74%) can lead to an acceptable level of DI (0.20—0.30). Items with high FI (>75%) will decrease DI to unacceptable levels (<0.20 or negative value).
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Moss E. Multiple Choice questions: Their value as an Assessment tool. Current Opinion in Anesthesiology 2001;14:661-6.
Dawning S. Assessment of knowledge with written test format. In: Norman G, Van Der Vleuten C, Newble D, editors, International Handbook of Research in Medical Education. Dordrecht: Kluwer Academic Publishers; 2002. p. 647-72.
Case SM, Swanson DB. Constructing Written Test Questions for the Basic and Clinical Sciences. 3rd
ed. Philadelphia: National Board of Medical Examiners; 2001.
Premadasa IG. A reappraisal of the use of multiple choice questions. Med Teach 1993;15:237-42.
Bloom B. Taxonomy of Educational Objectives: The Classification of Educational Goals, Handbook 1: Cognitive Domain. 1st
ed. London: Longman; 1956.
Krathwohl D. A revision of bloom's taxonomy: An overview. Theory Pract 2002;41:212-8.
Masters JC, Hulsmeyer BS, Pike ME, Leichty K, Miller MT, Verst AL, et al.
Assessment of multiple-choice questions in selected test banks accompanying text books used in nursing education. J Nurs Educ 2001;40:25-32.
Downing SM. The effects of violating standard item writing principles on tests and students: The consequences of using flawed test items on achievement examinations in medical education. Adv Health Sci Educ Theory Pract 2005;10:133-43.
Baig M, Ali SK, Ali S, Huda N. Evaluation of multiple choice and short essay question items in basic medical sciences. Pak J Med Sci 2014;30:3-6.
Tarrant M, Knierim A, Hayes SK, Ware J. The frequency of item writing flaws in multiple-choice questions used in high stakes nursing assessments. Nurse Educ Today 2006;26:662-71.
Fayyaz Khan H, Farooq Danish K, Saeed Awan A, Anwar M. Identification of technical item flaws leads to improvement of the quality of single best multiple choice questions. Pak J Med Sci 2013;29:715-8.
Tarrant M, Ware J. Impact of item-writing flaws in multiple-choice questions on student achievement in high-stakes nursing assessments. Med Educ 2008;42:198-206.
AlMahmoud T, Elzubeir MA, Shaban S, Branicki F. An enhancement-focused framework for developing high quality single best answer multiple choice questions. Educ Health (Abingdon) 2015;28:194-200.
Van Der Vleuten CP. The assessment of professional competence: Developments, research and practical implications. Adv Health Sci Educ Theory Pract 1996;1:41-67.
Downing SM. Construct-irrelevant variance and flawed test questions: Do multiple-choice item-writing principles make any difference? Acad Med 2002;77:S103-4.
Naeem N, van der Vleuten C, Alfaris EA. Faculty development on item writing substantially improves item quality. Adv Health Sci Educ Theory Pract 2012;17:369-76.
Nedeau-Cayo R, Laughlin D, Rus L, Hall J. Assessment of item-writing flaws in multiple-choice questions. J Nurses Prof Dev 2013;29:52-7.
Considine J, Botti M, Thomas S. Design, format, validity and reliability of multiple choice questions for use in nursing research and education. Collegian 2005;12:19-24.
Fowell SL, Southgate LJ, Bligh JG. Evaluating assessment: The missing link? Med Educ 1999;33:276-81.
Oldham J, Freeman A, Chamberlin S, Ricketts C. Enhancing Teaching and Learning Through Assessment: Deriving an Appropriate Model. The Netherlands: Springer; 2007.
Haladyna T, Downing S, Rodrigues M. A review of multiple-choice item — Writing guidelines for classroom assessment. Appl Meas Educ 2002;15:309-34.
Mcdonald M. The Nurse Educator's Guide to Assessing Learning Outcomes. 2nd
ed. Sudbury, MA: Jones Bartett; 2007.
Linnette SA, Visbal-Dionaldo ML. Analysis of multiple choice questions: Item difficulty, discrimination index and distractor efficiency. Int J Nurs Educ 2017;9:109-14.
Kaur M, Singla S, Mahajan R. Item analysis of in use multiple choice questions in pharmacology. Int J Appl Basic Med Res 2016;6:170-3.
Ware J, Vik T. Quality assurance of item writing: During the introduction of multiple choice questions in medicine for high stakes examinations. Med Teach 2009;31:238-43.
McCoubrie P. Improving the fairness of multiple-choice questions: A literature review. Med Teach 2004;26:709-12.
[Table 1], [Table 2], [Table 3], [Table 4], [Table 5], [Table 6]