頁籤選單縮合
題名 | 試題呈現與回饋模式對Angoff標準設定結果一致性提升效益之比較研究=Evaluating the Utility of Different Item Presentation and Feedback Approaches with the Modified Angoff Method |
---|---|
作者 | 吳宜芳; 鄒慧英; Wu, Yi-fang; Tzou, Hueying; |
期刊 | 教育研究與發展期刊 |
出版日期 | 20101200 |
卷期 | 6:4 2010.12[民99.12] |
頁次 | 頁47-80 |
分類號 | 521.32 |
語文 | chi |
關鍵詞 | Angoff法; Reckase表; 試題預先分類; 標準設定; Angoff method; Item-grouping; Reckase charts; Standard setting; |
中文摘要 | 在標準設定的眾多方法中,Angoff法及其相關變形、延伸與修正程序等,實為教育實景中相當普及的標準設定流程。然而,執行Angoff標準設定方法的設定者在概念化最低能力受試者、估計其答題概率時,面臨相當大的認知挑戰。試題特徵(如:試題難度)對設定者間或設定者內一致性的影響,可能影響最後產出標準的效度。基於此,本研究試圖以實徵P值排序回饋、Reckase表回饋與試題呈現分類與否等做法融入修正Angoff法的標準設定程序,以促進設定結果的一致性,並從中比較前述作法融入設定程序之優劣。 本研究係為測驗結束後所進行之標準設定研究,屬於事後做決定型,研究中探究不同回饋模式及試題是否分類呈現對標準設定結果之影響,藉以比較二種作法的優劣,此為本研究之獨特性所在。其次,透過這二種修正作法,期能使設定者對於試題難度有較佳的察覺,進而改善設定間或設定者內一致性,提高設定結果的一致性,並對標準之效度有所助益,是為本研究在功能性之貢獻。 |
英文摘要 | Numerous standard setting methods have been developed to assist panels in estimating the performance of the borderline examinees. Among them, the Angoff method is one of the most popular judgmental standard setting procedures. Its extensions, modifications, and variations are often applied in practice. In standard setting, panelists hold an important role, especially in the judgmental methods such as the Angoff method and its variations. The ability of panelists to accurately estimate the borderline examinees’ performance is to some extent subjected to item difficulty. Once the accuracy is questioned, the validity of the performance standard would be damaged. Therefore, a variety of procedures and several types of feedback have been developed to reduce inconsistency among panelists or within a single panelist. To compare different procedures embedded in the modified Angoff standard setting method for establishing cutoff scores on a large-scale achievement assessment, we designed two standard setting activities, integrating different procedures to help panelists make more accurate estimates. Two sets of data from a national achievement assessment in mathematics in Taiwan were used in the standard setting activities. Each set contained 104 operational multiplechoice items used to measure students’ grade-level math ability. Twelve panelists participated in the 4th grade standard setting activity and the 6th grade panel consisted of 14 panelists. They were all math educators and some had prior experiences in the modified Angoff standard setting procedures. The standard setting procedures included two factors, each of which involved two conditions: test items with/without item-grouping in advance; different types of feedback, such as feedback with empirical p-values and feedback with IRT calibration/Reckase charts (Reckase, 1998, 2001). We presented a generalizability analysis design to examine the improvement of consistency for different above mentioned procedures. Item effect, item difficulty effect (both within difficulty level and between levels) and panelist effect were of interest. First, the percentage of variance components of item effect increased consistently from Round 1 to Round 3, while the percentage of variance components of panelist effect decreased as the setting round passes. Panelists’ consistency was raised; in addition, relatively more variability of panelists was eliminated in the procedure of feedback with Reckase charts. Secondly, with/without item-grouping, panelists could make similar estimates of item performance toward items with similar difficulty as the setting rounds passes. Finally, item-grouping integrated into feedback with Reckase charts having the best improvement of intra-judge consistency, since we observed that under this condition, the estimates of the root mean square error were the smallest and the estimates of generalizability coefficients and intraclass correlation coefficients (ICCs) were the highest. Panelists are capable of distinguishing hard and easy items; however, with the help of item-group by difficulty and feedback with Reckase charts, the variability induced by item difficulty which has an impact on panelists’ consistency, has been decreased as much as possible. This finding, undoubtedly, is beneficial in terms of defending the validity of standard. |
本系統之摘要資訊系依該期刊論文摘要之資訊為主。