Evaluation and TestingI’ve decided to prepare an article about evaluation and testing since good quality evaluation strategies and techniques are often absent from training in general and TESOL classes specifically. Although many of the same principles apply in classes focused on children, the focus here is on adult education.
As we learned in Lesson 15, different types of tests serve different purposes and functions. Here I would like to address five main types of testing:
1) Placement Tests: Their purpose is to measure the current level of knowledge for selection, rating, or orientation purposes. These decisions are greatly improved when a close correspondence exists between the testing in those skills, abilities and attitudes required for success in the workplace. We may also be looking for some prerequisite knowledge and skills which the intended training program does not address or expects as an entry level.
2) Diagnostic tests: These are similar to placement tests but are a more customized look at a student’s strengths or weaknesses. They may be administered at the start of the course to see what the students already know. As such the teacher can prepare lessons that will build on the identified areas of difficulty. In language testing, you can see that this is very useful because it reduces the frustration of having to repeat courses or content that a student is already competent in.
3) Practice tests-For external testing, most students benefit from completing practice tests. A large degree
of test success is attributable to test taking strategy and familiarity with test format. Examples of practice tests would include those that help prepare students to take TOEFL, IELTS, TOEIC, Cambridge, General english
etc. In fact, there are many language schools whose main business is to prepare students to take these exams. For regular training, students may benefit from practice tests to help familiarize them to a particular type or style of test.
4) Progress tests (or formative tests): These are used throughout the course to determine how the students are progressing. The purpose is to measure the current level of performance and to provide continuous feedback to both student and instructor concerning learning successes and failures. Progress tests in a language class should include all four skills as well as vocabulary and grammar. Most course books include sample tests although teachers may still also have to create their own.
Feedback to students provides reinforcement and identifies the specific learning errors that are in need of correction. Doing well on a test can truly motivate and encourage students to continue their efforts. On the other hand, a poor test or a test that is too difficult can have the opposite effect. Feedback to instructors provides information for modifying instruction and for prescribing group and individual remedial work.
5) “Summative” evaluations are, as the name implies, measurement at the end of training. Have the objectives of the course been achieved? The results of this test should provide an evaluation of the candidate’s success relative to the criterion set out for the course.
In addition, summative tests can be used to determine the effectiveness of a course or lesson. In this case, student performance is used as an indicator of lesson/course effectiveness. If the student does not do well on the test, then the lesson, or course, is considered at fault rather than the student him/herself.
Qualities of a Good Test:
Good tests share the following characteristics:
Valid is when the question in the test matches the intent of the objective. For example, if the objective requires the learner to identify vocabulary used in grocery shopping, then the test should ask them to use grocery vocabulary that was introduced in class. The task should not ask the participant to list vocabulary used for travelling. The latter would be invalid.
A test should be reliable. The results we get from it should be from an accurate measurement of their knowledge and skills. Advance knowledge of the test items would hamper the test instrument’s reliability. Providing extra time to one group over another or providing different exam conditions would also affect the test instrument’s reliability. To help make the test reliable we develop strict administration guides, implement security of the test instrument, and do not divulge the questions to the participants ahead of time.
There are many factors which affect the reliability of a test. The main ones are:
o The question itself: If poorly worded, confusing or ambiguous, it will not yield reliable measures of a student’s learning
o Test administration: Factors such as heat, lighting, noise, differences in test directions, amount of time allowed, confusing instructions, or illegible test sheets can all affect a student’s score.
o Scoring: Scoring that reflects the grader’s personal opinion is highly unreliable. Scoring may also be affected by an instructor’s energy level, mental outlook, or level of concentration. Objectively scored tests such as multiple-choice contain no element of grader subjectivity, and the test results are more reliable.
An excellent quality of a good test is its objectivity. When we remove all subjectivity we are then truly measuring performance against a standard and not against personal preferences. Not all tests can be measured objectively, but even subjective evaluations can be improved by well defined marking criteria. An example of this is in evaluating student writing or essays. Using a good quality rubric which clearly outlines the criteria required for the assignment is much more objective than evaluating the writing based on your personal opinion. The other advantage of using rubrics is that they can be introduced as an instructional tool. The rubric should be introduced to the students so that they know what is expected of them. Ideally, they should be able to use the rubric themselves and replicate the score given to them.
The test should be usable. That is, it should be easy to develop, easy to administer, and easy to mark. Getting the results to the participants quickly after the test is recommended. Multiple choice type exams are quite usable but can be very difficult to develop. Also, multiple choice questions may not be valid if they do not match the requirements of the objective.
A comprehensive test checks all the objectives. Courses with 100 objectives or more would make the administration of a comprehensive exam unusable. Yet we can still have a comprehensive exam if we check only those objectives that represent the larger tasks. For instance, some tasks cannot be evaluated without being able to perform sub-tasks first. A student who can write a short passage incorporating unit vocabulary has necessarily demonstrated the sub task of identifying the vocabulary. It would therefore be fair to evaluate only the higher order task because the lower order tasks are also being demonstrated.
And finally the test should discriminate. This is meant in a positive way. The test should be able to expose those who know the material from those who do not. If the test results from a poor performer show satisfactory results then the test does not discriminate. Poorly designed test items can sometimes fool the person who knows the subject. Similarly, poor test items can “give away” the answer to someone who does not know the subject matter. In recent years, TOEFL exams now incorporate an oral component to the exam. This is to prevent students who were adept at exam taking strategies from passing. In other words, by adding the oral component, the TOEFL exam would still discriminate.
Be sure to tell your students the following:
1) What are they being tested on and why?
2) Which objectives will be tested? What is the focus of this exam?
3) What will they be allowed to have or do? What will not be allowed? Can they use learning aids for the test?
4) How many questions, how much time is permitted, and what type of questions will there be? Studying for a multiple choice exam is different than studying for a short answer test.
5) How will they be graded? Are there any rubrics or scoring guides?
6) How will they get feedback? How does their performance on the test affect their overall result? Are they familiar with the success criteria?
We can inadvertently permit unethical practices if we do any of the following:
• Teach to the test- Teaching to the exam implies giving students a “heads up” on what questions will be on the exam. Let’s face it; the only thing we would be measuring is their ability to remember the question. This would not help with discrimination.
• Give hints to struggling students- Helping out some students at the expense of others is unethical. The instructor is there for everyone and not just a few.
• Create anxiety- Deliberately creating anxiety in participants over the prospects of the test and its results is also considered unethical.
In sum, creating a good test is not only time consuming, but it is very difficult. I would argue that good tests and test items are mostly absent from the majority of learning institutions. However, by having some basic idea regarding the characteristics of a good test, we might at the very least have a fleeting understanding that our students’ test results are informed to varying degrees by the inherent quality or lack of quality of the tests that we administer.