Some Things Parents Should Know About Testing

Harcourt Assessment, Inc.

A Series of Questions and Answers

Q Why do the schools test our children?

A. It is no news to parents that children differ. Even within a single family, some children learn to walk or to talk sooner than others. One child may be a good reader; another may excel in sports. When children come into school, the teacher needs to know as much as possible about how they differ in order to be able to match the classroom teaching to the specific needs of the children. The school administration also needs to be able to plan for the long-term education of the pupils.

Q. Why do teachers need to use published tests?

A. Commercially published tests give the teacher much important information about the pupils, information which the teacher cannot obtain himself Of course, teachers get a great deal of information about their pupils by observing their day-to-day work in class and by testing their progress with teacher-made tests. Commercially published tests are written by people who are experts in writing test questions. These people are also curriculum specialists who know what is being taught in schools all across the country. Most commercially published tests cover a wide range of skills in one test, whereas teacher-made tests usually cover only a single unit of work. But perhaps the most important reason for using commercially published tests is that the school can use the results obtained from them to compare a pupil's school progress with the school progress of other children throughout the country. These comparisons can be made because the tests are norm-referenced and standardized on a national population.

Q. What do you mean by norm-referenced?

A. Knowing that a pupil got 40 questions right on a test doesn't give you enough information by itself.  How many questions were there? Were they easy or hard? Is 40 a "good," "average," or "poor" score? Often, what we really want to know is how this score compares with the scores of other pupils of the same age or in the same grade. Is it high, medium, or low in relation to the scores of pupils in some large group? This way of describing performance is called norm-referenced and the numbers that are used to give meaning to a pupil's performance are called norms, or norm-referenced scores.

Q. What does standardized mean?

A. The test publisher develops the norms or norm-referenced scores by a process called standardization. In order to find out what scores are high, medium, or low, the test must be given to a large number of schoolchildren across the country. The pupils who will be in this national population sample will be carefully chosen. They cannot all live in one area; they cannot all go to big schools; they cannot all be of one race or socioeconomic group. The publisher will use government census data and experience and knowledge to select a group of several thousand pupils so that their scores on the test will represent the scores that would have been obtained if all the millions of children in the country had been tested.

Once the test has been written and the standardization group has been selected, the test publisher must make sure that the test's directions are so clear and so specific that the test can always be presented in the same way to all pupils. This is done so that all children have the same chance to know what they are supposed to do on the test. A test which has been written in this way and given to a carefully selected group of pupils in a controlled manner is said to be a standardized test.

Q. How do you get norms from standardization?

A. The norms are a way of summarizing how the pupils in the standardization group did on the test. In this sense, the pupils make the norms, not the test-maker. After the test has been given, the test publisher has something that looks like this:

120 third graders correctly answered 29 questions
180 third graders correctly answered 28 questions
215 third graders correctly answered 27 questions

and so on for each grade and each possible score. In order to make this information easier to understand, the test-publisher summarizes it. One way of doing this is by reporting, for each test, the average score in each grade. These are called grade equivalent norms. Another way is to report what percentage of the pupils in a grade scored at or below a certain score. These are called percentile rank norms. A third type of norm describes how far a pupil's performance is above or below the average performance for that grade. These are called standard scores. (The most common standard score is a stanine.) All of these methods of expressing a score are simply ways of indicating where a particular score fits into the pattern of all the scores earned by the pupils in the norm group.

Q. What do you mean by the pattern of scores?

A. For practically any characteristic you can name, there are differences among individuals. There is an average (or medium, or typical) weight, or height, or shoe size, or reading test score. But there is also wide variation in both directions from that average. The weights, heights, or reading scores of most people tend to bunch up close to an average weight, or height, or reading ability. And there are fewer people at the extremes; i.e., there are fewer adults who are six inches taller or shorter than the average than there are those who are only one inch taller or shorter than the average. The average and the pattern of scores for any characteristic can be determined, and one individual's score can always be described in terms of the whole pattern.

Q. You say norms can be expressed in several ways. What is a percentile rank?

A. A percentile rank tells you what percent of the pupils in the norm group got the same score or a lower score on the test. For example, if a score of 25 correct answers on a certain test for fourth graders has a percentile rank of 52, it means that 52 percent of the pupils in the norm group scored 25 or lower on the test. Since the norm group was representative of all fourth graders in the nation, it is estimated that a pupil scoring 25 on the test is performing at a level equal to or above 52% of all the fourth graders in the nation. For most standardized achievement tests, percentile ranks are developed separately for each grade and for a particular time of the year. A score of 25, for example, may have a percentile rank of 52 for a fourth grader in the fall of fourth grade and a percentile rank of 47 in the spring of fourth grade. A percentile rank is not in any sense a "percent correct." It is not the percent of questions the pupil answered correctly, but rather the percent of pupils in the norm group who scored at or below that score.

Q. What is a stanine?

A. A stanine is a score on a nine-unit scale from 1 to 9, where a score of 5 describes average performance. The highest stanine is 9; the lowest is 1. Stanines are based on the pattern of scores described earlier. Except for 1 and 9, they divide the baseline into equal amounts of the characteristic being measured. Stanine 8 is as far above average (5) as stanine 2 is below average. As is shown in the figure below, most pupils score in the middle three stanines; 54 percent will score in stanines 4, 5, and 6. On the other hand, very few (4 percent) will score a stanine of 1 or a stanine of 9.
Teachers may use stanines to describe a pupil's performance to his parents during a parent-teacher conference. They may also be used to group pupils for special instruction. Since there are only 9 stanines, students, parents, and teachers are not likely to give too much weight to small differences among scores. Sometimes stanines are combined into more general classifications with verbal descriptions. Stanine 9 describes higher performance; stanines 7 and 8, above average; stanines 4, 5, and 6, average; stanines 2 and 3, below average; and stanine 1 describes lower performance, in relation to the norm group's performance. Remember, stanines, like all other norms, describe comparative, not absolute, performance.

Q. But we usually hear about grade equivalents. What are they?

A. A grade equivalent indicates the grade level, in years and months, for which a given score was the average or middle score in the standardization sample. For example, a score of 25 with the grade equivalent of 4.6 means that, in the norm group, 25 was the average score of pupils in the sixth month of the fourth grade. If, after the test has been standardized, another pupil in the sixth month of the fourth grade were to take the same fourth-grade test and score 25 correct, his performance would be "at grade level" or average for his grade placement. If he were to get 30 right or a grade equivalent of 5.3, he would have done as well as the typical fifth grader in the third month on that test. This does not mean that the fourth grader can do all fifth grade work. There are many things a fifth grader has learned that are not measured on a fourth-grade test. Similarly, a 3.3 grade equivalent for a fourth grader, would mean that he is performing, on the fourth-grade test, the way the average pupil in the third month of third grade would perform on that same test. It does not suggest that he has learned only third grade material.
Although grade equivalents may sound like a simple idea, they can be easily misunderstood. For this reason, schools are increasingly coming to rely on percentile ranks and stanines as more useful ways to interpret scores in relation to a norm group. In fact, some publishers recommend that grade equivalents not be used to report to teachers, parents, pupils, or the general public.

Q. Newspapers sometimes write about "scoring at or above the norm." What does scoring at the norm mean?

A. Whereas the word norms is used to describe the full range of scores the norm group obtained, the term the norm refers only to the mid-point in that range. People sometimes refer to the norms as the acceptable or desirable score. This is inaccurate. On a norm-referenced test, the norm is the average score obtained by the pupils who took the test during its standardization. The norm only indicates what is average; it does not describe how good that performance is in absolute terms. Suppose a reading test were given to a large, representative, national norm group and the average score for the group was 25. The norm for that group, then, is 25. It must be remembered, however, that of all the pupils in the national norm group, half scored above 25 and half scored at or below 25!

When the norm is expressed as a grade equivalent, it is still describing the middle score in the norm group. If the norm group was tested in the sixth month of grade 4, the average score for the group would convert to the grade equivalent of 4.6. But note that even in that norm group, fully half of all pupils actually in the sixth month of fourth grade scored at or below that norm or "grade level." If the same test is then given to another group, it would not be surprising to find many pupils scoring "below the norm." Remember, half of the norm group itself scored at or below the norm; that's the meaning of the word.

Q. But if a child's reading is "below the norm," that means he is a poor reader, doesn't it?

A. Not necessarily. It probably means he is not reading as well as the average American child in his grade, assuming that the test was well standardized. But it doesn't tell you how well the average child reads. If most of the children in the norm group read "well," the norm or average represents good reading. If most children read poorly, the norm would represent "poor" reading. Whether the norm group reads well or poorly is a judgment the test cannot make. Such decisions must be made by schools and parents

Q. But wouldn't it be worthwhile to try to teach all children to read at or above the norm?

A. Suppose that to score at the norm on a fourth-grade test, a pupil must answer 25 questions out of 40 correctly. Then, suppose we improve the teaching of reading so that all fourth-grade children in the nation score at least 25 and many score much higher than 25. Now all children are reading "at or above the norm," right? Wrong! As the scores have changed, so has their average -- the norm. If you were to standardize the test again, you might find that the middle or average score for the national norm group is now 31 out of 40. So, the norm now is 31, not 25, and half the pupils are still reading at or below the norm and half are reading above the norm. In other words, if everybody is above average, it's not the average anymore! This is the reason that the norm is not an absolute goal for everyone to attain. It is simply a statement of fact about the average of a group. If they all read better, then the norm moves higher. You've done something worthwhile, indeed, but it didn't bring everyone "up to the norm"! (The norm for a test which was standardized in the 1950's is no longer the norm, since more than half the pupils now read better than that. This is one of the reasons new tests must be standardized by the publishers every few years.)

Q. Some parents and teachers claim that most published standardized tests are unfair to minority group and inner-city children. Is that true?

A. There are really two questions involved here. The first has to do with the knowledge area being measured. Is it "fair," for example, to test a pupil's knowledge of addition? If this skill is considered important, and is part of the school's curriculum, then it is "fair" to test a child's mastery of this skill. It is important that parents and the school know how well each child performs on each skill. Of course, very few people would consider that mastery of addition was not a necessary skill. Test publishers try to concentrate on areas that most people consider important. However, if a test measures many areas that a community does not consider important, then the test should not be used in that community.

Assuming that it is important to measure particular areas, a second question must still be answered. Does the test measure the areas "fairly"? Have some test questions been stated in a way that will give certain children an "unfair" advantage? Will some questions "turn off' some children so that they will not do their best? Test publishers have been giving increasing attention to the question of the fairness of their tests. Many writers and editors from different backgrounds are involved in test-making. Questions are reviewed by members of several ethnic groups to correct for unintentional, built-in biases. In addition, the topics in most reading tests are chosen to be unfamiliar to almost all students. This helps to ensure that scores are based on reading skill and not on familiarity with the subject matter of the particular passage.

Q. Are national norms valid for all children?

A. Yes, national norms do have meaning and significance for all school systems. National norms represent one reality -- they represent the pattern of performance of all the nation's schoolchildren. All kinds of schools in all parts of the country are represented in that total pattern. The pattern of scores in any one area, even in a large city, is not likely to match the total pattern exactly. Differences are to be expected and should be explained to parents and the general public.

However, our children are growing up in a rapidly changing, competitive, and highly mobile society. After attending school in one community, they may, in later years, have to compete in the job market with others from all over the country. Thus, it is valuable for parents and school personnel to be able to evaluate local school performance in relation to the nation as a whole.

Q. But aren't there other useful comparisons to be made?

A. Of course! And there are other kinds of norm groups besides the national norm group. The group chosen for comparison should depend on what information the school needs. It is quite possible and often advisable to compare individual pupils with pupils in a district or city, with other pupils in similar communities nearby, with all pupils in the state, and so on. These regional or local norms are developed in a way similar to that for national norms. However, they describe the pattern of performance for some more narrowly defined group.

Q. Why don't you have tests that tell you whether or not a pupil has learned a skill, regardless of what other pupils know?

A. Such test do exist; they are called objective-referenced or criterion-referenced tests. In fact, the tests teachers use in their own classrooms are more like this kind of test than they are like norm-referenced tests. Suppose a teacher has given the class ten words to learn how to spell. At the end of the week, a teacher-made spelling test is given to see whether or not each pupil has learned to spell those ten words. The teacher is not interested in what percent of pupils nationally can spell those words; the question is, rather, "Can John spell these words or not?" An objective-referenced or criterion-referenced test is, then, a test which is used to determine whether or not an individual pupil has met an objective or a criterion of performance. An objective may be stated something like this: "The pupil can add two two-digit numbers requiring regrouping." Important questions arise, however, when you begin to plan an objective-referenced test. How many correct answers are needed to show that the pupil has achieved the objective? At what grade level should we expect him to meet the objective? Should every pupil be expected to meet every objective? These are not  easy questions to answer. Who will make the decisions? Other questions arise when a child does achieve the objective. Is it typical for a fourth grader to achieve this objective? Do most fourth graders know how to perform this task? Answering these questions brings us back to a comparison among individuals - or to a norm-referenced interpretation of test scores.
Of course, it is not necessary to choose between these two kinds of tests or ways of interpreting test results. Each way of looking at a pupil's performance provides useful information about what the schools are teaching and about what pupils are learning. Some tests are designed to offer both kinds of interpretation.

Q. Where can I get more information about testing?

A. You might first contact the testing coordinator or guidance director in your local school system. If there is a college or university nearby, you might seek information from the professor who teaches courses in tests and measurements. The testing of children is an important responsibility. We feel it is also part of our responsibility, as test publishers, to help you understand why and how testing is done. The staff of Harcourt Educational Measurement will be glad to be of service.