| Some Things 
			Parents Should Know About Testing Harcourt 
			Assessment, Inc.Q. What do you mean by 
			norm-referenced?
 A Series of Questions and 
			Answers
 
 Q Why do the schools test 
			our children?
 
 A. It is no news to parents that children differ. Even within a 
			single family, some children learn to walk or to talk sooner than 
			others. One child may be a good reader; another may excel in sports. 
			When children come into school, the teacher needs to know as much as 
			possible about how they differ in order to be able to match the 
			classroom teaching to the specific needs of the children. The school 
			administration also needs to be able to plan for the long-term 
			education of the pupils.
 
 Q. Why do teachers need to use published tests?
 
 A. Commercially published tests give the teacher much important 
			information about the pupils, information which the teacher cannot 
			obtain himself Of course, teachers get a great deal of information 
			about their pupils by observing their day-to-day work in class and 
			by testing their progress with teacher-made tests. Commercially 
			published tests are written by people who are experts in writing 
			test questions. These people are also curriculum specialists who 
			know what is being taught in schools all across the country. Most 
			commercially published tests cover a wide range of skills in one 
			test, whereas teacher-made tests usually cover only a single unit of 
			work. But perhaps the most important reason for using commercially 
			published tests is that the school can use the results obtained from 
			them to compare a pupil's school progress with the school progress 
			of other children throughout the country. These comparisons can be 
			made because the tests are norm-referenced and standardized on a 
			national population.
 
 A. Knowing that a pupil got 40 questions right on a test doesn't 
			give you enough information by itself.  How many questions were 
			there? Were they easy or hard? Is 40 a "good," "average," or "poor" 
			score? Often, what we really want to know is how this score compares 
			with the scores of other pupils of the same age or in the same 
			grade. Is it high, medium, or low in relation to the scores of 
			pupils in some large group? This way of describing performance is 
			called norm-referenced and the numbers that are used to give meaning 
			to a pupil's performance are called norms, or norm-referenced 
			scores.
 Q. What does 
			standardized mean?
 A. The test publisher develops the norms or norm-referenced scores 
			by a process called standardization. In order to find out what 
			scores are high, medium, or low, the test must be given to a large 
			number of schoolchildren across the country. The pupils who will be 
			in this national population sample will be carefully chosen. They 
			cannot all live in one area; they cannot all go to big schools; they 
			cannot all be of one race or socioeconomic group. The publisher will 
			use government census data and experience and knowledge to select a 
			group of several thousand pupils so that their scores on the test 
			will represent the scores that would have been obtained if all the 
			millions of children in the country had been tested.
 
 Once the test has been written and the standardization group has 
			been selected, the test publisher must make sure that the test's 
			directions are so clear and so specific that the test can always be 
			presented in the same way to all pupils. This is done so that all 
			children have the same chance to know what they are supposed to do 
			on the test. A test which has been written in this way and given to 
			a carefully selected group of pupils in a controlled manner is said 
			to be a standardized test.
 
 Q. How do you get norms from standardization?
 
 A. The norms are a way of summarizing how the pupils in the 
			standardization group did on the test. In this sense, the pupils 
			make the norms, not the test-maker. After the test has been given, 
			the test publisher has something that looks like this:
 
 120 third graders correctly answered 29 questions
 180 third graders correctly answered 28 questions
 215 third graders correctly answered 27 questions
 
 and so on for each grade and each possible score. In order to make 
			this information easier to understand, the test-publisher summarizes 
			it. One way of doing this is by reporting, for each test, the 
			average score in each grade. These are called grade equivalent 
			norms. Another way is to report what percentage of the pupils in a 
			grade scored at or below a certain score. These are called 
			percentile rank norms. A third type of norm describes how far a 
			pupil's performance is above or below the average performance for 
			that grade. These are called standard scores. (The most common 
			standard score is a stanine.) All of these methods of expressing a 
			score are simply ways of indicating where a particular score fits 
			into the pattern of all the scores earned by the pupils in the norm 
			group.
 
 Q. What do you mean by the pattern of scores?
 
 A. For practically any characteristic you can name, there are 
			differences among individuals. There is an average (or medium, or 
			typical) weight, or height, or shoe size, or reading test score. But 
			there is also wide variation in both directions from that average. 
			The weights, heights, or reading scores of most people tend to bunch 
			up close to an average weight, or height, or reading ability. And 
			there are fewer people at the extremes; i.e., there are fewer adults 
			who are six inches taller or shorter than the average than there are 
			those who are only one inch taller or shorter than the average. The 
			average and the pattern of scores for any characteristic can be 
			determined, and one individual's score can always be described in 
			terms of the whole pattern.
 Q. You say norms can be expressed 
			in several ways. What is a percentile rank?
 A. A percentile rank tells you what percent of the pupils in the 
			norm group got the same score or a lower score on the test. For 
			example, if a score of 25 correct answers on a certain test for 
			fourth graders has a percentile rank of 52, it means that 52 percent 
			of the pupils in the norm group scored 25 or lower on the test. 
			Since the norm group was representative of all fourth graders in the 
			nation, it is estimated that a pupil scoring 25 on the test is 
			performing at a level equal to or above 52% of all the fourth 
			graders in the nation. For most standardized achievement tests, 
			percentile ranks are developed separately for each grade and for a 
			particular time of the year. A score of 25, for example, may have a 
			percentile rank of 52 for a fourth grader in the fall of fourth 
			grade and a percentile rank of 47 in the spring of fourth grade. A 
			percentile rank is not in any sense a "percent correct." It is not 
			the percent of questions the pupil answered correctly, but rather 
			the percent of pupils in the norm group who scored at or below that 
			score.
 
 Q. What is a stanine?
 
 A. A stanine is a score on a nine-unit scale from 1 to 9, where a 
			score of 5 describes average performance. The highest stanine is 9; 
			the lowest is 1. Stanines are based on the pattern of scores 
			described earlier. Except for 1 and 9, they divide the baseline into 
			equal amounts of the characteristic being measured. Stanine 8 is as 
			far above average (5) as stanine 2 is below average. As is shown in 
			the figure below, most pupils score in the middle three stanines; 54 
			percent will score in stanines 4, 5, and 6. On the other hand, very 
			few (4 percent) will score a stanine of 1 or a stanine of 9.
 Teachers may use stanines to describe a pupil's performance to his 
			parents during a parent-teacher conference. They may also be used to 
			group pupils for special instruction. Since there are only 9 
			stanines, students, parents, and teachers are not likely to give too 
			much weight to small differences among scores. Sometimes stanines 
			are combined into more general classifications with verbal 
			descriptions. Stanine 9 describes higher performance; stanines 7 and 
			8, above average; stanines 4, 5, and 6, average; stanines 2 and 3, 
			below average; and stanine 1 describes lower performance, in 
			relation to the norm group's performance. Remember, stanines, like 
			all other norms, describe comparative, not absolute, performance.
 
 Q. But we usually hear about grade equivalents. What are they?
 
 A. A grade equivalent indicates the grade level, in years and 
			months, for which a given score was the average or middle score in 
			the standardization sample. For example, a score of 25 with the 
			grade equivalent of 4.6 means that, in the norm group, 25 was the 
			average score of pupils in the sixth month of the fourth grade. If, 
			after the test has been standardized, another pupil in the sixth 
			month of the fourth grade were to take the same fourth-grade test 
			and score 25 correct, his performance would be "at grade level" or 
			average for his grade placement. If he were to get 30 right or a 
			grade equivalent of 5.3, he would have done as well as the typical 
			fifth grader in the third month on that test. This does not mean 
			that the fourth grader can do all fifth grade work. There are many 
			things a fifth grader has learned that are not measured on a 
			fourth-grade test. Similarly, a 3.3 grade equivalent for a fourth 
			grader, would mean that he is performing, on the fourth-grade test, 
			the way the average pupil in the third month of third grade would 
			perform on that same test. It does not suggest that he has learned 
			only third grade material.
 Although grade equivalents may sound like a simple idea, they can be 
			easily misunderstood. For this reason, schools are increasingly 
			coming to rely on percentile ranks and stanines as more useful ways 
			to interpret scores in relation to a norm group. In fact, some 
			publishers recommend that grade equivalents not be used to report to 
			teachers, parents, pupils, or the general public.
 
 Q. Newspapers sometimes write about "scoring at or above the norm." 
			What does scoring at the norm mean?
 
 A. Whereas the word norms is used to describe the full range of 
			scores the norm group obtained, the term the norm refers only to the 
			mid-point in that range. People sometimes refer to the norms as the 
			acceptable or desirable score. This is inaccurate. On a 
			norm-referenced test, the norm is the average score obtained by the 
			pupils who took the test during its standardization. The norm only 
			indicates what is average; it does not describe how good that 
			performance is in absolute terms. Suppose a reading test were given 
			to a large, representative, national norm group and the average 
			score for the group was 25. The norm for that group, then, is 25. It 
			must be remembered, however, that of all the pupils in the national 
			norm group, half scored above 25 and half scored at or below 25!
 
 When the norm is expressed as a grade equivalent, it is still 
			describing the middle score in the norm group. If the norm group was 
			tested in the sixth month of grade 4, the average score for the 
			group would convert to the grade equivalent of 4.6. But note that 
			even in that norm group, fully half of all pupils actually in the 
			sixth month of fourth grade scored at or below that norm or "grade 
			level." If the same test is then given to another group, it would 
			not be surprising to find many pupils scoring "below the norm." 
			Remember, half of the norm group itself scored at or below the norm; 
			that's the meaning of the word.
 
 Q. But if a child's reading is "below the norm," that means he is a 
			poor reader, doesn't it?
 
 A. Not necessarily. It probably means he is not reading as well as 
			the average American child in his grade, assuming that the test was 
			well standardized. But it doesn't tell you how well the average 
			child reads. If most of the children in the norm group read "well," 
			the norm or average represents good reading. If most children read 
			poorly, the norm would represent "poor" reading. Whether the norm 
			group reads well or poorly is a judgment the test cannot make. Such 
			decisions must be made by schools and parents
 Q. But wouldn't it be worthwhile to 
			try to teach all children to read at or above the norm?
 A. Suppose that to score at the norm on a fourth-grade test, a pupil 
			must answer 25 questions out of 40 correctly. Then, suppose we 
			improve the teaching of reading so that all fourth-grade children in 
			the nation score at least 25 and many score much higher than 25. Now 
			all children are reading "at or above the norm," right? Wrong! As 
			the scores have changed, so has their average -- the norm. If you 
			were to standardize the test again, you might find that the middle 
			or average score for the national norm group is now 31 out of 40. 
			So, the norm now is 31, not 25, and half the pupils are still 
			reading at or below the norm and half are reading above the norm. In 
			other words, if everybody is above average, it's not the average 
			anymore! This is the reason that the norm is not an absolute goal 
			for everyone to attain. It is simply a statement of fact about the 
			average of a group. If they all read better, then the norm moves 
			higher. You've done something worthwhile, indeed, but it didn't 
			bring everyone "up to the norm"! (The norm for a test which was 
			standardized in the 1950's is no longer the norm, since more than 
			half the pupils now read better than that. This is one of the 
			reasons new tests must be standardized by the publishers every few 
			years.)
 
 Q. Some parents and teachers claim that most published standardized 
			tests are unfair to minority group and inner-city children. Is that 
			true?
 
 A. There are really two questions involved here. The first has to do 
			with the knowledge area being measured. Is it "fair," for example, 
			to test a pupil's knowledge of addition? If this skill is considered 
			important, and is part of the school's curriculum, then it is "fair" 
			to test a child's mastery of this skill. It is important that 
			parents and the school know how well each child performs on each 
			skill. Of course, very few people would consider that mastery of 
			addition was not a necessary skill. Test publishers try to 
			concentrate on areas that most people consider important. However, 
			if a test measures many areas that a community does not consider 
			important, then the test should not be used in that community.
 
 Assuming that it is important to measure particular areas, a second 
			question must still be answered. Does the test measure the areas 
			"fairly"? Have some test questions been stated in a way that will 
			give certain children an "unfair" advantage? Will some questions 
			"turn off' some children so that they will not do their best? Test 
			publishers have been giving increasing attention to the question of 
			the fairness of their tests. Many writers and editors from different 
			backgrounds are involved in test-making. Questions are reviewed by 
			members of several ethnic groups to correct for unintentional, 
			built-in biases. In addition, the topics in most reading tests are 
			chosen to be unfamiliar to almost all students. This helps to ensure 
			that scores are based on reading skill and not on familiarity with 
			the subject matter of the particular passage.
 
 Q. Are national norms valid for all children?
 
 A. Yes, national norms do have meaning and significance for all 
			school systems. National norms represent one reality -- they 
			represent the pattern of performance of all the nation's 
			schoolchildren. All kinds of schools in all parts of the country are 
			represented in that total pattern. The pattern of scores in any one 
			area, even in a large city, is not likely to match the total pattern 
			exactly. Differences are to be expected and should be explained to 
			parents and the general public.
 
 However, our children are growing up in a rapidly changing, 
			competitive, and highly mobile society. After attending school in 
			one community, they may, in later years, have to compete in the job 
			market with others from all over the country. Thus, it is valuable 
			for parents and school personnel to be able to evaluate local school 
			performance in relation to the nation as a whole.
 
 Q. But aren't there other useful comparisons to be made?
 
 A. Of course! And there are other kinds of norm groups besides the 
			national norm group. The group chosen for comparison should depend 
			on what information the school needs. It is quite possible and often 
			advisable to compare individual pupils with pupils in a district or 
			city, with other pupils in similar communities nearby, with all 
			pupils in the state, and so on. These regional or local norms are 
			developed in a way similar to that for national norms. However, they 
			describe the pattern of performance for some more narrowly defined 
			group.
 
 Q. Why don't you have tests that tell you whether or not a pupil has 
			learned a skill, regardless of what other pupils know?
 
 A. Such test do exist; they are called objective-referenced or 
			criterion-referenced tests. In fact, the tests teachers use in their 
			own classrooms are more like this kind of test than they are like 
			norm-referenced tests. Suppose a teacher has given the class ten 
			words to learn how to spell. At the end of the week, a teacher-made 
			spelling test is given to see whether or not each pupil has learned 
			to spell those ten words. The teacher is not interested in what 
			percent of pupils nationally can spell those words; the question is, 
			rather, "Can John spell these words or not?" An objective-referenced 
			or criterion-referenced test is, then, a test which is used to 
			determine whether or not an individual pupil has met an objective or 
			a criterion of performance. An objective may be stated something 
			like this: "The pupil can add two two-digit numbers requiring 
			regrouping." Important questions arise, however, when you begin to 
			plan an objective-referenced test. How many correct answers are 
			needed to show that the pupil has achieved the objective? At what 
			grade level should we expect him to meet the objective? Should every 
			pupil be expected to meet every objective? These are not  easy 
			questions to answer. Who will make the decisions? Other questions 
			arise when a child does achieve the objective. Is it typical for a 
			fourth grader to achieve this objective? Do most fourth graders know 
			how to perform this task? Answering these questions brings us back 
			to a comparison among individuals - or to a norm-referenced 
			interpretation of test scores.
 Of course, it is not necessary to choose between these two kinds of 
			tests or ways of interpreting test results. Each way of looking at a 
			pupil's performance provides useful information about what the 
			schools are teaching and about what pupils are learning. Some tests 
			are designed to offer both kinds of interpretation.
 
 Q. Where can I get more information about testing?
 
 A. You might first contact the testing coordinator or guidance 
			director in your local school system. If there is a college or 
			university nearby, you might seek information from the professor who 
			teaches courses in tests and measurements. The testing of children 
			is an important responsibility. We feel it is also part of our 
			responsibility, as test publishers, to help you understand why and 
			how testing is done. The staff of Harcourt Educational Measurement 
			will be glad to be of service.
 |