Parents Should Know About Testing
Q. What do you mean by
A Series of Questions and
Q Why do the schools test
A. It is no news to parents that children differ. Even within a
single family, some children learn to walk or to talk sooner than
others. One child may be a good reader; another may excel in sports.
When children come into school, the teacher needs to know as much as
possible about how they differ in order to be able to match the
classroom teaching to the specific needs of the children. The school
administration also needs to be able to plan for the long-term
education of the pupils.
Q. Why do teachers need to use published tests?
A. Commercially published tests give the teacher much important
information about the pupils, information which the teacher cannot
obtain himself Of course, teachers get a great deal of information
about their pupils by observing their day-to-day work in class and
by testing their progress with teacher-made tests. Commercially
published tests are written by people who are experts in writing
test questions. These people are also curriculum specialists who
know what is being taught in schools all across the country. Most
commercially published tests cover a wide range of skills in one
test, whereas teacher-made tests usually cover only a single unit of
work. But perhaps the most important reason for using commercially
published tests is that the school can use the results obtained from
them to compare a pupil's school progress with the school progress
of other children throughout the country. These comparisons can be
made because the tests are norm-referenced and standardized on a
A. Knowing that a pupil got 40 questions right on a test doesn't
give you enough information by itself. How many questions were
there? Were they easy or hard? Is 40 a "good," "average," or "poor"
score? Often, what we really want to know is how this score compares
with the scores of other pupils of the same age or in the same
grade. Is it high, medium, or low in relation to the scores of
pupils in some large group? This way of describing performance is
called norm-referenced and the numbers that are used to give meaning
to a pupil's performance are called norms, or norm-referenced
Q. What does
A. The test publisher develops the norms or norm-referenced scores
by a process called standardization. In order to find out what
scores are high, medium, or low, the test must be given to a large
number of schoolchildren across the country. The pupils who will be
in this national population sample will be carefully chosen. They
cannot all live in one area; they cannot all go to big schools; they
cannot all be of one race or socioeconomic group. The publisher will
use government census data and experience and knowledge to select a
group of several thousand pupils so that their scores on the test
will represent the scores that would have been obtained if all the
millions of children in the country had been tested.
Once the test has been written and the standardization group has
been selected, the test publisher must make sure that the test's
directions are so clear and so specific that the test can always be
presented in the same way to all pupils. This is done so that all
children have the same chance to know what they are supposed to do
on the test. A test which has been written in this way and given to
a carefully selected group of pupils in a controlled manner is said
to be a standardized test.
Q. How do you get norms from standardization?
A. The norms are a way of summarizing how the pupils in the
standardization group did on the test. In this sense, the pupils
make the norms, not the test-maker. After the test has been given,
the test publisher has something that looks like this:
120 third graders correctly answered 29 questions
180 third graders correctly answered 28 questions
215 third graders correctly answered 27 questions
and so on for each grade and each possible score. In order to make
this information easier to understand, the test-publisher summarizes
it. One way of doing this is by reporting, for each test, the
average score in each grade. These are called grade equivalent
norms. Another way is to report what percentage of the pupils in a
grade scored at or below a certain score. These are called
percentile rank norms. A third type of norm describes how far a
pupil's performance is above or below the average performance for
that grade. These are called standard scores. (The most common
standard score is a stanine.) All of these methods of expressing a
score are simply ways of indicating where a particular score fits
into the pattern of all the scores earned by the pupils in the norm
Q. What do you mean by the pattern of scores?
A. For practically any characteristic you can name, there are
differences among individuals. There is an average (or medium, or
typical) weight, or height, or shoe size, or reading test score. But
there is also wide variation in both directions from that average.
The weights, heights, or reading scores of most people tend to bunch
up close to an average weight, or height, or reading ability. And
there are fewer people at the extremes; i.e., there are fewer adults
who are six inches taller or shorter than the average than there are
those who are only one inch taller or shorter than the average. The
average and the pattern of scores for any characteristic can be
determined, and one individual's score can always be described in
terms of the whole pattern.
Q. You say norms can be expressed
in several ways. What is a percentile rank?
A. A percentile rank tells you what percent of the pupils in the
norm group got the same score or a lower score on the test. For
example, if a score of 25 correct answers on a certain test for
fourth graders has a percentile rank of 52, it means that 52 percent
of the pupils in the norm group scored 25 or lower on the test.
Since the norm group was representative of all fourth graders in the
nation, it is estimated that a pupil scoring 25 on the test is
performing at a level equal to or above 52% of all the fourth
graders in the nation. For most standardized achievement tests,
percentile ranks are developed separately for each grade and for a
particular time of the year. A score of 25, for example, may have a
percentile rank of 52 for a fourth grader in the fall of fourth
grade and a percentile rank of 47 in the spring of fourth grade. A
percentile rank is not in any sense a "percent correct." It is not
the percent of questions the pupil answered correctly, but rather
the percent of pupils in the norm group who scored at or below that
Q. What is a stanine?
A. A stanine is a score on a nine-unit scale from 1 to 9, where a
score of 5 describes average performance. The highest stanine is 9;
the lowest is 1. Stanines are based on the pattern of scores
described earlier. Except for 1 and 9, they divide the baseline into
equal amounts of the characteristic being measured. Stanine 8 is as
far above average (5) as stanine 2 is below average. As is shown in
the figure below, most pupils score in the middle three stanines; 54
percent will score in stanines 4, 5, and 6. On the other hand, very
few (4 percent) will score a stanine of 1 or a stanine of 9.
Teachers may use stanines to describe a pupil's performance to his
parents during a parent-teacher conference. They may also be used to
group pupils for special instruction. Since there are only 9
stanines, students, parents, and teachers are not likely to give too
much weight to small differences among scores. Sometimes stanines
are combined into more general classifications with verbal
descriptions. Stanine 9 describes higher performance; stanines 7 and
8, above average; stanines 4, 5, and 6, average; stanines 2 and 3,
below average; and stanine 1 describes lower performance, in
relation to the norm group's performance. Remember, stanines, like
all other norms, describe comparative, not absolute, performance.
Q. But we usually hear about grade equivalents. What are they?
A. A grade equivalent indicates the grade level, in years and
months, for which a given score was the average or middle score in
the standardization sample. For example, a score of 25 with the
grade equivalent of 4.6 means that, in the norm group, 25 was the
average score of pupils in the sixth month of the fourth grade. If,
after the test has been standardized, another pupil in the sixth
month of the fourth grade were to take the same fourth-grade test
and score 25 correct, his performance would be "at grade level" or
average for his grade placement. If he were to get 30 right or a
grade equivalent of 5.3, he would have done as well as the typical
fifth grader in the third month on that test. This does not mean
that the fourth grader can do all fifth grade work. There are many
things a fifth grader has learned that are not measured on a
fourth-grade test. Similarly, a 3.3 grade equivalent for a fourth
grader, would mean that he is performing, on the fourth-grade test,
the way the average pupil in the third month of third grade would
perform on that same test. It does not suggest that he has learned
only third grade material.
Although grade equivalents may sound like a simple idea, they can be
easily misunderstood. For this reason, schools are increasingly
coming to rely on percentile ranks and stanines as more useful ways
to interpret scores in relation to a norm group. In fact, some
publishers recommend that grade equivalents not be used to report to
teachers, parents, pupils, or the general public.
Q. Newspapers sometimes write about "scoring at or above the norm."
What does scoring at the norm mean?
A. Whereas the word norms is used to describe the full range of
scores the norm group obtained, the term the norm refers only to the
mid-point in that range. People sometimes refer to the norms as the
acceptable or desirable score. This is inaccurate. On a
norm-referenced test, the norm is the average score obtained by the
pupils who took the test during its standardization. The norm only
indicates what is average; it does not describe how good that
performance is in absolute terms. Suppose a reading test were given
to a large, representative, national norm group and the average
score for the group was 25. The norm for that group, then, is 25. It
must be remembered, however, that of all the pupils in the national
norm group, half scored above 25 and half scored at or below 25!
When the norm is expressed as a grade equivalent, it is still
describing the middle score in the norm group. If the norm group was
tested in the sixth month of grade 4, the average score for the
group would convert to the grade equivalent of 4.6. But note that
even in that norm group, fully half of all pupils actually in the
sixth month of fourth grade scored at or below that norm or "grade
level." If the same test is then given to another group, it would
not be surprising to find many pupils scoring "below the norm."
Remember, half of the norm group itself scored at or below the norm;
that's the meaning of the word.
Q. But if a child's reading is "below the norm," that means he is a
poor reader, doesn't it?
A. Not necessarily. It probably means he is not reading as well as
the average American child in his grade, assuming that the test was
well standardized. But it doesn't tell you how well the average
child reads. If most of the children in the norm group read "well,"
the norm or average represents good reading. If most children read
poorly, the norm would represent "poor" reading. Whether the norm
group reads well or poorly is a judgment the test cannot make. Such
decisions must be made by schools and parents
Q. But wouldn't it be worthwhile to
try to teach all children to read at or above the norm?
A. Suppose that to score at the norm on a fourth-grade test, a pupil
must answer 25 questions out of 40 correctly. Then, suppose we
improve the teaching of reading so that all fourth-grade children in
the nation score at least 25 and many score much higher than 25. Now
all children are reading "at or above the norm," right? Wrong! As
the scores have changed, so has their average -- the norm. If you
were to standardize the test again, you might find that the middle
or average score for the national norm group is now 31 out of 40.
So, the norm now is 31, not 25, and half the pupils are still
reading at or below the norm and half are reading above the norm. In
other words, if everybody is above average, it's not the average
anymore! This is the reason that the norm is not an absolute goal
for everyone to attain. It is simply a statement of fact about the
average of a group. If they all read better, then the norm moves
higher. You've done something worthwhile, indeed, but it didn't
bring everyone "up to the norm"! (The norm for a test which was
standardized in the 1950's is no longer the norm, since more than
half the pupils now read better than that. This is one of the
reasons new tests must be standardized by the publishers every few
Q. Some parents and teachers claim that most published standardized
tests are unfair to minority group and inner-city children. Is that
A. There are really two questions involved here. The first has to do
with the knowledge area being measured. Is it "fair," for example,
to test a pupil's knowledge of addition? If this skill is considered
important, and is part of the school's curriculum, then it is "fair"
to test a child's mastery of this skill. It is important that
parents and the school know how well each child performs on each
skill. Of course, very few people would consider that mastery of
addition was not a necessary skill. Test publishers try to
concentrate on areas that most people consider important. However,
if a test measures many areas that a community does not consider
important, then the test should not be used in that community.
Assuming that it is important to measure particular areas, a second
question must still be answered. Does the test measure the areas
"fairly"? Have some test questions been stated in a way that will
give certain children an "unfair" advantage? Will some questions
"turn off' some children so that they will not do their best? Test
publishers have been giving increasing attention to the question of
the fairness of their tests. Many writers and editors from different
backgrounds are involved in test-making. Questions are reviewed by
members of several ethnic groups to correct for unintentional,
built-in biases. In addition, the topics in most reading tests are
chosen to be unfamiliar to almost all students. This helps to ensure
that scores are based on reading skill and not on familiarity with
the subject matter of the particular passage.
Q. Are national norms valid for all children?
A. Yes, national norms do have meaning and significance for all
school systems. National norms represent one reality -- they
represent the pattern of performance of all the nation's
schoolchildren. All kinds of schools in all parts of the country are
represented in that total pattern. The pattern of scores in any one
area, even in a large city, is not likely to match the total pattern
exactly. Differences are to be expected and should be explained to
parents and the general public.
However, our children are growing up in a rapidly changing,
competitive, and highly mobile society. After attending school in
one community, they may, in later years, have to compete in the job
market with others from all over the country. Thus, it is valuable
for parents and school personnel to be able to evaluate local school
performance in relation to the nation as a whole.
Q. But aren't there other useful comparisons to be made?
A. Of course! And there are other kinds of norm groups besides the
national norm group. The group chosen for comparison should depend
on what information the school needs. It is quite possible and often
advisable to compare individual pupils with pupils in a district or
city, with other pupils in similar communities nearby, with all
pupils in the state, and so on. These regional or local norms are
developed in a way similar to that for national norms. However, they
describe the pattern of performance for some more narrowly defined
Q. Why don't you have tests that tell you whether or not a pupil has
learned a skill, regardless of what other pupils know?
A. Such test do exist; they are called objective-referenced or
criterion-referenced tests. In fact, the tests teachers use in their
own classrooms are more like this kind of test than they are like
norm-referenced tests. Suppose a teacher has given the class ten
words to learn how to spell. At the end of the week, a teacher-made
spelling test is given to see whether or not each pupil has learned
to spell those ten words. The teacher is not interested in what
percent of pupils nationally can spell those words; the question is,
rather, "Can John spell these words or not?" An objective-referenced
or criterion-referenced test is, then, a test which is used to
determine whether or not an individual pupil has met an objective or
a criterion of performance. An objective may be stated something
like this: "The pupil can add two two-digit numbers requiring
regrouping." Important questions arise, however, when you begin to
plan an objective-referenced test. How many correct answers are
needed to show that the pupil has achieved the objective? At what
grade level should we expect him to meet the objective? Should every
pupil be expected to meet every objective? These are not easy
questions to answer. Who will make the decisions? Other questions
arise when a child does achieve the objective. Is it typical for a
fourth grader to achieve this objective? Do most fourth graders know
how to perform this task? Answering these questions brings us back
to a comparison among individuals - or to a norm-referenced
interpretation of test scores.
Of course, it is not necessary to choose between these two kinds of
tests or ways of interpreting test results. Each way of looking at a
pupil's performance provides useful information about what the
schools are teaching and about what pupils are learning. Some tests
are designed to offer both kinds of interpretation.
Q. Where can I get more information about testing?
A. You might first contact the testing coordinator or guidance
director in your local school system. If there is a college or
university nearby, you might seek information from the professor who
teaches courses in tests and measurements. The testing of children
is an important responsibility. We feel it is also part of our
responsibility, as test publishers, to help you understand why and
how testing is done. The staff of Harcourt Educational Measurement
will be glad to be of service.