EXECUTIVE OFFICE OF THE PRESIDENT
                OFFICE OF MANAGEMENT AND BUDGET
                     WASHINGTON, D.C. 20503
                                
                                
                       February 17, 1999
                                
                                
                   DRAFT PROVISIONAL GUIDANCE
          ON THE IMPLEMENTATION OF THE 1997 STANDARDS 
    FOR THE COLLECTION OF FEDERAL DATA ON RACE AND ETHNICITY
                                
                                
NOTE FOR READERS

     As a follow-on to OMB's October 1997 announcement of revised government-wide
standards for the collection of data on race and ethnicity, the Tabulation Working Group of the
Interagency Committee for the Review of Standards for Data on Race and Ethnicity has recently
issued a report, "Draft Provisional Guidance on the Implementation of the 1997 Standards for the
Collection of Federal Data on Race and Ethnicity."  This guidance, which has been developed
with the involvement of many Federal agencies, essentially was requested by those agencies and
the many users of data on race and ethnicity.

     The guidance focuses on three areas:  collecting data using the new standards, tabulating
data collected under the new standards, and building bridges to compare data collected under the
new and the old standards.  At this juncture, the guidance is often in the form of alternatives for
discussion rather than recommendations for implementation.  In many areas work is ongoing,
and the guidance will be amended as additional research and analyses are completed.

     At this juncture, we are seeking broader comment on the guidance.  In keeping with the
process that guided review and revision of the standards for data on race and ethnicity, we are
looking forward to an open dialogue on this draft provisional guidance.  Following a two month
period for discussion by stakeholders within and outside government, we expect to issue
provisional guidance at the end of April.  We expect the guidance issued at that time will evolve
further as data from Census 2000 and other data collections employing the new collection
standards become available.

     We look forward to your review and comments, and welcome your questions.


                              Katherine K. Wallman
                              Chief Statistician   

                              DRAFT 

                       PROVISIONAL GUIDANCE

                             ON THE 

                         IMPLEMENTATION 

                    OF THE 1997 STANDARDS FOR

                FEDERAL DATA ON RACE AND ETHNICITY


                           Prepared By


                     Tabulation Working Group
      Interagency Committee for the Review of Standards for
                    Data on Race and Ethnicity
                    

                                                                                                            February 17, 1999                        Table of Contents

I.   Background
          A.  The Need for Tabulation Guidelines and Alternative Approaches
          B.  General Guidelines for Tabulating Data on Race
                        C.  Points of Clarification Regarding the 1997 Standards
          D.  Criteria Used in Developing the Tabulation Guidelines

II.  Collecting Data on Race and Ethnicity Using the New Standards
          A.  Developing Procedures for Data Collection (Full Report at Appendix B)
          B.  Best Practices in Survey Design and Data Processing (Under development)
     
III. Tabulating Data on Race and Ethnicity Collected Under the New Standards
          A.  Decennial Census
          B.  Other Surveys and Administrative Records

IV.  Using Data on Race and Ethnicity Collected Under the New Standards
          A.  Redistricting
          B.  Equal Employment Opportunity
          C.  Vital Records and Intercensal Estimates 
          D.  Issues for Further Research (Under Development)

V.   Comparing Data Under the Old and the New Standards (Full Report at Appendix D)
          A.  Introduction
          B.  Methods for Bridging
          C.  Methods of Evaluation
          D.  Examination of the Results with Respect to the Evaluation Criteria
            
Appendix A.  Standards for Maintaining, Collecting, and Presenting Federal Data on Race and     
                       Ethnicity

Appendix B.  Procedural Implementation of the New Standards for Data on Race and Ethnicity -- 
                      Phase I Report

Appendix C.  Census 2000 Dress Rehearsal Prototype Redistricting Data

Appendix D.  Bridge Report:  Tabulation Options for Trend Analysis
DRAFT PROVISIONAL GUIDANCE ON THE IMPLEMENTATION OF 
    THE  1997 STANDARDS FOR FEDERAL DATA ON RACE AND ETHNICITY

                           Prepared by

                    Tabulation Working Group 
      Interagency Committee for the Review of Standards for
                    Data on Race and Ethnicity
                                 
                                 
The guidance presented in this report has been developed to complement the Federal
Government's decision in October 1997 to provide an opportunity for individuals to select one or
more races when responding to agency requests for data on race and ethnicity.  To foster
comparability across data collections carried out by various agencies, it is useful for those
agencies to report responses of more than one race using some standardized tabulations or
formats.  

The report briefly explains why the tabulation guidelines are needed, reviews the general
guidance issued when the new standards were adopted in October 1997, and provides
information on the criteria used in developing the guidelines.  This report also addresses a larger
set of implementation questions that have emerged during the working group's deliberations. 
Thus, the report considers:

     Collecting data on race and ethnicity using the new standards, including aggregate data
     reporting,

     Tabulating Census 2000 data and data on race and ethnicity collected in surveys and from
     administrative records,

     Using data on race and ethnicity in applications such as legislative redistricting and equal
     employment opportunity monitoring, and

     Comparing data under the old and the new standards when conducting analyses.

In addition, the appendices to the draft report contain the full text of the reports on the research
that has been conducted in two areas:  best procedural practices for implementing the new
standards, and approaches for bridging between data collected under the old standards and data
collected under the new standards.   

The guidelines are necessarily provisional pending the availability of data from Census 2000 and
other data systems as the new standards are implemented.  They are likely to be reviewed and
refined as Federal agencies and others gain experience with data collected under the new
standards.  In addition, in some portions of this report, guidelines have not yet been determined. 
Instead, options are presented and guidelines in these areas will be issued at a later date. 
  
OMB expects to issue this provisional guidance by the end of April 1999, following a period of
public discussion of this draft by interested users.  As noted in the Table of Contents and the
report, a few sections are still "under development"and will be available for review at a later
time.I.   BACKGROUND 

This part of the report discusses why guidance is needed for tabulating data collected using the
1997 standards, reiterates the general guidance issued in October 1997, provides clarification of
several aspects of the new standards, and presents the criteria that were developed for evaluating
bridging methods and presenting data.

A.  The Need for Tabulation Guidelines and Alternative Approaches

On October 30, 1997, the Office of Management and Budget (OMB) published "Standards for
Maintaining, Collecting, and Presenting Federal Data on Race and Ethnicity" (Federal Register,
62 FR 58781 - 58790), which are reprinted in Appendix A.  The new standards reflect a change
in data collection policy, making it possible for Federal agencies to collect information that
reflects the increasing diversity of our Nation's population stemming from growth in interracial
marriages and immigration.  Under the new policy, agencies are now required to offer
respondents the option of selecting one or more of the following five racial categories included in
the updated standards:

--   American Indian or Alaska Native.  A person having origins in any of the original
     peoples of North and South America (including Central America), and who maintains
     tribal affiliation or community attachment.

--   Asian.  A person having origins in any of the original peoples of the Far East, Southeast
     Asia, or the Indian subcontinent including, for example, Cambodia, China, India, Japan,
     Korea, Malaysia, Pakistan, the Philippine Islands, Thailand, and Vietnam.

--   Black or African American.  A person having origins in any of the black racial groups
     of Africa.  Terms such as "Haitian" or "Negro" can be used in addition to "Black or
     African American." 

--   Native Hawaiian or Other Pacific Islander.  A person having origins in any of the
     original peoples of Hawaii, Guam, Samoa, or other Pacific Islands.

--   White.  A person having origins in any of the original peoples of Europe, the Middle
     East, or North Africa.

These five categories are the minimum set for data on race for Federal statistics, program
administrative reporting, and civil rights compliance reporting.  

With respect to ethnicity, the standards provide for the collection of data on whether or not a
person is of "Hispanic or Latino" culture or origin.  (The standards do not permit a multiple
response that would indicate an ethnic heritage that is both Hispanic or Latino and non-Hispanic
or Latino.)  This category is defined as follows:

--   Hispanic or Latino.  A person of Cuban, Mexican, Puerto Rican, South or Central
     American, or other Spanish culture or origin, regardless of race.  The term, "Spanish
     origin," can be used in addition to "Hispanic or Latino."

As a result of the change in policy for collecting data on race, the reporting categories used to
present these data must similarly reflect this change.  In keeping with the spirit of the new
standards, agencies cannot collect multiple responses and then report and publish data using only
the five single race categories.  Agencies are expected to provide as much detail as possible on
the multiple race responses, consistent with agency confidentiality and data quality procedures. 
As provided by the standards, OMB will consider any agency variances to this policy on a case
by case basis.  

Based on research to date, it is estimated that less than two percent of the Nation's total
population is likely to identify with more than one race.  This percentage may increase as those
who identify with more than one racial heritage become aware of the opportunity to report more
than one race.  In the early years of the standards' implementation, there will be issues of data
quality and confidentiality related to sample size that may restrict the amount of data that can be
published for some combinations of multiple race responses.  Over time, however, the size of
these data cells may increase.  It should be noted that such data quality and confidentiality
problems for small population groups also existed under the old standards, where sample sizes
prevented presentation of data on certain population groups such as American Indians.  The
possible multiple race combinations under the new standards, some with small data cells, serve
to make such data quality concerns more apparent.  Some balance will need to be struck between
having a tabulation showing the full distribution of all possible combinations of multiple race
responses and presenting only the minimum -- that is, a single aggregate of people who reported
more than one race.

B.  General  Guidelines for Tabulating Data on Race

In response to concerns that had been raised about how Federal agencies would tabulate multiple
race responses, OMB in the October 30, 1997, Federal Register notice issued the following
general guidance:  

     Consistent with criteria for confidentiality and data quality, the tabulation procedures
     used by the agencies should result in the production of as much detailed information on
     race and ethnicity as possible.   

     Guidelines for tabulation ultimately must meet the needs of at least two groups within the
     Federal Government, with the overriding objective of providing the most accurate and
     informative body of data.  

     (1)  The first group is composed of those Federal Government officials charged with
          carrying out constitutional and legislative mandates, such as redistricting
          legislatures, enforcing civil rights laws, and monitoring progress in anti-
          discrimination programs.  (The legislative redistricting file produced by the
          Bureau of the Census, also known as the Public Law 94-171 file, is an example of
          a file meeting such legislative needs.)  

     (2)  The second group consists of the staff of Federal statistical agencies producing
          and analyzing data that are used to monitor economic and social conditions and
          trends.

     Many of the needs of the first group can be met with an initial tabulation that provides,
     consistent with standards for data quality and confidentiality, the full detail of racial
     reporting; that is, the number of people reporting in each single race category and the
     number reporting in each of the possible combinations of races, which would add to the
     total population.  

     Depending on the judgment of users, the combinations of multiple responses could be
     collapsed.  

     (1)  One method would be to provide separate totals for those reporting in the most
          common multiple race combinations and to collapse the data for other less
          frequently reported combinations.  The specifics of the collapsed distributions
          would be dependent on the results of particular data collections.  

     (2)  A second method would be to report the total selecting each particular race,
          whether alone or in combination with other races.  These totals would represent
          upper bounds on the size of the populations who identified with each of the racial
          categories.  In some cases, this latter method could be used for comparing data
          collected under the old standards with data collected under the new standards.  

     It is important that Federal agencies with the same or closely related responsibilities
     adopt the same tabulation method.  

     Regardless of the method chosen for collapsing multiple race responses, Federal agencies
     must make available the total number reporting more than one race, if confidentiality and
     data quality requirements can be met, in order to ensure that any changes in response
     patterns resulting from the new standards can be monitored over time.
  
     Different tabulation procedures might be required to meet various needs of Federal
     agencies for data on race.  Nevertheless, Federal agencies often need to compare racial
     and ethnic data.  Hence, some standardization of tabulation categories for reporting data
     on race is desirable to facilitate such comparisons.  

The October 30, 1997, Federal Register Notice identified four areas where further research was
needed in how to tabulate data under the new standards:  

     (1)  How should the data be used to evaluate conformance with program objectives in
          the area of equal employment opportunity and other anti-discrimination
          programs? 

     (2)  How should the decennial census data for many small population groups with
          multiple racial heritages be used to develop sample designs and survey controls
          for major demographic surveys?  

     (3)  How do we introduce the use of the new standards in the vital statistics program
          which obtains the number of births or deaths from administrative records, but uses
          intercensal population estimates in determining the rates of births and deaths? 

     (4)  And more generally, how can we conduct meaningful comparisons of data
          collected under the previous standards with those that will be collected under the
          new standards?

In order to address these and other issues and to ensure that tabulation methodologies would  be
carefully developed and coordinated among the Federal agencies, OMB assembled a group of
statistical and policy analysts drawn from the Federal agencies that generate or use these data. 
Over the past year, this group has considered tabulation issues and developed the draft
provisional guidance that is presented in this report for use by Federal agencies.  The work of this
group has included:  (1) a review of Federal data needs and uses to ensure that the tabulation
guidelines produce data that meet statutory and program requirements; (2) cognitive testing of
the wording of questions; (3) development of a form for reporting aggregate data; (4) evaluation
of different methods of bridging from the new to the old standards; and (5) development of
guidelines for presenting data on multiple race responses that meet accepted data quality and
confidentiality standards.

The tabulation guidance in this report is necessarily provisional pending the availability of
Census 2000 data and other data systems as the new collection standards are implemented. 
These guidelines will be reviewed and modified as the agencies and other data users gain
experience with data collected using the new standards.

C.  Points of Clarification Regarding the 1997 Standards

A few questions about the new standards have emerged over the past year.  This section
elaborates on several points in the standards that have been a source of confusion for some users.  

Under the new standards, "Hispanic or Latino" is clearly designated as an ethnicity and not as a
race.  Whether or not an individual is Hispanic, every effort should be made to ascertain the race
or races with which an individual identifies.  

The two-question format, with the ethnicity question preceding the race question, should be used
when information is collected through self-identification.  Although the standards permit the use
of a combined question when collecting  data by observer identification, the use of the two-
question format is strongly encouraged even where observer identification is used. 
Regardless of the question format, observers are expected to attempt to identify the individual's
race(s).

The standards require that at a minimum the total number of persons identifying with more than
one race be reported.  It is stressed that this is a minimum; agencies are strongly encouraged to
report detailed information on specific racial combinations subject to constraints of data
reliability and confidentiality standards.  

The following wording concerning the reporting of data when the combined question is used is
clarified in the paragraph below:

     "In cases where data on multiple responses are collapsed, the total number of respondents
     reporting 'Hispanic or Latino and one or more races' and the total number of respondents
     reporting 'more than one race' (regardless of ethnicity) shall be provided."  (Section 2b of
     the standards)

Race by ethnicity always should be reported when confidentiality permits.  If not, the first level
of collapsing should be ethnicity by the single races and ethnicity for those reporting more than
one race.  Thus, an Hispanic or Latino respondent reporting one race should be reported both as
Hispanic or Latino and as a member of that single race.  If the respondent selects more than one
race, he or she should be reported in the particular racial combination as well as in the Hispanic
or Latino category.  Reporting a composite -- that is, the number of people who responded
"Hispanic or Latino" and more than one race -- is a minimum that only should be used if more
detailed reporting would violate data reliability and confidentiality standards.

The rules discussed in Section 4 of the new standards concerning the presentation of data on race
and ethnicity under special circumstances are not to be invoked unilaterally by an agency.  If
the agency believes the standard categories are inappropriate, the agency must request a specific
variance from OMB.  

The new standards do not include an "other race" category.  For the sole purpose of the Census
2000 data collection, OMB has granted an exception to the Census Bureau to use a category
called "Some Other Race."

D.  Criteria Used in Developing the Tabulation Guidelines

The interagency expert group on tabulations generated criteria that could be used both to evaluate
the technical merits of different bridging procedures (See Part V and Appendix D) and to display
data under the new standards.  The relative importance of each criterion will depend on the
purpose for which the data are intended to be used.  For example, in the case of bridging to the
past, the most important criterion is "measuring change over time," while "congruence with
respect to respondent's choice" will be more critical for presenting data under the new standards.  

The criteria set forth below are designed only to assess the technical adequacy of the various
statistical procedures.  The first two criteria listed below are central to consideration of bridging
methods.  The next six criteria apply both to bridging and long-term tabulation decisions.  The
last criterion is of primary importance for future tabulations of data collected under the new
standards. 

Bridging:

     Measure change over time.  This is the most important criterion for bridging, because
     the major purpose of any historical bridge will be to measure true change over time as
     distinct from methodologically induced change.  The ideal bridging method, under this
     criterion, would be one that matches how the respondent would have responded under the
     old standards had that been possible.  In this ideal situation, differences between the new
     distribution and the old distribution would reflect true change in the distribution itself.

     Minimize disruptions to the single race distribution.  This criterion applies only to
     methods for bridging.  Its purpose is to consider how different the resulting bridge
     distribution is from the single-race distribution for detailed race under the new standards. 
     To the extent that a bridging method can meet the other criteria and still not differ
     substantially from the single-race proportion in the ongoing distribution, it will facilitate
     looking both forward and backward in time.  

Bridging and future tabulations:  

     Range of applicability.  Because the purpose of the guidelines is to foster consistency
     across agencies in tabulating racial and ethnic data, tabulation procedures that can be used
     in a wide range of programs and varied contexts are usually preferable to those that have
     more limited applicability. 

     Meet confidentiality and reliability standards.  It is essential that the tabulations
     maintain the confidentiality standards of the statistical organization while producing
     reliable estimates.  

     Statistically defensible.   Because tabulations may be published by statistical agencies
     and/or provided in public use data, the recommended tabulation procedures should follow
     recognized statistical practices.  

     Ease of use.  Because the tabulation procedures are likely to be used in a wide variety of
     situations by many different people, it is important that they can be implemented with a
     minimum of operational difficulty.  Thus, the tabulation procedures must be capable of
     being easily replicated by others.  

     Skill required.  Similarly, it is important that the tabulation procedures can be
     implemented by individuals with relatively little statistical knowledge.  

     Understandability and communicability.  Again, because the tabulation procedures
     will likely be used, as well as presented, in a wide variety of situations by many different
     people, it is important that they be easily explainable to the public.

Future tabulations:

     Congruence with respondent's choice.  Because of changes in the categories and the
     respondent instructions accompanying the question on race (allowing more than one
     category to be selected), the underlying logic of the tabulation procedures must reflect to
          the greatest extent possible the full detail of race reporting. II.  COLLECTING DATA ON RACE AND ETHNICITY USING THE NEW
     STANDARDS

This part of the report currently provides a summary of the Phase I Report on Procedural
Implementation of the New Standards for Data on Race and Ethnicity, which is contained in
Appendix B.

A.  Developing Procedures for Data Collection

An interagency committee has been continuing past research efforts to develop procedures to
collect and aggregate data on race and ethnicity.  This research is designed to produce guidelines
that address three areas:  (1) wording and format of questions that ask for self-reported data on
race and Hispanic or Latino origin; (2) wording and format of instructions and forms that collect
aggregate data on race and Hispanic or Latino origin; and (3) instructions and training procedures
for field interviewers and administrative personnel who will be using these questions and forms. 
Guidelines will be continually reviewed and modified as implementation of the new standards
occurs, feedback from agencies is received, and new research findings become available.   

Members of the procedures committee represent the Departments of Health and Human Services,
Commerce, Education, Labor, and Veterans Affairs, and the General Accounting Office.  This
summary briefly describes the Phase I research, offers initial guidelines for agencies developing
new data collection procedures, and includes a schedule for the completion of work by this
committee.  The full report of the committee includes the research design and methods, results of
Phase I, examples of test questions and forms, and a broader discussion of guidelines and
problems identified. 

Developing and Testing Self-Reported Race and Ethnicity Questions 

A goal of this research is to provide guidance on the wording and format of questions for self-
reporting race and Hispanic or Latino origin depending on the mode of administration. 
Questions administered by telephone or in a face-to-face personal interview have been tested in
cognitive laboratory interviews; self-administered questions are not included in this testing
because the Census Bureau previously conducted such research in preparation for Census 2000. 
To date, 32 cognitive interviews have been completed; another 18 are planned for Phase I and at
least 25 more for Phase II.  

Among the 32 subjects interviewed, 13 reported their race as Black, 3 reported Asian, 2 reported
Native Hawaiian, 4 reported more than one race, and 10 reported White, of which 2 also reported
Hispanic or Latino origin.  No American Indians or Alaska Natives have been interviewed yet in
Phase I.  Subjects were first asked routine demographic questions as well as the test Hispanic or
Latino origin and race questions for themselves and members of their household.  Then,
debriefings were conducted to learn more about the subjects' understanding of the questions and
terms used.  
 
Generally, subjects were able to answer without difficulty the race and Hispanic or Latino origin
questions.  In the cognitive interviews, understanding of the intent of a race or Hispanic origin
question was shared but individual differences in the interpretation and meaning of terms used
was found, as was confusion regarding the separation of Hispanic or Latino origin from race. 

As expected, subjects who were interviewed face-to-face seemed to use and rely on the
flashcards to select a response.  Subjects interviewed by telephone had a bit more difficulty
answering the race questions since they had to listen to a relatively long list of response options.
Also, there was some evidence that the instruction to "...select one or more..." was misunderstood
on the telephone to mean that the subject had to select more than one race.  Section 1 in
Appendix B describes in detail the results of testing the questions on race and ethnicity. 

Based on these interviews, the following initial guidelines for the design of questions on race and
ethnicity are offered:

     Communicate clearly an instruction that allows, but does not require, multiple responses
     to the race question.  

     Consider using an instruction to answer both the Hispanic or Latino origin question and
     the race question.   

     For data collection efforts requiring detailed Hispanic or Latino origin or detailed race
     information, consider options to collect further information through write-in entries or
     follow-up questions asked by the interviewer. 
 
     Take mode of administration carefully into account when designing questions and
     instructions.

     Provide definitions to the minimum race categories when possible. 

     Adhere to the specific terminology as stated in the October 30, 1997, standards.  

Developing and Testing Aggregate Reporting Forms

Implementing the revised standards will cause fundamental changes to the ways in which data on
race and Hispanic or Latino origin have previously been aggregated and reported.  Therefore, a
second goal of this research is to provide guidance on the design of reporting forms that will be
used by administrative personnel to aggregate data on race and Hispanic or Latino origin for a
given population (e.g., reporting race and ethnicity for a school population). 

Twenty cognitive interviews are planned for this phase of the research.  Three different forms are
being tested with subjects who are familiar with reporting aggregate data for a given population,
but not necessarily familiar with the revised standards.  Fourteen interviews have been completed
thus far, 7 in cognitive laboratories and 7 on-site.  Of the 14 respondents interviewed, 5 worked
for the Federal Government, 6 worked in private industry, 2 worked in local correctional
facilities, and 1 worked in a school.  

For the laboratory testing, subjects were given 'dummy' records of applications that contained
multiple race responses as well as combined Hispanic or Latino origin and race questions.  For
the on-site interviews, subjects referred to agency data. 

None of the forms tested were completed accurately without interviewer intervention. 
Regardless of the form tested or whether the testing was conducted in a laboratory or on-site,  the
most common problem was the requirement to count and report race for individuals who are of 
Hispanic or Latino origin.  As an illustration, one subject stated  "It's (the form) basically asking
how Hispanics were separated into groups of races.  I think the part that confuses me is that our
Hispanics do not view themselves as another race.  And so that is kind of what threw me off  it's
asking for Hispanics who had marked 'White,' but they don't.  They would have checked
Hispanic."   Discussions with subjects revealed that all but one worked for agencies that have
used the single question -- combined race and ethnicity format -- to collect data.  Several
methodological problems also emerged and will be corrected prior to further testing.  They are
discussed in detail in Appendix B, Section 2. 

Even though there were many problems found in developing and testing aggregate forms, some
initial guidelines can be put forth at this time. 

     If possible, allow for the reporting of every combination of multiple race responses.  
 
     Provide definitions that assist in understanding the concepts of single race reports and 
     multiple race reports as well as the distinction between ethnicity and race.   

     Explain how the missing data should be reported.   

     Professionally design the form and include clear instructions. 

Development of Field Instructions and Training Procedures

Work to develop interviewer instructions and interviewer training procedures will begin in the
Spring of 1999.  Plans include developing and testing different training modules and interviewer
instructions, depending on the mode of administration and the type of data collection.  This work
will, in all likelihood, not address new issues or problems.  However, since the new standards do
encompass several distinct changes, it seems timely to address in a more systematic way some
longstanding issues in the fielding of the questions, and ways that interviewers can be trained to
improve data quality.  Specific procedures on how to ask the questions and, in some cases, how
to instruct the respondent to use the flashcard, will be developed along with suggested
interviewer probes, definitions, and statements that can be used to answer respondent questions.

Schedule

Phase I was ongoing through 1998 and will be completed at the beginning of  April 1999.  Phase
II will begin in April 1999 and will be completed by the end of July 1999.  A final report
encompassing both phases should be available by the end of September 1999.  
B.  Best Practices in Survey Design and Data Processing

(Under development)III. TABULATING DATA ON RACE AND ETHNICITY COLLECTED USING THE
     NEW STANDARDS

This part of the report describes options for tabulating data on race and ethnicity collected under
the new standards to meet various Federal needs for these data.  

A.  Decennial Census

The Census 2000 questionnaire will provide individuals the opportunity to self-report their racial
identity by selecting one or more races.  For purposes of Census 2000 only, in an effort to
encourage response to this question, OMB has approved the use of a sixth category -- "Some
Other Race" -- in addition to the minimum five categories.  

This discussion covers preliminary tabulations plans for the six categories of race and the two
categories of ethnicity ("Hispanic or Latino" and "Not Hispanic or Latino") and for possible
combinations of these racial and ethnic categories.  It does not address tabulation plans for
detailed groups of American Indian and Alaska Native, Asian, or Native Hawaiian and Other
Pacific Islander populations for which information will be collected in Census 2000.

For data from the Census 2000 Dress Rehearsal sites, table shells will be available on the Internet
through the Census Bureau's American FactFinder.  The data user will be able to use the inquiry
system in the American FactFinder to obtain table shells filled with data for user-selected
geographic areas and for population universes defined by race and ethnicity down to the census
tract level.  The amount of data on population characteristics available in table shells will be
roughly the same as in printed reports in 1990 for counties and for places of 10,000 or more
population.  

Protection of Confidentiality in Data from Census 2000

To maintain confidentiality as required by law (Title 13, United States Code), the Census Bureau
uses a confidentiality edit to ensure that published data do not disclose information about specific
individuals, households, and housing units.  The result is that a small amount of uncertainty is
introduced into some of the census data to prevent identification of specific individuals,
households, or households.

As with data from the 1990 census, a confidentiality edit will be implemented for data from
Census 2000 by selecting a sample of census households from internal census files and
interchanging their data with data from other households that have identical numbers of
household members, but that are in different locations within the same state.  The net result of
this procedure is that the data user's ability to obtain census data is increased, particularly for
small geographic areas and small population groups.  


Approach for Tabulations by Race and Ethnicity for Census 2000

The proposed approach reflects OMB's preliminary guidelines (See Part I, Section B) on
tabulations by race and ethnicity.  The discussion of the approach includes data on both
population totals for racial and ethnic categories and on population characteristics (e.g., age and
sex) for racial and ethnic categories. 
     
Before describing preliminary plans for tabulations by race and ethnicity, it is helpful to describe
both the maximum number of racial and/or ethnic categories for which data could be provided
and some of the other racial and/or ethnic categories for which data could be provided.  

There are 63 potential single and multiple race categories, including 6 categories for those who
marked exactly one race and 57 categories for those who marked two or more races.  These 57
categories of two or more races include the 15 possible combinations of two races (for example,
Asian and White), the 20 possible combinations of three races, the 15 possible combinations of
four races, the 6 possible combinations of five races, and the 1 possible combination of all six
races.

There are two ethnic categories (Hispanic or Latino, and Not Hispanic or Latino).  Thus there are
126 categories (63 x 2) in which the population could be classified by both race and ethnicity. 

The 63 mutually exclusive and exhaustive categories of race may be collapsed down to 7
mutually exclusive and exhaustive categories by combining the 57 categories of two or more
races.  These 7 categories are: White alone, Black or African American alone, American Indian
and Alaska Native alone, Asian alone, Native Hawaiian and Other Pacific Islander alone, Some
other race alone, and Two or more races. 

Alternative groupings for tabulations by race reflect OMB's preliminary guidelines to show  "the
total selecting each particular race, whether alone or in combination."  In combination literally
means "in combination with one or more other races."  In this "all-inclusive" approach,
tabulations would be shown for each of six categories, which will overlap and will add to more
than the total population to the extent that individuals report more than one race.  These six
categories are: White alone or in combination, Black or African American alone or in
combination, American Indian and Alaska Native alone or in combination, Asian alone or in
combination, Native Hawaiian and Other Pacific Islander alone or in combination, and Some
Other Race alone or in combination.

As in the case of the 63 racial categories, both tabulations by race of the 7 mutually exclusive
and exhaustive categories and tabulations by race alone or in combination could be classified by
ethnicity (Hispanic or Latino, and Not Hispanic or Latino). 

Because of concerns about the usefulness and reliability of data on population characteristics for
small populations, about issues with respect to confidentiality, and about providing data products
so voluminous that most data cell values would be zero, the Census Bureau is planning (as it has
in previous censuses) to present more detail by race and ethnicity for population totals than for
population characteristics.  For example, Census 2000 data products might show a population
total for a specific racial or ethnic group (e.g., 50) in a small geographic area, but not show data
on characteristics such as household relationship, education, income, and tenure for this racial or
ethnic group.

Preliminary plans for tabulations by race and ethnicity for population totals and for population
characteristics are discussed in the following two sections.  The amount of detail shown in
tabulations by race and ethnicity in data products from Census 2000 will vary with the purpose
and size of each product.  Planned tabulations for population totals by race and ethnicity from
four data products are discussed:  the Public Law 94-171 file (which is a 100-percent data
product), the 100-percent demographic profile, the 100-percent summary file, and 100-percent
table shells.  Planned tabulations for population characteristics by race and ethnicity are
discussed together for the 100-percent and sample summary files and the 100-percent and sample
table shells.  (The 100-percent data products are based on data collected on all questionnaires.  In
comparison, sample data products are based on data collected only on long-form questionnaires.)

As noted above, this discussion does not discuss tabulation plans for detailed groups of American
Indian and Alaska Native, Asian, or Native Hawaiian and Other Pacific Islander populations.  It
may be noted, however, that tabulations for these detailed categories will not be included on the
PL 94-171 file, but will be included in the other Census 2000 data products listed in the
preceding paragraph.    
  
Population Totals:  Preliminary Plans for Data by Race and Ethnicity from Census 2000 

Public Law (PL) 94-171 Redistricting File.   PL 94-171 requires that the Census Bureau work
closely with the "officers or public bodies having initial responsibility for the legislative
apportionment or districting of each state" to determine the specific tabulations needed from the
decennial census.   Tabulations planned for this file are based on meetings and communications
with the Redistricting Task Force of the National Conference of State Legislatures and state-
appointed liaisons of the governors and legislatures.  During this process, senior officials from 
OMB, the Voting Rights Section of the Department of Justice, and the Census Bureau consulted
with the Task Force and state legislative officials.

The PL 94-171 file will include population totals down to the block level.  The racial and ethnic
categories that the Census Bureau plans to include in the matrices (one-dimensional statistical
tables) on the PL 94-171 file are combined into one table outline and presented in Table 1.  (The 
PL 94-171 file also includes data on the population 18 years and over for each of these racial or
ethnic categories.)

From tabulations for the racial and ethnic categories shown in Table 1, it is possible also to
obtain tabulations by subtraction for the Hispanic or Latino population by race (total minus Not
Hispanic or Latino) and for the population in a racial category in combination only (e.g., Asian
alone or in combination minus Asian alone). 

The PL 94-171 file will be available on the Internet and on CD-ROM.  A paper listing of data
from the PL 94-171 file, to be provided to officers or public bodies having initial responsibility
for the legislative apportionment or districting of each state, will include about one-half of the
tabulations shown above.  The paper listing will not include tabulations for Race alone or in
combination, or for Race not alone or in combination.

100-Percent Demographic Profile.  This profile is designed to provide for geographic areas
down to the census tract level an overview of 100-percent census data on a one-page table that
includes data on all population and housing topics for which data are collected on a 100-percent
basis: sex, age, race, Hispanic or Latino origin, household relationship, and housing occupancy
and tenure.  Given the limited amount of space to show data on each topic, population totals by
race and ethnicity will be limited.  Population totals will be shown for each of the major races
alone, for two or more races, and for each major race alone or in combination (as described
earlier), but will not be shown for the 57 specific categories of two or more races.
     
100-Percent Summary File.  This file, which is the most detailed 100-percent data product
planned, will include some population totals on race and ethnicity down to the block level and
additional population totals on race and ethnicity down only to the census tract level.  The racial
and ethnic categories that the Census Bureau plans to include down to the block level in the 
matrices on the 100-percent summary file are combined into one table outline and presented in
Table 2.

The additional  categories that are included down only to the census tract level in the 100-percent
summary file are the 57 individual categories of two or more races crossed by the two ethnic
categories (Hispanic or Latino, and Not Hispanic or Latino).   These racial and ethnic categories
are combined into one table outline and presented in Table 3.

100-Percent Table Shells.  Table shells represent a new data product for Census 2000.  A table
shell is a one-page table outline with a fixed stub and boxhead (for example, showing population
by age and sex).  Table shells are supported by summary files in the same way that data in
various printed reports in 1990 were supported by summary tape files (STFs). 

Population Characteristics:  Preliminary Plans for Data by Race and Ethnicity from
Census 2000 

100-Percent and Sample Summary Files and Table Shells.  Plans for tabulations of
population characteristics by race and ethnicity from the 100-percent and sample summary tables
and from the 100-percent and sample table shells are discussed together here because the Census
Bureau plans to show population characteristics for the same list of racial and ethnic groups in all
of these data products. 

In the case of summary files, population characteristics in the matrices on the files would be
iterated (repeated) for each racial or ethnic category.  This corresponds to the "B" matrices in
summary tape files (STFs) 2 and 4 in 1990 census data products in which the "B" matrices were
iterated for each of a list of racial and ethnic categories. 

In the case of table shells, population characteristics would be available for each of the racial and
ethnic categories for which population characteristics are available on the summary files.  The
user of table shells will be able to select from a list of topics (e.g., age and sex) and then select
the geographic area (e.g., state, county, place) and population universe (i.e., the racial or ethnic
category) to obtain the data desired.  The scope of data available using table shells is limited to
data on summary files (in the same way that data in printed reports in 1990 were limited to data
on summary files).  Table shells will present subsets of more detailed data from the summary
files in user-friendly formats (like tables in printed reports), and will show totals, subtotals, and
derived measures that are not included on the summary files. 

The list of 27 racial and ethnic categories for which the Census Bureau plans to show population
characteristics in aggregated data products (as opposed to what is available from microdata files,
as discussed below) in Census 2000 is presented in Table 4.  From tabulations for the list of 
racial and ethnic categories shown in Table 4, it is possible also to obtain tabulations by
subtraction for the Hispanic or Latino population by race (total minus Not Hispanic or Latino),
for the population in a racial category in combination only (e.g., Asian alone or in combination
minus Asian alone), and for the complement to an all-inclusive group (e.g., total minus Asian
alone or in combination).

Micro data files.  Tabulations on population characteristics by race and ethnicity described
above are limited to what is planned for aggregated data products.  In addition, the Census
Bureau will produce 5-percent public-use microdata files (PUMS), as was done in 1990, which
will permit users to obtain tabulations for any racial or ethnic group for which data were
collected in the census.  (This would include, for example, any of the 57 categories of more than
one race.)  In 1990, in addition to the confidentiality edit described earlier, the PUMS files were
stripped of names and address, the order of records was rearranged on the file, and a minimum
population threshold of 100,000 was used.

In addition, and subject to the Census Bureau's strict confidentiality standards, the Census
Bureau plans to make available on the Internet through the American FactFinder, the microdata
files that underlie the 100-percent and sample summary files for Census 2000 so that data users
can create tabulations to their own specifications.  These microdata files are the 100-percent
edited detail file (HEDF) and the sample edited detail file (SEDF).  The full microdata files will
be made available to data users only in the form of PUMS files, as described above.
     
If a data user wants data on population characteristics for a racial or ethnic group for which
characteristics are not available in the summary files or table shells and for a geographic area for
which a PUMS file is not available, it will be possible -- again, subject to strict confidentiality
standards set by the Census Bureau -- to obtain these data in the American FactFinder with a
custom tabulation from the HEDF or the SEDF.  For example, the data user will be able to obtain
population characteristics for one of the 57 categories of more than one race (e.g., White and
Asian).  Because of the strict confidentiality standards, the quantity of data that can be obtained
will depend on several factors, including the geographic area, the size of the population universe
(e.g., the number of individuals who are Asian and White), and the extent of the characteristics
detail (number of data cells in a table showing population characteristics).
Table 1.  Preliminary Racial and Ethnic Detail for Population Totals 
          in the PL 94-171 File Planned for Census 2000
(See text regarding protection of confidentiality of data from Census 2000.
"In combination" means "in combination with one or more other races")

                                                                                              Not Hispanic  
Race or ethnicity                                                                       Total               or Latino

     Total
One race
     White
     Black or African American
     American Indian and Alaska Native
     Asian
     Native Hawaiian and Other Pacific Islander
     Some other race
Two or more races

Hispanic or Latino                                                           (X)

White alone or in combination
Not White alone or in combination

Black or African American alone or in combination
Not Black or African American alone or in combination 

American Indian and Alaska Native alone or in combination
Not American Indian and Alaska Native alone or in combination

Asian alone or in combination
Not Asian alone or in combination

Native Hawaiian and Other Pacific Islander alone or in combination
Not Native Hawaiian and Other Pacific Islander alone or in combination
 
Some other race alone or in combination
Not Some other race alone or in combination
____________________________________________________________________________
(X) Not applicable.
Table 2.  Preliminary Racial and Ethnic Detail for Population Totals Down to the 
Block Level in the 100-Percent Summary File Planned for Census 2000
(See text regarding protection of confidentiality of data from Census 2000.  
"In combination" means "in combination with one or more other races")
                                                                                                  
                                                                                          Not  Hispanic         Hispanic
Race or ethnicity                                                                               Total               or Latino                or Latino

     Total
One race
     White
     Black or African American
     American Indian and Alaska Native
     Asian
     Native Hawaiian and Other Pacific Islander
     Some other race
Two or more races

Hispanic or Latino                                    (X)

White alone or in combination
     White alone
     White in combination only
Not White alone or in combination

Black or African American alone or in combination
     Black or African American alone
     Black or African American in combination only
Not Black or African American alone or in combination 

American Indian and Alaska Native alone or in combination
     American Indian and Alaska Native alone 
     American Indian and Alaska Native in combination only     
Not American Indian and Alaska Native alone or in combination

Asian alone or in combination
     Asian alone 
     Asian alone in combination only
Not Asian alone or in combination

Native Hawaiian and Other Pacific Islander alone or in combination
     Native Hawaiian and Other Pacific Islander alone 
     Native Hawaiian and Other Pacific Islander in combination only
Not Native Hawaiian and Other Pacific Islander alone or in combination
 
Some other race alone or in combination
     Some other race alone
     Some other race alone in combination only
Not Some other race alone or in combination
______________________________________________________________________________                         
(X) Not applicable.Table 3.  Preliminary Racial and Ethnic Detail for Population Totals Down to the 
Census Tract Level Only in the 100-Percent Summary File Planned for Census 2000
(See text regarding protection of confidentiality of data from Census 2000) 


                                                                          Not Hispanic     Hispanic     
Race or ethnicity                                                          Total            or Latino        or Latino

     Two or more races
Two races (15 categories)
     White, and Black or African American
     White, and American Indian and Alaska Native
     White, and Asian
     White, and Native Hawaiian and Other Pacific Islander
     White, and Some other race
     Black or African American, and American Indian and Alaska Native
     Black or African American, and Asian
     Black or African American, and Native Hawaiian and Other Pacific Islander
     Black or African American, and Some other race
     American Indian and Alaska Native, and Asian
     American Indian and Alaska Native, and Native Hawaiian and Other Pacific Islander
     American Indian and Alaska Native, and Some other race
     Asian,  and Native Hawaiian and Other Pacific Islander  
     Asian, and Some other race
     Native Hawaiian and Other Pacific Islander, and Some other race    

Three races (20 categories)
     White, Black or African American, and American Indian and Alaska Native
     (continues with 19 other categories of three races)
Four races (15 categories)
     White, Black or African American, American Indian and Alaska Native, and Asian      
     (continues with 14 other categories of four races)
Five races (6 categories)
     White, Black or African American, American Indian and Alaska Native, Asian, and
        Native Hawaiian and Other Pacific Islander
     (continues with 5 other categories of five races)
Six races (1 category)
     White, Black or African American, American Indian and Alaska Native, Asian, 
        Native Hawaiian and Other Pacific Islander, and Some other raceTable 4.  Preliminary Racial and Ethnic Detail for Population Characteristics in Summary 
          Files and Table Shells Planned for Census 2000
(See text regarding protection of confidentiality of data from Census 2000. 
"In combination" means "in combination with one or more other races")
       
Race or ethnicity

White alone
Black or African American alone
American Indian and Alaska Native alone
Asian alone
Native Hawaiian and Other Pacific Islander alone
Some other race alone
Two or more races

White alone or in combination
Black or African American alone or in combination
American Indian and Alaska Native alone or in combination
Asian alone or in combination
Native Hawaiian and Other Pacific Islander alone or in combination
Some other race alone or in combination

Hispanic or Latino

White alone, not Hispanic or Latino
Black or African American alone, not Hispanic or Latino
American Indian and Alaska Native alone, not Hispanic or Latino
Asian alone, not Hispanic or Latino
Native Hawaiian and Other Pacific Islander alone, not Hispanic or Latino
Some other race alone, not Hispanic or Latino
Two or more races, not Hispanic or Latino

White alone or in combination, not Hispanic or Latino
Black or African American alone or in combination, not Hispanic or Latino
American Indian and Alaska Native alone or in combination, not Hispanic or Latino
Asian alone or in combination, not Hispanic or Latino
Native Hawaiian and Other Pacific Islander alone or in combination, not Hispanic or Latino
Some other race alone or in combination, not Hispanic or LatinoB.  Other Surveys and Administrative Records 

This section applies to the presentation of data collected under the new standards through surveys
and administrative records.  Although these proposed tabulation guidelines are particularly
applicable in the near term, they also provide a framework that can be expanded in the future as it
becomes possible to present more data on multiple race responses.  In general, data should be
presented in as much detail as possible (thereby satisfying the criteria congruence with
respondent's choice), subject to satisfying agency criteria for statistical reliability and
confidentiality (satisfying the criteria meet confidentiality and reliability standards.)  Thus, data
on multiple race responses should be presented in as much detail as possible given sample sizes
and sample designs.  In addition, to the extent possible, Federal agencies should report data using
standardized categories to facilitate comparisons across subject-matter areas and data systems,
thus satisfying the criteria range of applicability, statistical defensibility, and understandability
and communicability.

The decision to revise the policy for the collection of data on race reflects the increasing
complexity of our Nation's demographics.  As a result, the ways that data on race are tabulated
and analyzed also will become more complex.  The proposed guidelines in this section reflect
this complexity.  The tabulation strategies illustrated here have simple structures, hence they
satisfy the criteria ease of use and skill required.  Examples of tabulation strategies are provided
and illustrated using data collected as part of the National Health Interview Survey (NHIS),
conducted by the National Center for Health Statistics, Centers for Disease Control and
Prevention.  Since 1976, the NHIS has allowed respondents to report more than one race, but has
also asked respondents to indicate the single race with which they most closely identified.  The
data on race from this survey have been retabulated for illustrative purposes to be as comparable
as possible to the categories in the 1997 standards.  (Unless otherwise noted, the tables in this
section are based on data combined from three years of NHIS data.  The resulting larger sample
size improves the reliability of the estimates and enables more categories to be shown.  
However, even when combining three years of data on race, counts for some categories cannot be
shown due to small sample sizes.)  

As noted above, agencies are to provide as much detail as possible while adhering to their own
standards for data quality and confidentiality.   Under a typical data quality standard, a table cell
cannot be published if its relative standard error (or other measure of dispersion) is larger than
some value specified by the agency.  In such a situation, the data cell is not published separately,
but the cell value is included in subtotals.

Under a confidentiality standard, a cell value must be suppressed (withheld from publication) if
knowledge of the cell value might enable someone to gain knowledge about one of the
respondents contributing data to the cell.  If a cell is suppressed to preserve confidentiality, other
cells must also be suppressed so the cell value cannot be derived by subtraction.  This is called
"complementary suppression."  (The reader may wish to refer to Statistical Policy Working Paper
22:  Report on Statistical Disclosure Limitation Methodology for more information concerning
the definition of sensitive cells and the selection of cells for complementary suppression.)

Agencies do not use a common set of standards for evaluating confidentiality and quality issues.  
To illustrate the application of agency standards that affect the cells that can be shown in tables
only a data quality standard is used here.  A table cell has been arbitrarily classified as failing the
data quality standard if the sample size is smaller than 0.2 percent of the population for all but
Table C.  To illustrate a table that might result from a smaller sample survey, in Table C a table
cell is classified as failing the data quality standard if the sample size is smaller than 2.0 percent
of the population.  These admittedly arbitrary criteria are used to illustrate what might be
published from a large sample survey, and to illustrate the distributions that may result from the
implementation of the new standards.  Note that since the only data being displayed in this report
are population counts, it is possible to show more data cells than would be the case if the table
presented attributes (income, education, health outcomes, etc.) of these groups.  Individual
survey systems will make decisions as to what data can be shown based on the characteristics of
each system and the confidentiality and reliability guidelines established for that data system.

Two types of responses cannot be tabulated into the categories identified in the standard.  The
first is when no information on race was provided.  In this report the heading "Race Not
Reported" is used for this type of response.  This response type can be further subdivided
according to the reason that no information was obtained -- refusal, don't know, and not
ascertained.  The second is when a response was received that does not match any of the standard
racial categories.  Such responses are tabulated using the heading "Other Race."  A third heading,
"Not Tabulated Above" is used to include either single or more than one race categories that are
specified in the standard, but are not large enough to be published separately.  For illustrative
purposes, these three headings are used in the tables in this section.  Not all statistical
publications will use this model.  Strategies for tabulating these kinds of responses will follow
agency policy and the analytic objectives of the report.  

A remaining issue to be addressed by Federal agencies is that the rules used in editing and
imputing respondents' data on race and ethnicity will affect the racial distributions derived from
Federal surveys and administrative records.  As noted elsewhere in this report, rules for editing
and imputation of data on race and ethnicity should be an area of further research and
collaboration for Federal agencies, to ensure that the data reported are as comparable as possible.

Since the objective of this section is to illustrate different tabulation strategies, categories with
frequencies too small to be shown will not be treated the same way in all of the tables.  In some
tables, the category is not shown at all and the cell value is included under "Not Tabulated
Above"; in other tables, the category is retained in order to clarify the structure of the table but
data are replaced by a "Q" to illustrate that they have been withheld from publication for data
quality considerations.  When the data are replaced by "Q," a footnote is used to describe the
reason the data are not shown. 

In all tables in this section, the "More Than One Race" heading includes respondents who
selected more than one of the five basic racial categories in the new standard.  Many data
collection systems obtain information on a more detailed set of responses.   When surveys collect
more detailed information on race than the minimum standard, some persons may indicate that
they identify with more than one of the more detailed groups.  For example, within the Asian
group, respondents might indicate that they are of Chinese and Japanese heritage.  These
respondents would not be included in the "More Than One Race" heading but would be included
in the total for Asians.  If sample size permits, an additional Asian sub-category could be used to
indicate the number of individuals who marked more than one of the detailed Asian categories. 

Table A illustrates the fundamental goal of the new standard and provides a detailed set of
categories for tabulating data on race.  Table A displays the five single categories, and also
includes more detail on the Asian subgroups; it also displays a number of  multiple-response
categories.  Based on NHIS data, the most frequently marked race combinations are American
Indian and White, Asian and White, and Black and White.  In other situations, the categories
used to present data would be a function of the overall sample size and the regional
characteristics of the population where the sample is selected.  Whatever detailed categories are
presented, they should support recreating the minimum basic set of racial categories. 

Table B shows a category for each of the five single racial groups in the new standards as well as
a "More Than One Race" heading.  It is an example of a table that can be used when sample sizes
do not permit the presentation of greater detail.  In this table, data are not shown separately for
Native Hawaiians and Other Pacific Islanders, one of the single race categories in the collection
standard, since they comprise less than 0.2 percent of the U.S. population.  However, since this is
the only category that cannot be shown both the number and the percent for the Native Hawaiian
and Other Pacific Islander group are readily obtained by subtraction.  This is an example of a
data cell that is being suppressed for data quality concerns.  If it were suppressed for
confidentiality concerns, another cell would also have to be suppressed to prevent the cell value
from being obtained by subtraction.

As was the case under the 1977 standard, it will often not be possible to tabulate data using all of
the categories used to collect the information.  Even with three years of data from the NHIS,
Tables A and B could not present data for Native Hawaiians and Other Pacific Islanders because
they total less than 0.2 percent of the population.  If data for one or more of the five minimum
racial categories fail the requirements for data quality or confidentiality, standard agency
products should include them in an aggregation such as "Not Tabulated Above," rather than
combining them with categories that are publishable alone.  For example, if the data for Native
Hawaiians and Other Pacific Islanders cannot be published separately, these data should not be
combined with data in the Asian category (except when such combinations are needed for
comparability with data collected under the old standard).  Instead, the data on Native Hawaiians
and Other Pacific Islanders should be included in the total and either omitted from the detailed
tabulations completely, replaced with a symbol and footnoted as in Tables A and B, or included
in a separate heading for all groups not specifically tabulated (i.e., under the Not Tabulated
Above heading.)  This last approach is illustrated in Table C.  For this table, only one year's
NHIS data are used, and data are reported only for categories that comprise at least 2 percent of
the population.  This is intended to provide an illustration of what might happen when total
sample sizes are smaller and data from fewer categories can be reliably presented.  Because the
Asian, Native Hawaiian and Other Pacific Islander, More Than One Race, and Race Not
Reported respondents each comprise less than 2 percent of the population, these categories were
not listed separately in Table C but were included both in the Total and the Not Tabulated Above
rows.

In order to display as much data as possible as well as to reflect the complexity of  reporting on
race, some additional categories may be tabulated and reported along with the basic tabulations. 
These categories may not be mutually exclusive but would combine categories to create useful
analytic distinctions.  For example, a heading could be created for persons reporting that they are
Asian whether as a single race or in combination with any other race(s).  Parallel categories could
be created for any of the five single racial categories.  The resulting counts are called "all
inclusive."  They form distributions for each individual racial group; that is, the sum of the
percent of respondents who mark a particular group alone, the percent who mark that group and
at least one other group, and the percent who did not mark that group is 100 percent.  The all
inclusive distributions may provide  information on population groups that might not have
sufficient size in the sample to be included in basic tabulations.  Table D provides a suggested
tabulation strategy.  Three years of NHIS data are used for this Table, and the 0.2 percent cutoff
is used to determine whether data can be shown.  The all inclusive NHOPI category does not
meet the criteria for inclusion (0.2 percent of the population) and is not shown. 

Note that when the tabulation involves counts or percentages, the analyst can subtract the count
or percentage for each single race from the all inclusive count or percentage to obtain the count
of individuals reporting each race in combination with any other race(s).  For example, the Black
or African American all inclusive count minus the Black or African American single race count
will yield a count for those reporting Black or African American in combination with one or
more other races. This would not be possible if the tabulation included summary statistics (mean,
median, or percent) for attributes such as income, education or health outcomes. 

Tables A - D describe tabulation alternatives for data on race collected using the new standards. 
The new standards also affect the collection and reporting of data on Hispanic or Latino origin. 
The new standards call for asking a question on Hispanic or Latino origin followed by a question
on race but also allows under limited circumstances for a single, combined question where
Hispanic or Latino origin is included in a list along with the five standard racial categories.  In
the combined question, respondents are also instructed to "mark one or more."  In either case,
Hispanic origin may be reported alone or in combination with one or more races.  As was the
case for the tabulation of data on race, data on Hispanic or Latino ethnicity can also be presented
for specific subgroups (e.g., Mexican, Cuban, and Puerto Rican) as shown in Table E.  The
tabulation headings used will be a function of the overall sample size and the population
composition where the sample is selected.

Even when separate questions are used to collect data on Hispanic or Latino origin and race,
there are applications where a cross tabulation of the data from these two survey questions is
preferred.  Whether data are collected using the single question or the two question format,
education and health data are frequently reported with racial data for Hispanics or Latinos as a
separate group along with racial data for non-Hispanics or non-Latinos.  Data collected under the
new standards using either format will support the analysis of data on both Hispanics or Latinos
and non-Hispanics or non-Latinos by race (Table F).   For example, Table F shows that among
Hispanics or Latinos, the sample size permits the presentation of data for Blacks, Whites, those
of "other" races, and those selecting more than one race.  Tabulations which incorporate the
Hispanic or Latino subgroup information can be developed by expanding Table F.  Since
respondents are free to select one or more categories in the combined format, data collected from
a survey or administrative reporting where a combined format is used can also be tabulated using
Tables E or F.Table A.   Sample Tabulation -- Detailed Presentation of Data on Race 


Race
N
%


Total
328317
100.00


AIAN
2616
.79


Asian
9718
3.26


Asian Indian
1287
.42


Chinese
2245
.75


Filipino
1965
.63


Japanese
920
.34


Korean
966
.33


Vietnamese
1102
.38


Black
45259
12.32


NHOPI
Q
Q


Other
9734
2.22


White
250054
78.24


More than one race
5435
1.62


AIAN/White
2618
.81


Asian/White
741
.24


Black/White
849
.23


Race Not Reported
5237
1.45


Q =  Does not meet statistical criteria for reliability (< 0.2 percent of population).
AIAN=American Indian and Alaska Native
NHOPI = Native Hawaiian and Other Pacific Islander (for example, Hawaiian, Guamanian, or Samoan)

SOURCE: NCHS/CDC National Health Interview Survey 1993-1995, Unpublished TabulationsTable B.   Sample Tabulation --  Minimum Presentation of Data on Race


Race
N
%


Total
328317
100.00


AIAN
2616
.79


Asian
9718
3.26


Black
45259
12.32


NHOPI
Q
Q


Other
9734
2.22


White
250054
78.24


More than one race
5435
1.62


Race Not Reported
5237
1.45


Q =  Does not meet statistical criteria for reliability (< 0.2 percent of population).
AIAN=American Indian and Alaska Native
NHOPI = Native Hawaiian and Other Pacific Islander (for example, Hawaiian, Guamanian, or Samoan)

SOURCE: NCHS/CDC National Health Interview Survey 1993-1995, Unpublished Tabulations. Table C.   Sample Tabulation -- Minimum Presentation of Data on Race for a Small Sample


Race
N
%


Total
102467
100.00


Asian
2894
3.32


Black
13468
12.22


Other
5127
2.64


White
76441
77.94


NTA
4537
3.88


Note: Statistical criteria for reliability  (< 2 percent of population).
AIAN=American Indian and Alaska Native
NHOPI = Native Hawaiian and Other Pacific Islander (for example, Hawaiian, Guamanian, or Samoan)
NTA=Not Tabulated Above (Includes Race Not Reported, AIAN, NHOPI, and all responses that indicated More
Than One Race)

SOURCE: NCHS/CDC National Health Interview Survey 1993-1995, Unpublished TabulationsTable D.  Sample Tabulation  -- Detailed Presentation of Data on Race and the All Inclusive Distributions.


Race
N
%


Total
328317
100.00


AIAN
2616
.79


Asian
9718
3.26


Asian Indian
1287
.42


Chinese
2245
.75


Filipino
1965
.63


Japanese
920
.34


Korean
966
.33


Vietnamese
1102
.38


Black
45259
12.32


NHOPI
Q
Q


Other
9734
2.22


White
250054
78.24


More than one race
5435
1.62


AIAN/White
2618
.81


Asian/White
741
.24


Black/White
849
.23


Race Not Reported
5237
1.45


AIAN all inclusive
5724
1.74


AIAN and other race(s)
3108
.95


Asian all inclusive
10710
3.57


Asian and other race(s)
992
.31


Black all inclusive
46731
12.72


Black and other race(s)
1472
.40


White all inclusive
254688
79.65


White and other race(s)
4634
1.41

Q = Does not meet statistical criteria for reliability (< 0.2 percent of population).
AIAN=American Indian and Alaska Native
NHOPI = Native Hawaiian and Other Pacific Islander (for example, Hawaiian, Guamanian, or Samoan)

SOURCE: NCHS/CDC National Health Interview Survey 1993-1995, Unpublished TabulationsTable E.   Sample Tabulation --Hispanic or Latino Ethnicity With Detail


Ethnicity
N
%


Total
328317
100.00


Hispanic/Latino
41585
9.78


Cuban
2151
.54


Mexican
26042
5.86


Puerto Rican
4809
1.25


Not Hispanic/Latino
283735
89.36


Ethnicity not reported
2997
.85


Note:  Statistical criteria for reliability (< 0.2 percent of population).

SOURCE: NCHS/CDC National Health Interview Survey 1993-1995, Unpublished TabulationsTable F.   Sample Tabulation -- Detailed Presentation of Data on Race and Hispanic or Latino Ethnicity


Ethnicity/Race
N
%


Total
328317
100.00


Hispanic or Latino
41585
9.78


         AIAN
Q
Q


Asian
Q
Q


Black
950
.24


NHOPI
Q
Q


Other
8348
1.80


White
28742
6.88


More than one race
985
.26


Race Not Reported
1816
.42


Not Hispanic or Latino
283735
89.36


AIAN
2160
.69


Asian
9291
3.14


Asian Indian
1263
.42


Chinese
2208
.74


Filipino
1828
.60


Japanese
903
.33


Korean
944
.32


Vietnamese
1082
.47


Black
45259
11.99


NHOPI
Q
Q


Other
1303
.41


White
219923
70.96


More than one race
4377
1.35


AIAN/White
2270
.72


Asian/White
613
.20


Black/White
677
.19


Race Not Reported
2444
.74


Ethnicity Not Reported
2997
.85


White
1389
.41


Race Not Reported
977
.29


Q = Does not meet statistical criteria for reliability (< 0.2 percent of population).
AIAN=American Indian and Alaska Native
NHOPI = Native Hawaiian and Other Pacific Islander (for example, Hawaiian, Guamanian, or Samoan)

SOURCE: NCHS/CDC National Health Interview Survey 1993-1995, Unpublished Tabulations
IV.  USING DATA ON RACE AND ETHNICITY COLLECTED UNDER THE NEW
     STANDARDS

This part of the report discusses some important uses of data under the new standards, reflecting
in large measure work that is ongoing.

A.  Redistricting  

One of the first official statutory uses of data on race and ethnicity collected under the new
standards will be for legislative redistricting following Census 2000.  The new data format
should not require substantial changes in the way redistricting will be conducted.

How the 1990 Census Racial and Ethnic Data Were Used 

The 1990 census Public Law 94-171 ("redistricting count") tabulations (which were released to
the states for redistricting purposes) reported data down to the block level for the total population
and the voting age population (ages 18 years and older) for four racial groups (American Indian
and Alaska Native, Asian and Pacific Islander, Black, and White) and a residual category ("other"
race).  Data on these racial groups were also cross-tabulated by Hispanic origin.  Categories were
mutually exclusive (each person was counted only once), and the categories added to the total
population reported for a geographic region.

States and political subdivisions that are covered under Section 5 of the Voting Rights Act are
required to demonstrate, to the United States Attorney General or to a Federal district court in the
District of Columbia, that their redistricting plans will not reduce the voting strength of their
minority citizens and that the plans do not have a racially discriminatory purpose.  All states and
political subdivisions, however, are prohibited by Section 2 of the Voting Rights Act from using
redistricting plans that have the effect of diluting their resident's voting strength on account of
race.  The U.S. Department of Justice or private citizens may file lawsuits to enforce these laws.

In order to comply with those Federal laws, states and their political subdivisions used the
redistricting count tabulations to assess the racial and ethnic compositions and distributions of
their residents as they drew their redistricting plans.  The data were used to identify areas in
which racial and ethnic minorities were residentially segregated, in order, for example, to avoid
splintering those areas among several districts.  The data also were used in some areas to
determine whether voting patterns were racially polarized.  After the redistricting process was
complete, courts would rely on the redistricting count data, together with other evidence, to
decide any legal challenge that was filed against the redistricting plan.

How the 2000 Census Data Can Be Used for Redistricting in 2001 

In Census 2000 the major changes to the reporting of data on race and ethnicity are (1) the
instruction to "mark one or more" racial categories and (2) the splitting of the "Asian or Pacific
Islander" category into two separate categories -- "Asian" and "Native Hawaiian or Other Pacific
Islander."  Hispanic or Latino origin will be ascertained in a separate question, as in 1990 census. 

For the purposes of the 2000 Census Dress Rehearsal, the Census Bureau will provide
tabulations of the number of persons who identified with only one of the five individual racial
categories or with the residual category ("single race" counts), plus tabulations of the total
number of persons who identified with each of the five individual racial categories either alone
(e.g., White only) or in combination with any other categories  (e.g., White plus any other racial
category), referred to as  "all inclusive" counts.  Both the "single race" counts and the "all
inclusive" counts will be cross-tabulated by Hispanic or Latino origin.  It should be noted that the
"all inclusive" counts will add to more than 100 percent of the population since a person's
response will be counted in all of the racial categories selected.  (See Appendix C for more
information on Census 2000 Dress Rehearsal prototype redistricting data.) 

It is not expected that provision of the redistricting count data in the new format will lead to
significant changes in redistricting practices or decisions.  The new data categories will not affect
the total population counts used for the apportionment of Congress, or for compliance with one-
person, one-vote requirements.

Once the Dress Rehearsal data are released and analyzed, there will be more information
available about the practical effects of the new standards.  It can be expected that the more that
the single-count and all-inclusive-count populations share the same residential patterns, the less
likely it will be that jurisdictions' redistricting choices will affect those populations differently. 
Research also has indicated that, at least nationwide, there is unlikely to be a significant
difference between the "single count" Black population and the "all-inclusive" Black population. 
In addition, jurisdictions with substantial Hispanic or Latino populations will have a separate
count of all persons identifying themselves as Hispanic or Latino, because ethnicity is collected
in a separate question. 

Alternatives to the single-race/all-inclusive approach to redistricting data are under consideration. 
The U. S. Department of Justice has not yet reached a decision on the question of whether
advantages would result from the use of one of the allocation methods described in Appendix D
for voting rights issues.  While allocation does not conform with the criterion that data uses
should reflect "congruence with respondent's choice," it would facilitate comparisons with the
1990 census data.  (Allocation methods assign an individual's multiple race response to a single
race category.)  

Some have suggested that an allocation approach would have the advantage of giving
redistricting authorities, the states and their political subdivisions, one number to use in making
their redistricting choices.  Others have suggested that instead it would require states to use and
consider three data sets:  single-race counts, all-inclusive counts, and the allocated counts.  If a
decision is made to use an allocation approach, the Department of Justice would discuss with the
Census Bureau the technical feasibility of including matrices using the chosen allocation method
in the PL 94-171 data files or producing a special tabulation with such data after the Census
Bureau has met its legal deadline of April 1, 2001, for producing the data specified in PL 94-171.
The working group would appreciate feedback from users on these issues. B.  Equal Employment Opportunity

One of the Federal Government's most significant uses of data on race and ethnicity is in its
efforts to ensure that every individual has an equal opportunity for employment.  Title VII of the
Civil Rights Act of 1964, as amended, prohibits discrimination in employment based upon race,
color, sex, religion, and national origin.  Executive Order No. 11246, as amended, similarly
prohibits discrimination in employment by government contractors.  Executive Order 11246 also
requires contractors covered by its provisions to ensure affirmatively that they do not
discriminate against their employees and applicants for employment.  

Responsibility for equal employment opportunity is shared among a number of  Federal agencies
including: the Equal Employment Opportunity Commission (EEOC), the Department of Justice,
the Office of Federal Contract Compliance Programs (OFCCP) in the Department of Labor, the
Office of Personnel Management, and the Department of Education.  Title VII is enforced by the
EEOC against private employers and by the Department of Justice against state and local
government employers.   Executive Order 11246 is enforced by the OFCCP. 

Representatives from these agencies have been meeting to determine how best to implement the
1997 standards for reporting of data on race and ethnicity.  This section describes some of the
data related activities carried out by the agencies, how the data were previously collected and
used, the changes the agencies have agreed upon, and some of the alternatives that are currently
under discussion.

As the new standards are implemented, agencies whose primary mission is civil rights
enforcement will face particularly complex challenges.  The EEO agencies will continue to
consider the burden imposed on those responding to data requests as they make various
tabulation, aggregation, and other decisions.  All participants in these important decisions are
reminded that it is not the intent of the 1997 standards to diminish the availability and quality of
information collected and available for Federal civil rights enforcement and related purposes.

Data Needs and Uses

There are two basic theories of employment discrimination:  disparate treatment and disparate
impact.  Disparate treatment can either affect individuals because of their protected
characteristics, or in pattern and practice cases, it can affect all persons in the group who have an
employment relationship with that employer.  

Individual disparate treatment cases rely primarily on evidence of how an individual was treated
in comparison to other similarly situated individuals.  In some instances, statistical evidence of 
disparities in treatment between similarly situated individuals can suggest that some individuals
were subject to employment discrimination because of their protected class status.

In  disparate impact cases, statistics on the number of available and qualified minority workers
for a particular job are compared with statistics on the employer's workforce.  Enforcement
agencies compare statistics on the racial breakdown of an employer's workforce to the racial
composition of the available qualified labor pool.  These analyses also consider statistics on the
jobholder's employment-related characteristics, such as educational attainment or occupational
experience, compared with similar data on those persons qualified for, and interested in,  the at-
issue jobs.  This analysis is the first step in determining whether there is reason to believe that the
employer's selection procedures improperly excluded individuals on the basis of their race,
ethnicity, or gender.   After this analysis, the employer may be asked to show that its selection
procedures for the position(s) in question are job-related and consistent with business necessity. 
The workforce data often come from the employer's annual reports filed with Federal agencies
(see "Data on Employer's Workforce" below), and the benchmark data come from a special file
covering EEO-related data drawn from the most recent decennial census (see "The Benchmark
File" below.)  In  some disparate impact cases, the selection or de-selection rates of different
groups within the employer's workforce are compared without reference to external benchmarks. 

Data on Employer's Work Force.  Data on an employer's workforce are collected annually on
the Employer Information Reports (EEO-1 and EEO-4 surveys) covering private and state or
local government employment, respectively, and on the EEO-5 and IPEDS (formerly EEO-6)
surveys of employment in elementary/secondary and higher education, respectively.  The current
EEO forms collect general information about the employer and its workforce.  Employers
provide counts of employees within nine job categories by gender and five racial/ethnic
categories (White--not of Hispanic origin, Black--not of Hispanic origin, Hispanic, Asian or
Pacific Islander, and American Indian or Alaskan Native) for each facility. 

The Benchmark File.  In 1990, a special EEO file based on the decennial census data was
produced by the Census Bureau, in accordance with specifications provided by the EEO
agencies.  It included five matrices of counts for various geographic entities including the United
States, States, metropolitan areas, counties, and places of 50,000 or more in population.  The five
tables presented various cross-tabulations of the number of people in each labor force category
by gender,  EEO racial/ethnic categories (six categories, the five noted above plus "other, not of
Hispanic origin"), occupation (512 categories), industry (98 categories), educational attainment
(six categories), earnings (9 categories) or age (seven categories). 

Summary of Data Use for EEO Analysis.  The basic inquiry requires identification of the
relevant labor force for each case, followed by a determination as to whether the employer's
work force differs to a statistically significant extent from the benchmark comparison group. 
The relevant labor force depends on the employment action at issue.  For entry-level positions
that require few skills or experience, the benchmark may be some lesser skilled subset of the
civilian labor force in the geographic area in which the employer operates.  Depending on the
qualifications required for a position, the relevant labor force may be further delineated, for
example, by age, education, or occupation.  For promotions, the relevant labor pool typically will
be the employees eligible for the promotion.  The basic inquiry is always the same:  is the
number/percent of, for example, Blacks, found in the employer's work force  significantly
different from the number of Blacks that would be expected to be found based on the percentage
of qualified and interested Blacks in the labor force.  The comparative information on the labor
force generally comes from the benchmark file from the most recent decennial census.

The wide range of factors, e.g. qualifications, availability, location,  affecting employment
decisions by both employers and individual workers  influences whether the employer's work
force will replicate the availability of individuals at any level of labor force aggregation.  Absent
discriminatory practices, it is also unlikely that significant disparities should exist between the
proportion of qualified minority or female workers in positions throughout the employer's work
force and the available and qualified labor pool.

Statistical analysis measures the disparity between the actual  participation of minorities or
women in the employer's workforce and  their expected representation to determine whether any
disparity can be attributed to chance.  The analysis is based on an assumption that available and
qualified minorities and women are recruited, apply and are selected on a nondiscriminatory
basis by the employer.   

Following statistical practice, if the likelihood of chance differences is less than 0.05 (the five
percent probability significance level), regulatory agencies and the courts generally accept the
alternative inference that unlawful factors may have influenced employer's decision making.  In
litigation, this inference can constitute a prima facie showing of discrimination, which then
requires the employer to explain its practices or face liability.  In several cases, the Supreme
Court accepted the use of a statistic approximating the five percent probability level, a two-three
standard deviation difference, but emphasized that a range of techniques can be used to reflect
the fact patterns of each case.  See Hazelwood School District v. United States, 433 U.S. 299,
311 n. 17 (1977), and Watson v. Ft. Worth Bank & Trust, 487 U.S. 977, 995 n.3 (1988).

The following example illustrates the statistical comparison of the racial profile of an employer's
workforce and the racial profile of similar job-holders in that employer's labor market area. In
this example, the ABC Corporation, a large producer of computer software in City X, employs
350 programmers.  Eleven, or 3.2 percent of these programmers are Black.

Using the decennial census benchmark data, it is found that Blacks constitute 3.72 percent of
available programmers working in City X.   Using that benchmark proportion, the expected
number of Black programmers in a company in City X with 350 programmers is found to be 13
(3.72 percent times 350).  The difference between the number of Black programmers in ABC
Corporation and the number expected is minus 2 (11 minus 13).  In "standard deviation" terms,
the disparity (-2/350) is -.57 standard deviations.  Such a difference, while negative, is not
statistically significant (to be statistically significant, it would need to be less than -1.96).  Thus,
the number of Black computer programmers employed by the ABC Corporation is not suggestive
of an under representation of Black programmers in the employer's workforce.

Changes Needed to EEO Forms and Instructions to Meet the New Standards

Employer Record-keeping.  The instructions accompanying the current EEO forms state that
the race and ethnicity of an employer's work force may be obtained either by "visual surveys of
the work force, or from post-employment records."  The instructions state explicitly that eliciting
information from the employee via direct inquiry is not encouraged.  With the implementation of
the 1997 standards, this guidance will change.  Self-identification will be the preferred method of 
collecting data on race and ethnicity from employees.  Employers will also be encouraged to use
the two-question format with Hispanic ethnicity first, and to allow those employees who wish to
do so to select  more than one race.  Employers will be asked to maintain this information in their
data files.  It is currently thought that employers will not be required to resurvey current staff,
although some will likely do so.  If employers do not resurvey current staff, the data available to
be collected on the EEO forms will only slowly become comparable to the benchmark data
reported in Census 2000.

The OFCCP regulations do not specify how Federal contractors (employers) should gather the
data necessary to complete the work force analysis or the utilization analysis for Affirmative
Action Programs.  The implementing regulations, however, require the filing of an EEO-1 report
and, by implication, the data reported in the work force utilization analysis must be consistent
with the EEO-1 reporting requirements. 

Planned Changes to the EEO Forms.  To be consistent with the new standards, the following
changes to the EEO forms are planned:

(1)  Add a separate category "Native Hawaiian or Other Pacific Islander" to EEO forms and
     instructions, and replace the category "Asian or Pacific Islander" with "Asian."

(2)  Make the following changes in terminology:
     a.   The term "Eskimo or Aleut" replaced by "Alaska Native,"
     b.   The term  "Black" replaced by "Black or African-American," and
     c.   The term  "Hispanic" replaced by "Hispanic or Latino."

(3)  Capture Hispanic or Latino ethnicity in a separate category or question.

These planned changes do not incorporate a change of instructions to "mark one or more races." 
It has not yet been determined how best to revise the forms that collect aggregations of data
about the employer's workforce to account for individuals who report more than one race. 
Efforts to date to design and test an aggregate reporting form are discussed earlier in this report. 
Alternatives for using the data for EEO purposes (that might lead to changes in the EEO forms)
are described below.

Ensuring Common Approaches in EEO Reporting

The Federal civil rights enforcement agencies agree that they should adopt common data base
definitions for the racial and ethnic categories used to enforce EEO laws and regulations.  
Clearly, whatever system is adopted, the enforcement agencies will need to consider the complex
issues related to implementing the new standards, bridging to EEO enforcement conducted using
data collected under the old standard, and continuing to conduct the important business of
ensuring equal employment opportunity during the transition years.

Because of the complexities in collecting and using the data reported under the new standards for
civil rights enforcement purposes, the EEO agencies are still in the process of considering the
best way to analyze these data.  A number of alternative approaches are currently under review. 
Three alternatives are briefly described in the following sections.  Each alternative would require
the preparation of a suitable decennial census benchmark file.  Readers are invited to comment
on these alternatives and to suggest additional ideas and options.

Tabulation Alternative 1:  Using a Bridging Method.  The EEO agencies have considered the
methods discussed in Appendix D of this report, and have concluded that one of the allocation
methods proposed for bridging would be useful during the transition period.  The EEO agencies
considered the allocation method that assigns an individual who selected more than one race to
the largest of the nonwhite groups he/she marked as a viable alternative for EEO purposes.  The
largest nonwhite group may be ascertained from the racial composition of the population for the
relevant geography.

This allocation method can be used to assign responses from individuals who reported more than
one race to single race categories.  With this method, no change would be needed in the
statistical methods currently used by the EEO agencies, and for a few years, employers who
begin collecting data under the new standards would use this allocation method to report on their
EEO forms the racial data for new hires who select more than one race.  Employers could also be
asked to record on their EEO forms the total number of individuals in their files who selected
more than one race.  This would provide the EEO agencies with a measure of the changing racial
characteristics in work force data and would indicate when the final alternative should be
implemented.

This method represents an interim solution that would precede full implementation of the new
standards.  Following careful evaluation of Census 2000 data, decisions could be made that phase
in the new standards in an analytically appropriate manner. 

Tabulation Alternative 2:  The Lower and Upper Boundary Approach.  Under the new
standards, employees will be able to identify themselves as members of more than one racial
group.  As a result, some individuals who were identified as members of only one group, for
example, Black, under the previous standards, may now identify as members of more than one
group, for example, Black and White, under the new standards.  Thus, when data are reported it
will be possible to determine two counts for each racial group.  The lower count, or lower
boundary, will be those individuals who identify with one race only, for example those who
marked only the Black category.  The larger count, or upper boundary, adds to the lower
boundary those individuals who identify with the given racial category and one or more other
racial categories.  Thus, the upper boundary Black count includes everyone who marked Black
either alone or in combination with one or more other racial categories.  The remainder of the
population consists of those individuals who did not identify as Black.
 
As a practical matter, in most geographic locations the upper and lower boundaries will not
currently be substantially different for purposes of employment data because few adults are
expected to report themselves as members of more than one racial group.  This assessment is
based upon data provided in Appendix D of this report, and documentation of the National
Content Survey and the Race and Ethnic Targeted Test conducted by the Census Bureau.  Data
from some geographic regions are expected to reflect larger numbers and percentages of
respondents reporting themselves as belonging to more than one racial group.

An interagency group is working on possible modifications to survey forms, such as the EEO-1,
that collect aggregated data on the characteristics of many individuals for a single organization,
to capture information needed for the upper/lower boundary approach.  The tests conducted to
date are described in detail in Appendix B of this report. 

Tabulation Alternative 3:  Collect Micro Data from Employers.  An alternative approach to
using an aggregate reporting form, similar to the EEO-1, is to ask respondents to provide a micro
data file containing one record (without identifiers) for each employee.  The micro record would
include the employee's race or races, ethnicity, gender, and occupational category.  This
approach might be simpler for employers, and would provide agencies the maximum amount of
flexibility in using the information.  Implementation of this approach appears to be a longer-term
solution.  The EEO agencies would need to work with respondents in designing and
implementing the reporting format and method, and they would need to acquire the relevant
software and hardware to process the information. 

Illustrations of Comparisons Under Alternative Tabulation Approaches

To illustrate the alternatives, consider the example described earlier in this section.  Recall that
the ABC Corporation, a large producer of computer software in City X, employs 350
programmers.  It is assumed that the ABC Corporation started maintaining self-reported data on
race (allowing employees to select one or more races) for their new hires more than a year ago. 
As a result, their internal files contain a mixture of data collected under the old and new
standards.  For their 250 programmers hired before the new standards were implemented,
information on race in internal files is recorded as one of the four racial groups.  These files
indicate that 8, or 3.2 percent of the long-term programming staff members, are Black.  For the
100 recent employees, race is recorded as one or more of the five groups.  According to these
records, one of the new programmers has reported that he is Black, one has reported that she is
Black and White, and one has reported that he is Black and American Indian.  None of the other
97 individuals hired after the new standards were implemented reported Black either alone or in
combination with another race.

In benchmark data based on Census 2000, the following percentages of programmers in City X
have reported that they are Black:  3.3 percent have reported the single race Black, .23 percent
have reported that they are Black and White, and .11 percent have reported that they are Black
and American Indian.  A total of .42 percent have reported that they are Black and some other
race or races.  

Comparisons Under Alternative 1:  Allocation.  Because there are more Blacks in City X than
any racial group other than White, under the allocation method known as "largest non-white
group", ABC Corporation would count the 8 long term Black employees and the 3 new
employees who selected Black alone or in combination with another race, and report that they
have 11 Black programmers (approximately 3.2 percent of their programmers).  Similarly the
benchmark proportions would count in the Black category everyone who marked Black either
alone or with other race(s).  This would count a total of  3.72 percent of the available
programmers as Black. 

With these transformations, the counts and percentages are identical to the example provided
earlier and the analysis would lead to identical results.  If a different racial group were used in the
analysis, or a different allocation method were used, results would not necessarily be identical to
the earlier example.

Comparisons Under Alternative 2:  Upper/ Lower Bound.  For the upper/ lower bound
method, ABC Corporation would report that they have 9 programmers (2.6 percent) in the single
race (or lower boundary) Black category, and 2 employees (.6 percent)  who have reported Black
in combination with another race.  Thus, the "all inclusive" (or upper boundary) count for Black
programmers is 11 (3.2 percent).

The benchmark file has 3.3 percent of the programmers in the single race (or lower boundary)
Black category, and .42 percent of the programmers who report as Black and at least one other
race, yielding a total of 3.72 percent of programmers in the "all inclusive" (or upper boundary)
category.

Given past patterns of discrimination, one would most likely argue that the "all inclusive"
category would be most appropriate to use.  In this example, the resulting counts and percentages
are identical to the example provided earlier, and to the results of the allocation method.

The analysis could be conducted using the data for the single race category -- or lower bound, as
follows.  Using the benchmark proportion 3.2 percent, the expected number of Black
programmers in a company with 350 programmers in City X is found to be 11 (approximately
3.2 percent of 350).  The difference between the number of single race Black programmers in
ABC Corporation and the number expected is minus 2 (9 minus 11).  In "standard deviation"
terms the disparity (-2/350) is -.61.  This difference is not statistically significant (to be
statistically significant, it would need to be less than -1.96).  Thus, the number of Black
computer programmers employed by the ABC Corporation is not suggestive of an under
representation of Black computer programmers in the employer's work force.  In this case, the
analysis using the lower bound leads to the same conclusion as the analysis using the upper
bound, though the numbers are somewhat different.

Note that if a different allocation method was used with tabulation alternative 1, or if one of the
other racial groups were used in the example, the upper bound ("all inclusive" count) would not
be identical to the count based on the tabulation allocation method.  The reader is referred to
Appendix D for a detailed discussion of the impact of the various allocation methods.

Comparison Alternative 3:  Full Data Reporting.  With this method, ABC Corporation will
compile a micro data listing of employee characteristics to submit for EEO purposes.  The table
below illustrates the contents of such a micro data file.  This example is intended to illustrate the
complete recording of sex, race, and ethnicity.  It makes use of the single job category
"programmer," and therefore cannot be viewed as a real prototype for EEO reporting.  In this
table X denotes "yes," zero denotes "no," and blank indicates that the data are not available.

The first record (employee number 1) is a Black, non-Hispanic male programmer.  His data are
recorded in the new format:  he was hired after the new reporting system was adopted and had an
opportunity to self-select one or more races.  He chose to report himself as Black.  On the other 
hand, employee 4 has been an employee for some time, and his data are in the old format.  He is
also a Black male programmer, but the information provided in this record is what was recorded
in the company files prior to conversion to the new reporting system.

If this type of information became available from all employers, the EEO agencies could use any
of  the tests described above, or they would be able to transition to applying the EEO
methodology to any groups that become large enough to monitor for EEO, including those that
involve more than one race.
  Illustration of Part of Micro Data File for ABC Corporation
___________________________________________________________________
Employee  Sex  Hispanic  Race      Programmer     New Format
Number                   W  B  I  A  H
___________________________________________________________________
      1           M        0        0   X  0  0   0             X          X
      2           F             X                X  X  0  0   0           X          X
      3           M            0                 0  X  X  0   0           X          X
      4           M            0                 0  X  0   0              X          0
      5           F        0                0  X  0   0              X          0
      6           F        0         0  X  0   0                X               0
      7                M        0         0  X  0   0                X          0
      8           F        0         0  X  0   0                X          0
      9           M       X          0  X  0   0                X          0
    10            M        0         0  X  0   0                X          0
    11            M        0         0  X  0   0                X          0
    12            F       X          X 0   0   0   0            X          X    
    13                   .           .          .   .   .    .    .            .                .
___________________________________________________________________
W=White
B=Black
I=American Indian and Alaska Native
A=Asian
H=Native Hawaiian and Other Pacific Islander
 

Comparisons using Tabulation Alternative 3 would require benchmark data from the Census
Bureau for a subset of the 63 different unique combinations of reporting of race.  Decisions
concerning the size of the groups for which tabulations are needed would need to be made by the
EEO agencies, informed by the data from the decennial census.

C.  Vital Records and Intercensal Estimates
                  
The revisions to the standards for collecting and presenting Federal data on race and ethnicity
pose many challenges to the Census Bureau's Intercensal Population Estimates Program.
Because the population estimates are data driven, changes to the program to provide new racial
categories will depend upon the availability of data from a variety of sources.  Although changes
are possible, it will require discussions with data providers and data users, as well as research and
analysis of data collected under the new standards, before the Census Bureau can identify the
racial categories that can be used in the Intercensal Population Estimates Program.

Following some  background discussion, this section presents a description of the Intercensal
Population Estimates Program, its methodology, and its major uses, and then turns to some of the
major issues that must be addressed. 

Background

In 1977, the Office of Management and Budget (OMB) issued Race and Ethnic Standards for
Federal Statistics and Administrative Reporting.  Because the intercensal population estimates
are limited in their detail by the availability of administrative data, it was not until 1993 that the
Intercensal Population Estimates Program could modify its racial categories to follow fully the
1977 standards by providing data for the population in the four major racial categories -- White;
Black; Asian or Pacific Islander; and American Indian, Eskimo and Aleut.  To comply with the
1977 standards, the Intercensal Population Estimates Program developed estimates by race 
separately for the population by Hispanic origin (Hispanic, non-Hispanic).  

The 1997 standards present many challenges with two in particular posing the greatest challenge. 
One is that respondents to Federal data collections, including Census 2000, surveys, and vital
statistics registrations, will be allowed to select one or more races.  The other is that the Asian or
Pacific Islander aggregate category has been split into two categories -- one called "Asian" and
the other called  "Native Hawaiian or Other Pacific Islander."

Because the intercensal population estimates serve several diverse purposes, exploring the
possible outcomes of the estimates process and examining the implications of the new standards 
are important.  The intercensal population estimates are used as controls for many Federal
surveys, as denominators for important Federal statistics, and as indicators for important program
and policy decisions.

Because the issues raised by the 1997 standards are complicated and diverse, it will take
considerable research and experimentation before the Intercensal Population Estimates Program
can produce population estimates outputs that fully follow the new standards.  The next sections
describe the program and discuss the major issues that must be addressed in changing program
outputs. 
                                                                                                                                          
What is the Intercensal Population Estimates Program?   

The Intercensal Population Estimates Program, under Title 13, develops and releases annual
estimates of the total population and its demographic characteristics.  For the Nation, states, and
counties, these characteristics include annual estimates by:

     Age --              single years of age (age 0 to age 99) and 100+;
     Sex --              Male/Female
     Race--         White; Black; Asian and Pacific Islander; and American Indian, Eskimo,
                    and Aleut;
     Hispanic origin --  Hispanic/non-Hispanic
     
The Intercensal Population Estimates Program currently provides estimates of the total
population of functioning governmental units (cities, incorporated places, and minor civil
divisions).  The Census Bureau is considering expansion of the program to include smaller and
more diverse units of geography (such as School Districts), as well as the development of
demographic characteristics for functioning governmental units and other smaller geographic
units.

How Are the Population Estimates Used?  

The population estimates are used in the intercensal period for funding allocations, as controls
for Census Bureau and other Federal surveys, as denominators for vital statistics and other
demographic events, and as planning tools for government and private programs. 

Funding Allocations.   Federal programs totaling $180 billion use these annual population
estimates to make important program decisions and to distribute these funds.

Survey Controls.  The population estimates are used as control totals for the Current Population
Survey (CPS), the Survey of Income and Program Participation (SIPP),  the new American
Community Survey (ACS),  other Federal surveys, as well as many private surveys.  

Most Federal surveys use national level population estimates by age, sex, race, and Hispanic
origin as controls for weighting survey data.  The ACS currently uses county level population
estimates by age, sex, race, and Hispanic origin as controls for weighting survey data.

Denominators for Demographic Events.  The National Center for Health Statistics (NCHS)
currently uses the national, state, and county population estimates by age, sex, race, and Hispanic
origin as denominators to create birth and death rates and to calculate life tables by race and sex. 
In addition to the use by NCHS,  the Centers for Disease Control and Prevention (CDC)
frequently relies upon the estimates of population at various geographic levels as denominators
for various health related and disease incidence  rates.  The National Cancer Institute (NCI) uses
the county population estimates by age, sex, race, and Hispanic origin as denominators for the
various cancer incidence rates released to the public.

Planning Tools.  The intercensal population estimates are frequently used as planning tools and
as barometers to measure an area's growth and change since the last decennial census.  In making
important policy decisions, local planners frequently cite the overall population level and the
demographic characteristics products of the Intercensal Population Estimates Program. 
     
Methodology for Developing Intercensal Population Estimates

The Intercensal Population Estimates Program develops its population estimates by age, sex,
race, and Hispanic origin using the demographically recognized cohort-component technique.  In
this technique, each component of population change -- births, deaths, international migration,
and internal migration -- is estimated separately by age, sex, race, and Hispanic origin.  Various
administrative records provide information needed to develop these components of population
change.  The estimates process begins with the most recent decennial census results and
combines the estimated components of population change to develop the intercensal population
estimates.

The 1990 Census Base Population.  Although the enumeration of the resident population in the
1990 census, without adjustment for net undercoverage, was adopted as a standard for the
estimates, changes were made in the distribution of the population by age and race.  These
modifications were made to bring the definition of age and race into conformity with definitions
used for data from other sources, such as vital statistics.  (See Comparability Issues below for a
complete discussion of the modification of the 1990 Decennial Census.)

Birth and Death Components.  In brief, NCHS provides annual counts and distributions of
births and deaths by age, race, sex, and Hispanic origin by county to the Census Bureau in a
specially developed individual record file of the birth and death events.  These individual records
contain the detailed race and Hispanic classifications available from the birth and death
certificates collected by NCHS.

International Migration Component.  The international net migration components are based
on a variety of administrative sources and analytic estimates.  The Immigration and
Naturalization Service (INS) supplies data on legal immigrants.  The Office of Refugee
Resettlement (ORR) supplies data on persons admitted to the United States as refugees.  Both
sources supply data on country of birth.  The Census Bureau estimates the distribution by race
and Hispanic origin from the country-of-birth tallies, using data from the 1990 Census on the
foreign-born population who entered the United States from 1985 to 1990.

The other components of international migration such as emigration and undocumented
migration  are developed using a combination of basic demographic modeling techniques.  By
examining data from other administrative records in combination with an analysis of the
decennial census, the Census Bureau models the level and demographic characteristics of these
other international migration components.

Internal Migration Component.  The data on internal migration are developed using a basic
administrative records method.  This method relies on annual extracts of tax returns provided by
the Internal Revenue Service (IRS).  In this approach, using the Social Security Number (SSN)
on the return, The Census Bureau can match the tax returns for two years and obtain state of
residence for the two periods.  By comparing the state of residence at the two points in time, 
annual measures of migration can be developed for states.

Until recently, the Census Bureau had only developed the national population estimates by age,
race, sex, and Hispanic origin and the estimates of the total population for states and counties. 
During the current decade, the Census Bureau started to develop a set of state and county
population estimates by age, sex, race, and Hispanic origin.  

These state population estimates are developed using the basic cohort component technique
outlined above.  Since the standard tax return provides no demographic characteristics of the tax
filer, the Census Bureau must further modify the basic administrative records method to estimate
internal migration by age, sex, race, and Hispanic origin. To obtain demographic characteristics,
the Bureau has relied on the annual extract of tax returns provided by the IRS, and a 20 percent
sample of information on the Social Security Administration Application File (NUMIDENT). 
This NUMIDENT file includes SSN, month and year of birth, race, sex, and six characters of the
last name for each SSN holder in the sample file.

The extract of the NUMIDENT file has been merged with the tax returns file by SSN to derive
demographic characteristics of IRS filers.  Because the Census Bureau was able to receive only a
20% sample of this basic NUMIDENT file, the Bureau appended the demographic characteristics
of the primary filer to only the same 20 percent sample of tax returns.  Besides demographic
characteristics of the primary filers, the model requires demographic characteristics of those
persons claimed as exemptions on the tax return.  The rules for assigning demographic
characteristics to dependents are straightforward and rely on basic familial and demographic
relationships.

Because until this year, the  NUMIDENT File was restricted to a 20 percent sample, the Census
Bureau could not use the merged tax file and SSA data to develop county population estimates
by age, sex, race, and Hispanic origin.  To develop the current sets of county population
estimates by age, sex, race, and Hispanic origin, a ratio approach is employed.  This approach
combines the full set of age, race, sex, and Hispanic origin detail for the county in 1990 with the
newly developed state population estimates by age, sex, race, and Hispanic origin and the
estimates of the total population of the county.  With the delivery of the 100 percent
NUMIDENT file to the Census Bureau, work on employing the cohort component technique to
develop the county estimates by age, sex, race, and Hispanic origin is anticipated.


Data Availability

The intercensal population estimates are "data driven." As noted above, the decennial census, the
National Center for Health Statistics, the Immigration and Naturalization Service, and the Social
Security Administration are all important sources for developing intercensal population
estimates.  Using the current methodology, estimates cannot be produced without the availability
of these data.

Decennial Census Data.  The Census 2000 will mark the first time that decennial population
data are available using the new OMB standards for collecting racial data. The Census Bureau is
developing the approaches and timetables for tabulating these data from the Census 2000.  

Birth and Death Data.  The National Vital Statistics System is the basis for the Nation's official
statistics on births and deaths (including infant deaths).  The data are provided through vital
registration systems maintained and operated by the individual states and territories where the
original certificates are filed.  While the legal authority for vital registration rests with the states
and territories, the National Center for Health Statistics (NCHS) is required to produce national
vital statistics by collecting data from the vital records of all the states.  The NCHS cooperates
with the states in developing the standard forms for data collection as well as standard
procedures for data preparation and processing in order to promote a uniform national data base. 
The NCHS shares in the costs incurred by the states through contractual agreements with each
state.  Under this arrangement, NCHS obtains and publishes vital statistics based on all births and
deaths (e.g., 3,891,494 and 2,314,690, respectively, in 1996) occurring in the United States.

Implementation of the 1997 standards on vital records will require changes in data collection and
processing systems at all levels of government and very likely will take at least several years to
accomplish throughout the United States.  In addition to revising computer systems at the state
and Federal levels, the electronic software that is used in hospitals to record and report over 90
percent of all births in the United States needs to be converted.  Most importantly, the procedures
used to collect birth and death data in hospitals and funeral homes will need to be revised and the
appropriate staff need to be trained.  

It can be anticipated that not all registration areas will implement the 1997 standards at the same
time or with complete coverage and compliance at the start.  For example, some states may
implement the revised race question on birth and death certificates in the year 2000 in order to be
compatible with Census 2000, while others may prefer or need to wait until the next revisions of
the U.S. Standard Certificates of Birth and Death are implemented in 2002.  During 1998 and
1999, the NCHS is sponsoring a committee of state vital statistics officials and representatives of
the relevant professions in a series of meetings to evaluate the entire content and format of the
current Standard Certificates.  The committee's goal is to submit certificate revisions to the
Secretary, Department of Health and Human Services, in July 1999 for clearance by the
Department.  Implementation by the registration areas is expected to occur in January 2002. 
Some states have indicated a desire to make changes in the race and ethnicity items at the same
time as other changes are made.
  
International Migration Components.  As discussed above, the international migration
components  are based on a variety of administrative sources and analytic estimates.  The
Immigration and Naturalization Service (INS) supplies data on legal immigrants. The Office of
Refugee Resettlement (ORR) supplies data on persons admitted to the United States as refugees. 
Both sources supply data on country of birth.  

To develop data on the race and Hispanic origin of the entering immigrants, the Census Bureau 
combines the information on country of birth from the INS files with information from the most
recent decennial census.  Because the INS and other data sources on international migration do
not code race or Hispanic origin, no change in these sources is anticipated.  The Census Bureau 
will need to examine the results of Census 2000 and develop new algorithms to accommodate the
revised categories for data on race.

Internal Migration Components.  To develop the internal migration component, the Census
Bureau currently relies upon the annual extract of tax returns provided by the Internal Revenue
Service (IRS), and a 20 percent sample of information on the Social Security Administration
Application File (NUMIDENT).  Under an agreement between the Census Bureau and the Social
Security Administration, the Census Bureau has recently gained access to a full 100 percent
NUMIDENT file.  This opens additional opportunities for developing subnational population
estimates by age, sex, race, and Hispanic origin.  

This component also presents the biggest obstacle to modifying categories for data on race in the
intercensal population estimates process.  Under the Social Security system, data on race are
provided as part of  the Social Security card application process.  For the oldest among the
population currently covered in the NUMIDENT files, the last application date could refer to the
beginning of the Social Security system.  

Until 1980, the Social Security Administration application system provided three racial
categories -- White, Black, and Other.  Beginning in 1980, the SSA modified the  racial
categories on the SSA application form to include five categories -- (1) Asian, Asian-American
or Pacific Islander; (2) Hispanic; (3) Black (non-Hispanic); (4) North American Indian or
Alaskan Native; (5) White (non-Hispanic).  Although SSA modified the racial categories
application card, people who already had an SSA card did not have to resubmit their data on race. 
Thus, pre-1980 entries on the SSA file have information for three racial categories (White, Black,
and  Other), while entries after 1980 have information for five racial categories.  The application
for a Social Security card needs to be updated to reflect the 1997 standards.

Another change to the Social Security  application procedure has presented challenges to the use
of data on race.  Beginning in the late 1980's, the Social Security Administration introduced the
"enumeration at birth program."  Under this program, parents could request a Social Security
Number for their newborn children with the birth registration process.  Because the birth
certificates do not include racial information for the newborn, it is impossible to code race for the
newborn onto the SSA file.  While information on race is available for the birth mother and
father on the basic birth registration certificate, this data are not made available to the Social
Security Administration and is not on the basic NUMIDENT file received by the Census Bureau.

Comparability Issues 

Even the availability of the required source data does not ensure the capability to produce
reasonable and accurate population estimates.  Production of population estimates by the major
demographic characteristics depends upon the availability of comparable data across the various
data sources.  While comparability issues with respect to race reporting are not new, the
increased complexities of the new racial categories are likely to exacerbate the problems.

The issues about comparability in race reporting are present in the current set of intercensal
population estimates.  Data from the 1990 census on race posed several of these problems.

Although the enumeration of the resident population in the 1990 census, without adjustment for
net under coverage, was adopted as a standard for the estimates, changes were made to that
distribution of the population by age and race.  These modifications were made to bring the
definition of age and race into conformity with definitions used for data from other sources, such
as vital statistics.

For age, the aim was to correct biases in census age tabulations that resulted from displacement
of age reporting from the reference date of the census.  In 1990 census publications, age is based
on respondents' direct reports of age at last birthday, with some editing for age misstatement. 
This definition proved inadequate for postcensal estimates however, as many respondents
reported their age (even if correctly) at the time of completion of the census form or interview by
an enumerator, either of which could have occurred several months after the April 1 reference
date.  As a result, age was slightly biased upward.  Modification was based on a respecification
of age, for most individual respondents, according to their year of birth.  Age was derived from
year of birth by allocating date of birth to the first quarter and last three quarters of each year,
subtracting year of birth from 1990 for those born before April 1, and from 1989 for those born
after April 1.  The allocation was based on an historical series of registered births by month.

For race, the objective of the modification was to conform to the definition of race specified in
the 1977 standards.  In the 1990 census, a substantial number of people (roughly 9.8 million) did
not specify a racial group that could be classified in any of the categories on the census form:
White; Black; American Indian, Eskimo, or Aleut; Asian or Pacific Islander.  A large majority of
these people were of Hispanic origin (based on their response to a separate, Hispanic origin
question on the form), and many wrote in their Hispanic origin, or Hispanic origin type (for
example, Mexican or Puerto Rican) as their race.  People of unspecified race were allocated to
one of the four tabulated racial groups (White; Black; American Indian, Eskimo or Aleut; and
Asian or Pacific Islander) based on their response to the Hispanic origin question.  These four
categories for race conform with the 1977 standards, and are more consistent with the categories
in other administrative sources than are the original census tabulations.  

Census 2000 will pose challenges about reporting of race.  The expanded number of categories
and the possibility for reporting more than one race translates into over 60 possibilities.  The
large number of categories that are likely to have few responses will present challenges to the
Intercensal Population Estimates Program.

When combining across data sets and agencies, the problems of comparability in reporting of
race become more severe.  Clearly, the added complexity of reporting more than one race will
add to this problem, particularly as different reporting situations (such as the census or the birth
and death certificates) engender differential tendencies to report more than one race.  Differences
in allocation and editing procedures will almost certainly exacerbate the problem as exemplified
by the problem of using data from different data universes in the calculation of rates. 

Future Direction 

The process of developing a set of intercensal population estimates consistent with the 1997
standards will not be an easy one.  Until data are available, making any commitments about the
probable set of products is impossible.  The Census Bureau realizes, however,  that many data
users need to know its plans in order to make their own program decisions. 

To begin this process, the Census Bureau is forming a technical interagency group of key data
providers and key data users to address many of the major issues.  Members of this group will
provide input on:  (1) the feasibility of using one consistent set of categories on race across all
geographic levels; (2) the feasibility of using population size as the only criteria for determining
which categories by race will have separate population estimates; (3) the minimum cell size
below which population estimates will not be produced; (4) the continued development of
population estimates by mutually exclusive categories on race; and (5) the use of consistent
methodologies for the different categories by race in the population estimates program.  This
technical group will also examine issues related to data allocation and editing --  important
factors related to the data consistency issues. 

Although detailed data on race from Census 2000 will not be available until mid 2001, during the
next few months, the interagency group can address and reach consensus on most of the issues
outlined above. Through these discussions with the data providers and data users, the Intercensal
Population Estimates Program can begin to form some tentative plans.  Although it is too soon to
speculate on any outcomes, it is likely that the Intercensal Population Estimates Program will
need to be flexible. During the coming decade, as more data become available using the 1997
standards, it is likely that the Census Bureau will continue the expansion of the population
estimates program to include additional categories by race. 

D.  Issues for Further Research 

(Under Development)
V.   COMPARING DATA UNDER THE OLD AND THE NEW STANDARDS

This part of the report provides a summary of the Bridge Report:  Tabulation Options for Trend
Analysis, which is contained in Appendix D.
     
A.  Introduction  

Agencies whose data are used to display time trends in economic, social, and health
characteristics by racial and ethnic groups may need to consider bridging methods to assist users
in understanding the data collected under the new standard.  For some period of time, referred to
as the bridge period, agencies may display historical data along with two estimates for the
present time period.  The first, a tabulation of the data collected under the new standard (see Part
III B), and the second, a "bridging estimate" or prediction of how the responses would have been
collected and coded under the old standard.  Once the bridge period is over, the bridge estimates
will no longer be needed. 

It should not be assumed that bridging is useful or required in every situation.  Agencies should
carefully consider whether they need bridging estimates.  Bridging estimates may not be needed
if agencies can tolerate a "break" in their data series or if comparison to another data series
provides users with enough information about the change.  If bridging estimates are not used,
however, agencies should footnote the first occurrence of data collected under the new standard.

There are at least two purposes of bridge estimates:  (1) to help users understand the relationship
between the old and new data series (as noted above); and (2) to provide consistent numerators
and denominators for the transition period, before all data are available in the new format.  If
there is a need for bridging, agencies should carefully evaluate alternative methods.  The work
presented in Appendix D, and summarized below, is intended to help inform agencies about the
statistical characteristics of selected bridging methods. 

Agencies are encouraged to plan and conduct methodological research that will lead to more
informed decisions concerning bridging methods and their uses.  Such methodological research
has long been used to quantify changes in data collection procedures.  For example, when
methods for coding industry, occupation, or diseases are updated, it is common practice to code
data using both sets of coding rules to determine the nature and extent of the changes introduced
by the change in procedures. 

The analyses presented in Appendix D make use of survey data in which the same respondent
provided racial information in response to both a question structured under the old standard, and
in response to questions similar to those that might be structured under the new standard.  These
are examples of methodological approaches that can be adopted by agencies, if necessary.  In
particular, since 1976, the National Health Interview Survey (NHIS) has added a follow-up
question for those reporting more than one racial identity, asking them to select the one that they
feel best describes them.  This information is directly used in some of the most promising bridge
techniques.  Some agencies may find that adding such a follow-up question to the questions on
race and ethnicity, even just once after the implementation of the new standards, would provide
valuable survey-specific information for bridging to the past.  As agencies conduct such
experiments, the results may assist other agencies in understanding the changes associated with
transitioning to the new standard.

The results discussed here and in Appendix D represent the work of a group of statistical and
policy analysts drawn from Federal statistical agencies that use and produce data on race and
ethnicity.  They have spent the past year considering these tabulation issues and conducting
research to develop tabulation guidelines for constructing "bridges" between racial data collected
under the new standards and racial data collected under the old standards. The report sets forth
criteria by which different bridging methods should be evaluated and describes the different
methods that have been considered thus far.  The results of the research conducted on several
methods for creating bridges are also presented.

This part of the report discusses different options for tabulating racial data in order to create
bridges from data collected under the 1997 standards, which have five racial categories and
permit the reporting of more than one race, back to the data collected under the previous
standards, which identified four racial categories.  An "Other" category appears in much of the
analysis, because it is included in the decennial census and some other surveys.  

All of these methods (and the research on them reported here) involve the use of individual-level
records.  Analysis is limited to data collected using the separate questions for race and Hispanic
origin.  Under the new standards, when reporting is based on self-identification, the two-question
format is to be used; even in the case of observer identification, this is the preferred format.  It is
expected that some users will bridge to a distribution created using the combined format for the 
question on race and ethnicity.  Thus, bridging both to the old racial distribution arising from the
use of two questions and one based on a combined, single question are analyzed.  At this time,
the analysis of bridging to the combined distribution has not been completed, but those results
will be included in the report when they become available.  Based on the research, the strengths
and weaknesses of each tabulation method are discussed.  Until all the analysis has been
completed, however, recommendations will not be made. 

B.  Methods for Bridging

The goal of developing bridging methodology for data on race is to identify a statistical model
that will take individuals' responses to the new questions on race and classify those responses as
closely as possible to the responses we hypothesize they would have given using the old single
race categories.  Such a task will be relatively easy or be more difficult depending on how an
individual identifies himself or herself under the new standards.  For bridging purposes,
individuals with only a single racial background are likely to identify as they did before, and no
statistical model is needed for bridging.  However, those with a mixed racial heritage who were
previously required to identify only one part of their background may, under the new standards,
choose to report more than one racial identity.  When a person identifies with more than one
racial group, some model will be necessary to translate those multiple responses into the one,
single response we hypothesize that the individual most likely would have reported under the old
standards.  

Framework.  Several different methods have been identified for creating a single race
distribution from data including multiple race responses.  These methods vary in both the
assumptions that are made and the procedures that are followed.  Before describing the particular
methods examined in this report, it is useful to describe some of their major underlying
characteristics.  

One major distinction among the methods is whether an individual's responses are assigned to a
single racial category (termed whole assignment) or to multiple categories (termed fractional
assignment).  Whole assignment can be based on a set of deterministic rules or based on some
probabilistic distribution.  For example, a deterministic rule might assign all White and American
Indian responses into the American Indian category, while a probabilistic rule might randomly
assign 60 percent of the White and American Indian responses into the American Indian
category, and 40 percent into the White category.  In the above example, it is unlikely that all
individuals identifying as White and American Indian under the new standards would have
previously identified as American Indian, so the deterministic rule will result in
misclassifications for all those people who had previously identified as White.  With a
probabilistic rule, an individual's responses are randomly assigned to either the American Indian
category or the White category (such as with 60 percent and 40 percent probabilities,
respectively, based on previously collected data).  However, even if the overall probabilities
matched exactly the aggregate distribution under the old standards, there is no guarantee that the
40 percent who were categorized as White would have classified themselves that way.  In fact, in
the worst case, all 40 percent who were classified as White would actually have identified as
American Indian under the old standards, and a corresponding percentage of those categorized as
American Indian would have identified as White.  

When fractional assignment is used, multiple race responses are categorized into more than one
category where each category receives a fraction of a count, and the sum of the fractions equals
one.  In the above examples of whole assignment, a person's responses were placed into one and
only one category, in an attempt to mimic the past.  An alternative is to use a deterministic rule to
assign some fraction of the multiple race responses to each of the racial categories identified.  For
example, a multiple response of White and American Indian might count as "one-half" in the
tabulations for American Indians and "one-half" in the tabulations for Whites.  These fractions,
like the probabilities in the earlier example, could be varied for different combinations of
multiple races to attempt to reflect how often people might identify with one group compared
with another.  

Bridge Tabulation Methods.  All of the bridge tabulation methods focus on the assignment of
the responses from individuals who identify with more than one racial group.  Responses from
individuals who identify with only a single racial group under the new standards are assumed to
have been the same under the old standards.  The response "Native Hawaiian or Pacific Islander"
is assigned to the old racial category of "Asian or Pacific Islander."  The specific methods for
assigning multiple race responses into single race categories are Deterministic Whole
Assignment, Deterministic Fractional Assignment, and Probabilistic Whole Assignment.

Two sets of results for each of the following tabulation methods are produced.  The first set
ignores the use of any auxiliary information other than that needed to carry out the particular
tabulation method.  The other set of results for each method uses the one piece of information
that is certain to be common to all data collections done following the new standards, that is,
ethnicity.  Thus, whether or not an individual is Hispanic is taken into account when a tabulation
method is used. 

(1) Deterministic whole assignment.  These methods use fixed, deterministic rules for assigning
multiple responses back to one and only one of the racial categories from the old standards.  Four
alternatives are examined.  The first (Smallest Group) assigns responses that include White and
another group to the other group, but responses with two or more racial groups other than White
are assigned into the group with the fewest number of individuals identifying that group as a
single race.  The second alternative (Largest Group Other Than White) assigns responses that
include White with some other racial group, to the other group, but responses with two or more
racial groups other than White are assigned into the group with the highest single-race count. 
The third alternative (Largest Group) assigns responses with two or more racial groups into the
group with the largest number of individuals as a single race.  In this latter case, any combination
with White is assigned to the White category, and combinations that do not include White are
assigned to the group with the largest single-race count. The fourth alternative (Plurality) assigns
responses based on data from the National Health Interview Survey (NHIS).  The NHIS has
permitted respondents to select more than one race for a number of years, with only the first two
responses captured.  However, respondents reporting more than one race were given a follow-up
question asking them to select the one race with which they most closely identify (called Main
Race here).  For these respondents, the proportion choosing each of the two possibilities as their
main race was calculated.  All responses in a particular multiple-race category using the Plurality
method are assigned to the group with the highest proportion of responses on the follow-up
question about main race. 

(2) Deterministic fractional assignment.  These methods use fixed, deterministic rules for
fractional weighting of multiple-race responses, that is, assigning a fraction to each one of the
individual racial categories that are identified.  These fractions must sum to 1.  Two alternatives
are examined.  The first (Deterministic Equal Fractions) assigns each of the multiple responses in
equal fractions to each racial group identified.  Thus, responses with two racial groups are
assigned half to each group; those with three groups are assigned one-third to each, etc.  The
second alternative (Deterministic NHIS Fractions) assigns responses by fractions to each racial
group identified, with the fractions drawn from empirical results from the NHIS (as described
above). 

(3) Probabilistic whole assignment.  These methods use probabilistic rules for assigning
multiple race responses back to one and only one of the previous racial categories.  Two
alternatives are examined.  These parallel the two alternatives discussed under Deterministic
Fractional Assignment, except that, for a given set of fractions, the response is assigned to only
one racial category.  The fractions specify the probabilities used to select a particular category. 
The first alternative uses equal selection probabilities.  The second uses the NHIS fractions
where possible, and equal fractions when no information is available from NHIS.  Probabilistic
Whole Assignment will yield nearly, on average, the same population counts as Deterministic
Fractional Assignment.  Only the results from Deterministic Fractional Assignment are
presented in this report.  In practice, there would be a difference between Deterministic
Fractional Assignment and Probabilistic Whole Assignment when computing variances for
tabulated estimates, and the two methods will yield relatively small differences in distributions
for respondent characteristics.  In general, Probabilistic Whole Assignment would yield a higher
estimated variance than the Deterministic Fractional approach, with the variances for both
methods underestimating the true variance.  Probabilistic methods which incorporate a "Multiple
Imputation" statistical technique would result in an unbiased estimate of variance, but at the price
of being more difficult to implement (See Rubin 1987.).

(4) All Inclusive.  A final tabulation method considered is termed the "All Inclusive" method. 
Under this method all responses are used.  Responses are assigned to each of the categories that
an individual selects.  The sum of the categories totals more than 100 percent.  

C.  Methods of Evaluation 

Data Sources

National Health Interview Survey.  The NHIS is a continuing nationwide sample survey
designed to measure the health status of residents of the United States (Benson and Marano,
1995; Massey et al., 1989).  The analysis here uses data from an analytic file that contains three
years of NHIS data (1993, 1994, and 1995).  For each of these years there were about 45,000
households interviewed, resulting in slightly more than 100,000 individuals per year.  The total
sample for the bridge analysis is 323,080 (5237 respondents did not provide data on race).   

Since 1976, the NHIS has allowed respondents to choose more than one racial category.  As the
respondent is handed a card with numbered racial categories, the interviewer asks, "What is the
number of the group or groups that represent your race".  If a respondent selects more than one
category, the interviewer then asks,  "Which of those groups would you say best describes your
race?" 

Although the listed racial groups have changed over time, for 1993 to 1995, the card shown to
respondents included 16 separate racial categories (white, black, American Indian, Aleut,
Eskimo, Chinese, Filipino, Hawaiian, Korean, Vietnamese, Japanese, Asian Indian, Samoan,
Guamanian, and other Asian and Pacific Islander).  Although not on the flashcard, respondents
were allowed to give an "other" race response.  To be consistent, the 16 groups were collapsed to
the four previous racial categories:  White, Black, American Indian or Alaskan Native (AIAN),
and Asian or Pacific Islander (API), plus Other.  

For this analysis, a variable called Detailed Race was created from responses to the first question,
which allowed identification with more than one racial group.  This information is not included
on public use data files of the NHIS.  However, on internal files, the first two race groups
mentioned are recorded for each observation.  Even if a respondent selected more than two
groups, only two were recorded on the intermediate file.  From the two recorded racial responses,
Detailed Race was coded into five single race groups (White, Black, AIAN, API, Other) and 11
multiple race groups (White/Black, White/AIAN, White/API, White/Other, Black/AIAN,
Black/API, Black/Other, AIAN/API, AIAN/Other, and API/Other).  For most analyses, multiple
race combinations that had insufficient numbers were aggregated into the category "Other
Combinations."  Individuals who had two racial groups recorded for Detailed Race but a third
group recorded for the "group that best describes race" were coded into "Other Combinations."

The Main Race variable, used as a reference point representing the racial distribution under the
old standards, is primarily derived from Detailed Race and the responses to the second question,
which asks the respondent for the group that best describes his/her race (Benson and Marano,
1995).  For respondents who selected one Detailed Race group, Main Race is the same as
Detailed Race.  For respondents who selected more than one racial group, Main Race is the one
group reported as best describing their race.  Some respondents who had chosen more than one
race for the Detailed Race question responded as "Multiple race" or "Other" for the Main Race
question.  For this analysis, these responses were combined into the "Other" category.  Categories
for Main Race were White, Black, AIAN, API, and Other.

May 1995 Supplement on Race and Ethnicity to the Current Population Survey (CPS).  
The May 1995 CPS Supplement was one in a series of studies conducted for the Federal
agencies' review of the standards for data on race and ethnicity.  The Supplement was designed
to address the following issues:  (1) the effect of having a "multiracial" race category among the
list of races; (2) the effect of adding "Hispanic" to the list of racial categories; and (3) the
preferences for alternative names for racial and ethnic categories (e.g., African-American for
Black, and Latino for Hispanic).  The Supplement was organized into four panels representing a
two-by-two experimental design for studying the first and second issues outlined above.  Each
panel was given to one-fourth of the sample, or about 15,000 households (30,000 individuals). 
All respondents in a household received the same set of questions; household members 15 years
and older were asked to respond for themselves, and parents answered for children under 15.  

Only two of the panels in the CPS Supplement permitted respondents to report in a multiracial
category (panels 2 and 4), and only one panel had separate race and Hispanic origin questions
(panel 2) as ultimately recommended in the new standards.  Therefore, panel 2 data were used to
analyze the effects of the different tabulation methods for the two-question format.  The smaller
sample (about 30,000 observations) hampers analysis and generalizations when the focus is on
the small portion of the sample (about 1 percent) who identified as "multiracial."  

There are additional limitations to these data for evaluating the bridging methods.  The option
respondents were given to identify multiple races in the CPS Supplement was a multiracial
category with a follow-up question asking respondents to indicate all the racial groups with
which they identified.  The new standards allow people to identify directly with all the racial
groups they choose and do not include a "multiracial" category.  Furthermore, a large percentage
of individuals who chose the multiracial category in panel 2 of the Supplement did not specify
more than one racial group (see Tucker et al., 1996).  For purposes of this evaluation, individuals
were classified as belonging to the specific racial categories they identified.  Those who
identified as being multiracial but then did not give two or more specific racial groups were
reclassified in the one racial category they gave.  Thus, the distribution of the CPS Supplement
data reported here differs from that which was published in earlier reports, which classified as
multiracial any person who identified with the multiracial category even if they only specified
one racial group.  This new distribution is referred to here as the "Edited Distribution."

This edited distribution was used with the various tabulation methods.  As in NHIS, the resulting
distributions were compared to a reference distribution based on the respondents' original
answers (in the first CPS interview) to the race question that followed the old standards.   

1998 Washington State Population Survey.  The 1998 Washington State Population Survey
(WSPS) was designed to provide information on Washington residents between decennial
censuses.  The survey collected data on employment, income, education, and health, along with
basic demographic information.  The WSPS was done by telephone and included 7,279
households with telephones.  Blacks, Asians, Hispanics and American Indians were over
sampled.  The designated respondent was the individual with the greatest knowledge about the
household.  The respondent weights reflect this over sampling and, thus, results are
representative of the Washington population as a whole.  The response rate for the entire sample
was between 50 and 60 percent.  

Information about the race of the respondent was collected twice during the course of the
interview.  At the beginning of the survey, the respondent was asked, "Are you of Hispanic
origin?"  Following that question, the respondent was asked, "What is your race?"  The categories
were the ones appearing under the old standards, but the order was as follows:  Black; American
Indian, Aleut, or Eskimo; Asian or Pacific Islander; and White.  An "Other" category also was
allowed, and the interviewer recorded the verbatim response on a "specify" line.  Near the end of
the survey, the respondent was asked race questions conforming to the new standards.  Besides
the same Hispanic origin question, the respondent was asked to specify country of origin.  For
race, the respondent was asked to select one or more categories.  This time the ordering of the
categories was White; Black or African American (or Haitian or Negro); American Indian or
Alaska Native; Native Hawaiian or Other Pacific Islander; Asian.  Again, an "Other" category
was provided.  There also was a follow-up question for Asian respondents to specify country of
origin. 

The results from the race question at the end of the survey were used with the tabulation
methods.  The reference distribution came from the answers to the original race question.

Advantages and Disadvantages of These Data Sources

Only the Washington State data closely resemble the way the question on race will be asked
under the new standards.  Yet, all three can offer insights into the relationship between how
individuals will actually respond to the new question on race and how they responded to the
question under the old standards.  The NHIS and the CPS Supplement are nationally
representative, and the Washington State data serve as an example for evaluating the tabulation
methods at the state level.  Simulations using 1990 census data also were conducted, but the
results differed little from those for the other data sets.  At this point, it is believed that an
analysis of data from the 1998 Dress Rehearsal for Census 2000 would be of greater utility. 
Furthermore, the Dress Rehearsal data will provide examples of the effects of the new standards
at the local level.  Thus, this analysis will be included in the next version of this report.    

Description of New Analyses

The analyses concentrated on the bridge tabulation methods.  These analyses can be divided into
three broad areas:  (1) descriptions of racial distributions under the alternative bridging tabulation
methods; (2) rates of racial "misclassification" for these alternatives; and, (3) sensitivity of
outcome measures to the bridging alternatives.

Distribution of Race.  For the first phase of the analysis (using the NHIS, the CPS Supplement,
and the data from Washington State), the distributions of race under the allocation alternatives
described previously were calculated:  All Inclusive, Deterministic Whole Allocation (Smallest
Group, Largest Group Other Than White, Largest Group, and Plurality) and Fractional
Allocation (Equal Fractions and NHIS Fractions).  These new distributions were compared to the
reference distribution in each data set.  At this time, it is unknown what percentage of people in
the United States will identify with more than one racial group when given the opportunity to do
so in Census 2000 and in subsequent surveys.  For purposes of illustrating the effects of a greater
proportion of individuals identifying more than one racial background, analyses were conducted
increasing the proportion of multiple race responses two-, four-, six- and eight-fold using the
NHIS, the CPS Supplement, and the Washington State micro data sources.  The racial
distributions were compared using each of the tabulation methods to see effects with increasing
levels of reporting more than one race.  Of necessity, these tabulations assume that the increases
are the same across the different combinations of more than one race.  The accuracy of this
assumption cannot be tested.  The purpose of these analyses is not to attempt to make accurate
predictions about the extent of multiple race reporting or its composition, but rather to see more
clearly possible differences among tabulation methods that may only become apparent with a
greater percentage of more than one race reporting. 

Misclassification of Race.  Besides evaluating the overall racial distributions produced by the
tabulation methods, the misclassification of individuals also needs to be examined.  For the
NHIS, the CPS Supplement, and the Washington State survey, these misclassification rates were
formed by comparing an individual's answer to the race question under the old standards to the
assigned category of the individual's response(s) to the race question under the new standards
using each of the tabulation methods.  The misclassification rate and its standard error for each
race by tabulation method were produced.     

Preliminary Outcomes Assessment.  In the last phase of the analysis, the impact of multiple-
race reporting on outcome measures was assessed.  This process is important because users in
many of the Federal agencies are not typically examining race distributions, but rather trends and
indicators for the Nation (e.g., health outcomes, economic well-being, educational attainment)
across racial groups.  This is where the majority of work will need to be done within individual
agencies as the new standards are implemented.  An initial examination of how common
statistics could be affected by reporting of more than one race was conducted.  Five outcome
measures were examined, three from the NHIS and two from the CPS Supplement.  From the
NHIS, three routine health outcomes were calculated:  percent of respondents in poor or fair
health, percent of children living with a single mother, and percent of respondents with no health
insurance.  From the CPS Supplement, the proportions of respondents who were unemployed and
the labor force participation rates for different racial groups were calculated.  These estimates
based on the bridging alternatives are not meant to be precise measures of these factors, but are
used to demonstrate the possible impact reporting of multiple races and the tabulation methods
may have on these and similar estimates. 

D.  Examination of the Results with Respect to the Evaluation Criteria 

Bridging to the past will be needed for measuring change in a variety of circumstances.   Besides
measuring population growth, any number of economic, social, and health outcomes must be
monitored.  This work will involve different population groups at different levels of geography. 
As a first step toward providing the information users will need to make informed decisions
about the methods, the strengths and weaknesses of the bridging methods with respect to the
evaluation criteria outlined at the beginning of this report are discussed, based on the results of
the statistical analyses conducted.  The details of these statistical analyses can be found in
Appendix D.

Measure Change Over Time.  As indicated earlier, measuring change over time is the criterion
that is of greatest importance in evaluating the bridging methods.  The first and second phases of
the analysis shed light on the performance of the various methods in this area.  In essence, an
ideal bridging method in this case is one that not only accurately recreates the population
distribution under the old standards such that the only difference remaining is a function of true
change over time, but also assigns an individual's response to the old category that would have
been chosen.  The methodology used in these studies allows users, within limits, to see how well
the bridging methods using racial data collected under the new standards can match data from the
same respondents collected (at about the same time) under the old standards.  To the extent that
there is a match, any change that would occur from this point forward would indicate true
change.  If the match is poor, it is not possible to isolate the true change.

When comparing the different methods to their reference distributions, the racial categories that
were most sensitive to which method is chosen were the numerically small ones, particularly the
AIAN category.  While different data sets were used in each study and the racial questions were
not the same, the studies indicate that the Largest Group Deterministic Whole Assignment
method, the Plurality method, and the two Deterministic Fractional Assignment methods produce
distributions closer to the reference distributions than do the other Deterministic Whole
Assignment methods and the All Inclusive method.  Controlling for ethnicity had no effect on
these results.  One reason the Largest Group Assignment method results are so close is that it has
little effect on the smaller races, because most assignments are made to Black or White, and the
percentages for these two races are so large that the relatively small increase they receive is not
noticeable.  The Plurality method produces a good fit, because it makes assignments at the level
of specific racial combinations.  The performance of the NHIS Fractional Assignment method
can be discounted to a degree in the NHIS study because the analysis is somewhat circular;
however, the results from the CPS Supplement and the Washington State Population Survey
(WSPS) show this method yields a relatively close match.  The Equal Fractional Assignment
method produces a reasonable match in these studies.  The primary reason that the other two
Whole Assignment methods and the All Inclusive method do not perform as well is that they
alter the White percentage to some extent and substantially increase the percentage in the AIAN
category.

In the case of misclassification rates, some contradictory results emerge.  While the AIAN and
"Other" categories have high misclassification rates across all tabulation methods in the CPS
Supplement, the same is not true for the other two surveys.  The Smallest Group Whole
Assignment and the Largest Group Other Than White Whole Assignment methods produce the
most comparable results for the AIAN category in both surveys and for the "Other" category in
the WSPS;  however, these methods have higher overall misclassification rates.  Both the CPS
Supplement and the WSPS have large misclassification rates for these two categories when using
many of the tabulation methods.  

When the distributions of the outcome variables are examined, all methods produce comparable,
and relatively close matches for all health outcomes.  For the AIAN unemployment rate, the
Largest Group Whole Assignment method and the NHIS Fractional Assignment method appear
to produce the least comparable results, but none of the differences are significant.  There are
significant differences in the AIAN labor force participation rates for several of the tabulation
methods.  It is likely that which method is best at matching a reference distribution for outcome
measures will depend on the outcome being examined.  Unfortunately, the data to assess the best
tabulation method for each outcome may never be readily available. 

All of these conclusions should be viewed with caution.  Many assumptions had to be made in
these studies.  It is unclear how people will respond to the new racial question in the future, and
these responses could differ by mode of data collection and with the subject of the survey. 
Furthermore, most of this work on developing bridging methods relied on sample data, and small
samples at that.         

Congruence with Respondent's Choice.  This criterion concerns how well the full range of the
respondent's choices is represented in the racial distribution.  It is more important for evaluating
ongoing tabulations under the new standards, but the bridging methods can be differentiated with
respect to this criterion, too.  None of the Deterministic Whole Assignment methods take into
account the full range of the respondent's selections, but the Plurality method at least controls for
the particular racial combination chosen by the respondent under the new standards.  The All
Inclusive method accurately reflects all selections by tabulating actual responses and not people. 
The Equal Fraction Assignment method tabulates people, but, like the All Inclusive method,
treats all responses equally.  The NHIS Fractional Assignment method takes all responses into
account, but assignment is based on attempting to estimate in which single-race category the
respondent would prefer to be counted. 

Range of Applicability.  This criterion refers to how well the bridging method can be applied in
different contexts. The All Inclusive method provides the same results in every context, because
assignment does not depend on the particular detailed racial distribution.  This method is not
suitable for  users who need a distribution that adds to 100 percent.  Of the Deterministic Whole
Assignment methods, the Largest Group Assignment method is the least sensitive to context and
can be used in a wide variety of applications.  The other Deterministic Whole Assignment
methods are as easy to use as the Largest Group Whole Assignment method, but the results for
the small racial categories will vary to a greater extent with the context, particularly according to
level of geography.  The Equal Fraction Assignment method is as generalizable as the All
Inclusive method, but it is not quite as easy to use.  The NHIS Fractional Assignment method
and the Plurality method may be the most problematic, because they currently only represent a
national preference distribution based on data from 1993 to 1995.  The use of this distribution at
the local level would be likely to produce inaccurate results in a number of cases.  That is not to
say that the other methods do not face the same problem.

Meet Confidentiality and Reliability Standards.  Because these methods all attempt to
reproduce the racial categories under the old standards, the same confidentiality problems that
existed over the last 20 years will continue to exist.  No increase in problems is anticipated.  In
the case of reliability, however, the situation is different.  The All Inclusive method will not
produce less reliable data than data produced under the old standards.  The Equal Fraction
Assignment method may have reliability problems as a result of only adding fractional counts to
some of the smaller categories if these categories have a high probability of being chosen as the
preferred single race.  The same would be true if equal fractions were used to make whole
assignments.  In sample surveys, the Deterministic Whole Assignment methods will have
reliability problems to the extent that there is a large variance on the individual race proportions. 
This is likely to occur when small samples are involved.  The Largest Group Whole assignment
method should have the fewest problems with respect to reliability, and the Smallest Group
Whole Assignment method will likely have the most.  These methods have another problem,
however, in that an individual's response may be assigned to different categories at different
levels of geography.  The NHIS Fractional Assignment method, as well as methods where
fractions are used for whole assignment (i.e., the Plurality method), is based upon a sample
distribution with its own variance properties.  Reliability for the very small combinations will be
quite bad unless many years of data are combined, and this presents its own problems.

Minimize Disruptions to the Single Race Distributions.  This criterion is only relevant for
evaluation of bridging methods.  Its purpose is to see how different the resulting bridge
distribution is from the single-race distribution for detailed race under the new standards.  To the
extent that a bridging method can meet the other bridging criteria and still not differ substantially
from the single-race proportions in the ongoing distribution, it will have value for looking both
forward and backward in time.  An evaluation of the different methods according to this criterion
involves the comparison of the bridge distributions to the detailed race distribution under the new
standards in each case. 

For the CPS Supplement, the Plurality method is marginally closer than the Largest Group
Whole Assignment method and the Fractional methods.  While the All Inclusive method and the
other Deterministic Whole Assignment methods match for the White category, they differ
substantially from the single-race AIAN category in the detailed distribution and are marginally
worse for the API category.  The NHIS Fractional method is the closest in both the NHIS and
WSPS.

Statistically Defensible.  To be statistically defensible, the bridging method must conform to
acceptable statistical conventions.  The All Inclusive method makes no assumption about how
respondents would assign themselves in the single race situation.  The NHIS Fractional
Assignment method and the Plurality method are based on an observed distribution, and, to that
extent, involve less judgment than the rest of the methods that assign people and not responses. 
While the Equal Fractional Assignment method is based on judgment, it does not make
assumptions about the relative importance of any given race.  The Largest Group Whole
Assignment method does assign greater importance to one of the races, but it also follows
common, but different, statistical practice than the equal fraction approach.  Both attempt to
minimize the error in assignment.  The Smallest Group Whole Assignment method and the
Largest Group Other Than White Whole Assignment method do not follow statistical practice,
but, instead, rely on the historical record of discrimination; even in these cases, however, the
assigned category is based on an observed distribution.

Ease of Use.  "Ease of use" refers to how complicated it is to produce the bridge results.  The
Equal Fractional Assignment method makes assignments that do not depend on the particular
detailed racial distribution at hand.  It and the NHIS Fractional Assignment method do require
the duplication of individual records or the creation, on every record, of a variable for each racial
category under the old standards in order to be able to assign fractions for any combination of
categories.  If the fractional methods are used to assign a respondent to a single category (whole
probabilistic methods), this cumbersome process can be avoided.  The All Inclusive method, like
the Equal Fractional method, does not depend on the particular distribution, but it does produce
proportions that add to more than 100 percent unless they are raked or repercentaged to a base of
100 percent each time. The Deterministic Whole Assignment methods and the NHIS Fractional
method would require an extra step unless only national figures are used, because the relative
size of the groups must be determined for each detailed distribution.  Otherwise, they are as easy
to use as the whole probabilistic methods.

Skill Required.  This criterion refers to the skills required to carry out the bridge operations. 
The amount of computer expertise to perform the operations associated with each of these
methods is fairly trivial.  The Deterministic Whole Assignment methods require almost no
statistical knowledge.  Some familiarity with the statistical adjustment literature would be useful
for understanding the Deterministic Fractional Assignment procedures.  If the All Inclusive
method were used, users might need to understand statistical raking.

Understandability and Communicability.  This criterion concerns how easily the methods can
be explained and understood by the average user.  The Deterministic Whole Assignment methods
are both easy to explain and easy to understand.  The fractional assignment of individuals to a
single category also is not difficult to follow.  Assigning fractions of a person to different
categories may be easy to explain, but the average user may find it difficult to accept the idea. 
The All Inclusive method also is easily explained, but, unless the percentages are raked to 100
percent, users may have a problem understanding how to use the results.


                           References

Benson, V. and Marano, M. (1995), "Current Estimates from the National Health Interview
     Survey, 1994," National Center for Health Statistics, Vital Health Statistics, 10(193).  

Massey, J. T., Moore, T. F., Parsons, V. L., and Tadros W. (1989), "Design and Estimation for
     the National Health Interview Survey, 1985-1994," National Center for Health Statistics, 
     Vital Health Statistics, 2(110).

Rubin, D. R. (1987), Multiple Imputation for Nonresponse in Surveys, New York: Wiley, 1987.

Tucker, C., McKay, R., Kojetin, B., Harrison, R., de la Puente, M., Stinson, L., and Robison, E. 
              (1996), "Testing Methods of Collecting Racial and Ethnic Information: Results of the 
              Current Population Survey Supplement on Race and Ethnicity," Bureau of Labor 
              Statistics Statistical Notes, No. 40.