EXECUTIVE OFFICE OF THE PRESIDENT OFFICE OF MANAGEMENT AND BUDGET WASHINGTON, D.C. 20503 February 17, 1999 DRAFT PROVISIONAL GUIDANCE ON THE IMPLEMENTATION OF THE 1997 STANDARDS FOR THE COLLECTION OF FEDERAL DATA ON RACE AND ETHNICITY NOTE FOR READERS As a follow-on to OMB's October 1997 announcement of revised government-wide standards for the collection of data on race and ethnicity, the Tabulation Working Group of the Interagency Committee for the Review of Standards for Data on Race and Ethnicity has recently issued a report, "Draft Provisional Guidance on the Implementation of the 1997 Standards for the Collection of Federal Data on Race and Ethnicity." This guidance, which has been developed with the involvement of many Federal agencies, essentially was requested by those agencies and the many users of data on race and ethnicity. The guidance focuses on three areas: collecting data using the new standards, tabulating data collected under the new standards, and building bridges to compare data collected under the new and the old standards. At this juncture, the guidance is often in the form of alternatives for discussion rather than recommendations for implementation. In many areas work is ongoing, and the guidance will be amended as additional research and analyses are completed. At this juncture, we are seeking broader comment on the guidance. In keeping with the process that guided review and revision of the standards for data on race and ethnicity, we are looking forward to an open dialogue on this draft provisional guidance. Following a two month period for discussion by stakeholders within and outside government, we expect to issue provisional guidance at the end of April. We expect the guidance issued at that time will evolve further as data from Census 2000 and other data collections employing the new collection standards become available. We look forward to your review and comments, and welcome your questions. Katherine K. Wallman Chief Statistician DRAFT PROVISIONAL GUIDANCE ON THE IMPLEMENTATION OF THE 1997 STANDARDS FOR FEDERAL DATA ON RACE AND ETHNICITY Prepared By Tabulation Working Group Interagency Committee for the Review of Standards for Data on Race and Ethnicity February 17, 1999 Table of Contents I. Background A. The Need for Tabulation Guidelines and Alternative Approaches B. General Guidelines for Tabulating Data on Race C. Points of Clarification Regarding the 1997 Standards D. Criteria Used in Developing the Tabulation Guidelines II. Collecting Data on Race and Ethnicity Using the New Standards A. Developing Procedures for Data Collection (Full Report at Appendix B) B. Best Practices in Survey Design and Data Processing (Under development) III. Tabulating Data on Race and Ethnicity Collected Under the New Standards A. Decennial Census B. Other Surveys and Administrative Records IV. Using Data on Race and Ethnicity Collected Under the New Standards A. Redistricting B. Equal Employment Opportunity C. Vital Records and Intercensal Estimates D. Issues for Further Research (Under Development) V. Comparing Data Under the Old and the New Standards (Full Report at Appendix D) A. Introduction B. Methods for Bridging C. Methods of Evaluation D. Examination of the Results with Respect to the Evaluation Criteria Appendix A. Standards for Maintaining, Collecting, and Presenting Federal Data on Race and Ethnicity Appendix B. Procedural Implementation of the New Standards for Data on Race and Ethnicity -- Phase I Report Appendix C. Census 2000 Dress Rehearsal Prototype Redistricting Data Appendix D. Bridge Report: Tabulation Options for Trend Analysis DRAFT PROVISIONAL GUIDANCE ON THE IMPLEMENTATION OF THE 1997 STANDARDS FOR FEDERAL DATA ON RACE AND ETHNICITY Prepared by Tabulation Working Group Interagency Committee for the Review of Standards for Data on Race and Ethnicity The guidance presented in this report has been developed to complement the Federal Government's decision in October 1997 to provide an opportunity for individuals to select one or more races when responding to agency requests for data on race and ethnicity. To foster comparability across data collections carried out by various agencies, it is useful for those agencies to report responses of more than one race using some standardized tabulations or formats. The report briefly explains why the tabulation guidelines are needed, reviews the general guidance issued when the new standards were adopted in October 1997, and provides information on the criteria used in developing the guidelines. This report also addresses a larger set of implementation questions that have emerged during the working group's deliberations. Thus, the report considers: Collecting data on race and ethnicity using the new standards, including aggregate data reporting, Tabulating Census 2000 data and data on race and ethnicity collected in surveys and from administrative records, Using data on race and ethnicity in applications such as legislative redistricting and equal employment opportunity monitoring, and Comparing data under the old and the new standards when conducting analyses. In addition, the appendices to the draft report contain the full text of the reports on the research that has been conducted in two areas: best procedural practices for implementing the new standards, and approaches for bridging between data collected under the old standards and data collected under the new standards. The guidelines are necessarily provisional pending the availability of data from Census 2000 and other data systems as the new standards are implemented. They are likely to be reviewed and refined as Federal agencies and others gain experience with data collected under the new standards. In addition, in some portions of this report, guidelines have not yet been determined. Instead, options are presented and guidelines in these areas will be issued at a later date. OMB expects to issue this provisional guidance by the end of April 1999, following a period of public discussion of this draft by interested users. As noted in the Table of Contents and the report, a few sections are still "under development"and will be available for review at a later time. I. BACKGROUND This part of the report discusses why guidance is needed for tabulating data collected using the 1997 standards, reiterates the general guidance issued in October 1997, provides clarification of several aspects of the new standards, and presents the criteria that were developed for evaluating bridging methods and presenting data. A. The Need for Tabulation Guidelines and Alternative Approaches On October 30, 1997, the Office of Management and Budget (OMB) published "Standards for Maintaining, Collecting, and Presenting Federal Data on Race and Ethnicity" (Federal Register, 62 FR 58781 - 58790), which are reprinted in Appendix A. The new standards reflect a change in data collection policy, making it possible for Federal agencies to collect information that reflects the increasing diversity of our Nation's population stemming from growth in interracial marriages and immigration. Under the new policy, agencies are now required to offer respondents the option of selecting one or more of the following five racial categories included in the updated standards: -- American Indian or Alaska Native. A person having origins in any of the original peoples of North and South America (including Central America), and who maintains tribal affiliation or community attachment. -- Asian. A person having origins in any of the original peoples of the Far East, Southeast Asia, or the Indian subcontinent including, for example, Cambodia, China, India, Japan, Korea, Malaysia, Pakistan, the Philippine Islands, Thailand, and Vietnam. -- Black or African American. A person having origins in any of the black racial groups of Africa. Terms such as "Haitian" or "Negro" can be used in addition to "Black or African American." -- Native Hawaiian or Other Pacific Islander. A person having origins in any of the original peoples of Hawaii, Guam, Samoa, or other Pacific Islands. -- White. A person having origins in any of the original peoples of Europe, the Middle East, or North Africa. These five categories are the minimum set for data on race for Federal statistics, program administrative reporting, and civil rights compliance reporting. With respect to ethnicity, the standards provide for the collection of data on whether or not a person is of "Hispanic or Latino" culture or origin. (The standards do not permit a multiple response that would indicate an ethnic heritage that is both Hispanic or Latino and non-Hispanic or Latino.) This category is defined as follows: -- Hispanic or Latino. A person of Cuban, Mexican, Puerto Rican, South or Central American, or other Spanish culture or origin, regardless of race. The term, "Spanish origin," can be used in addition to "Hispanic or Latino." As a result of the change in policy for collecting data on race, the reporting categories used to present these data must similarly reflect this change. In keeping with the spirit of the new standards, agencies cannot collect multiple responses and then report and publish data using only the five single race categories. Agencies are expected to provide as much detail as possible on the multiple race responses, consistent with agency confidentiality and data quality procedures. As provided by the standards, OMB will consider any agency variances to this policy on a case by case basis. Based on research to date, it is estimated that less than two percent of the Nation's total population is likely to identify with more than one race. This percentage may increase as those who identify with more than one racial heritage become aware of the opportunity to report more than one race. In the early years of the standards' implementation, there will be issues of data quality and confidentiality related to sample size that may restrict the amount of data that can be published for some combinations of multiple race responses. Over time, however, the size of these data cells may increase. It should be noted that such data quality and confidentiality problems for small population groups also existed under the old standards, where sample sizes prevented presentation of data on certain population groups such as American Indians. The possible multiple race combinations under the new standards, some with small data cells, serve to make such data quality concerns more apparent. Some balance will need to be struck between having a tabulation showing the full distribution of all possible combinations of multiple race responses and presenting only the minimum -- that is, a single aggregate of people who reported more than one race. B. General Guidelines for Tabulating Data on Race In response to concerns that had been raised about how Federal agencies would tabulate multiple race responses, OMB in the October 30, 1997, Federal Register notice issued the following general guidance: Consistent with criteria for confidentiality and data quality, the tabulation procedures used by the agencies should result in the production of as much detailed information on race and ethnicity as possible. Guidelines for tabulation ultimately must meet the needs of at least two groups within the Federal Government, with the overriding objective of providing the most accurate and informative body of data. (1) The first group is composed of those Federal Government officials charged with carrying out constitutional and legislative mandates, such as redistricting legislatures, enforcing civil rights laws, and monitoring progress in anti- discrimination programs. (The legislative redistricting file produced by the Bureau of the Census, also known as the Public Law 94-171 file, is an example of a file meeting such legislative needs.) (2) The second group consists of the staff of Federal statistical agencies producing and analyzing data that are used to monitor economic and social conditions and trends. Many of the needs of the first group can be met with an initial tabulation that provides, consistent with standards for data quality and confidentiality, the full detail of racial reporting; that is, the number of people reporting in each single race category and the number reporting in each of the possible combinations of races, which would add to the total population. Depending on the judgment of users, the combinations of multiple responses could be collapsed. (1) One method would be to provide separate totals for those reporting in the most common multiple race combinations and to collapse the data for other less frequently reported combinations. The specifics of the collapsed distributions would be dependent on the results of particular data collections. (2) A second method would be to report the total selecting each particular race, whether alone or in combination with other races. These totals would represent upper bounds on the size of the populations who identified with each of the racial categories. In some cases, this latter method could be used for comparing data collected under the old standards with data collected under the new standards. It is important that Federal agencies with the same or closely related responsibilities adopt the same tabulation method. Regardless of the method chosen for collapsing multiple race responses, Federal agencies must make available the total number reporting more than one race, if confidentiality and data quality requirements can be met, in order to ensure that any changes in response patterns resulting from the new standards can be monitored over time. Different tabulation procedures might be required to meet various needs of Federal agencies for data on race. Nevertheless, Federal agencies often need to compare racial and ethnic data. Hence, some standardization of tabulation categories for reporting data on race is desirable to facilitate such comparisons. The October 30, 1997, Federal Register Notice identified four areas where further research was needed in how to tabulate data under the new standards: (1) How should the data be used to evaluate conformance with program objectives in the area of equal employment opportunity and other anti-discrimination programs? (2) How should the decennial census data for many small population groups with multiple racial heritages be used to develop sample designs and survey controls for major demographic surveys? (3) How do we introduce the use of the new standards in the vital statistics program which obtains the number of births or deaths from administrative records, but uses intercensal population estimates in determining the rates of births and deaths? (4) And more generally, how can we conduct meaningful comparisons of data collected under the previous standards with those that will be collected under the new standards? In order to address these and other issues and to ensure that tabulation methodologies would be carefully developed and coordinated among the Federal agencies, OMB assembled a group of statistical and policy analysts drawn from the Federal agencies that generate or use these data. Over the past year, this group has considered tabulation issues and developed the draft provisional guidance that is presented in this report for use by Federal agencies. The work of this group has included: (1) a review of Federal data needs and uses to ensure that the tabulation guidelines produce data that meet statutory and program requirements; (2) cognitive testing of the wording of questions; (3) development of a form for reporting aggregate data; (4) evaluation of different methods of bridging from the new to the old standards; and (5) development of guidelines for presenting data on multiple race responses that meet accepted data quality and confidentiality standards. The tabulation guidance in this report is necessarily provisional pending the availability of Census 2000 data and other data systems as the new collection standards are implemented. These guidelines will be reviewed and modified as the agencies and other data users gain experience with data collected using the new standards. C. Points of Clarification Regarding the 1997 Standards A few questions about the new standards have emerged over the past year. This section elaborates on several points in the standards that have been a source of confusion for some users. Under the new standards, "Hispanic or Latino" is clearly designated as an ethnicity and not as a race. Whether or not an individual is Hispanic, every effort should be made to ascertain the race or races with which an individual identifies. The two-question format, with the ethnicity question preceding the race question, should be used when information is collected through self-identification. Although the standards permit the use of a combined question when collecting data by observer identification, the use of the two- question format is strongly encouraged even where observer identification is used. Regardless of the question format, observers are expected to attempt to identify the individual's race(s). The standards require that at a minimum the total number of persons identifying with more than one race be reported. It is stressed that this is a minimum; agencies are strongly encouraged to report detailed information on specific racial combinations subject to constraints of data reliability and confidentiality standards. The following wording concerning the reporting of data when the combined question is used is clarified in the paragraph below: "In cases where data on multiple responses are collapsed, the total number of respondents reporting 'Hispanic or Latino and one or more races' and the total number of respondents reporting 'more than one race' (regardless of ethnicity) shall be provided." (Section 2b of the standards) Race by ethnicity always should be reported when confidentiality permits. If not, the first level of collapsing should be ethnicity by the single races and ethnicity for those reporting more than one race. Thus, an Hispanic or Latino respondent reporting one race should be reported both as Hispanic or Latino and as a member of that single race. If the respondent selects more than one race, he or she should be reported in the particular racial combination as well as in the Hispanic or Latino category. Reporting a composite -- that is, the number of people who responded "Hispanic or Latino" and more than one race -- is a minimum that only should be used if more detailed reporting would violate data reliability and confidentiality standards. The rules discussed in Section 4 of the new standards concerning the presentation of data on race and ethnicity under special circumstances are not to be invoked unilaterally by an agency. If the agency believes the standard categories are inappropriate, the agency must request a specific variance from OMB. The new standards do not include an "other race" category. For the sole purpose of the Census 2000 data collection, OMB has granted an exception to the Census Bureau to use a category called "Some Other Race." D. Criteria Used in Developing the Tabulation Guidelines The interagency expert group on tabulations generated criteria that could be used both to evaluate the technical merits of different bridging procedures (See Part V and Appendix D) and to display data under the new standards. The relative importance of each criterion will depend on the purpose for which the data are intended to be used. For example, in the case of bridging to the past, the most important criterion is "measuring change over time," while "congruence with respect to respondent's choice" will be more critical for presenting data under the new standards. The criteria set forth below are designed only to assess the technical adequacy of the various statistical procedures. The first two criteria listed below are central to consideration of bridging methods. The next six criteria apply both to bridging and long-term tabulation decisions. The last criterion is of primary importance for future tabulations of data collected under the new standards. Bridging: Measure change over time. This is the most important criterion for bridging, because the major purpose of any historical bridge will be to measure true change over time as distinct from methodologically induced change. The ideal bridging method, under this criterion, would be one that matches how the respondent would have responded under the old standards had that been possible. In this ideal situation, differences between the new distribution and the old distribution would reflect true change in the distribution itself. Minimize disruptions to the single race distribution. This criterion applies only to methods for bridging. Its purpose is to consider how different the resulting bridge distribution is from the single-race distribution for detailed race under the new standards. To the extent that a bridging method can meet the other criteria and still not differ substantially from the single-race proportion in the ongoing distribution, it will facilitate looking both forward and backward in time. Bridging and future tabulations: Range of applicability. Because the purpose of the guidelines is to foster consistency across agencies in tabulating racial and ethnic data, tabulation procedures that can be used in a wide range of programs and varied contexts are usually preferable to those that have more limited applicability. Meet confidentiality and reliability standards. It is essential that the tabulations maintain the confidentiality standards of the statistical organization while producing reliable estimates. Statistically defensible. Because tabulations may be published by statistical agencies and/or provided in public use data, the recommended tabulation procedures should follow recognized statistical practices. Ease of use. Because the tabulation procedures are likely to be used in a wide variety of situations by many different people, it is important that they can be implemented with a minimum of operational difficulty. Thus, the tabulation procedures must be capable of being easily replicated by others. Skill required. Similarly, it is important that the tabulation procedures can be implemented by individuals with relatively little statistical knowledge. Understandability and communicability. Again, because the tabulation procedures will likely be used, as well as presented, in a wide variety of situations by many different people, it is important that they be easily explainable to the public. Future tabulations: Congruence with respondent's choice. Because of changes in the categories and the respondent instructions accompanying the question on race (allowing more than one category to be selected), the underlying logic of the tabulation procedures must reflect to the greatest extent possible the full detail of race reporting. II. COLLECTING DATA ON RACE AND ETHNICITY USING THE NEW STANDARDS This part of the report currently provides a summary of the Phase I Report on Procedural Implementation of the New Standards for Data on Race and Ethnicity, which is contained in Appendix B. A. Developing Procedures for Data Collection An interagency committee has been continuing past research efforts to develop procedures to collect and aggregate data on race and ethnicity. This research is designed to produce guidelines that address three areas: (1) wording and format of questions that ask for self-reported data on race and Hispanic or Latino origin; (2) wording and format of instructions and forms that collect aggregate data on race and Hispanic or Latino origin; and (3) instructions and training procedures for field interviewers and administrative personnel who will be using these questions and forms. Guidelines will be continually reviewed and modified as implementation of the new standards occurs, feedback from agencies is received, and new research findings become available. Members of the procedures committee represent the Departments of Health and Human Services, Commerce, Education, Labor, and Veterans Affairs, and the General Accounting Office. This summary briefly describes the Phase I research, offers initial guidelines for agencies developing new data collection procedures, and includes a schedule for the completion of work by this committee. The full report of the committee includes the research design and methods, results of Phase I, examples of test questions and forms, and a broader discussion of guidelines and problems identified. Developing and Testing Self-Reported Race and Ethnicity Questions A goal of this research is to provide guidance on the wording and format of questions for self- reporting race and Hispanic or Latino origin depending on the mode of administration. Questions administered by telephone or in a face-to-face personal interview have been tested in cognitive laboratory interviews; self-administered questions are not included in this testing because the Census Bureau previously conducted such research in preparation for Census 2000. To date, 32 cognitive interviews have been completed; another 18 are planned for Phase I and at least 25 more for Phase II. Among the 32 subjects interviewed, 13 reported their race as Black, 3 reported Asian, 2 reported Native Hawaiian, 4 reported more than one race, and 10 reported White, of which 2 also reported Hispanic or Latino origin. No American Indians or Alaska Natives have been interviewed yet in Phase I. Subjects were first asked routine demographic questions as well as the test Hispanic or Latino origin and race questions for themselves and members of their household. Then, debriefings were conducted to learn more about the subjects' understanding of the questions and terms used. Generally, subjects were able to answer without difficulty the race and Hispanic or Latino origin questions. In the cognitive interviews, understanding of the intent of a race or Hispanic origin question was shared but individual differences in the interpretation and meaning of terms used was found, as was confusion regarding the separation of Hispanic or Latino origin from race. As expected, subjects who were interviewed face-to-face seemed to use and rely on the flashcards to select a response. Subjects interviewed by telephone had a bit more difficulty answering the race questions since they had to listen to a relatively long list of response options. Also, there was some evidence that the instruction to "...select one or more..." was misunderstood on the telephone to mean that the subject had to select more than one race. Section 1 in Appendix B describes in detail the results of testing the questions on race and ethnicity. Based on these interviews, the following initial guidelines for the design of questions on race and ethnicity are offered: Communicate clearly an instruction that allows, but does not require, multiple responses to the race question. Consider using an instruction to answer both the Hispanic or Latino origin question and the race question. For data collection efforts requiring detailed Hispanic or Latino origin or detailed race information, consider options to collect further information through write-in entries or follow-up questions asked by the interviewer. Take mode of administration carefully into account when designing questions and instructions. Provide definitions to the minimum race categories when possible. Adhere to the specific terminology as stated in the October 30, 1997, standards. Developing and Testing Aggregate Reporting Forms Implementing the revised standards will cause fundamental changes to the ways in which data on race and Hispanic or Latino origin have previously been aggregated and reported. Therefore, a second goal of this research is to provide guidance on the design of reporting forms that will be used by administrative personnel to aggregate data on race and Hispanic or Latino origin for a given population (e.g., reporting race and ethnicity for a school population). Twenty cognitive interviews are planned for this phase of the research. Three different forms are being tested with subjects who are familiar with reporting aggregate data for a given population, but not necessarily familiar with the revised standards. Fourteen interviews have been completed thus far, 7 in cognitive laboratories and 7 on-site. Of the 14 respondents interviewed, 5 worked for the Federal Government, 6 worked in private industry, 2 worked in local correctional facilities, and 1 worked in a school. For the laboratory testing, subjects were given 'dummy' records of applications that contained multiple race responses as well as combined Hispanic or Latino origin and race questions. For the on-site interviews, subjects referred to agency data. None of the forms tested were completed accurately without interviewer intervention. Regardless of the form tested or whether the testing was conducted in a laboratory or on-site, the most common problem was the requirement to count and report race for individuals who are of Hispanic or Latino origin. As an illustration, one subject stated "It's (the form) basically asking how Hispanics were separated into groups of races. I think the part that confuses me is that our Hispanics do not view themselves as another race. And so that is kind of what threw me off it's asking for Hispanics who had marked 'White,' but they don't. They would have checked Hispanic." Discussions with subjects revealed that all but one worked for agencies that have used the single question -- combined race and ethnicity format -- to collect data. Several methodological problems also emerged and will be corrected prior to further testing. They are discussed in detail in Appendix B, Section 2. Even though there were many problems found in developing and testing aggregate forms, some initial guidelines can be put forth at this time. If possible, allow for the reporting of every combination of multiple race responses. Provide definitions that assist in understanding the concepts of single race reports and multiple race reports as well as the distinction between ethnicity and race. Explain how the missing data should be reported. Professionally design the form and include clear instructions. Development of Field Instructions and Training Procedures Work to develop interviewer instructions and interviewer training procedures will begin in the Spring of 1999. Plans include developing and testing different training modules and interviewer instructions, depending on the mode of administration and the type of data collection. This work will, in all likelihood, not address new issues or problems. However, since the new standards do encompass several distinct changes, it seems timely to address in a more systematic way some longstanding issues in the fielding of the questions, and ways that interviewers can be trained to improve data quality. Specific procedures on how to ask the questions and, in some cases, how to instruct the respondent to use the flashcard, will be developed along with suggested interviewer probes, definitions, and statements that can be used to answer respondent questions. Schedule Phase I was ongoing through 1998 and will be completed at the beginning of April 1999. Phase II will begin in April 1999 and will be completed by the end of July 1999. A final report encompassing both phases should be available by the end of September 1999. B. Best Practices in Survey Design and Data Processing (Under development) III. TABULATING DATA ON RACE AND ETHNICITY COLLECTED USING THE NEW STANDARDS This part of the report describes options for tabulating data on race and ethnicity collected under the new standards to meet various Federal needs for these data. A. Decennial Census The Census 2000 questionnaire will provide individuals the opportunity to self-report their racial identity by selecting one or more races. For purposes of Census 2000 only, in an effort to encourage response to this question, OMB has approved the use of a sixth category -- "Some Other Race" -- in addition to the minimum five categories. This discussion covers preliminary tabulations plans for the six categories of race and the two categories of ethnicity ("Hispanic or Latino" and "Not Hispanic or Latino") and for possible combinations of these racial and ethnic categories. It does not address tabulation plans for detailed groups of American Indian and Alaska Native, Asian, or Native Hawaiian and Other Pacific Islander populations for which information will be collected in Census 2000. For data from the Census 2000 Dress Rehearsal sites, table shells will be available on the Internet through the Census Bureau's American FactFinder. The data user will be able to use the inquiry system in the American FactFinder to obtain table shells filled with data for user-selected geographic areas and for population universes defined by race and ethnicity down to the census tract level. The amount of data on population characteristics available in table shells will be roughly the same as in printed reports in 1990 for counties and for places of 10,000 or more population. Protection of Confidentiality in Data from Census 2000 To maintain confidentiality as required by law (Title 13, United States Code), the Census Bureau uses a confidentiality edit to ensure that published data do not disclose information about specific individuals, households, and housing units. The result is that a small amount of uncertainty is introduced into some of the census data to prevent identification of specific individuals, households, or households. As with data from the 1990 census, a confidentiality edit will be implemented for data from Census 2000 by selecting a sample of census households from internal census files and interchanging their data with data from other households that have identical numbers of household members, but that are in different locations within the same state. The net result of this procedure is that the data user's ability to obtain census data is increased, particularly for small geographic areas and small population groups. Approach for Tabulations by Race and Ethnicity for Census 2000 The proposed approach reflects OMB's preliminary guidelines (See Part I, Section B) on tabulations by race and ethnicity. The discussion of the approach includes data on both population totals for racial and ethnic categories and on population characteristics (e.g., age and sex) for racial and ethnic categories. Before describing preliminary plans for tabulations by race and ethnicity, it is helpful to describe both the maximum number of racial and/or ethnic categories for which data could be provided and some of the other racial and/or ethnic categories for which data could be provided. There are 63 potential single and multiple race categories, including 6 categories for those who marked exactly one race and 57 categories for those who marked two or more races. These 57 categories of two or more races include the 15 possible combinations of two races (for example, Asian and White), the 20 possible combinations of three races, the 15 possible combinations of four races, the 6 possible combinations of five races, and the 1 possible combination of all six races. There are two ethnic categories (Hispanic or Latino, and Not Hispanic or Latino). Thus there are 126 categories (63 x 2) in which the population could be classified by both race and ethnicity. The 63 mutually exclusive and exhaustive categories of race may be collapsed down to 7 mutually exclusive and exhaustive categories by combining the 57 categories of two or more races. These 7 categories are: White alone, Black or African American alone, American Indian and Alaska Native alone, Asian alone, Native Hawaiian and Other Pacific Islander alone, Some other race alone, and Two or more races. Alternative groupings for tabulations by race reflect OMB's preliminary guidelines to show "the total selecting each particular race, whether alone or in combination." In combination literally means "in combination with one or more other races." In this "all-inclusive" approach, tabulations would be shown for each of six categories, which will overlap and will add to more than the total population to the extent that individuals report more than one race. These six categories are: White alone or in combination, Black or African American alone or in combination, American Indian and Alaska Native alone or in combination, Asian alone or in combination, Native Hawaiian and Other Pacific Islander alone or in combination, and Some Other Race alone or in combination. As in the case of the 63 racial categories, both tabulations by race of the 7 mutually exclusive and exhaustive categories and tabulations by race alone or in combination could be classified by ethnicity (Hispanic or Latino, and Not Hispanic or Latino). Because of concerns about the usefulness and reliability of data on population characteristics for small populations, about issues with respect to confidentiality, and about providing data products so voluminous that most data cell values would be zero, the Census Bureau is planning (as it has in previous censuses) to present more detail by race and ethnicity for population totals than for population characteristics. For example, Census 2000 data products might show a population total for a specific racial or ethnic group (e.g., 50) in a small geographic area, but not show data on characteristics such as household relationship, education, income, and tenure for this racial or ethnic group. Preliminary plans for tabulations by race and ethnicity for population totals and for population characteristics are discussed in the following two sections. The amount of detail shown in tabulations by race and ethnicity in data products from Census 2000 will vary with the purpose and size of each product. Planned tabulations for population totals by race and ethnicity from four data products are discussed: the Public Law 94-171 file (which is a 100-percent data product), the 100-percent demographic profile, the 100-percent summary file, and 100-percent table shells. Planned tabulations for population characteristics by race and ethnicity are discussed together for the 100-percent and sample summary files and the 100-percent and sample table shells. (The 100-percent data products are based on data collected on all questionnaires. In comparison, sample data products are based on data collected only on long-form questionnaires.) As noted above, this discussion does not discuss tabulation plans for detailed groups of American Indian and Alaska Native, Asian, or Native Hawaiian and Other Pacific Islander populations. It may be noted, however, that tabulations for these detailed categories will not be included on the PL 94-171 file, but will be included in the other Census 2000 data products listed in the preceding paragraph. Population Totals: Preliminary Plans for Data by Race and Ethnicity from Census 2000 Public Law (PL) 94-171 Redistricting File. PL 94-171 requires that the Census Bureau work closely with the "officers or public bodies having initial responsibility for the legislative apportionment or districting of each state" to determine the specific tabulations needed from the decennial census. Tabulations planned for this file are based on meetings and communications with the Redistricting Task Force of the National Conference of State Legislatures and state- appointed liaisons of the governors and legislatures. During this process, senior officials from OMB, the Voting Rights Section of the Department of Justice, and the Census Bureau consulted with the Task Force and state legislative officials. The PL 94-171 file will include population totals down to the block level. The racial and ethnic categories that the Census Bureau plans to include in the matrices (one-dimensional statistical tables) on the PL 94-171 file are combined into one table outline and presented in Table 1. (The PL 94-171 file also includes data on the population 18 years and over for each of these racial or ethnic categories.) From tabulations for the racial and ethnic categories shown in Table 1, it is possible also to obtain tabulations by subtraction for the Hispanic or Latino population by race (total minus Not Hispanic or Latino) and for the population in a racial category in combination only (e.g., Asian alone or in combination minus Asian alone). The PL 94-171 file will be available on the Internet and on CD-ROM. A paper listing of data from the PL 94-171 file, to be provided to officers or public bodies having initial responsibility for the legislative apportionment or districting of each state, will include about one-half of the tabulations shown above. The paper listing will not include tabulations for Race alone or in combination, or for Race not alone or in combination. 100-Percent Demographic Profile. This profile is designed to provide for geographic areas down to the census tract level an overview of 100-percent census data on a one-page table that includes data on all population and housing topics for which data are collected on a 100-percent basis: sex, age, race, Hispanic or Latino origin, household relationship, and housing occupancy and tenure. Given the limited amount of space to show data on each topic, population totals by race and ethnicity will be limited. Population totals will be shown for each of the major races alone, for two or more races, and for each major race alone or in combination (as described earlier), but will not be shown for the 57 specific categories of two or more races. 100-Percent Summary File. This file, which is the most detailed 100-percent data product planned, will include some population totals on race and ethnicity down to the block level and additional population totals on race and ethnicity down only to the census tract level. The racial and ethnic categories that the Census Bureau plans to include down to the block level in the matrices on the 100-percent summary file are combined into one table outline and presented in Table 2. The additional categories that are included down only to the census tract level in the 100-percent summary file are the 57 individual categories of two or more races crossed by the two ethnic categories (Hispanic or Latino, and Not Hispanic or Latino). These racial and ethnic categories are combined into one table outline and presented in Table 3. 100-Percent Table Shells. Table shells represent a new data product for Census 2000. A table shell is a one-page table outline with a fixed stub and boxhead (for example, showing population by age and sex). Table shells are supported by summary files in the same way that data in various printed reports in 1990 were supported by summary tape files (STFs). Population Characteristics: Preliminary Plans for Data by Race and Ethnicity from Census 2000 100-Percent and Sample Summary Files and Table Shells. Plans for tabulations of population characteristics by race and ethnicity from the 100-percent and sample summary tables and from the 100-percent and sample table shells are discussed together here because the Census Bureau plans to show population characteristics for the same list of racial and ethnic groups in all of these data products. In the case of summary files, population characteristics in the matrices on the files would be iterated (repeated) for each racial or ethnic category. This corresponds to the "B" matrices in summary tape files (STFs) 2 and 4 in 1990 census data products in which the "B" matrices were iterated for each of a list of racial and ethnic categories. In the case of table shells, population characteristics would be available for each of the racial and ethnic categories for which population characteristics are available on the summary files. The user of table shells will be able to select from a list of topics (e.g., age and sex) and then select the geographic area (e.g., state, county, place) and population universe (i.e., the racial or ethnic category) to obtain the data desired. The scope of data available using table shells is limited to data on summary files (in the same way that data in printed reports in 1990 were limited to data on summary files). Table shells will present subsets of more detailed data from the summary files in user-friendly formats (like tables in printed reports), and will show totals, subtotals, and derived measures that are not included on the summary files. The list of 27 racial and ethnic categories for which the Census Bureau plans to show population characteristics in aggregated data products (as opposed to what is available from microdata files, as discussed below) in Census 2000 is presented in Table 4. From tabulations for the list of racial and ethnic categories shown in Table 4, it is possible also to obtain tabulations by subtraction for the Hispanic or Latino population by race (total minus Not Hispanic or Latino), for the population in a racial category in combination only (e.g., Asian alone or in combination minus Asian alone), and for the complement to an all-inclusive group (e.g., total minus Asian alone or in combination). Micro data files. Tabulations on population characteristics by race and ethnicity described above are limited to what is planned for aggregated data products. In addition, the Census Bureau will produce 5-percent public-use microdata files (PUMS), as was done in 1990, which will permit users to obtain tabulations for any racial or ethnic group for which data were collected in the census. (This would include, for example, any of the 57 categories of more than one race.) In 1990, in addition to the confidentiality edit described earlier, the PUMS files were stripped of names and address, the order of records was rearranged on the file, and a minimum population threshold of 100,000 was used. In addition, and subject to the Census Bureau's strict confidentiality standards, the Census Bureau plans to make available on the Internet through the American FactFinder, the microdata files that underlie the 100-percent and sample summary files for Census 2000 so that data users can create tabulations to their own specifications. These microdata files are the 100-percent edited detail file (HEDF) and the sample edited detail file (SEDF). The full microdata files will be made available to data users only in the form of PUMS files, as described above. If a data user wants data on population characteristics for a racial or ethnic group for which characteristics are not available in the summary files or table shells and for a geographic area for which a PUMS file is not available, it will be possible -- again, subject to strict confidentiality standards set by the Census Bureau -- to obtain these data in the American FactFinder with a custom tabulation from the HEDF or the SEDF. For example, the data user will be able to obtain population characteristics for one of the 57 categories of more than one race (e.g., White and Asian). Because of the strict confidentiality standards, the quantity of data that can be obtained will depend on several factors, including the geographic area, the size of the population universe (e.g., the number of individuals who are Asian and White), and the extent of the characteristics detail (number of data cells in a table showing population characteristics). Table 1. Preliminary Racial and Ethnic Detail for Population Totals in the PL 94-171 File Planned for Census 2000 (See text regarding protection of confidentiality of data from Census 2000. "In combination" means "in combination with one or more other races") Not Hispanic Race or ethnicity Total or Latino Total One race White Black or African American American Indian and Alaska Native Asian Native Hawaiian and Other Pacific Islander Some other race Two or more races Hispanic or Latino (X) White alone or in combination Not White alone or in combination Black or African American alone or in combination Not Black or African American alone or in combination American Indian and Alaska Native alone or in combination Not American Indian and Alaska Native alone or in combination Asian alone or in combination Not Asian alone or in combination Native Hawaiian and Other Pacific Islander alone or in combination Not Native Hawaiian and Other Pacific Islander alone or in combination Some other race alone or in combination Not Some other race alone or in combination ____________________________________________________________________________ (X) Not applicable. Table 2. Preliminary Racial and Ethnic Detail for Population Totals Down to the Block Level in the 100-Percent Summary File Planned for Census 2000 (See text regarding protection of confidentiality of data from Census 2000. "In combination" means "in combination with one or more other races") Not Hispanic Hispanic Race or ethnicity Total or Latino or Latino Total One race White Black or African American American Indian and Alaska Native Asian Native Hawaiian and Other Pacific Islander Some other race Two or more races Hispanic or Latino (X) White alone or in combination White alone White in combination only Not White alone or in combination Black or African American alone or in combination Black or African American alone Black or African American in combination only Not Black or African American alone or in combination American Indian and Alaska Native alone or in combination American Indian and Alaska Native alone American Indian and Alaska Native in combination only Not American Indian and Alaska Native alone or in combination Asian alone or in combination Asian alone Asian alone in combination only Not Asian alone or in combination Native Hawaiian and Other Pacific Islander alone or in combination Native Hawaiian and Other Pacific Islander alone Native Hawaiian and Other Pacific Islander in combination only Not Native Hawaiian and Other Pacific Islander alone or in combination Some other race alone or in combination Some other race alone Some other race alone in combination only Not Some other race alone or in combination ______________________________________________________________________________ (X) Not applicable. Table 3. Preliminary Racial and Ethnic Detail for Population Totals Down to the Census Tract Level Only in the 100-Percent Summary File Planned for Census 2000 (See text regarding protection of confidentiality of data from Census 2000) Not Hispanic Hispanic Race or ethnicity Total or Latino or Latino Two or more races Two races (15 categories) White, and Black or African American White, and American Indian and Alaska Native White, and Asian White, and Native Hawaiian and Other Pacific Islander White, and Some other race Black or African American, and American Indian and Alaska Native Black or African American, and Asian Black or African American, and Native Hawaiian and Other Pacific Islander Black or African American, and Some other race American Indian and Alaska Native, and Asian American Indian and Alaska Native, and Native Hawaiian and Other Pacific Islander American Indian and Alaska Native, and Some other race Asian, and Native Hawaiian and Other Pacific Islander Asian, and Some other race Native Hawaiian and Other Pacific Islander, and Some other race Three races (20 categories) White, Black or African American, and American Indian and Alaska Native (continues with 19 other categories of three races) Four races (15 categories) White, Black or African American, American Indian and Alaska Native, and Asian (continues with 14 other categories of four races) Five races (6 categories) White, Black or African American, American Indian and Alaska Native, Asian, and Native Hawaiian and Other Pacific Islander (continues with 5 other categories of five races) Six races (1 category) White, Black or African American, American Indian and Alaska Native, Asian, Native Hawaiian and Other Pacific Islander, and Some other race Table 4. Preliminary Racial and Ethnic Detail for Population Characteristics in Summary Files and Table Shells Planned for Census 2000 (See text regarding protection of confidentiality of data from Census 2000. "In combination" means "in combination with one or more other races") Race or ethnicity White alone Black or African American alone American Indian and Alaska Native alone Asian alone Native Hawaiian and Other Pacific Islander alone Some other race alone Two or more races White alone or in combination Black or African American alone or in combination American Indian and Alaska Native alone or in combination Asian alone or in combination Native Hawaiian and Other Pacific Islander alone or in combination Some other race alone or in combination Hispanic or Latino White alone, not Hispanic or Latino Black or African American alone, not Hispanic or Latino American Indian and Alaska Native alone, not Hispanic or Latino Asian alone, not Hispanic or Latino Native Hawaiian and Other Pacific Islander alone, not Hispanic or Latino Some other race alone, not Hispanic or Latino Two or more races, not Hispanic or Latino White alone or in combination, not Hispanic or Latino Black or African American alone or in combination, not Hispanic or Latino American Indian and Alaska Native alone or in combination, not Hispanic or Latino Asian alone or in combination, not Hispanic or Latino Native Hawaiian and Other Pacific Islander alone or in combination, not Hispanic or Latino Some other race alone or in combination, not Hispanic or Latino B. Other Surveys and Administrative Records This section applies to the presentation of data collected under the new standards through surveys and administrative records. Although these proposed tabulation guidelines are particularly applicable in the near term, they also provide a framework that can be expanded in the future as it becomes possible to present more data on multiple race responses. In general, data should be presented in as much detail as possible (thereby satisfying the criteria congruence with respondent's choice), subject to satisfying agency criteria for statistical reliability and confidentiality (satisfying the criteria meet confidentiality and reliability standards.) Thus, data on multiple race responses should be presented in as much detail as possible given sample sizes and sample designs. In addition, to the extent possible, Federal agencies should report data using standardized categories to facilitate comparisons across subject-matter areas and data systems, thus satisfying the criteria range of applicability, statistical defensibility, and understandability and communicability. The decision to revise the policy for the collection of data on race reflects the increasing complexity of our Nation's demographics. As a result, the ways that data on race are tabulated and analyzed also will become more complex. The proposed guidelines in this section reflect this complexity. The tabulation strategies illustrated here have simple structures, hence they satisfy the criteria ease of use and skill required. Examples of tabulation strategies are provided and illustrated using data collected as part of the National Health Interview Survey (NHIS), conducted by the National Center for Health Statistics, Centers for Disease Control and Prevention. Since 1976, the NHIS has allowed respondents to report more than one race, but has also asked respondents to indicate the single race with which they most closely identified. The data on race from this survey have been retabulated for illustrative purposes to be as comparable as possible to the categories in the 1997 standards. (Unless otherwise noted, the tables in this section are based on data combined from three years of NHIS data. The resulting larger sample size improves the reliability of the estimates and enables more categories to be shown. However, even when combining three years of data on race, counts for some categories cannot be shown due to small sample sizes.) As noted above, agencies are to provide as much detail as possible while adhering to their own standards for data quality and confidentiality. Under a typical data quality standard, a table cell cannot be published if its relative standard error (or other measure of dispersion) is larger than some value specified by the agency. In such a situation, the data cell is not published separately, but the cell value is included in subtotals. Under a confidentiality standard, a cell value must be suppressed (withheld from publication) if knowledge of the cell value might enable someone to gain knowledge about one of the respondents contributing data to the cell. If a cell is suppressed to preserve confidentiality, other cells must also be suppressed so the cell value cannot be derived by subtraction. This is called "complementary suppression." (The reader may wish to refer to Statistical Policy Working Paper 22: Report on Statistical Disclosure Limitation Methodology for more information concerning the definition of sensitive cells and the selection of cells for complementary suppression.) Agencies do not use a common set of standards for evaluating confidentiality and quality issues. To illustrate the application of agency standards that affect the cells that can be shown in tables only a data quality standard is used here. A table cell has been arbitrarily classified as failing the data quality standard if the sample size is smaller than 0.2 percent of the population for all but Table C. To illustrate a table that might result from a smaller sample survey, in Table C a table cell is classified as failing the data quality standard if the sample size is smaller than 2.0 percent of the population. These admittedly arbitrary criteria are used to illustrate what might be published from a large sample survey, and to illustrate the distributions that may result from the implementation of the new standards. Note that since the only data being displayed in this report are population counts, it is possible to show more data cells than would be the case if the table presented attributes (income, education, health outcomes, etc.) of these groups. Individual survey systems will make decisions as to what data can be shown based on the characteristics of each system and the confidentiality and reliability guidelines established for that data system. Two types of responses cannot be tabulated into the categories identified in the standard. The first is when no information on race was provided. In this report the heading "Race Not Reported" is used for this type of response. This response type can be further subdivided according to the reason that no information was obtained -- refusal, don't know, and not ascertained. The second is when a response was received that does not match any of the standard racial categories. Such responses are tabulated using the heading "Other Race." A third heading, "Not Tabulated Above" is used to include either single or more than one race categories that are specified in the standard, but are not large enough to be published separately. For illustrative purposes, these three headings are used in the tables in this section. Not all statistical publications will use this model. Strategies for tabulating these kinds of responses will follow agency policy and the analytic objectives of the report. A remaining issue to be addressed by Federal agencies is that the rules used in editing and imputing respondents' data on race and ethnicity will affect the racial distributions derived from Federal surveys and administrative records. As noted elsewhere in this report, rules for editing and imputation of data on race and ethnicity should be an area of further research and collaboration for Federal agencies, to ensure that the data reported are as comparable as possible. Since the objective of this section is to illustrate different tabulation strategies, categories with frequencies too small to be shown will not be treated the same way in all of the tables. In some tables, the category is not shown at all and the cell value is included under "Not Tabulated Above"; in other tables, the category is retained in order to clarify the structure of the table but data are replaced by a "Q" to illustrate that they have been withheld from publication for data quality considerations. When the data are replaced by "Q," a footnote is used to describe the reason the data are not shown. In all tables in this section, the "More Than One Race" heading includes respondents who selected more than one of the five basic racial categories in the new standard. Many data collection systems obtain information on a more detailed set of responses. When surveys collect more detailed information on race than the minimum standard, some persons may indicate that they identify with more than one of the more detailed groups. For example, within the Asian group, respondents might indicate that they are of Chinese and Japanese heritage. These respondents would not be included in the "More Than One Race" heading but would be included in the total for Asians. If sample size permits, an additional Asian sub-category could be used to indicate the number of individuals who marked more than one of the detailed Asian categories. Table A illustrates the fundamental goal of the new standard and provides a detailed set of categories for tabulating data on race. Table A displays the five single categories, and also includes more detail on the Asian subgroups; it also displays a number of multiple-response categories. Based on NHIS data, the most frequently marked race combinations are American Indian and White, Asian and White, and Black and White. In other situations, the categories used to present data would be a function of the overall sample size and the regional characteristics of the population where the sample is selected. Whatever detailed categories are presented, they should support recreating the minimum basic set of racial categories. Table B shows a category for each of the five single racial groups in the new standards as well as a "More Than One Race" heading. It is an example of a table that can be used when sample sizes do not permit the presentation of greater detail. In this table, data are not shown separately for Native Hawaiians and Other Pacific Islanders, one of the single race categories in the collection standard, since they comprise less than 0.2 percent of the U.S. population. However, since this is the only category that cannot be shown both the number and the percent for the Native Hawaiian and Other Pacific Islander group are readily obtained by subtraction. This is an example of a data cell that is being suppressed for data quality concerns. If it were suppressed for confidentiality concerns, another cell would also have to be suppressed to prevent the cell value from being obtained by subtraction. As was the case under the 1977 standard, it will often not be possible to tabulate data using all of the categories used to collect the information. Even with three years of data from the NHIS, Tables A and B could not present data for Native Hawaiians and Other Pacific Islanders because they total less than 0.2 percent of the population. If data for one or more of the five minimum racial categories fail the requirements for data quality or confidentiality, standard agency products should include them in an aggregation such as "Not Tabulated Above," rather than combining them with categories that are publishable alone. For example, if the data for Native Hawaiians and Other Pacific Islanders cannot be published separately, these data should not be combined with data in the Asian category (except when such combinations are needed for comparability with data collected under the old standard). Instead, the data on Native Hawaiians and Other Pacific Islanders should be included in the total and either omitted from the detailed tabulations completely, replaced with a symbol and footnoted as in Tables A and B, or included in a separate heading for all groups not specifically tabulated (i.e., under the Not Tabulated Above heading.) This last approach is illustrated in Table C. For this table, only one year's NHIS data are used, and data are reported only for categories that comprise at least 2 percent of the population. This is intended to provide an illustration of what might happen when total sample sizes are smaller and data from fewer categories can be reliably presented. Because the Asian, Native Hawaiian and Other Pacific Islander, More Than One Race, and Race Not Reported respondents each comprise less than 2 percent of the population, these categories were not listed separately in Table C but were included both in the Total and the Not Tabulated Above rows. In order to display as much data as possible as well as to reflect the complexity of reporting on race, some additional categories may be tabulated and reported along with the basic tabulations. These categories may not be mutually exclusive but would combine categories to create useful analytic distinctions. For example, a heading could be created for persons reporting that they are Asian whether as a single race or in combination with any other race(s). Parallel categories could be created for any of the five single racial categories. The resulting counts are called "all inclusive." They form distributions for each individual racial group; that is, the sum of the percent of respondents who mark a particular group alone, the percent who mark that group and at least one other group, and the percent who did not mark that group is 100 percent. The all inclusive distributions may provide information on population groups that might not have sufficient size in the sample to be included in basic tabulations. Table D provides a suggested tabulation strategy. Three years of NHIS data are used for this Table, and the 0.2 percent cutoff is used to determine whether data can be shown. The all inclusive NHOPI category does not meet the criteria for inclusion (0.2 percent of the population) and is not shown. Note that when the tabulation involves counts or percentages, the analyst can subtract the count or percentage for each single race from the all inclusive count or percentage to obtain the count of individuals reporting each race in combination with any other race(s). For example, the Black or African American all inclusive count minus the Black or African American single race count will yield a count for those reporting Black or African American in combination with one or more other races. This would not be possible if the tabulation included summary statistics (mean, median, or percent) for attributes such as income, education or health outcomes. Tables A - D describe tabulation alternatives for data on race collected using the new standards. The new standards also affect the collection and reporting of data on Hispanic or Latino origin. The new standards call for asking a question on Hispanic or Latino origin followed by a question on race but also allows under limited circumstances for a single, combined question where Hispanic or Latino origin is included in a list along with the five standard racial categories. In the combined question, respondents are also instructed to "mark one or more." In either case, Hispanic origin may be reported alone or in combination with one or more races. As was the case for the tabulation of data on race, data on Hispanic or Latino ethnicity can also be presented for specific subgroups (e.g., Mexican, Cuban, and Puerto Rican) as shown in Table E. The tabulation headings used will be a function of the overall sample size and the population composition where the sample is selected. Even when separate questions are used to collect data on Hispanic or Latino origin and race, there are applications where a cross tabulation of the data from these two survey questions is preferred. Whether data are collected using the single question or the two question format, education and health data are frequently reported with racial data for Hispanics or Latinos as a separate group along with racial data for non-Hispanics or non-Latinos. Data collected under the new standards using either format will support the analysis of data on both Hispanics or Latinos and non-Hispanics or non-Latinos by race (Table F). For example, Table F shows that among Hispanics or Latinos, the sample size permits the presentation of data for Blacks, Whites, those of "other" races, and those selecting more than one race. Tabulations which incorporate the Hispanic or Latino subgroup information can be developed by expanding Table F. Since respondents are free to select one or more categories in the combined format, data collected from a survey or administrative reporting where a combined format is used can also be tabulated using Tables E or F. Table A. Sample Tabulation -- Detailed Presentation of Data on Race Race N % Total 328317 100.00 AIAN 2616 .79 Asian 9718 3.26 Asian Indian 1287 .42 Chinese 2245 .75 Filipino 1965 .63 Japanese 920 .34 Korean 966 .33 Vietnamese 1102 .38 Black 45259 12.32 NHOPI Q Q Other 9734 2.22 White 250054 78.24 More than one race 5435 1.62 AIAN/White 2618 .81 Asian/White 741 .24 Black/White 849 .23 Race Not Reported 5237 1.45 Q = Does not meet statistical criteria for reliability (< 0.2 percent of population). AIAN=American Indian and Alaska Native NHOPI = Native Hawaiian and Other Pacific Islander (for example, Hawaiian, Guamanian, or Samoan) SOURCE: NCHS/CDC National Health Interview Survey 1993-1995, Unpublished Tabulations Table B. Sample Tabulation -- Minimum Presentation of Data on Race Race N % Total 328317 100.00 AIAN 2616 .79 Asian 9718 3.26 Black 45259 12.32 NHOPI Q Q Other 9734 2.22 White 250054 78.24 More than one race 5435 1.62 Race Not Reported 5237 1.45 Q = Does not meet statistical criteria for reliability (< 0.2 percent of population). AIAN=American Indian and Alaska Native NHOPI = Native Hawaiian and Other Pacific Islander (for example, Hawaiian, Guamanian, or Samoan) SOURCE: NCHS/CDC National Health Interview Survey 1993-1995, Unpublished Tabulations. Table C. Sample Tabulation -- Minimum Presentation of Data on Race for a Small Sample Race N % Total 102467 100.00 Asian 2894 3.32 Black 13468 12.22 Other 5127 2.64 White 76441 77.94 NTA 4537 3.88 Note: Statistical criteria for reliability (< 2 percent of population). AIAN=American Indian and Alaska Native NHOPI = Native Hawaiian and Other Pacific Islander (for example, Hawaiian, Guamanian, or Samoan) NTA=Not Tabulated Above (Includes Race Not Reported, AIAN, NHOPI, and all responses that indicated More Than One Race) SOURCE: NCHS/CDC National Health Interview Survey 1993-1995, Unpublished Tabulations Table D. Sample Tabulation -- Detailed Presentation of Data on Race and the All Inclusive Distributions. Race N % Total 328317 100.00 AIAN 2616 .79 Asian 9718 3.26 Asian Indian 1287 .42 Chinese 2245 .75 Filipino 1965 .63 Japanese 920 .34 Korean 966 .33 Vietnamese 1102 .38 Black 45259 12.32 NHOPI Q Q Other 9734 2.22 White 250054 78.24 More than one race 5435 1.62 AIAN/White 2618 .81 Asian/White 741 .24 Black/White 849 .23 Race Not Reported 5237 1.45 AIAN all inclusive 5724 1.74 AIAN and other race(s) 3108 .95 Asian all inclusive 10710 3.57 Asian and other race(s) 992 .31 Black all inclusive 46731 12.72 Black and other race(s) 1472 .40 White all inclusive 254688 79.65 White and other race(s) 4634 1.41 Q = Does not meet statistical criteria for reliability (< 0.2 percent of population). AIAN=American Indian and Alaska Native NHOPI = Native Hawaiian and Other Pacific Islander (for example, Hawaiian, Guamanian, or Samoan) SOURCE: NCHS/CDC National Health Interview Survey 1993-1995, Unpublished Tabulations Table E. Sample Tabulation --Hispanic or Latino Ethnicity With Detail Ethnicity N % Total 328317 100.00 Hispanic/Latino 41585 9.78 Cuban 2151 .54 Mexican 26042 5.86 Puerto Rican 4809 1.25 Not Hispanic/Latino 283735 89.36 Ethnicity not reported 2997 .85 Note: Statistical criteria for reliability (< 0.2 percent of population). SOURCE: NCHS/CDC National Health Interview Survey 1993-1995, Unpublished Tabulations Table F. Sample Tabulation -- Detailed Presentation of Data on Race and Hispanic or Latino Ethnicity Ethnicity/Race N % Total 328317 100.00 Hispanic or Latino 41585 9.78 AIAN Q Q Asian Q Q Black 950 .24 NHOPI Q Q Other 8348 1.80 White 28742 6.88 More than one race 985 .26 Race Not Reported 1816 .42 Not Hispanic or Latino 283735 89.36 AIAN 2160 .69 Asian 9291 3.14 Asian Indian 1263 .42 Chinese 2208 .74 Filipino 1828 .60 Japanese 903 .33 Korean 944 .32 Vietnamese 1082 .47 Black 45259 11.99 NHOPI Q Q Other 1303 .41 White 219923 70.96 More than one race 4377 1.35 AIAN/White 2270 .72 Asian/White 613 .20 Black/White 677 .19 Race Not Reported 2444 .74 Ethnicity Not Reported 2997 .85 White 1389 .41 Race Not Reported 977 .29 Q = Does not meet statistical criteria for reliability (< 0.2 percent of population). AIAN=American Indian and Alaska Native NHOPI = Native Hawaiian and Other Pacific Islander (for example, Hawaiian, Guamanian, or Samoan) SOURCE: NCHS/CDC National Health Interview Survey 1993-1995, Unpublished Tabulations IV. USING DATA ON RACE AND ETHNICITY COLLECTED UNDER THE NEW STANDARDS This part of the report discusses some important uses of data under the new standards, reflecting in large measure work that is ongoing. A. Redistricting One of the first official statutory uses of data on race and ethnicity collected under the new standards will be for legislative redistricting following Census 2000. The new data format should not require substantial changes in the way redistricting will be conducted. How the 1990 Census Racial and Ethnic Data Were Used The 1990 census Public Law 94-171 ("redistricting count") tabulations (which were released to the states for redistricting purposes) reported data down to the block level for the total population and the voting age population (ages 18 years and older) for four racial groups (American Indian and Alaska Native, Asian and Pacific Islander, Black, and White) and a residual category ("other" race). Data on these racial groups were also cross-tabulated by Hispanic origin. Categories were mutually exclusive (each person was counted only once), and the categories added to the total population reported for a geographic region. States and political subdivisions that are covered under Section 5 of the Voting Rights Act are required to demonstrate, to the United States Attorney General or to a Federal district court in the District of Columbia, that their redistricting plans will not reduce the voting strength of their minority citizens and that the plans do not have a racially discriminatory purpose. All states and political subdivisions, however, are prohibited by Section 2 of the Voting Rights Act from using redistricting plans that have the effect of diluting their resident's voting strength on account of race. The U.S. Department of Justice or private citizens may file lawsuits to enforce these laws. In order to comply with those Federal laws, states and their political subdivisions used the redistricting count tabulations to assess the racial and ethnic compositions and distributions of their residents as they drew their redistricting plans. The data were used to identify areas in which racial and ethnic minorities were residentially segregated, in order, for example, to avoid splintering those areas among several districts. The data also were used in some areas to determine whether voting patterns were racially polarized. After the redistricting process was complete, courts would rely on the redistricting count data, together with other evidence, to decide any legal challenge that was filed against the redistricting plan. How the 2000 Census Data Can Be Used for Redistricting in 2001 In Census 2000 the major changes to the reporting of data on race and ethnicity are (1) the instruction to "mark one or more" racial categories and (2) the splitting of the "Asian or Pacific Islander" category into two separate categories -- "Asian" and "Native Hawaiian or Other Pacific Islander." Hispanic or Latino origin will be ascertained in a separate question, as in 1990 census. For the purposes of the 2000 Census Dress Rehearsal, the Census Bureau will provide tabulations of the number of persons who identified with only one of the five individual racial categories or with the residual category ("single race" counts), plus tabulations of the total number of persons who identified with each of the five individual racial categories either alone (e.g., White only) or in combination with any other categories (e.g., White plus any other racial category), referred to as "all inclusive" counts. Both the "single race" counts and the "all inclusive" counts will be cross-tabulated by Hispanic or Latino origin. It should be noted that the "all inclusive" counts will add to more than 100 percent of the population since a person's response will be counted in all of the racial categories selected. (See Appendix C for more information on Census 2000 Dress Rehearsal prototype redistricting data.) It is not expected that provision of the redistricting count data in the new format will lead to significant changes in redistricting practices or decisions. The new data categories will not affect the total population counts used for the apportionment of Congress, or for compliance with one- person, one-vote requirements. Once the Dress Rehearsal data are released and analyzed, there will be more information available about the practical effects of the new standards. It can be expected that the more that the single-count and all-inclusive-count populations share the same residential patterns, the less likely it will be that jurisdictions' redistricting choices will affect those populations differently. Research also has indicated that, at least nationwide, there is unlikely to be a significant difference between the "single count" Black population and the "all-inclusive" Black population. In addition, jurisdictions with substantial Hispanic or Latino populations will have a separate count of all persons identifying themselves as Hispanic or Latino, because ethnicity is collected in a separate question. Alternatives to the single-race/all-inclusive approach to redistricting data are under consideration. The U. S. Department of Justice has not yet reached a decision on the question of whether advantages would result from the use of one of the allocation methods described in Appendix D for voting rights issues. While allocation does not conform with the criterion that data uses should reflect "congruence with respondent's choice," it would facilitate comparisons with the 1990 census data. (Allocation methods assign an individual's multiple race response to a single race category.) Some have suggested that an allocation approach would have the advantage of giving redistricting authorities, the states and their political subdivisions, one number to use in making their redistricting choices. Others have suggested that instead it would require states to use and consider three data sets: single-race counts, all-inclusive counts, and the allocated counts. If a decision is made to use an allocation approach, the Department of Justice would discuss with the Census Bureau the technical feasibility of including matrices using the chosen allocation method in the PL 94-171 data files or producing a special tabulation with such data after the Census Bureau has met its legal deadline of April 1, 2001, for producing the data specified in PL 94-171. The working group would appreciate feedback from users on these issues. B. Equal Employment Opportunity One of the Federal Government's most significant uses of data on race and ethnicity is in its efforts to ensure that every individual has an equal opportunity for employment. Title VII of the Civil Rights Act of 1964, as amended, prohibits discrimination in employment based upon race, color, sex, religion, and national origin. Executive Order No. 11246, as amended, similarly prohibits discrimination in employment by government contractors. Executive Order 11246 also requires contractors covered by its provisions to ensure affirmatively that they do not discriminate against their employees and applicants for employment. Responsibility for equal employment opportunity is shared among a number of Federal agencies including: the Equal Employment Opportunity Commission (EEOC), the Department of Justice, the Office of Federal Contract Compliance Programs (OFCCP) in the Department of Labor, the Office of Personnel Management, and the Department of Education. Title VII is enforced by the EEOC against private employers and by the Department of Justice against state and local government employers. Executive Order 11246 is enforced by the OFCCP. Representatives from these agencies have been meeting to determine how best to implement the 1997 standards for reporting of data on race and ethnicity. This section describes some of the data related activities carried out by the agencies, how the data were previously collected and used, the changes the agencies have agreed upon, and some of the alternatives that are currently under discussion. As the new standards are implemented, agencies whose primary mission is civil rights enforcement will face particularly complex challenges. The EEO agencies will continue to consider the burden imposed on those responding to data requests as they make various tabulation, aggregation, and other decisions. All participants in these important decisions are reminded that it is not the intent of the 1997 standards to diminish the availability and quality of information collected and available for Federal civil rights enforcement and related purposes. Data Needs and Uses There are two basic theories of employment discrimination: disparate treatment and disparate impact. Disparate treatment can either affect individuals because of their protected characteristics, or in pattern and practice cases, it can affect all persons in the group who have an employment relationship with that employer. Individual disparate treatment cases rely primarily on evidence of how an individual was treated in comparison to other similarly situated individuals. In some instances, statistical evidence of disparities in treatment between similarly situated individuals can suggest that some individuals were subject to employment discrimination because of their protected class status. In disparate impact cases, statistics on the number of available and qualified minority workers for a particular job are compared with statistics on the employer's workforce. Enforcement agencies compare statistics on the racial breakdown of an employer's workforce to the racial composition of the available qualified labor pool. These analyses also consider statistics on the jobholder's employment-related characteristics, such as educational attainment or occupational experience, compared with similar data on those persons qualified for, and interested in, the at- issue jobs. This analysis is the first step in determining whether there is reason to believe that the employer's selection procedures improperly excluded individuals on the basis of their race, ethnicity, or gender. After this analysis, the employer may be asked to show that its selection procedures for the position(s) in question are job-related and consistent with business necessity. The workforce data often come from the employer's annual reports filed with Federal agencies (see "Data on Employer's Workforce" below), and the benchmark data come from a special file covering EEO-related data drawn from the most recent decennial census (see "The Benchmark File" below.) In some disparate impact cases, the selection or de-selection rates of different groups within the employer's workforce are compared without reference to external benchmarks. Data on Employer's Work Force. Data on an employer's workforce are collected annually on the Employer Information Reports (EEO-1 and EEO-4 surveys) covering private and state or local government employment, respectively, and on the EEO-5 and IPEDS (formerly EEO-6) surveys of employment in elementary/secondary and higher education, respectively. The current EEO forms collect general information about the employer and its workforce. Employers provide counts of employees within nine job categories by gender and five racial/ethnic categories (White--not of Hispanic origin, Black--not of Hispanic origin, Hispanic, Asian or Pacific Islander, and American Indian or Alaskan Native) for each facility. The Benchmark File. In 1990, a special EEO file based on the decennial census data was produced by the Census Bureau, in accordance with specifications provided by the EEO agencies. It included five matrices of counts for various geographic entities including the United States, States, metropolitan areas, counties, and places of 50,000 or more in population. The five tables presented various cross-tabulations of the number of people in each labor force category by gender, EEO racial/ethnic categories (six categories, the five noted above plus "other, not of Hispanic origin"), occupation (512 categories), industry (98 categories), educational attainment (six categories), earnings (9 categories) or age (seven categories). Summary of Data Use for EEO Analysis. The basic inquiry requires identification of the relevant labor force for each case, followed by a determination as to whether the employer's work force differs to a statistically significant extent from the benchmark comparison group. The relevant labor force depends on the employment action at issue. For entry-level positions that require few skills or experience, the benchmark may be some lesser skilled subset of the civilian labor force in the geographic area in which the employer operates. Depending on the qualifications required for a position, the relevant labor force may be further delineated, for example, by age, education, or occupation. For promotions, the relevant labor pool typically will be the employees eligible for the promotion. The basic inquiry is always the same: is the number/percent of, for example, Blacks, found in the employer's work force significantly different from the number of Blacks that would be expected to be found based on the percentage of qualified and interested Blacks in the labor force. The comparative information on the labor force generally comes from the benchmark file from the most recent decennial census. The wide range of factors, e.g. qualifications, availability, location, affecting employment decisions by both employers and individual workers influences whether the employer's work force will replicate the availability of individuals at any level of labor force aggregation. Absent discriminatory practices, it is also unlikely that significant disparities should exist between the proportion of qualified minority or female workers in positions throughout the employer's work force and the available and qualified labor pool. Statistical analysis measures the disparity between the actual participation of minorities or women in the employer's workforce and their expected representation to determine whether any disparity can be attributed to chance. The analysis is based on an assumption that available and qualified minorities and women are recruited, apply and are selected on a nondiscriminatory basis by the employer. Following statistical practice, if the likelihood of chance differences is less than 0.05 (the five percent probability significance level), regulatory agencies and the courts generally accept the alternative inference that unlawful factors may have influenced employer's decision making. In litigation, this inference can constitute a prima facie showing of discrimination, which then requires the employer to explain its practices or face liability. In several cases, the Supreme Court accepted the use of a statistic approximating the five percent probability level, a two-three standard deviation difference, but emphasized that a range of techniques can be used to reflect the fact patterns of each case. See Hazelwood School District v. United States, 433 U.S. 299, 311 n. 17 (1977), and Watson v. Ft. Worth Bank & Trust, 487 U.S. 977, 995 n.3 (1988). The following example illustrates the statistical comparison of the racial profile of an employer's workforce and the racial profile of similar job-holders in that employer's labor market area. In this example, the ABC Corporation, a large producer of computer software in City X, employs 350 programmers. Eleven, or 3.2 percent of these programmers are Black. Using the decennial census benchmark data, it is found that Blacks constitute 3.72 percent of available programmers working in City X. Using that benchmark proportion, the expected number of Black programmers in a company in City X with 350 programmers is found to be 13 (3.72 percent times 350). The difference between the number of Black programmers in ABC Corporation and the number expected is minus 2 (11 minus 13). In "standard deviation" terms, the disparity (-2/350) is -.57 standard deviations. Such a difference, while negative, is not statistically significant (to be statistically significant, it would need to be less than -1.96). Thus, the number of Black computer programmers employed by the ABC Corporation is not suggestive of an under representation of Black programmers in the employer's workforce. Changes Needed to EEO Forms and Instructions to Meet the New Standards Employer Record-keeping. The instructions accompanying the current EEO forms state that the race and ethnicity of an employer's work force may be obtained either by "visual surveys of the work force, or from post-employment records." The instructions state explicitly that eliciting information from the employee via direct inquiry is not encouraged. With the implementation of the 1997 standards, this guidance will change. Self-identification will be the preferred method of collecting data on race and ethnicity from employees. Employers will also be encouraged to use the two-question format with Hispanic ethnicity first, and to allow those employees who wish to do so to select more than one race. Employers will be asked to maintain this information in their data files. It is currently thought that employers will not be required to resurvey current staff, although some will likely do so. If employers do not resurvey current staff, the data available to be collected on the EEO forms will only slowly become comparable to the benchmark data reported in Census 2000. The OFCCP regulations do not specify how Federal contractors (employers) should gather the data necessary to complete the work force analysis or the utilization analysis for Affirmative Action Programs. The implementing regulations, however, require the filing of an EEO-1 report and, by implication, the data reported in the work force utilization analysis must be consistent with the EEO-1 reporting requirements. Planned Changes to the EEO Forms. To be consistent with the new standards, the following changes to the EEO forms are planned: (1) Add a separate category "Native Hawaiian or Other Pacific Islander" to EEO forms and instructions, and replace the category "Asian or Pacific Islander" with "Asian." (2) Make the following changes in terminology: a. The term "Eskimo or Aleut" replaced by "Alaska Native," b. The term "Black" replaced by "Black or African-American," and c. The term "Hispanic" replaced by "Hispanic or Latino." (3) Capture Hispanic or Latino ethnicity in a separate category or question. These planned changes do not incorporate a change of instructions to "mark one or more races." It has not yet been determined how best to revise the forms that collect aggregations of data about the employer's workforce to account for individuals who report more than one race. Efforts to date to design and test an aggregate reporting form are discussed earlier in this report. Alternatives for using the data for EEO purposes (that might lead to changes in the EEO forms) are described below. Ensuring Common Approaches in EEO Reporting The Federal civil rights enforcement agencies agree that they should adopt common data base definitions for the racial and ethnic categories used to enforce EEO laws and regulations. Clearly, whatever system is adopted, the enforcement agencies will need to consider the complex issues related to implementing the new standards, bridging to EEO enforcement conducted using data collected under the old standard, and continuing to conduct the important business of ensuring equal employment opportunity during the transition years. Because of the complexities in collecting and using the data reported under the new standards for civil rights enforcement purposes, the EEO agencies are still in the process of considering the best way to analyze these data. A number of alternative approaches are currently under review. Three alternatives are briefly described in the following sections. Each alternative would require the preparation of a suitable decennial census benchmark file. Readers are invited to comment on these alternatives and to suggest additional ideas and options. Tabulation Alternative 1: Using a Bridging Method. The EEO agencies have considered the methods discussed in Appendix D of this report, and have concluded that one of the allocation methods proposed for bridging would be useful during the transition period. The EEO agencies considered the allocation method that assigns an individual who selected more than one race to the largest of the nonwhite groups he/she marked as a viable alternative for EEO purposes. The largest nonwhite group may be ascertained from the racial composition of the population for the relevant geography. This allocation method can be used to assign responses from individuals who reported more than one race to single race categories. With this method, no change would be needed in the statistical methods currently used by the EEO agencies, and for a few years, employers who begin collecting data under the new standards would use this allocation method to report on their EEO forms the racial data for new hires who select more than one race. Employers could also be asked to record on their EEO forms the total number of individuals in their files who selected more than one race. This would provide the EEO agencies with a measure of the changing racial characteristics in work force data and would indicate when the final alternative should be implemented. This method represents an interim solution that would precede full implementation of the new standards. Following careful evaluation of Census 2000 data, decisions could be made that phase in the new standards in an analytically appropriate manner. Tabulation Alternative 2: The Lower and Upper Boundary Approach. Under the new standards, employees will be able to identify themselves as members of more than one racial group. As a result, some individuals who were identified as members of only one group, for example, Black, under the previous standards, may now identify as members of more than one group, for example, Black and White, under the new standards. Thus, when data are reported it will be possible to determine two counts for each racial group. The lower count, or lower boundary, will be those individuals who identify with one race only, for example those who marked only the Black category. The larger count, or upper boundary, adds to the lower boundary those individuals who identify with the given racial category and one or more other racial categories. Thus, the upper boundary Black count includes everyone who marked Black either alone or in combination with one or more other racial categories. The remainder of the population consists of those individuals who did not identify as Black. As a practical matter, in most geographic locations the upper and lower boundaries will not currently be substantially different for purposes of employment data because few adults are expected to report themselves as members of more than one racial group. This assessment is based upon data provided in Appendix D of this report, and documentation of the National Content Survey and the Race and Ethnic Targeted Test conducted by the Census Bureau. Data from some geographic regions are expected to reflect larger numbers and percentages of respondents reporting themselves as belonging to more than one racial group. An interagency group is working on possible modifications to survey forms, such as the EEO-1, that collect aggregated data on the characteristics of many individuals for a single organization, to capture information needed for the upper/lower boundary approach. The tests conducted to date are described in detail in Appendix B of this report. Tabulation Alternative 3: Collect Micro Data from Employers. An alternative approach to using an aggregate reporting form, similar to the EEO-1, is to ask respondents to provide a micro data file containing one record (without identifiers) for each employee. The micro record would include the employee's race or races, ethnicity, gender, and occupational category. This approach might be simpler for employers, and would provide agencies the maximum amount of flexibility in using the information. Implementation of this approach appears to be a longer-term solution. The EEO agencies would need to work with respondents in designing and implementing the reporting format and method, and they would need to acquire the relevant software and hardware to process the information. Illustrations of Comparisons Under Alternative Tabulation Approaches To illustrate the alternatives, consider the example described earlier in this section. Recall that the ABC Corporation, a large producer of computer software in City X, employs 350 programmers. It is assumed that the ABC Corporation started maintaining self-reported data on race (allowing employees to select one or more races) for their new hires more than a year ago. As a result, their internal files contain a mixture of data collected under the old and new standards. For their 250 programmers hired before the new standards were implemented, information on race in internal files is recorded as one of the four racial groups. These files indicate that 8, or 3.2 percent of the long-term programming staff members, are Black. For the 100 recent employees, race is recorded as one or more of the five groups. According to these records, one of the new programmers has reported that he is Black, one has reported that she is Black and White, and one has reported that he is Black and American Indian. None of the other 97 individuals hired after the new standards were implemented reported Black either alone or in combination with another race. In benchmark data based on Census 2000, the following percentages of programmers in City X have reported that they are Black: 3.3 percent have reported the single race Black, .23 percent have reported that they are Black and White, and .11 percent have reported that they are Black and American Indian. A total of .42 percent have reported that they are Black and some other race or races. Comparisons Under Alternative 1: Allocation. Because there are more Blacks in City X than any racial group other than White, under the allocation method known as "largest non-white group", ABC Corporation would count the 8 long term Black employees and the 3 new employees who selected Black alone or in combination with another race, and report that they have 11 Black programmers (approximately 3.2 percent of their programmers). Similarly the benchmark proportions would count in the Black category everyone who marked Black either alone or with other race(s). This would count a total of 3.72 percent of the available programmers as Black. With these transformations, the counts and percentages are identical to the example provided earlier and the analysis would lead to identical results. If a different racial group were used in the analysis, or a different allocation method were used, results would not necessarily be identical to the earlier example. Comparisons Under Alternative 2: Upper/ Lower Bound. For the upper/ lower bound method, ABC Corporation would report that they have 9 programmers (2.6 percent) in the single race (or lower boundary) Black category, and 2 employees (.6 percent) who have reported Black in combination with another race. Thus, the "all inclusive" (or upper boundary) count for Black programmers is 11 (3.2 percent). The benchmark file has 3.3 percent of the programmers in the single race (or lower boundary) Black category, and .42 percent of the programmers who report as Black and at least one other race, yielding a total of 3.72 percent of programmers in the "all inclusive" (or upper boundary) category. Given past patterns of discrimination, one would most likely argue that the "all inclusive" category would be most appropriate to use. In this example, the resulting counts and percentages are identical to the example provided earlier, and to the results of the allocation method. The analysis could be conducted using the data for the single race category -- or lower bound, as follows. Using the benchmark proportion 3.2 percent, the expected number of Black programmers in a company with 350 programmers in City X is found to be 11 (approximately 3.2 percent of 350). The difference between the number of single race Black programmers in ABC Corporation and the number expected is minus 2 (9 minus 11). In "standard deviation" terms the disparity (-2/350) is -.61. This difference is not statistically significant (to be statistically significant, it would need to be less than -1.96). Thus, the number of Black computer programmers employed by the ABC Corporation is not suggestive of an under representation of Black computer programmers in the employer's work force. In this case, the analysis using the lower bound leads to the same conclusion as the analysis using the upper bound, though the numbers are somewhat different. Note that if a different allocation method was used with tabulation alternative 1, or if one of the other racial groups were used in the example, the upper bound ("all inclusive" count) would not be identical to the count based on the tabulation allocation method. The reader is referred to Appendix D for a detailed discussion of the impact of the various allocation methods. Comparison Alternative 3: Full Data Reporting. With this method, ABC Corporation will compile a micro data listing of employee characteristics to submit for EEO purposes. The table below illustrates the contents of such a micro data file. This example is intended to illustrate the complete recording of sex, race, and ethnicity. It makes use of the single job category "programmer," and therefore cannot be viewed as a real prototype for EEO reporting. In this table X denotes "yes," zero denotes "no," and blank indicates that the data are not available. The first record (employee number 1) is a Black, non-Hispanic male programmer. His data are recorded in the new format: he was hired after the new reporting system was adopted and had an opportunity to self-select one or more races. He chose to report himself as Black. On the other hand, employee 4 has been an employee for some time, and his data are in the old format. He is also a Black male programmer, but the information provided in this record is what was recorded in the company files prior to conversion to the new reporting system. If this type of information became available from all employers, the EEO agencies could use any of the tests described above, or they would be able to transition to applying the EEO methodology to any groups that become large enough to monitor for EEO, including those that involve more than one race. Illustration of Part of Micro Data File for ABC Corporation ___________________________________________________________________ Employee Sex Hispanic Race Programmer New Format Number W B I A H ___________________________________________________________________ 1 M 0 0 X 0 0 0 X X 2 F X X X 0 0 0 X X 3 M 0 0 X X 0 0 X X 4 M 0 0 X 0 0 X 0 5 F 0 0 X 0 0 X 0 6 F 0 0 X 0 0 X 0 7 M 0 0 X 0 0 X 0 8 F 0 0 X 0 0 X 0 9 M X 0 X 0 0 X 0 10 M 0 0 X 0 0 X 0 11 M 0 0 X 0 0 X 0 12 F X X 0 0 0 0 X X 13 . . . . . . . . . ___________________________________________________________________ W=White B=Black I=American Indian and Alaska Native A=Asian H=Native Hawaiian and Other Pacific Islander Comparisons using Tabulation Alternative 3 would require benchmark data from the Census Bureau for a subset of the 63 different unique combinations of reporting of race. Decisions concerning the size of the groups for which tabulations are needed would need to be made by the EEO agencies, informed by the data from the decennial census. C. Vital Records and Intercensal Estimates The revisions to the standards for collecting and presenting Federal data on race and ethnicity pose many challenges to the Census Bureau's Intercensal Population Estimates Program. Because the population estimates are data driven, changes to the program to provide new racial categories will depend upon the availability of data from a variety of sources. Although changes are possible, it will require discussions with data providers and data users, as well as research and analysis of data collected under the new standards, before the Census Bureau can identify the racial categories that can be used in the Intercensal Population Estimates Program. Following some background discussion, this section presents a description of the Intercensal Population Estimates Program, its methodology, and its major uses, and then turns to some of the major issues that must be addressed. Background In 1977, the Office of Management and Budget (OMB) issued Race and Ethnic Standards for Federal Statistics and Administrative Reporting. Because the intercensal population estimates are limited in their detail by the availability of administrative data, it was not until 1993 that the Intercensal Population Estimates Program could modify its racial categories to follow fully the 1977 standards by providing data for the population in the four major racial categories -- White; Black; Asian or Pacific Islander; and American Indian, Eskimo and Aleut. To comply with the 1977 standards, the Intercensal Population Estimates Program developed estimates by race separately for the population by Hispanic origin (Hispanic, non-Hispanic). The 1997 standards present many challenges with two in particular posing the greatest challenge. One is that respondents to Federal data collections, including Census 2000, surveys, and vital statistics registrations, will be allowed to select one or more races. The other is that the Asian or Pacific Islander aggregate category has been split into two categories -- one called "Asian" and the other called "Native Hawaiian or Other Pacific Islander." Because the intercensal population estimates serve several diverse purposes, exploring the possible outcomes of the estimates process and examining the implications of the new standards are important. The intercensal population estimates are used as controls for many Federal surveys, as denominators for important Federal statistics, and as indicators for important program and policy decisions. Because the issues raised by the 1997 standards are complicated and diverse, it will take considerable research and experimentation before the Intercensal Population Estimates Program can produce population estimates outputs that fully follow the new standards. The next sections describe the program and discuss the major issues that must be addressed in changing program outputs. What is the Intercensal Population Estimates Program? The Intercensal Population Estimates Program, under Title 13, develops and releases annual estimates of the total population and its demographic characteristics. For the Nation, states, and counties, these characteristics include annual estimates by: Age -- single years of age (age 0 to age 99) and 100+; Sex -- Male/Female Race-- White; Black; Asian and Pacific Islander; and American Indian, Eskimo, and Aleut; Hispanic origin -- Hispanic/non-Hispanic The Intercensal Population Estimates Program currently provides estimates of the total population of functioning governmental units (cities, incorporated places, and minor civil divisions). The Census Bureau is considering expansion of the program to include smaller and more diverse units of geography (such as School Districts), as well as the development of demographic characteristics for functioning governmental units and other smaller geographic units. How Are the Population Estimates Used? The population estimates are used in the intercensal period for funding allocations, as controls for Census Bureau and other Federal surveys, as denominators for vital statistics and other demographic events, and as planning tools for government and private programs. Funding Allocations. Federal programs totaling $180 billion use these annual population estimates to make important program decisions and to distribute these funds. Survey Controls. The population estimates are used as control totals for the Current Population Survey (CPS), the Survey of Income and Program Participation (SIPP), the new American Community Survey (ACS), other Federal surveys, as well as many private surveys. Most Federal surveys use national level population estimates by age, sex, race, and Hispanic origin as controls for weighting survey data. The ACS currently uses county level population estimates by age, sex, race, and Hispanic origin as controls for weighting survey data. Denominators for Demographic Events. The National Center for Health Statistics (NCHS) currently uses the national, state, and county population estimates by age, sex, race, and Hispanic origin as denominators to create birth and death rates and to calculate life tables by race and sex. In addition to the use by NCHS, the Centers for Disease Control and Prevention (CDC) frequently relies upon the estimates of population at various geographic levels as denominators for various health related and disease incidence rates. The National Cancer Institute (NCI) uses the county population estimates by age, sex, race, and Hispanic origin as denominators for the various cancer incidence rates released to the public. Planning Tools. The intercensal population estimates are frequently used as planning tools and as barometers to measure an area's growth and change since the last decennial census. In making important policy decisions, local planners frequently cite the overall population level and the demographic characteristics products of the Intercensal Population Estimates Program. Methodology for Developing Intercensal Population Estimates The Intercensal Population Estimates Program develops its population estimates by age, sex, race, and Hispanic origin using the demographically recognized cohort-component technique. In this technique, each component of population change -- births, deaths, international migration, and internal migration -- is estimated separately by age, sex, race, and Hispanic origin. Various administrative records provide information needed to develop these components of population change. The estimates process begins with the most recent decennial census results and combines the estimated components of population change to develop the intercensal population estimates. The 1990 Census Base Population. Although the enumeration of the resident population in the 1990 census, without adjustment for net undercoverage, was adopted as a standard for the estimates, changes were made in the distribution of the population by age and race. These modifications were made to bring the definition of age and race into conformity with definitions used for data from other sources, such as vital statistics. (See Comparability Issues below for a complete discussion of the modification of the 1990 Decennial Census.) Birth and Death Components. In brief, NCHS provides annual counts and distributions of births and deaths by age, race, sex, and Hispanic origin by county to the Census Bureau in a specially developed individual record file of the birth and death events. These individual records contain the detailed race and Hispanic classifications available from the birth and death certificates collected by NCHS. International Migration Component. The international net migration components are based on a variety of administrative sources and analytic estimates. The Immigration and Naturalization Service (INS) supplies data on legal immigrants. The Office of Refugee Resettlement (ORR) supplies data on persons admitted to the United States as refugees. Both sources supply data on country of birth. The Census Bureau estimates the distribution by race and Hispanic origin from the country-of-birth tallies, using data from the 1990 Census on the foreign-born population who entered the United States from 1985 to 1990. The other components of international migration such as emigration and undocumented migration are developed using a combination of basic demographic modeling techniques. By examining data from other administrative records in combination with an analysis of the decennial census, the Census Bureau models the level and demographic characteristics of these other international migration components. Internal Migration Component. The data on internal migration are developed using a basic administrative records method. This method relies on annual extracts of tax returns provided by the Internal Revenue Service (IRS). In this approach, using the Social Security Number (SSN) on the return, The Census Bureau can match the tax returns for two years and obtain state of residence for the two periods. By comparing the state of residence at the two points in time, annual measures of migration can be developed for states. Until recently, the Census Bureau had only developed the national population estimates by age, race, sex, and Hispanic origin and the estimates of the total population for states and counties. During the current decade, the Census Bureau started to develop a set of state and county population estimates by age, sex, race, and Hispanic origin. These state population estimates are developed using the basic cohort component technique outlined above. Since the standard tax return provides no demographic characteristics of the tax filer, the Census Bureau must further modify the basic administrative records method to estimate internal migration by age, sex, race, and Hispanic origin. To obtain demographic characteristics, the Bureau has relied on the annual extract of tax returns provided by the IRS, and a 20 percent sample of information on the Social Security Administration Application File (NUMIDENT). This NUMIDENT file includes SSN, month and year of birth, race, sex, and six characters of the last name for each SSN holder in the sample file. The extract of the NUMIDENT file has been merged with the tax returns file by SSN to derive demographic characteristics of IRS filers. Because the Census Bureau was able to receive only a 20% sample of this basic NUMIDENT file, the Bureau appended the demographic characteristics of the primary filer to only the same 20 percent sample of tax returns. Besides demographic characteristics of the primary filers, the model requires demographic characteristics of those persons claimed as exemptions on the tax return. The rules for assigning demographic characteristics to dependents are straightforward and rely on basic familial and demographic relationships. Because until this year, the NUMIDENT File was restricted to a 20 percent sample, the Census Bureau could not use the merged tax file and SSA data to develop county population estimates by age, sex, race, and Hispanic origin. To develop the current sets of county population estimates by age, sex, race, and Hispanic origin, a ratio approach is employed. This approach combines the full set of age, race, sex, and Hispanic origin detail for the county in 1990 with the newly developed state population estimates by age, sex, race, and Hispanic origin and the estimates of the total population of the county. With the delivery of the 100 percent NUMIDENT file to the Census Bureau, work on employing the cohort component technique to develop the county estimates by age, sex, race, and Hispanic origin is anticipated. Data Availability The intercensal population estimates are "data driven." As noted above, the decennial census, the National Center for Health Statistics, the Immigration and Naturalization Service, and the Social Security Administration are all important sources for developing intercensal population estimates. Using the current methodology, estimates cannot be produced without the availability of these data. Decennial Census Data. The Census 2000 will mark the first time that decennial population data are available using the new OMB standards for collecting racial data. The Census Bureau is developing the approaches and timetables for tabulating these data from the Census 2000. Birth and Death Data. The National Vital Statistics System is the basis for the Nation's official statistics on births and deaths (including infant deaths). The data are provided through vital registration systems maintained and operated by the individual states and territories where the original certificates are filed. While the legal authority for vital registration rests with the states and territories, the National Center for Health Statistics (NCHS) is required to produce national vital statistics by collecting data from the vital records of all the states. The NCHS cooperates with the states in developing the standard forms for data collection as well as standard procedures for data preparation and processing in order to promote a uniform national data base. The NCHS shares in the costs incurred by the states through contractual agreements with each state. Under this arrangement, NCHS obtains and publishes vital statistics based on all births and deaths (e.g., 3,891,494 and 2,314,690, respectively, in 1996) occurring in the United States. Implementation of the 1997 standards on vital records will require changes in data collection and processing systems at all levels of government and very likely will take at least several years to accomplish throughout the United States. In addition to revising computer systems at the state and Federal levels, the electronic software that is used in hospitals to record and report over 90 percent of all births in the United States needs to be converted. Most importantly, the procedures used to collect birth and death data in hospitals and funeral homes will need to be revised and the appropriate staff need to be trained. It can be anticipated that not all registration areas will implement the 1997 standards at the same time or with complete coverage and compliance at the start. For example, some states may implement the revised race question on birth and death certificates in the year 2000 in order to be compatible with Census 2000, while others may prefer or need to wait until the next revisions of the U.S. Standard Certificates of Birth and Death are implemented in 2002. During 1998 and 1999, the NCHS is sponsoring a committee of state vital statistics officials and representatives of the relevant professions in a series of meetings to evaluate the entire content and format of the current Standard Certificates. The committee's goal is to submit certificate revisions to the Secretary, Department of Health and Human Services, in July 1999 for clearance by the Department. Implementation by the registration areas is expected to occur in January 2002. Some states have indicated a desire to make changes in the race and ethnicity items at the same time as other changes are made. International Migration Components. As discussed above, the international migration components are based on a variety of administrative sources and analytic estimates. The Immigration and Naturalization Service (INS) supplies data on legal immigrants. The Office of Refugee Resettlement (ORR) supplies data on persons admitted to the United States as refugees. Both sources supply data on country of birth. To develop data on the race and Hispanic origin of the entering immigrants, the Census Bureau combines the information on country of birth from the INS files with information from the most recent decennial census. Because the INS and other data sources on international migration do not code race or Hispanic origin, no change in these sources is anticipated. The Census Bureau will need to examine the results of Census 2000 and develop new algorithms to accommodate the revised categories for data on race. Internal Migration Components. To develop the internal migration component, the Census Bureau currently relies upon the annual extract of tax returns provided by the Internal Revenue Service (IRS), and a 20 percent sample of information on the Social Security Administration Application File (NUMIDENT). Under an agreement between the Census Bureau and the Social Security Administration, the Census Bureau has recently gained access to a full 100 percent NUMIDENT file. This opens additional opportunities for developing subnational population estimates by age, sex, race, and Hispanic origin. This component also presents the biggest obstacle to modifying categories for data on race in the intercensal population estimates process. Under the Social Security system, data on race are provided as part of the Social Security card application process. For the oldest among the population currently covered in the NUMIDENT files, the last application date could refer to the beginning of the Social Security system. Until 1980, the Social Security Administration application system provided three racial categories -- White, Black, and Other. Beginning in 1980, the SSA modified the racial categories on the SSA application form to include five categories -- (1) Asian, Asian-American or Pacific Islander; (2) Hispanic; (3) Black (non-Hispanic); (4) North American Indian or Alaskan Native; (5) White (non-Hispanic). Although SSA modified the racial categories application card, people who already had an SSA card did not have to resubmit their data on race. Thus, pre-1980 entries on the SSA file have information for three racial categories (White, Black, and Other), while entries after 1980 have information for five racial categories. The application for a Social Security card needs to be updated to reflect the 1997 standards. Another change to the Social Security application procedure has presented challenges to the use of data on race. Beginning in the late 1980's, the Social Security Administration introduced the "enumeration at birth program." Under this program, parents could request a Social Security Number for their newborn children with the birth registration process. Because the birth certificates do not include racial information for the newborn, it is impossible to code race for the newborn onto the SSA file. While information on race is available for the birth mother and father on the basic birth registration certificate, this data are not made available to the Social Security Administration and is not on the basic NUMIDENT file received by the Census Bureau. Comparability Issues Even the availability of the required source data does not ensure the capability to produce reasonable and accurate population estimates. Production of population estimates by the major demographic characteristics depends upon the availability of comparable data across the various data sources. While comparability issues with respect to race reporting are not new, the increased complexities of the new racial categories are likely to exacerbate the problems. The issues about comparability in race reporting are present in the current set of intercensal population estimates. Data from the 1990 census on race posed several of these problems. Although the enumeration of the resident population in the 1990 census, without adjustment for net under coverage, was adopted as a standard for the estimates, changes were made to that distribution of the population by age and race. These modifications were made to bring the definition of age and race into conformity with definitions used for data from other sources, such as vital statistics. For age, the aim was to correct biases in census age tabulations that resulted from displacement of age reporting from the reference date of the census. In 1990 census publications, age is based on respondents' direct reports of age at last birthday, with some editing for age misstatement. This definition proved inadequate for postcensal estimates however, as many respondents reported their age (even if correctly) at the time of completion of the census form or interview by an enumerator, either of which could have occurred several months after the April 1 reference date. As a result, age was slightly biased upward. Modification was based on a respecification of age, for most individual respondents, according to their year of birth. Age was derived from year of birth by allocating date of birth to the first quarter and last three quarters of each year, subtracting year of birth from 1990 for those born before April 1, and from 1989 for those born after April 1. The allocation was based on an historical series of registered births by month. For race, the objective of the modification was to conform to the definition of race specified in the 1977 standards. In the 1990 census, a substantial number of people (roughly 9.8 million) did not specify a racial group that could be classified in any of the categories on the census form: White; Black; American Indian, Eskimo, or Aleut; Asian or Pacific Islander. A large majority of these people were of Hispanic origin (based on their response to a separate, Hispanic origin question on the form), and many wrote in their Hispanic origin, or Hispanic origin type (for example, Mexican or Puerto Rican) as their race. People of unspecified race were allocated to one of the four tabulated racial groups (White; Black; American Indian, Eskimo or Aleut; and Asian or Pacific Islander) based on their response to the Hispanic origin question. These four categories for race conform with the 1977 standards, and are more consistent with the categories in other administrative sources than are the original census tabulations. Census 2000 will pose challenges about reporting of race. The expanded number of categories and the possibility for reporting more than one race translates into over 60 possibilities. The large number of categories that are likely to have few responses will present challenges to the Intercensal Population Estimates Program. When combining across data sets and agencies, the problems of comparability in reporting of race become more severe. Clearly, the added complexity of reporting more than one race will add to this problem, particularly as different reporting situations (such as the census or the birth and death certificates) engender differential tendencies to report more than one race. Differences in allocation and editing procedures will almost certainly exacerbate the problem as exemplified by the problem of using data from different data universes in the calculation of rates. Future Direction The process of developing a set of intercensal population estimates consistent with the 1997 standards will not be an easy one. Until data are available, making any commitments about the probable set of products is impossible. The Census Bureau realizes, however, that many data users need to know its plans in order to make their own program decisions. To begin this process, the Census Bureau is forming a technical interagency group of key data providers and key data users to address many of the major issues. Members of this group will provide input on: (1) the feasibility of using one consistent set of categories on race across all geographic levels; (2) the feasibility of using population size as the only criteria for determining which categories by race will have separate population estimates; (3) the minimum cell size below which population estimates will not be produced; (4) the continued development of population estimates by mutually exclusive categories on race; and (5) the use of consistent methodologies for the different categories by race in the population estimates program. This technical group will also examine issues related to data allocation and editing -- important factors related to the data consistency issues. Although detailed data on race from Census 2000 will not be available until mid 2001, during the next few months, the interagency group can address and reach consensus on most of the issues outlined above. Through these discussions with the data providers and data users, the Intercensal Population Estimates Program can begin to form some tentative plans. Although it is too soon to speculate on any outcomes, it is likely that the Intercensal Population Estimates Program will need to be flexible. During the coming decade, as more data become available using the 1997 standards, it is likely that the Census Bureau will continue the expansion of the population estimates program to include additional categories by race. D. Issues for Further Research (Under Development) V. COMPARING DATA UNDER THE OLD AND THE NEW STANDARDS This part of the report provides a summary of the Bridge Report: Tabulation Options for Trend Analysis, which is contained in Appendix D. A. Introduction Agencies whose data are used to display time trends in economic, social, and health characteristics by racial and ethnic groups may need to consider bridging methods to assist users in understanding the data collected under the new standard. For some period of time, referred to as the bridge period, agencies may display historical data along with two estimates for the present time period. The first, a tabulation of the data collected under the new standard (see Part III B), and the second, a "bridging estimate" or prediction of how the responses would have been collected and coded under the old standard. Once the bridge period is over, the bridge estimates will no longer be needed. It should not be assumed that bridging is useful or required in every situation. Agencies should carefully consider whether they need bridging estimates. Bridging estimates may not be needed if agencies can tolerate a "break" in their data series or if comparison to another data series provides users with enough information about the change. If bridging estimates are not used, however, agencies should footnote the first occurrence of data collected under the new standard. There are at least two purposes of bridge estimates: (1) to help users understand the relationship between the old and new data series (as noted above); and (2) to provide consistent numerators and denominators for the transition period, before all data are available in the new format. If there is a need for bridging, agencies should carefully evaluate alternative methods. The work presented in Appendix D, and summarized below, is intended to help inform agencies about the statistical characteristics of selected bridging methods. Agencies are encouraged to plan and conduct methodological research that will lead to more informed decisions concerning bridging methods and their uses. Such methodological research has long been used to quantify changes in data collection procedures. For example, when methods for coding industry, occupation, or diseases are updated, it is common practice to code data using both sets of coding rules to determine the nature and extent of the changes introduced by the change in procedures. The analyses presented in Appendix D make use of survey data in which the same respondent provided racial information in response to both a question structured under the old standard, and in response to questions similar to those that might be structured under the new standard. These are examples of methodological approaches that can be adopted by agencies, if necessary. In particular, since 1976, the National Health Interview Survey (NHIS) has added a follow-up question for those reporting more than one racial identity, asking them to select the one that they feel best describes them. This information is directly used in some of the most promising bridge techniques. Some agencies may find that adding such a follow-up question to the questions on race and ethnicity, even just once after the implementation of the new standards, would provide valuable survey-specific information for bridging to the past. As agencies conduct such experiments, the results may assist other agencies in understanding the changes associated with transitioning to the new standard. The results discussed here and in Appendix D represent the work of a group of statistical and policy analysts drawn from Federal statistical agencies that use and produce data on race and ethnicity. They have spent the past year considering these tabulation issues and conducting research to develop tabulation guidelines for constructing "bridges" between racial data collected under the new standards and racial data collected under the old standards. The report sets forth criteria by which different bridging methods should be evaluated and describes the different methods that have been considered thus far. The results of the research conducted on several methods for creating bridges are also presented. This part of the report discusses different options for tabulating racial data in order to create bridges from data collected under the 1997 standards, which have five racial categories and permit the reporting of more than one race, back to the data collected under the previous standards, which identified four racial categories. An "Other" category appears in much of the analysis, because it is included in the decennial census and some other surveys. All of these methods (and the research on them reported here) involve the use of individual-level records. Analysis is limited to data collected using the separate questions for race and Hispanic origin. Under the new standards, when reporting is based on self-identification, the two-question format is to be used; even in the case of observer identification, this is the preferred format. It is expected that some users will bridge to a distribution created using the combined format for the question on race and ethnicity. Thus, bridging both to the old racial distribution arising from the use of two questions and one based on a combined, single question are analyzed. At this time, the analysis of bridging to the combined distribution has not been completed, but those results will be included in the report when they become available. Based on the research, the strengths and weaknesses of each tabulation method are discussed. Until all the analysis has been completed, however, recommendations will not be made. B. Methods for Bridging The goal of developing bridging methodology for data on race is to identify a statistical model that will take individuals' responses to the new questions on race and classify those responses as closely as possible to the responses we hypothesize they would have given using the old single race categories. Such a task will be relatively easy or be more difficult depending on how an individual identifies himself or herself under the new standards. For bridging purposes, individuals with only a single racial background are likely to identify as they did before, and no statistical model is needed for bridging. However, those with a mixed racial heritage who were previously required to identify only one part of their background may, under the new standards, choose to report more than one racial identity. When a person identifies with more than one racial group, some model will be necessary to translate those multiple responses into the one, single response we hypothesize that the individual most likely would have reported under the old standards. Framework. Several different methods have been identified for creating a single race distribution from data including multiple race responses. These methods vary in both the assumptions that are made and the procedures that are followed. Before describing the particular methods examined in this report, it is useful to describe some of their major underlying characteristics. One major distinction among the methods is whether an individual's responses are assigned to a single racial category (termed whole assignment) or to multiple categories (termed fractional assignment). Whole assignment can be based on a set of deterministic rules or based on some probabilistic distribution. For example, a deterministic rule might assign all White and American Indian responses into the American Indian category, while a probabilistic rule might randomly assign 60 percent of the White and American Indian responses into the American Indian category, and 40 percent into the White category. In the above example, it is unlikely that all individuals identifying as White and American Indian under the new standards would have previously identified as American Indian, so the deterministic rule will result in misclassifications for all those people who had previously identified as White. With a probabilistic rule, an individual's responses are randomly assigned to either the American Indian category or the White category (such as with 60 percent and 40 percent probabilities, respectively, based on previously collected data). However, even if the overall probabilities matched exactly the aggregate distribution under the old standards, there is no guarantee that the 40 percent who were categorized as White would have classified themselves that way. In fact, in the worst case, all 40 percent who were classified as White would actually have identified as American Indian under the old standards, and a corresponding percentage of those categorized as American Indian would have identified as White. When fractional assignment is used, multiple race responses are categorized into more than one category where each category receives a fraction of a count, and the sum of the fractions equals one. In the above examples of whole assignment, a person's responses were placed into one and only one category, in an attempt to mimic the past. An alternative is to use a deterministic rule to assign some fraction of the multiple race responses to each of the racial categories identified. For example, a multiple response of White and American Indian might count as "one-half" in the tabulations for American Indians and "one-half" in the tabulations for Whites. These fractions, like the probabilities in the earlier example, could be varied for different combinations of multiple races to attempt to reflect how often people might identify with one group compared with another. Bridge Tabulation Methods. All of the bridge tabulation methods focus on the assignment of the responses from individuals who identify with more than one racial group. Responses from individuals who identify with only a single racial group under the new standards are assumed to have been the same under the old standards. The response "Native Hawaiian or Pacific Islander" is assigned to the old racial category of "Asian or Pacific Islander." The specific methods for assigning multiple race responses into single race categories are Deterministic Whole Assignment, Deterministic Fractional Assignment, and Probabilistic Whole Assignment. Two sets of results for each of the following tabulation methods are produced. The first set ignores the use of any auxiliary information other than that needed to carry out the particular tabulation method. The other set of results for each method uses the one piece of information that is certain to be common to all data collections done following the new standards, that is, ethnicity. Thus, whether or not an individual is Hispanic is taken into account when a tabulation method is used. (1) Deterministic whole assignment. These methods use fixed, deterministic rules for assigning multiple responses back to one and only one of the racial categories from the old standards. Four alternatives are examined. The first (Smallest Group) assigns responses that include White and another group to the other group, but responses with two or more racial groups other than White are assigned into the group with the fewest number of individuals identifying that group as a single race. The second alternative (Largest Group Other Than White) assigns responses that include White with some other racial group, to the other group, but responses with two or more racial groups other than White are assigned into the group with the highest single-race count. The third alternative (Largest Group) assigns responses with two or more racial groups into the group with the largest number of individuals as a single race. In this latter case, any combination with White is assigned to the White category, and combinations that do not include White are assigned to the group with the largest single-race count. The fourth alternative (Plurality) assigns responses based on data from the National Health Interview Survey (NHIS). The NHIS has permitted respondents to select more than one race for a number of years, with only the first two responses captured. However, respondents reporting more than one race were given a follow-up question asking them to select the one race with which they most closely identify (called Main Race here). For these respondents, the proportion choosing each of the two possibilities as their main race was calculated. All responses in a particular multiple-race category using the Plurality method are assigned to the group with the highest proportion of responses on the follow-up question about main race. (2) Deterministic fractional assignment. These methods use fixed, deterministic rules for fractional weighting of multiple-race responses, that is, assigning a fraction to each one of the individual racial categories that are identified. These fractions must sum to 1. Two alternatives are examined. The first (Deterministic Equal Fractions) assigns each of the multiple responses in equal fractions to each racial group identified. Thus, responses with two racial groups are assigned half to each group; those with three groups are assigned one-third to each, etc. The second alternative (Deterministic NHIS Fractions) assigns responses by fractions to each racial group identified, with the fractions drawn from empirical results from the NHIS (as described above). (3) Probabilistic whole assignment. These methods use probabilistic rules for assigning multiple race responses back to one and only one of the previous racial categories. Two alternatives are examined. These parallel the two alternatives discussed under Deterministic Fractional Assignment, except that, for a given set of fractions, the response is assigned to only one racial category. The fractions specify the probabilities used to select a particular category. The first alternative uses equal selection probabilities. The second uses the NHIS fractions where possible, and equal fractions when no information is available from NHIS. Probabilistic Whole Assignment will yield nearly, on average, the same population counts as Deterministic Fractional Assignment. Only the results from Deterministic Fractional Assignment are presented in this report. In practice, there would be a difference between Deterministic Fractional Assignment and Probabilistic Whole Assignment when computing variances for tabulated estimates, and the two methods will yield relatively small differences in distributions for respondent characteristics. In general, Probabilistic Whole Assignment would yield a higher estimated variance than the Deterministic Fractional approach, with the variances for both methods underestimating the true variance. Probabilistic methods which incorporate a "Multiple Imputation" statistical technique would result in an unbiased estimate of variance, but at the price of being more difficult to implement (See Rubin 1987.). (4) All Inclusive. A final tabulation method considered is termed the "All Inclusive" method. Under this method all responses are used. Responses are assigned to each of the categories that an individual selects. The sum of the categories totals more than 100 percent. C. Methods of Evaluation Data Sources National Health Interview Survey. The NHIS is a continuing nationwide sample survey designed to measure the health status of residents of the United States (Benson and Marano, 1995; Massey et al., 1989). The analysis here uses data from an analytic file that contains three years of NHIS data (1993, 1994, and 1995). For each of these years there were about 45,000 households interviewed, resulting in slightly more than 100,000 individuals per year. The total sample for the bridge analysis is 323,080 (5237 respondents did not provide data on race). Since 1976, the NHIS has allowed respondents to choose more than one racial category. As the respondent is handed a card with numbered racial categories, the interviewer asks, "What is the number of the group or groups that represent your race". If a respondent selects more than one category, the interviewer then asks, "Which of those groups would you say best describes your race?" Although the listed racial groups have changed over time, for 1993 to 1995, the card shown to respondents included 16 separate racial categories (white, black, American Indian, Aleut, Eskimo, Chinese, Filipino, Hawaiian, Korean, Vietnamese, Japanese, Asian Indian, Samoan, Guamanian, and other Asian and Pacific Islander). Although not on the flashcard, respondents were allowed to give an "other" race response. To be consistent, the 16 groups were collapsed to the four previous racial categories: White, Black, American Indian or Alaskan Native (AIAN), and Asian or Pacific Islander (API), plus Other. For this analysis, a variable called Detailed Race was created from responses to the first question, which allowed identification with more than one racial group. This information is not included on public use data files of the NHIS. However, on internal files, the first two race groups mentioned are recorded for each observation. Even if a respondent selected more than two groups, only two were recorded on the intermediate file. From the two recorded racial responses, Detailed Race was coded into five single race groups (White, Black, AIAN, API, Other) and 11 multiple race groups (White/Black, White/AIAN, White/API, White/Other, Black/AIAN, Black/API, Black/Other, AIAN/API, AIAN/Other, and API/Other). For most analyses, multiple race combinations that had insufficient numbers were aggregated into the category "Other Combinations." Individuals who had two racial groups recorded for Detailed Race but a third group recorded for the "group that best describes race" were coded into "Other Combinations." The Main Race variable, used as a reference point representing the racial distribution under the old standards, is primarily derived from Detailed Race and the responses to the second question, which asks the respondent for the group that best describes his/her race (Benson and Marano, 1995). For respondents who selected one Detailed Race group, Main Race is the same as Detailed Race. For respondents who selected more than one racial group, Main Race is the one group reported as best describing their race. Some respondents who had chosen more than one race for the Detailed Race question responded as "Multiple race" or "Other" for the Main Race question. For this analysis, these responses were combined into the "Other" category. Categories for Main Race were White, Black, AIAN, API, and Other. May 1995 Supplement on Race and Ethnicity to the Current Population Survey (CPS). The May 1995 CPS Supplement was one in a series of studies conducted for the Federal agencies' review of the standards for data on race and ethnicity. The Supplement was designed to address the following issues: (1) the effect of having a "multiracial" race category among the list of races; (2) the effect of adding "Hispanic" to the list of racial categories; and (3) the preferences for alternative names for racial and ethnic categories (e.g., African-American for Black, and Latino for Hispanic). The Supplement was organized into four panels representing a two-by-two experimental design for studying the first and second issues outlined above. Each panel was given to one-fourth of the sample, or about 15,000 households (30,000 individuals). All respondents in a household received the same set of questions; household members 15 years and older were asked to respond for themselves, and parents answered for children under 15. Only two of the panels in the CPS Supplement permitted respondents to report in a multiracial category (panels 2 and 4), and only one panel had separate race and Hispanic origin questions (panel 2) as ultimately recommended in the new standards. Therefore, panel 2 data were used to analyze the effects of the different tabulation methods for the two-question format. The smaller sample (about 30,000 observations) hampers analysis and generalizations when the focus is on the small portion of the sample (about 1 percent) who identified as "multiracial." There are additional limitations to these data for evaluating the bridging methods. The option respondents were given to identify multiple races in the CPS Supplement was a multiracial category with a follow-up question asking respondents to indicate all the racial groups with which they identified. The new standards allow people to identify directly with all the racial groups they choose and do not include a "multiracial" category. Furthermore, a large percentage of individuals who chose the multiracial category in panel 2 of the Supplement did not specify more than one racial group (see Tucker et al., 1996). For purposes of this evaluation, individuals were classified as belonging to the specific racial categories they identified. Those who identified as being multiracial but then did not give two or more specific racial groups were reclassified in the one racial category they gave. Thus, the distribution of the CPS Supplement data reported here differs from that which was published in earlier reports, which classified as multiracial any person who identified with the multiracial category even if they only specified one racial group. This new distribution is referred to here as the "Edited Distribution." This edited distribution was used with the various tabulation methods. As in NHIS, the resulting distributions were compared to a reference distribution based on the respondents' original answers (in the first CPS interview) to the race question that followed the old standards. 1998 Washington State Population Survey. The 1998 Washington State Population Survey (WSPS) was designed to provide information on Washington residents between decennial censuses. The survey collected data on employment, income, education, and health, along with basic demographic information. The WSPS was done by telephone and included 7,279 households with telephones. Blacks, Asians, Hispanics and American Indians were over sampled. The designated respondent was the individual with the greatest knowledge about the household. The respondent weights reflect this over sampling and, thus, results are representative of the Washington population as a whole. The response rate for the entire sample was between 50 and 60 percent. Information about the race of the respondent was collected twice during the course of the interview. At the beginning of the survey, the respondent was asked, "Are you of Hispanic origin?" Following that question, the respondent was asked, "What is your race?" The categories were the ones appearing under the old standards, but the order was as follows: Black; American Indian, Aleut, or Eskimo; Asian or Pacific Islander; and White. An "Other" category also was allowed, and the interviewer recorded the verbatim response on a "specify" line. Near the end of the survey, the respondent was asked race questions conforming to the new standards. Besides the same Hispanic origin question, the respondent was asked to specify country of origin. For race, the respondent was asked to select one or more categories. This time the ordering of the categories was White; Black or African American (or Haitian or Negro); American Indian or Alaska Native; Native Hawaiian or Other Pacific Islander; Asian. Again, an "Other" category was provided. There also was a follow-up question for Asian respondents to specify country of origin. The results from the race question at the end of the survey were used with the tabulation methods. The reference distribution came from the answers to the original race question. Advantages and Disadvantages of These Data Sources Only the Washington State data closely resemble the way the question on race will be asked under the new standards. Yet, all three can offer insights into the relationship between how individuals will actually respond to the new question on race and how they responded to the question under the old standards. The NHIS and the CPS Supplement are nationally representative, and the Washington State data serve as an example for evaluating the tabulation methods at the state level. Simulations using 1990 census data also were conducted, but the results differed little from those for the other data sets. At this point, it is believed that an analysis of data from the 1998 Dress Rehearsal for Census 2000 would be of greater utility. Furthermore, the Dress Rehearsal data will provide examples of the effects of the new standards at the local level. Thus, this analysis will be included in the next version of this report. Description of New Analyses The analyses concentrated on the bridge tabulation methods. These analyses can be divided into three broad areas: (1) descriptions of racial distributions under the alternative bridging tabulation methods; (2) rates of racial "misclassification" for these alternatives; and, (3) sensitivity of outcome measures to the bridging alternatives. Distribution of Race. For the first phase of the analysis (using the NHIS, the CPS Supplement, and the data from Washington State), the distributions of race under the allocation alternatives described previously were calculated: All Inclusive, Deterministic Whole Allocation (Smallest Group, Largest Group Other Than White, Largest Group, and Plurality) and Fractional Allocation (Equal Fractions and NHIS Fractions). These new distributions were compared to the reference distribution in each data set. At this time, it is unknown what percentage of people in the United States will identify with more than one racial group when given the opportunity to do so in Census 2000 and in subsequent surveys. For purposes of illustrating the effects of a greater proportion of individuals identifying more than one racial background, analyses were conducted increasing the proportion of multiple race responses two-, four-, six- and eight-fold using the NHIS, the CPS Supplement, and the Washington State micro data sources. The racial distributions were compared using each of the tabulation methods to see effects with increasing levels of reporting more than one race. Of necessity, these tabulations assume that the increases are the same across the different combinations of more than one race. The accuracy of this assumption cannot be tested. The purpose of these analyses is not to attempt to make accurate predictions about the extent of multiple race reporting or its composition, but rather to see more clearly possible differences among tabulation methods that may only become apparent with a greater percentage of more than one race reporting. Misclassification of Race. Besides evaluating the overall racial distributions produced by the tabulation methods, the misclassification of individuals also needs to be examined. For the NHIS, the CPS Supplement, and the Washington State survey, these misclassification rates were formed by comparing an individual's answer to the race question under the old standards to the assigned category of the individual's response(s) to the race question under the new standards using each of the tabulation methods. The misclassification rate and its standard error for each race by tabulation method were produced. Preliminary Outcomes Assessment. In the last phase of the analysis, the impact of multiple- race reporting on outcome measures was assessed. This process is important because users in many of the Federal agencies are not typically examining race distributions, but rather trends and indicators for the Nation (e.g., health outcomes, economic well-being, educational attainment) across racial groups. This is where the majority of work will need to be done within individual agencies as the new standards are implemented. An initial examination of how common statistics could be affected by reporting of more than one race was conducted. Five outcome measures were examined, three from the NHIS and two from the CPS Supplement. From the NHIS, three routine health outcomes were calculated: percent of respondents in poor or fair health, percent of children living with a single mother, and percent of respondents with no health insurance. From the CPS Supplement, the proportions of respondents who were unemployed and the labor force participation rates for different racial groups were calculated. These estimates based on the bridging alternatives are not meant to be precise measures of these factors, but are used to demonstrate the possible impact reporting of multiple races and the tabulation methods may have on these and similar estimates. D. Examination of the Results with Respect to the Evaluation Criteria Bridging to the past will be needed for measuring change in a variety of circumstances. Besides measuring population growth, any number of economic, social, and health outcomes must be monitored. This work will involve different population groups at different levels of geography. As a first step toward providing the information users will need to make informed decisions about the methods, the strengths and weaknesses of the bridging methods with respect to the evaluation criteria outlined at the beginning of this report are discussed, based on the results of the statistical analyses conducted. The details of these statistical analyses can be found in Appendix D. Measure Change Over Time. As indicated earlier, measuring change over time is the criterion that is of greatest importance in evaluating the bridging methods. The first and second phases of the analysis shed light on the performance of the various methods in this area. In essence, an ideal bridging method in this case is one that not only accurately recreates the population distribution under the old standards such that the only difference remaining is a function of true change over time, but also assigns an individual's response to the old category that would have been chosen. The methodology used in these studies allows users, within limits, to see how well the bridging methods using racial data collected under the new standards can match data from the same respondents collected (at about the same time) under the old standards. To the extent that there is a match, any change that would occur from this point forward would indicate true change. If the match is poor, it is not possible to isolate the true change. When comparing the different methods to their reference distributions, the racial categories that were most sensitive to which method is chosen were the numerically small ones, particularly the AIAN category. While different data sets were used in each study and the racial questions were not the same, the studies indicate that the Largest Group Deterministic Whole Assignment method, the Plurality method, and the two Deterministic Fractional Assignment methods produce distributions closer to the reference distributions than do the other Deterministic Whole Assignment methods and the All Inclusive method. Controlling for ethnicity had no effect on these results. One reason the Largest Group Assignment method results are so close is that it has little effect on the smaller races, because most assignments are made to Black or White, and the percentages for these two races are so large that the relatively small increase they receive is not noticeable. The Plurality method produces a good fit, because it makes assignments at the level of specific racial combinations. The performance of the NHIS Fractional Assignment method can be discounted to a degree in the NHIS study because the analysis is somewhat circular; however, the results from the CPS Supplement and the Washington State Population Survey (WSPS) show this method yields a relatively close match. The Equal Fractional Assignment method produces a reasonable match in these studies. The primary reason that the other two Whole Assignment methods and the All Inclusive method do not perform as well is that they alter the White percentage to some extent and substantially increase the percentage in the AIAN category. In the case of misclassification rates, some contradictory results emerge. While the AIAN and "Other" categories have high misclassification rates across all tabulation methods in the CPS Supplement, the same is not true for the other two surveys. The Smallest Group Whole Assignment and the Largest Group Other Than White Whole Assignment methods produce the most comparable results for the AIAN category in both surveys and for the "Other" category in the WSPS; however, these methods have higher overall misclassification rates. Both the CPS Supplement and the WSPS have large misclassification rates for these two categories when using many of the tabulation methods. When the distributions of the outcome variables are examined, all methods produce comparable, and relatively close matches for all health outcomes. For the AIAN unemployment rate, the Largest Group Whole Assignment method and the NHIS Fractional Assignment method appear to produce the least comparable results, but none of the differences are significant. There are significant differences in the AIAN labor force participation rates for several of the tabulation methods. It is likely that which method is best at matching a reference distribution for outcome measures will depend on the outcome being examined. Unfortunately, the data to assess the best tabulation method for each outcome may never be readily available. All of these conclusions should be viewed with caution. Many assumptions had to be made in these studies. It is unclear how people will respond to the new racial question in the future, and these responses could differ by mode of data collection and with the subject of the survey. Furthermore, most of this work on developing bridging methods relied on sample data, and small samples at that. Congruence with Respondent's Choice. This criterion concerns how well the full range of the respondent's choices is represented in the racial distribution. It is more important for evaluating ongoing tabulations under the new standards, but the bridging methods can be differentiated with respect to this criterion, too. None of the Deterministic Whole Assignment methods take into account the full range of the respondent's selections, but the Plurality method at least controls for the particular racial combination chosen by the respondent under the new standards. The All Inclusive method accurately reflects all selections by tabulating actual responses and not people. The Equal Fraction Assignment method tabulates people, but, like the All Inclusive method, treats all responses equally. The NHIS Fractional Assignment method takes all responses into account, but assignment is based on attempting to estimate in which single-race category the respondent would prefer to be counted. Range of Applicability. This criterion refers to how well the bridging method can be applied in different contexts. The All Inclusive method provides the same results in every context, because assignment does not depend on the particular detailed racial distribution. This method is not suitable for users who need a distribution that adds to 100 percent. Of the Deterministic Whole Assignment methods, the Largest Group Assignment method is the least sensitive to context and can be used in a wide variety of applications. The other Deterministic Whole Assignment methods are as easy to use as the Largest Group Whole Assignment method, but the results for the small racial categories will vary to a greater extent with the context, particularly according to level of geography. The Equal Fraction Assignment method is as generalizable as the All Inclusive method, but it is not quite as easy to use. The NHIS Fractional Assignment method and the Plurality method may be the most problematic, because they currently only represent a national preference distribution based on data from 1993 to 1995. The use of this distribution at the local level would be likely to produce inaccurate results in a number of cases. That is not to say that the other methods do not face the same problem. Meet Confidentiality and Reliability Standards. Because these methods all attempt to reproduce the racial categories under the old standards, the same confidentiality problems that existed over the last 20 years will continue to exist. No increase in problems is anticipated. In the case of reliability, however, the situation is different. The All Inclusive method will not produce less reliable data than data produced under the old standards. The Equal Fraction Assignment method may have reliability problems as a result of only adding fractional counts to some of the smaller categories if these categories have a high probability of being chosen as the preferred single race. The same would be true if equal fractions were used to make whole assignments. In sample surveys, the Deterministic Whole Assignment methods will have reliability problems to the extent that there is a large variance on the individual race proportions. This is likely to occur when small samples are involved. The Largest Group Whole assignment method should have the fewest problems with respect to reliability, and the Smallest Group Whole Assignment method will likely have the most. These methods have another problem, however, in that an individual's response may be assigned to different categories at different levels of geography. The NHIS Fractional Assignment method, as well as methods where fractions are used for whole assignment (i.e., the Plurality method), is based upon a sample distribution with its own variance properties. Reliability for the very small combinations will be quite bad unless many years of data are combined, and this presents its own problems. Minimize Disruptions to the Single Race Distributions. This criterion is only relevant for evaluation of bridging methods. Its purpose is to see how different the resulting bridge distribution is from the single-race distribution for detailed race under the new standards. To the extent that a bridging method can meet the other bridging criteria and still not differ substantially from the single-race proportions in the ongoing distribution, it will have value for looking both forward and backward in time. An evaluation of the different methods according to this criterion involves the comparison of the bridge distributions to the detailed race distribution under the new standards in each case. For the CPS Supplement, the Plurality method is marginally closer than the Largest Group Whole Assignment method and the Fractional methods. While the All Inclusive method and the other Deterministic Whole Assignment methods match for the White category, they differ substantially from the single-race AIAN category in the detailed distribution and are marginally worse for the API category. The NHIS Fractional method is the closest in both the NHIS and WSPS. Statistically Defensible. To be statistically defensible, the bridging method must conform to acceptable statistical conventions. The All Inclusive method makes no assumption about how respondents would assign themselves in the single race situation. The NHIS Fractional Assignment method and the Plurality method are based on an observed distribution, and, to that extent, involve less judgment than the rest of the methods that assign people and not responses. While the Equal Fractional Assignment method is based on judgment, it does not make assumptions about the relative importance of any given race. The Largest Group Whole Assignment method does assign greater importance to one of the races, but it also follows common, but different, statistical practice than the equal fraction approach. Both attempt to minimize the error in assignment. The Smallest Group Whole Assignment method and the Largest Group Other Than White Whole Assignment method do not follow statistical practice, but, instead, rely on the historical record of discrimination; even in these cases, however, the assigned category is based on an observed distribution. Ease of Use. "Ease of use" refers to how complicated it is to produce the bridge results. The Equal Fractional Assignment method makes assignments that do not depend on the particular detailed racial distribution at hand. It and the NHIS Fractional Assignment method do require the duplication of individual records or the creation, on every record, of a variable for each racial category under the old standards in order to be able to assign fractions for any combination of categories. If the fractional methods are used to assign a respondent to a single category (whole probabilistic methods), this cumbersome process can be avoided. The All Inclusive method, like the Equal Fractional method, does not depend on the particular distribution, but it does produce proportions that add to more than 100 percent unless they are raked or repercentaged to a base of 100 percent each time. The Deterministic Whole Assignment methods and the NHIS Fractional method would require an extra step unless only national figures are used, because the relative size of the groups must be determined for each detailed distribution. Otherwise, they are as easy to use as the whole probabilistic methods. Skill Required. This criterion refers to the skills required to carry out the bridge operations. The amount of computer expertise to perform the operations associated with each of these methods is fairly trivial. The Deterministic Whole Assignment methods require almost no statistical knowledge. Some familiarity with the statistical adjustment literature would be useful for understanding the Deterministic Fractional Assignment procedures. If the All Inclusive method were used, users might need to understand statistical raking. Understandability and Communicability. This criterion concerns how easily the methods can be explained and understood by the average user. The Deterministic Whole Assignment methods are both easy to explain and easy to understand. The fractional assignment of individuals to a single category also is not difficult to follow. Assigning fractions of a person to different categories may be easy to explain, but the average user may find it difficult to accept the idea. The All Inclusive method also is easily explained, but, unless the percentages are raked to 100 percent, users may have a problem understanding how to use the results. References Benson, V. and Marano, M. (1995), "Current Estimates from the National Health Interview Survey, 1994," National Center for Health Statistics, Vital Health Statistics, 10(193). Massey, J. T., Moore, T. F., Parsons, V. L., and Tadros W. (1989), "Design and Estimation for the National Health Interview Survey, 1985-1994," National Center for Health Statistics, Vital Health Statistics, 2(110). Rubin, D. R. (1987), Multiple Imputation for Nonresponse in Surveys, New York: Wiley, 1987. Tucker, C., McKay, R., Kojetin, B., Harrison, R., de la Puente, M., Stinson, L., and Robison, E. (1996), "Testing Methods of Collecting Racial and Ethnic Information: Results of the Current Population Survey Supplement on Race and Ethnicity," Bureau of Labor Statistics Statistical Notes, No. 40.