Frequently Asked Questions

General

Census Data

Technical GIS and Importing Data

Conversion Files

Precinct Data

Registration Data

Voting Data

General

Where is the Statewide Database?

The Statewide Database's offices are located in Barrows Hall at UC Berkeley, Berkeley, CA 94720.

What type of data do you have?

The Statewide Database's datasets consist of the Statements of Vote (SOV) and the Statements of Registration (SOR) for each statewide election since 1992. These data are collected from California's 58 counties. The database also maintains decennial and other census data for use in redistricting. The Census, SOV, and SOR data are available for download from the Statewide Database website in the "Data" link. Data are available on various units of analysis like census block, tract, and precinct.

Where do you get your data?

The registration and voting data are collected by the County Registrar of Voters or County Clerks in each of California's 58 counties. Census data are collected by the Census Bureau under the Department of Commerce.

How can I get local election data?

At this time the Statewide Database only receives funding to process statewide races and does not collect data for local races like city council and mayor. Contact the local County Registrar of Voters or Election Clerk to obtain data for local races.

How much do you charge?

The Statewide Database is a free public resource. Our data and services are available to anyone who wishes to use them. All items ready for download on the website are free. Specific requests for data on CD-ROM's are available at cost. The charge for plotting a basic map is currently $75.00 plus shipping and handling (approx. $12.00).

What are FIPS county codes?

FIPS stands for Federal Information Processing Standard Code. FIPS county codes are unique three-digit codes that identify counties in California. In census files, the county FIPS codes are five-digits with the last three digits indicating the county and the first two digits designating the state FIPS code (which for California is 06). Below is a table of county numbers, county names are in alphabetical order accompanied by FIPS codes. FIPS codes are calculated by taking the county number, multiplying it by 2 and subtracting 1.

Do I need a special program to read the data?

All downloadable files are in *.dbf format, which can be read by most spreadsheet programs such as Microsoft Excel and SPSS. Data is also available in *.txt format, which can be read by most word-processing programs such as MS Word, MS Works, and WordPerfect.

Do you make maps?

Yes, but we are not exactly a map-making organization. We encourage people to make their own maps with our data, either with their own GIS (Geographic Information System) or with our computers. We will assist you as much as possible, and we will take orders for customized maps. You can create Statewide, County, Senate, Assembly, and Congressional District levels filled with various data such as blocks and street labels, census, and/or registration information. Unfortunately, we cannot provide maps with precinct information. Basic Assembly, Senate, and Congressional District maps are available online in the "Maps" section of this website. All other maps need to be requested by e-mail or phone.

Do you draw district lines?

While we are the State of California's "Redistricting Database" we do NOT draw lines. Our purpose is to readily provide data to ALL who wish to use them. While some of our users may intend to use our data to draw their own plans, we do NOT provide instruction on drawing lines, nor will we draw them for anyone.

Where was the new 2001 California congressional seat created?

The new congressional seat is the 46th district. Located in Orange County, it encompasses the cities of Anaheim (west and north-south Anaheim Stadium-Disneyland corridor), Buena Park, Costa Mesa, Fountain Valley, Garden Grove, Irvine, Orange, Santa Ana, Stanton, Tustin, and Westminster (north of San Diego Freeway). Dana Rohrabacher (R) currently represents the 46th district.

Do you also have Census Data?

Yes, we store and make available for download the Census Data specified for Redistricting and the Reapportionment. It is published in a file called PL94-171 that includes total population, race and ethnicity, and housing units.

Census Data

Can I merge 2000 census tract data to the 2001 Assembly, Congressional, Senate, and Board of Equalization Districts?

Yes, the Statewide Database provides a set of files, the 2000 Census tract to California's 2001 Districts Conversion files that can be used for this purpose. These files can be found the Census Geography Assignment & Conversion File Collection page, https://statewidedatabase.org/district_con_ass.html

How do you determine which census blocks/ tracts are in a given city?

You will need to download our census block to census place assignment file. The file can be found on this page, https://statewidedatabase.org/district_con_ass.htm. It is the 3rd link from the bottom of the page. The link is called, "2000 Census Block to 2000 Census Place Assignment File." Please note that census tracts cross-city boundaries while census blocks do not. What this means is that some of your tracts will only be partially with in the borders of your city of interest while the census blocks will be either in or out.

Are the 1992G to 2000G census block data files by 1990 or 2000 census block units?

The 1992G to 2000G data files are based on the 2000 census blocks.

How/by what method was the 1992 to 2000 voting data merged to the 2000 census block from the precinct?

The propensities of various groups in the electorate to vote in particular ways (using ecological inference) are estimated. These propensities are then used to proportionally allocate the precinct results back to census blocks. Please note that census block and precinct geography come from different governmental entities and very little effort is made to produce common boundaries.

How is the 1992G to 2000G block-level data produced? Given that there are several blocks to a precinct, how were the votes assigned to the block level? Does this create any issues when aggregating blocks to the city level?

The process is documented in a report titled, "Disaggregation of Precinct Voting Results to Census Geography" which can be found at https://statewidedatabase.org/info/metadata/disaggregation_of_prec_to_block.pdf. https://statewidedatabase.org/info/metadata/disaggregation_of_prec_to_block.pdf Here is the link to our documentation page: https://statewidedatabase.org/metadata.html This would not create any special issues when aggregating to the city.

Does SWDB provide voter or registration data by zipcode units? What about by ZCTAs?

No; however, the Statewide Database does publish its electoral (Registration and Statement of Vote) by USPS zip code.

The Statewide Database is California's redistricting database and redistricting is done with Census TIGER/Line files. Zip codes are United States Postal Service delivery areas and as such they are maintained and changed at will by the USPS. Furthermore, USPS zip codes areas create transecting areas when overlaid on the Census TIGER/Line geography and precincts.

Though it is not our practice to maintain data or geography for zip code units here at the Statewide Database for the above mentioned reasons, there are data reports for zip codes on our website that we have either collected or that were created as part of special research projects. These old data reports can be downloaded from our website's REPORTS archive but we do not have any current or future plans to publish our precinct data sets by USPS zip code.

ZCTAs, or zip code tabulation areas are a different story though we don't intend to create ZCTA data reports as any analyst can using our precinct to 2000 block conversions files.

ZCTAs are Census statistical tabulation areas, unlike USPS zip codes; and as such, they do align with census geography.

Using the Statewide Database's rg precinct to 2000 block conversion files it is possible to aggregate block-level registration to the ZCTA using a ZCTA to 2000 census block cross-walk file.

A ZCTA to 2000 census block cross-walk file, can be found on the Census Bureau's web site, http://www.census.gov/geo/ZCTA/zcta.html

Can we obtain the exact count of the registration breakdown by block for elections after 2000?

You would need to geocode the voter registration file to the census block using the registered voter's address. This would allow you to retain/associate all of the data associated with a registrant including party affiliation. The Statewide Database's precinct block to conversion files are based only on a geocode of total registrants in a precinct and not their party affiliation. *Next spring the Statewide Database will be releasing all of our precinct data from the 2002G to the 2010G on the 2011 census blocks. This census block data set of registration and voting data will be constructed using a more precise method than the precinct to block conversion files.

How can I estimate the registration breakdown (i.e. number of Republicans and Democrats) for a particular block using the precinct data files.

Each block contains a certain percentage of a precinct's total registrants. In order to estimate, the registration breakdown for a particular block, use the election data set's 2000 Census Block to Precinct conversion file to multiply the percentage of each party in the overall precinct by the percent that block represents in terms of total registration.

Technical GIS and Importing Data

What projection and coordinate system does SWDB's GIS spatial files use?

The .cdf and .shp files use the latitude and longitude system which is an X-Y coordinate system and the projection of the files is NAD 83 UTM. The .mif files use the Earth Projection 1, 0 coordinate system which basically means latitude/longitude on a perfect sphere.

How do you import DBF files into R?

Method 1 If the version of your R does not have the "foreign import" library - for example, library (foreign) returns false then the R cannot directly read or import the DBF file. Rather, one needs to convert the DBF to a CSV by using Excel, and then use a command line of R to read in that CSV. Steps: 1. Open Excel. Select File, Open. 2. In the "Look In" drop down box, select the directory in which you saved the DBF file; and in the "Files of Type drop down box", select "All Files (*.*)." Double click on the file name and the .dbf file will now open in Excel. 3. With that open DBF, select File, Save As and in the "Save as Type drop down box" select "CSV (Comma delimited)." Specify the filename, and hit Save. 4. In R, use the following syntax to open this CSV: read.csv(filename, header = TRUE, sep = ",", quote="\"", dec=".", fill = TRUE); Method 2 If library(foreign) returns true, that means you are ready to load DBF directly into R. Please refer to the syntax in this page:http://www.ats.ucla.edu/stat/r/faq/inputdata_R.htm Citation: Introduction to SAS. UCLA: Academic Technology Services, Statistical Consulting Group. "How to input data into R" from http://www.ats.ucla.edu/stat/sas/notes2/ (October 31, 2012).

Conversion Files

What is the significance of the "UNASSIGN" values in the spatial and geographic data conversion files?

The "UNASSIGN" value represents a geographic area assigned to a registration precinct in which there were no registered voters.

What are the DATA CONVERSION files i.e. the SRPREC to BLK, RGPREC to BLK, BLK to MPREC, SRPREC to RGPREC and SRPREC to CITY files and what are they used for?

These are the Statewide Database's conversion and equivalency files, also referred to as cross-walk files. Each election has a set of these files. They can be found in the 3rd column on the election's Geographic Data page. The SPREC to RGPREC and MPREC to SPREC files are used to cross-walk the registration (REG, ABS, POLLV & VOTE) data, and the SOV precinct data between the different precinct types i.e. rgprec, rrprec, srprec and mprec. For example, if your analysis requires that you merge SOV data to the map precincts (mprec) for spatial/GIS analysis and since the SOV data is only available by the srprec and svprec precinct types, one would use the MPREC to SRPREC file to cross-walk the two precinct types. One can merge 2000 census data to the precincts using the SRPREC to BLK, RGPREC to BLK and BLK to MPREC files. Additionally, the SRPREC to BLK and RGPREC to BLK are bi-directional so they can also be used to merge the srprec and rgprec data to the 2000 census block

How is a census geography associated with a precinct in the precinct to blk conversion files i.e. SRPREC to BLK, RGPREC to BLK and BLK to MPREC files and what is the "conversion" based on?

First of all, the conversion files between blocks and precincts are obtained from the digitally-recorded precincts overlaid on the census geography. Then every individual registered voter within an address is "geocoded," or put into his or her census geography by means of address matching.

This process allows the number of registered voters per census block - precinct piece to be determined. The number of registered voters in a given precinct-block piece is reported in the BLKREG field in the block to precinct conversion files.

When the BLKREG is divided by the total number of registered voters in the precinct (SRTOTREG/ RGTOTREG) one can derive what proportion of a precinct is composed of a given block-precinct piece and vice verse when the BLKREG is divided by the BLKTOTREG, the percent of a block belonging to a given precinct can be derived. The measure is reported in the PCTBLK field of the block to precinct conversion files.

How can I merge registration and voting precinct data to census block units using the precinct to block conversion files on the Geographic Data page i.e. the SRPREC to BLK and RGPREC to BLK files? What about merging to 2000 census block groups and/ or 2000 census tracts?

The precinct to block conversion files have a field called "PCTRGPREC" in the RGPREC to BLK files and "PCTSRPREC" in the SRPREC to BLK files. These fields contain the proportion of a given precinct's total registered voters (RGTOTREG or SRTOTREG) that are contained within the portion of the precinct that is encompassed by the census bock. The number of registered voters in the census block piece is "BLKREG."

Hence the values in the " PCTRGPREC" and " PCTSRPREC" variable fields are derived thus: BLKREG / RGTOTREG = PCTRGPREC and BLKREG / SRTOTREG = PCTSRPREC

The case is often that multiple census blocks will transect a single precinct with some census blocks being split between several precincts. Overview of the procedure:

1. Merge the precinct to 2000 census block conversion file to the precinct data you want to merge to census block units.

2. Distribute PCTRGPREC or the PCTSRPREC value, depending on which precinct type you are working with, across the precinct data. This renders the precinct data into its 2000 census block-precinct pieces.

3. Sum the records by census block so that you have one record for each block. If your goal is to obtain precinct data by block group or census tract you should sum the file to these units.

Is it also possible to use these same files to do the reverse i.e. merge/ convert census block data to the sr, rg and map precincts?

Yes, both of the RGPREC to BLK and the SRPREC to BLK as well as, the BLK to MPREC file can be used to merge census block data to the RG, SR and Map precinct types.

In the case of merging census data to precinct units, you will be distributing the value in the "PCTBLK" field, rather than the "PCTRGPREC" and "PCTSRPREC," across the census block data you want to merge to precinct units.

The precinct to block conversion files and the MPREC to BLK conversion file have a field called "PCTBLK." This field contains the value of the proportion of a given 2000 census block's total registered voters (RGTOTREG or SRTOTREG) that are encompassed by a single precinct. The number of registered voters in the census block piece is "BLKREG" and the total registered voters in the census block is the "BLKTOTREG."

Hence the values in the "PCTBLK" variable field are derived thus: BLKREG / BLKTOTREG = PCTBLK Overview of the procedure:

1. Merge the 2000 census block to precinct conversion file to the census block data that you want by precinct units.

2. Distribute PCTBLK value across the precinct data. This renders the census block data into its 2000 census block - precinct pieces.

3. Sum the records by precinct so that you have one record for each block.

What about merging block group and tract data to the precincts? Can this also be done with the block to precinct conversion files?

Yes, the RGPREC to BLK, the SRPREC to BLK as well as, the BLK to MPREC file can be used to do this but it is a more advanced analysis.

To merge 2000 tract records or block group data to one of the precinct types i.e. rg, sr or map it is first necessary to re-tabulate the records in the conversion file to determine the proportion of a block group/ tract's total registrants that fall into a given precinct.

Overview of the procedure:

1. Merge the re-tabulated precinct to 2000 census block group/ tract conversion file to the block group/ tract data that you would like to merge to precinct units.

2. Distribute the percent block group/ tract value, depending on which census unit you are working with, across the precinct data. This renders the precinct data into its 2000 census block group/tract-precinct pieces.

3. Sum the records by precinct so that you have one record for each precinct.

Precinct Data

Why is there a discrepancy in the number of precinct records in the precinct data files versus the number of precinct records in the precinct geographic files?

The discrepancy in the number of records is due to the fact that not all registration records can be associated to geographical locations but still must be reported in the Statement of Registration nor can all ballots that are cast can be associated with the registered voter's precinct.

I want to aggregate the precinct voting data(SOV) data to the city level, is there any practical implication for using one type of precinct instead of the other? In other words, would it better to use the sv precinct files instead of the sr precincts or vice versa?

You should use the sr precincts since this is the precinct unit for which we have precinct to city conversion/equivalency files (i.e. The sr precinct to city files that describe which precincts are in which cities).

What are the SOV, REG, ABS, POLL and VOTE files?

From the California Election Data pages you can download precinct data for California statewide elections back to the 2000 Primary Election. There are two types of precinct data. Data derived from the Statement of Vote, or SOV and data derived from the Statement of Registration, or SOR.

SOV or Statement of Vote data files are available by the sv and sr precinct types. The SOV files are in the first column of the data pages. These files contain the precinct level voting results. The SOV data files are available for the sv and sr precinct types.

SOR or Statement of Registration data files are processed into four file types: REG = registration data for all registered voters ABS = registration data for registered voters that voted by mail ballot POLLV = registration data for registered voters that voted at the polling place VOTE = registration data for all voters that voted. The VOTE files are the sum of the ABS and POLLV files.

Each of the SOR files is available by the rg, rr, and sr precinct types. The same registration data variables are reported for each election. Please refer to the SOR (statement of registration) codebook for a complete listing of the variables in the SOR files.

Why doesn't the Statewide Database have any registration and SOV data files for map precincts?

The mprec or map precinct is a geographic precinct type that is created by the Statewide Database to reflect the geography of the county's registration precincts as consistently as possible. The RR precincts are the non-geographic version of the MPREC. Furthermore, the RRPREC precincts are aggregations of RG precinct (tabular data) into MPRECs (geographic). Generally speaking, Map Precincts are RR precincts.

Because the resulting RR precincts may include RG precincts that are consolidated into different SV precincts we create a geographic consolidation known as the SR precinct to contain whole RR and SV precincts.

Are precinct data from one election comparable with precinct data from another election?

Unfortunately, no, precinct boundaries as well as the total number of precincts change with every election in most counties. How much they change depends on factors such as the fluctuation in voter registration numbers, the availability of polling locations, and expected voter turnout.

For example, registrars of voters will often "consolidate" (combine 2 or more into 1) precincts for low turnout election such as primary elections and will "split" (create 2 or more from 1) precincts for high turnout elections such as the 2008 General presidential election.

Why are there blanks in some srprec fields?

In some of the older SWDB files, mail ballot precincts were reported for whole ballot groups rather than individual precincts. For affected counties, these precincts will be missing from the geographic conversion files but will still be found in the absentee results for the ballot group.

This is different from the "UNASSIGN" srprec precinct value. The "UNASSIGN" srprec value represents geography assigned to registration precinct in which there were no registered voters.

Why isn't the 2000 to 2008 precinct data available by 2000 census block like the 1992 to 2000 electoral data is?

The Statewide Database produces a merged data set consisting of the previous decade's political data merged to the decennial PL94 census data at the level of the census block for each redistricting following the release of the PL94 data by the Census Bureau. The 1992 to 2000 census block data sets are part of the merged data set that was produced for the 2001 redistricting cycle. In between redistricting cycles we do not create the merged data sets (due to the effort required) but we do produce precinct/block conversion files/equivalencies which allow sophisticated users to perform their own mergers through a geographical proportionality method.

How/by what method was the 1992 to 2000 voting data merged to the 2000 census block from the precinct?

The propensities of various groups in the electorate to vote in particular ways (using ecological inference) are estimated. These propensities are then used to proportionally allocate the precinct results back to census blocks. Please note that census block and precinct geography come from different governmental entities and very little effort is made to produce common boundaries.

How is the 1992G to 2000G block-level data produced? Given that there are several blocks to a precinct, how were the votes assigned to the block level? Does this create any issues when aggregating blocks to the city level?

The process is documented in a report titled, "Disaggregation of Precinct Voting Results to Census Geography" which can be found at https://statewidedatabase.org/info/metadata/disaggregation_of_prec_to_block.pdf. Here is the link to our documentation page: https://statewidedatabase.org/metadata.html.This would not create any special issues when aggregating to the city.

I am interested in aggregating the the data to the county level. Does it matter if I use the by rgprec, by rrprec, by srprec, or by block files?

We recommend using the rg precinct files because they do not contain the county and district total records while the sr precinct files do. If you use the rr and sr precinct files, you will need to remove the totals records that are in the files. The names of these records vary by dataset but they can be identified by the inclusion of the "TOT" in the precinct field. We already have most of our election data aggregated to the county. These reports can be found here: https://statewidedatabase.org/info/statetext/state_reports.html.

What is a registration cycle as it applies to the cycles registered variables i.e. Dem registered 1 cycle (DREG1G), Dem registered 2 cycles (DREG2G) in the Statewide Database registration precinct data (REG, ABS, POLLV & VOTE) files?

There is a registration date on the registered voter file that is classified according to how many elections ago the voter registered. That is, if you registered in September, 2006, and we are processing for the November, 2010 election, you were registered for three general elections (2006, 2008, 2010).

The program then compares the registered date from the registrant's record with the date in the cohorts to determine where it falls and then increases the value of that cohort by one.

Here are the cohorts:

insert into reg_cohort (dt) values (#election_date);

update reg_cohort set cohort_1 = date_sub( dt, interval 2 year);

update reg_cohort set cohort_2 = date_sub( dt, interval 4 year);

update reg_cohort set cohort_3 = date_sub( dt, interval 6 year);

update reg_cohort set cohort_4 = date_sub( dt, interval 8 year);

update reg_cohort set cohort_5 = date_sub( dt, interval 10 year);

update reg_cohort set cohort_6 = date_sub( dt, interval 12 year);

update reg_cohort set cohort_7 = date_sub( dt, interval 14 year);

update reg_cohort set cohort_8 = date_sub( dt, interval 16 year);

So going backwards from the election date (say the g10 election) puts a registration date of December 2008 into cohort_1. Since there is a 15-day close of registration in California, if you registered at the last moment for the 2008 election, you had to be registered in October, so you would end up in cohort_2 (cohort_1 would be in the last two years, and no other general election. cohort_9 is registering over 16 years ago or no registration date, but that is rare now). No adjustments are made for purges of the registration roles.

I want to aggregate the precinct voting data(SOV) data to the city level, is there any practical implication for using one type of precinct instead of the other? In other words, would it better to use the sv precinct files instead of the sr precincts or vice versa?

You should use the sr precincts since this is the precinct unit for which we have precinct to city conversion/equivalency files (i.e. The sr precinct to city files which describe which precincts are in which cities.)

December 15th Q. Why, when merging the statewide SOV data by sr precinct to the reg voters data files by sr precinct.for instance, are there 4,606 more records in the 2006G SOV file than there are in the 2006G voters file and why when they merge the two files there are 4,800 sov records that remain unmatched?

In California SOV data are collected and reported by voting precinct (also known as sv precinct) while the SOR data are reported by registration precincts (also known as rg precinct). The SR precinct is the only precinct type that the SOR & SOV data have in common. In order to analyze the SOV and SOR precinct registration data in a common unit the SOR precinct data are merged to the SOV precinct results through the SR precincts. This is why there wouldn't be any 2006G voters file for the SOV precincts, only the SR precincts. This is true for any year, of course, as registration statistics cannot be matched up against election statistics except through the SR precincts (in general they are at the block level in the redistricting dataset but that requires an extensive breakdown which is described in the documentation).

In 2006 and previous election years, California County election officials were not required to break out absentee results to voting precincts and as a result some large counties such as Los Angeles (06037) and San Diego (06073) did not do this breakdown of their absentee results to voting precincts. This is primarily why there are so many unmatched records. California election officials were not required to breakdown absentee results to the voting sov precinct until state law changed in 2008.

Prior to 2008, absentee votes were usually reported by ballot groups. Ballot groups are ballots for which all of the races on that ballot are identical, as the races on a voter's ballot changes as districts change for that voter.

We can provide files for each of the mergers showing what went into what, but one will find that the 2008 and later mergers have significantly fewer mismatches. This is because state law changed, requiring a breakdown of absentee results to the voting sov precinct. (there still are a few unassigned absentee sov precincts--some are federal voters, others we're not quite sure why they were created--but it is a relatively small number).

The Redistricting Database's technical documentation discusses SOV and SOR precinct data at length, https://statewidedatabase.org/d10/Creating%20CA%20Official%20Redistricting%20Database.pdf Here is a link to an example of SOR & SOV precinct merge, https://statewidedatabase.org/info/merge/ssmrg.html Here is a link to additional information regarding the various precinct types and how they are related to the merger of the SOV and SOR precinct data, https://statewidedatabase.org/diagrams.html

Registration Data

What are the SOV, REG, ABS, POLL and VOTE files?

From the California Election Data pages you can download precinct data for California statewide elections back to the 2000 Primary Election. There are two types of precinct data. Data derived from the Statement of Vote, or SOV and data derived from the Statement of Registration, or SOR.

SOV or Statement of Vote data files are available by the sv and sr precinct types. The SOV files are in the first column of the data pages. These files contain the precinct level voting results. The SOV data files are available for the sv and sr precinct types.

SOR or Statement of Registration data files are processed into four file types: REG = registration data for all registered voters ABS = registration data for registered voters that voted by mail ballot POLLV = registration data for registered voters that voted at the polling place VOTE = registration data for all voters that voted. The VOTE files are the sum of the ABS and POLLV files.

Each of the SOR files is available by the rg, rr, and sr precinct types. The same registration data variables are reported for each election. Please refer to the SOR (statement of registration) codebook for a complete listing of the variables in the SOR files.

How does SWDB obtain demographic information about voters?

The only demographic data that we have for voters are surname-matched registration data. Basically, we obtain the last names of registered voters by address and then match their last names to an ethnic group and then report that data by precinct. Due to the fact that the data is surname-matched it is not possible to distinguish black registered voters from white registered voters. Here is the link to the documentation on the surname matching process: https://statewidedatabase.org/info/metadata/surname.html

How does SWDB obtain demographic information about voters?

Unfortunately, we don't have voting patterns by ethnicity per se but what we do have are surname matched registration data. The surname-matched registration data can give you an idea of patterns of registration by groups with ethnic names.

Please note that black and whites registered voters cannot be matched to an ethnic group based on their last name so this data will not be able to give you numbers for black and white registration.

Does SWDB have documentation on the methodology and on Spanish surname list it uses for identifying Latino surnamed registered voters?
The Statewide Database uses the Passel-Word list published by the U.S. Census Bureau in 1980. Its a list of 12,497 different Spanish surnames. The central premise for including a surname on that list was the similarity of that name(s) and the geographic distribution to the geographic distribution of the Hispanic origin population within the United States.

The 12,497 surnames appearing on the 1980 Spanish surname list were culled from a database of 85 million taxpayers filing individual federal tax returns for 1977.

Please see this document for more information on how the U.S. Census Bureau developed the surname list that we use,located at: http://www.census.gov/population/documentation/twpno13.pdf In terms of methodology, the Census Bureau also provides some processing guidelines which we implement, such as how to handle situation such where there is more than one part to the name, as in the case with the name De La Torre.

In terms of a cutoff, the Statewide Database uses all of the names if they match the list.

Do you have any estimates of Hispanic registration and turnout prior to 1992, or know where we might find this info?

No, unfortunately we do not have surname-matched data for elections before 1992. We are not aware of any other existing data sets of surname-matched voter registration data for elections before 1992.

Does SWDB provide voter or registration data by zipcode units? What about by ZCTAs?

No; however, the Statewide Database does publish its electoral (Registration and Statement of Vote) by USPS zip code.

The Statewide Database is California's redistricting database and redistricting is done with Census TIGER/Line files. Zip codes are United States Postal Service delivery areas and as such they are maintained and changed at will by the USPS. Furthermore, USPS zip codes areas create transecting areas when overlaid on the Census TIGER/Line geography and precincts.

Though it is not our practice to maintain data or geography for zip code units here at the Statewide Database for the above mentioned reasons, there are data reports for zip codes on our website that we have either collected or that were created as part of special research projects. These old data reports can be downloaded from our website's REPORTS archive but we do not have any current or future plans to publish our precinct data sets by USPS zip code.

ZCTAs, or zip code tabulation areas are a different story though we don't intend to create ZCTA data reports as any analyst can using our precinct to 2000 block conversions files.

ZCTAs are Census statistical tabulation areas, unlike USPS zip codes; and as such, they do align with census geography.

Using the Statewide Database's rg precinct to 2000 block conversion files it is possible to aggregate block-level registration to the ZCTA using a ZCTA to 2000 census block cross-walk file.

A ZCTA to 2000 census block cross-walk file, can be found on the Census Bureau's web site, http://www.census.gov/geo/ZCTA/zcta.html.

Are the 1992 - 2000 block-level registration figures estimates (like the election results) or are they actual registrations disaggregated to the census block? What about the block-level partisan registration figures by race/ethnicity?

They are actual registration disaggregated to the census block. Registrants are geocoded by registered voter addressees to the 2000 census block. The registration figures by race/ethnicity are based on matching the registered voters' name to an ethnic surname list. This is why there is no measure of white or black registered voters as blacks and whites do not have distinctive surnames. Here is more information on the surname-matched registration data, https://statewidedatabase.org/info/metadata/surname

Can we obtain the exact count of the registration breakdown by block for elections after 2000?

You would need to geocode the voter registration file to the census block using the registered voter's address. This would allow you to retain/associate all of the data associated with a registrant including party affiliation.

The Statewide Database's precinct block to conversion files are based only on a geocode of total registrants in a precinct and not their party affiliation.

*Next spring the Statewide Database will be releasing all of our precinct data from the 2002G to the 2010G on the 2011 census blocks. This census block data set of registration and voting data will be constructed using a more precise method than the precinct to block conversion files.

What is a registration cycle as it applies to the cycles registered variables i.e. Dem registered 1 cycle (DREG1G), Dem registered 2 cycles (DREG2G) in the Statewide Database registration precinct data (REG, ABS, POLLV & VOTE) files?

There is a registration date on the registered voter file that is classified according to how many elections ago the voter registered. That is, if you registered in September, 2006, and we are processing for the November, 2010 election, you were registered for three general elections (2006, 2008, 2010).

The program then compares the registered date from the registrant's record with the date in the cohorts to determine where it falls and then increases the value of that cohort by one.

Here are the cohorts:

insert into reg_cohort (dt) values (#election_date);

update reg_cohort set cohort_1 = date_sub( dt, interval 2 year);

update reg_cohort set cohort_2 = date_sub( dt, interval 4 year);

update reg_cohort set cohort_3 = date_sub( dt, interval 6 year);

update reg_cohort set cohort_4 = date_sub( dt, interval 8 year);

update reg_cohort set cohort_5 = date_sub( dt, interval 10 year);

update reg_cohort set cohort_6 = date_sub( dt, interval 12 year);

update reg_cohort set cohort_7 = date_sub( dt, interval 14 year);

update reg_cohort set cohort_8 = date_sub( dt, interval 16 year);

So going backwards from the election date (say the g10 election) puts a registration date of December 2008 into cohort_1.

Since there is a 15-day close of registration in California, if you registered at the last moment for the 2008 election, you had to be registered in October, so you would end up in cohort_2 (cohort_1 would be in the last two years, and no other general election. cohort_9 is registering over 16 years ago or no registration date, but that is rare now). No adjustments are made for purges of the registration roles.

Voting Data

How does SWDB obtain demographic information about voters?

The only demographic data that we have for voters are surname-matched registration data. Basically, we obtain the last names of registered voters by address and then match their last names to an ethnic group and then report that data by precinct. Due to the fact that the data is surname-matched it is not possible to distinguish black registered voters from white registered voters. Here is the link to the documentation on the surname matching process: ,https://statewidedatabase.org/info/metadata/surname.html

Do you have data on voting patterns by ethnicity?

Unfortunately, we don't have voting patterns by ethnicity per se but what we do have are surname matched registration data. The surname-matched registration data can give you an idea of patterns of registration by groups with ethnic names.

Please note that black and whites registered voters cannot be matched to an ethnic group based on their last name so this data will not be able to give you numbers for black and white registration.

So going backwards from the election date (say the g10 election) puts a registration date of December 2008 into cohort_1.

Since there is a 15-day close of registration in California, if you registered at the last moment for the 2008 election, you had to be registered in October, so you would end up in cohort_2 (cohort_1 would be in the last two years, and no other general election. cohort_9 is registering over 16 years ago or no registration date, but that is rare now).

No adjustments are made for purges of the registration roles.

I want to aggregate the precinct voting data(SOV) data to the city level, is there any practical implication for using one type of precinct instead of the other? In other words, would it better to use the sv precinct files instead of the sr precincts or vice versa?

You should use the sr precincts since this is the precinct unit for which we have precinct to city conversion/equivalency files (i.e. The sr precinct to city files which describe which precincts are in which cities.)

December 15th

Q. Why, when merging the statewide SOV data by sr precinct to the reg voters data files by sr precinct.for instance, are there 4,606 more records in the 2006G SOV file than there are in the 2006G voters file and why when they merge the two files there are 4,800 sov records that remain unmatched?

In California SOV data are collected and reported by voting precinct (also known as sv precinct) while the SOR data are reported by registration precincts (also known as rg precinct). The SR precinct is the only precinct type that the SOR & SOV data have in common. In order to analyze the SOV and SOR precinct registration data in a common unit the SOR precinct data are merged to the SOV precinct results through the SR precincts. This is why there wouldn't be any 2006G voters file for the SOV precincts, only the SR precincts. This is true for any year, of course, as registration statistics cannot be matched up against election statistics except through the SR precincts (in general they are at the block level in the redistricting dataset but that requires an extensive breakdown which is described in the documentation).

In 2006 and previous election years, California County election officials were not required to break out absentee results to voting precincts and as a result some large counties such as Los Angeles (06037) and San Diego (06073) did not do this breakdown of their absentee results to voting precincts. This is primarily why there are so many unmatched records. California election officials were not required to breakdown absentee results to the voting sov precinct until state law changed in 2008. Prior to 2008, absentee votes were usually reported by ballot groups. Ballot groups are ballots for which all of the races on that ballot are identical, as the races on a voter's ballot changes as districts change for that voter.

We can provide files for each of the mergers showing what went into what, but one will find that the 2008 and later mergers have significantly fewer mismatches. This is because state law changed, requiring a breakdown of absentee results to the voting sov precinct. (there still are a few unassigned absentee sov precincts--some are federal voters, others we're not quite sure why they were created--but it is a relatively small number).

The Redistricting Database's technical documentation discusses SOV and SOR precinct data at length, https://statewidedatabase.org/d10/Creating%20CA%20Official%20Redistricting%20Database.pdf

Here is a link to an example of SOR & SOV precinct merge, https://statewidedatabase.org/info/merge/ssmrg.html

Here is a link to additional information regarding the various precinct types and how they are related to the merger of the SOV and SOR precinct data, https://statewidedatabase.org/diagrams.html

TOP