Statewide Database | FAQ

Type a keyword into the search box:

General

Where is the Statewide Database?

The Statewide Database's offices are located in Barrows Hall at UC Berkeley, Berkeley, CA 94720.

What type of data do you have?

The Statewide Database's datasets consist of the Statements of Vote (SOV) and the Statements of Registration (SOR) for each statewide election since 1992. These data are collected from California's 58 counties. The database also maintains decennial and other census data for use in redistricting. The Census data are available for download in the Census Data section of our website. SOV, and SOR data are available for download from the Statewide Database website in the Election Data section of our website. Data are available on various units of analysis such as census block, tract, and precinct.

Where do you get your data?

The registration and voting data are collected by the County Registrar of Voters or County Clerks in each of California's 58 counties. Census data are collected by the Census Bureau under the Department of Commerce.

How can I get local election data?

At this time the Statewide Database only receives funding to process statewide races and does not collect data for local races like city council and mayor. Contact the local County Registrar of Voters or Election Clerk to obtain data for local races.

How much do you charge?

The Statewide Database is a free public resource. Our data and services are available to anyone who wishes to use them. All items ready for download via this website are free. Please contact us if you require other data formats or accommodations.

What are FIPS county codes?

FIPS stands for Federal Information Processing Standard Code. FIPS county codes are unique three-digit codes that identify counties in California. In census files, the county FIPS codes are five-digits with the last three digits indicating the county and the first two digits designating the state FIPS code (which for California is 06). Below is a table of county numbers, county names are in alphabetical order accompanied by FIPS codes. FIPS codes are calculated by taking the county number, multiplying it by 2 and subtracting 1.

County Number County Name County Fips
1 Alameda 001
2 Alpine 003
3 Amador 005
4 Butte 007
5 Calaveras 009
6 Colusa 011
7 Contra Costa 013
8 Del Norte 015
9 El Dorado 017
10 Fresno 019
11 Glenn 021
12 Humboldt 023
13 Imperial 025
14 Inyo 027
15 Kern 029
16 Kings 031
17 Lake 033
18 Lassen 035
19 Los Angeles 037
20 Madera 039
21 Marin 041
22 Mariposa 043
23 Mendocino 045
24 Merced 047
25 Modoc 049
26 Mono 051
27 Monterey 053
28 Napa 055
29 Nevada 057
30 Orange 059
31 Placer 061
32 Plumas 063
33 Riverside 065
34 Sacramento 067
35 San Benito 069
36 San Bernardino 071
37 San Diego 073
38 San Francisco 075
39 San Joaquin 077
40 San Luis Obispo 079
41 San Mateo 081
42 Santa Barbara 083
43 Santa Clara 085
44 Santa Cruz 087
45 Shasta 089
46 Sierra 091
47 Siskiyou 093
48 Solano 095
49 Sonoma 097
50 Stanislaus 099
51 Sutter 101
52 Tehama 103
53 Trinity 105
54 Tulare 107
55 Tuolumne 109
56 Ventura 111
57 Yolo 113
58 Yuba 115

Do I need a special program to read the data?

Our geographic data files (shapefiles, GeoPackages, and *.kmz files) can be opened using GIS software. Our tabular data files, which contains the vote and registration data associated with different geographies, are distributed in *.csv and *.dbf formats. *.csv files can be read by most spreadsheet programs, while *.dbf files are part of the shapefile format, and can be opened by GIS software. Data is also available in *.txt format, which can be read by most word-processing programs.

Do you make maps?

We encourage people to make their own maps with our data using their own GIS (Geographic Information System) software. We will assist you as much as possible.

Do you draw district lines

While we are the State of California's "Redistricting Database" we do NOT draw lines. Our purpose is to readily provide data to ALL who wish to use them. While some of our users may intend to use our data to draw their own plans, we do NOT provide instruction on drawing lines, nor will we draw them for anyone.

Where was the new 2001 California congressional seat created?

California’s newest congressional seat, created in 2001, was the 46th congressional district. Located in Orange County, it was drawn to encompass the cities of Anaheim (west and north-south Anaheim Stadium-Disneyland corridor), Buena Park, Costa Mesa, Fountain Valley, Garden Grove, Irvine, Orange, Santa Ana, Stanton, Tustin, and Westminster (north of San Diego Freeway)

Do you also have Census Data?

We provide California’s Census Data specified for Redistricting and Reapportionment. It is published in a file called PL94-171 that includes total population, race and ethnicity, and housing units. Census data can be found at Statewide Database Census page.

Census Data

What Census data do you provide?

We provide California’s Census Data specified for Redistricting and Reapportionment. It is published in a file called PL94-171 that includes total population, race and ethnicity, and housing units. Census data can be found at Statewide Database Census page

Can I merge 2000 census tract data to the 2001 Assembly, Congressional, Senate, and Board of Equalization Districts?

Yes, the Statewide Database provides a set of files, the 2000 Census tract to California's 2001 Districts Conversion files that can be used for this purpose. These files can be found at the Census Geography Assignment & Conversion File Collection page.

How do you determine which census blocks/ tracts are in a given city?

The U.S. Census Bureau distributes a census block assignment file for every state. You can use this data to determine which census blocks are in any census place in California. Please note that census tracts cross city boundaries while census blocks do not. What this means is that some of your tracts will only be partially within the borders of your city of interest while the census blocks will be either fully in or out.

Are the 1992G to 2000G census block data files by 1990 or 2000 census block units?

The 1992G to 2000G data files are based on the 2000 census blocks.

How/by what method was the 1992 to 2000 voting data merged to the 2000 census block from the precinct?

The propensities of various groups in the electorate to vote in particular ways (using ecological inference) are estimated. These propensities are then used to proportionally allocate the precinct results back to census blocks. Please note that census block and precinct geography come from different governmental entities and very little effort is made to produce common boundaries.

How is the 1992G to 2000G block-level data produced? Given that there are several blocks to a precinct, how were the votes assigned to the block level? Does this create any issues when aggregating blocks to the city level?

The process is documented in a report titled, "Disaggregation of Precinct Voting Results to Census Geography". For more information, please visit our Documentation & Metadata page . In brief, this would not create any special issues when aggregating to the city.

Does SWDB provide voter or registration data by zipcode units? What about by ZCTAs?

No; however, the Statewide Database does publish its electoral data (Registration and Statement of Vote) by USPS zip code.

The Statewide Database is California's redistricting database and redistricting is done with Census TIGER/Line files. Zip codes are United States Postal Service delivery areas, and as such they are maintained and changed at will by the USPS. USPS zip codes areas do not necessarily align with Census geography, such as census block boundaries.
Though it is not our practice to maintain data or geography for zip code units here at the Statewide Database, there are data reports for zip codes on our website that we have either collected or that were created as part of special research projects. These old data reports can be downloaded from our Zip Code Reports archive. Please note that we do not have any current or future plans to publish our precinct data sets by USPS zip code.

ZCTAs, or Zip Code Tabulation Areas, are Census statistical tabulation areas, unlike USPS zip codes; and as such, they do align with census geography. We don't intend to create ZCTA data reports, however you can create your own ZCTA data reports using our Precinct to Block Conversion files, which can be found in the “Geographic Data” links under our Election Data page. Using the Statewide Database's rg precinct to block conversion files, it is possible to aggregate block-level registration to the ZCTA using a ZCTA to census block cross-walk file.

A ZCTA to census block cross-walk file can be found on the Census Bureau's web site.

Can we obtain the exact count of the registration breakdown by block for elections after 2000?

You would need to geocode the voter registration file to the census block using the registered voter's address. This would allow you to retain/associate all of the data associated with a registrant including party affiliation. The Statewide Database's precinct block to conversion files are based only on a geocode of total registrants in a precinct and not their party affiliation.
*Next spring the Statewide Database will be releasing all of our precinct data from the 2002G to the 2010G on the 2011 census blocks. This census block data set of registration and voting data will be constructed using a more precise method than the precinct to block conversion files.

How can I estimate the registration breakdown (i.e. number of Republicans and Democrats) for a particular block using the precinct data files.

Each block contains a certain percentage of a precinct's total registrants. In order to estimate, the registration breakdown for a particular block, use the election data set's 2000 Census Block to Precinct conversion file to multiply the percentage of each party in the overall precinct by the percent that block represents in terms of total registration.

Technical GIS and Importing Data

What projection and coordinate system does SWDB's GIS spatial files use?

The .cdf and .shp files use the latitude and longitude system which is an X-Y coordinate system and the projection of the files is NAD 83 UTM. The .mif files use the Earth Projection 1, 0 coordinate system which basically means latitude/longitude on a perfect sphere.

How do you import DBF files into R?

Method 1
    (If your version of R does not have the "foreign import" library - for example, library (foreign) returns false then the R cannot directly read or import the DBF file): Convert the DBF to a CSV by opening the DBF in Excel saving it as a CSV. Then use a command line of R to read in that CSV.
  • Step 1:
      Open Excel. Select File → Open.
  • Step 2:
      In the "Look In" drop down box, select the directory in which you saved the DBF file; and in the "Files of Type” drop down box, select "All Files (*.*)." Double click on the file name and the .dbf file will now open in Excel
  • Step 3:
      With that open DBF, select File → Save As and in the "Save as Type” drop down box, select "CSV (Comma delimited)." Specify the filename and save
  • Step 4:
      In R, use the following syntax to open your newly-created CSV: read.csv(filename, header = TRUE, sep = ",", quote="\"", dec=".", fill = TRUE);
  • Method 2

    Conversion Files

    What are the DATA CONVERSION files i.e. the SRPREC to BLK, RGPREC to BLK, BLK to MPREC, SRPREC to RGPREC and SRPREC to CITY files and what are they used for?

    These are the Statewide Database's conversion and equivalency files, also referred to as cross-walk files. Each election has a set of these files. They can be found in the 3rd column on the election's Geographic Data page . The SPREC to RGPREC and MPREC to SPREC files are used to cross-walk the registration (REG, ABS, POLLV & VOTE) data, and the Statement of Vote (SOV) precinct data between the different precinct types - i.e. rgprec, rrprec, srprec and mprec. For example, if your analysis requires that you merge SOV data to the map precincts (mprec) for spatial/GIS analysis, and since the SOV data is only available by the srprec and svprec precinct types, one would use the MPREC to SRPREC file to cross-walk the two precinct types. One can merge census data to the precincts using the SRPREC to BLK, RGPREC to BLK and BLK to MPREC files. Additionally, the SRPREC to BLK and RGPREC to BLK are bi-directional so they can also be used to merge the srprec and rgprec data to census blocks.

    What is the significance of the "UNASSIGN" values in the spatial and geographic data conversion files?

    The "UNASSIGN" value represents a geographic area assigned to a registration precinct in which there were no registered voters.

    How is a census geography associated with a precinct in the precinct to blk conversion files i.e. SRPREC to BLK, RGPREC to BLK and BLK to MPREC files and what is the "conversion" based on?

    First of all, the conversion files between blocks and precincts are obtained from the digitally-recorded precincts overlaid on the census geography. Then every individual registered voter within an address is "geocoded," or put into his or her census geography by means of address matching.

    This process allows the number of registered voters per census block - precinct piece to be determined. The number of registered voters in a given precinct-block piece is reported in the BLKREG field in the block to precinct conversion files.

    When the BLKREG is divided by the total number of registered voters in the precinct (SRTOTREG/ RGTOTREG) one can derive what proportion of a precinct is composed of a given block-precinct piece and vice verse when the BLKREG is divided by the BLKTOTREG, the percent of a block belonging to a given precinct can be derived. The measure is reported in the PCTBLK field of the block to precinct conversion files.

    How can I merge registration and voting precinct data to census block units using the precinct to block conversion files on the Geographic Data page i.e. the SRPREC to BLK and RGPREC to BLK files? What about merging to 2000 census block groups and/ or 2000 census tracts?

    The precinct to block conversion files have a field called "PCTRGPREC" in the RGPREC to BLK files and "PCTSRPREC" in the SRPREC to BLK files. These fields contain the proportion of a given precinct's total registered voters (RGTOTREG or SRTOTREG) that are contained within the portion of the precinct that is encompassed by the census bock. The number of registered voters in the census block piece is "BLKREG."

    Hence the values in the " PCTRGPREC" and " PCTSRPREC" variable fields are derived thus: BLKREG / RGTOTREG = PCTRGPREC and BLKREG / SRTOTREG = PCTSRPREC

    The case is often that multiple census blocks will transect a single precinct with some census blocks being split between several precincts. Overview of the procedure:

  • Merge the precinct to 2000 census block conversion file to the precinct data you want to merge to census block units.
  • Distribute PCTRGPREC or the PCTSRPREC value, depending on which precinct type you are working with, across the precinct data. This renders the precinct data into its 2000 census block-precinct pieces.
  • Sum the records by census block so that you have one record for each block. If your goal is to obtain precinct data by block group or census tract you should sum the file to these units.
  • Is it also possible to use these same files to do the reverse i.e. merge/ convert census block data to the sr, rg and map precincts?

    Yes, both of the RGPREC to BLK and the SRPREC to BLK as well as, the BLK to MPREC file can be used to merge census block data to the RG, SR and Map precinct types.

    In the case of merging census data to precinct units, you will be distributing the value in the "PCTBLK" field, rather than the "PCTRGPREC" and "PCTSRPREC," across the census block data you want to merge to precinct units.

    The precinct to block conversion files and the MPREC to BLK conversion file have a field called "PCTBLK." This field contains the value of the proportion of a given 2000 census block's total registered voters (RGTOTREG or SRTOTREG) that are encompassed by a single precinct. The number of registered voters in the census block piece is "BLKREG" and the total registered voters in the census block is the "BLKTOTREG."

    Hence the values in the "PCTBLK" variable field are derived thus: BLKREG / BLKTOTREG = PCTBLK Overview of the procedure:

    1. Merge the 2000 census block to precinct conversion file to the census block data that you want by precinct units.

    2. Distribute PCTBLK value across the precinct data. This renders the census block data into its 2000 census block - precinct pieces.

    3. Sum the records by precinct so that you have one record for each block.

    What about merging block group and tract data to the precincts? Can this also be done with the block to precinct conversion files?

    Yes, the RGPREC to BLK, the SRPREC to BLK as well as, the BLK to MPREC file can be used to do this but it is a more advanced analysis.

    To merge 2000 tract records or block group data to one of the precinct types i.e. rg, sr or map it is first necessary to re-tabulate the records in the conversion file to determine the proportion of a block group/ tract's total registrants that fall into a given precinct.

    Overview of the procedure:

    1. Merge the re-tabulated precinct to 2000 census block group/ tract conversion file to the block group/ tract data that you would like to merge to precinct units.

    2. Distribute the percent block group/ tract value, depending on which census unit you are working with, across the precinct data. This renders the precinct data into its 2000 census block group/tract-precinct pieces.

    3. Sum the records by precinct so that you have one record for each precinct.

    Precinct Data

    I am comparing precinct level data from the Statewide Database to the precinct data from the Registrar of Voters and I am finding discrepancies with registration and the number of Votes cast. Why?

    Election Data are kept on different files for different purposes. When you pull election returns from the Registrar of Voters file, you are looking at the Statement of Vote (SOV). You may also see Voter Registration on that file, which corresponds (in some but not necessarily all counties) to the registration 29 days before the day of the election. The SWDB uses the 15-day close file to process registration data: this means the file is frozen 15 days prior to Election Day and changes leading up to the election will not be captured. The SWDB also uses SOV data at the precinct level and those generally agree with the data published on the respective county’s website, although with some exceptions including that turnout may not match in cases of multi-page ballots. Rarely, there are processing errors and those are addressed in our errata files. The SWDB uses an additional file for processing voter data: the Voter History file. Via that file, we can track the voter irrespective of where the person resides, which is important for the building of a longitudinal database. That file also tells us the method by which the voter participated, e.g. did the voter use a mail ballot or did they vote at the polling place.
    The use of these different files has implications for data reporting by precinct:
    One, as mentioned above, the registration numbers may differ due to our use of the 15- day close and the Registrars’ use of the 29-day close of Registration. Two, the SOV is stable for each election but the Voter History file is continuously updated. Three, the different categories of vote method don't always add up to the total registration and votes cast in the precinct. Finally, provisional votes are added to the precinct totals later in the process (after Election Day). You are seeing differences in the SOV due to all of the above: the fact that the categories don't add up and the fact that voters are no longer in the precinct (moved etc.) or are new to the precinct by the time we receive the file.

    Why is there a discrepancy in the number of precinct records in the precinct data files versus the number of precinct records in the precinct geographic files?

    The discrepancy in the number of records is due to the fact that not all registration records can be associated to geographical locations but still must be reported in the Statement of Registration nor can all ballots that are cast can be associated with the registered voter's precinct.

    I want to aggregate the precinct voting data(SOV) data to the city level, is there any practical implication for using one type of precinct instead of the other? In other words, would it better to use the sv precinct files instead of the sr precincts or vice versa?

    You should use the sr precincts, since this is the precinct unit for which we have precinct to city conversion/equivalency files.These sr precinct to city files describe which precincts are in which cities.

    What are the SOV, REG, ABS, POLL and VOTE files?

    From the our Election Data page , you can download precinct data for California statewide elections back to the 2000 Primary Election. There are two types of precinct data: data derived from the Statement of Vote, or SOV, and data derived from the Statement of Registration, or SOR.

    Statement of Vote (SOV) data files are available by the sv and sr precinct types. The SOV files are in the first column of the data pages. These files contain the precinct-level voting results. The SOV data files are available for the sv and sr precinct types.

    Statement of Registration (SOR) data files are processed into four file types: REG = registration data for all registered voters; ABS = registration data for registered voters that voted by mail ballot; POLLV = registration data for registered voters that voted at the polling place; VOTE = registration data for all voters that voted. The VOTE files are the sum of the ABS and POLLV files.

    Each of the SOR files is available by the rg, rr, and sr precinct types. The same registration data variables are reported for each election. Please refer to the Statement of Registration codebook in the precinct data page for a complete listing of the variables in the SOR files.

    Why doesn't the Statewide Database have any registration and SOV data files for map precincts?

    The Map Precinct (mprec)is a geographic precinct type that is created by the Statewide Database to reflect the geography of the county's registration precincts as consistently as possible. The RR precincts are the non-geographic version of the MPREC and are aggregations of RG precinct (tabular data) into MPRECs (geographic data). Generally speaking, Map Precincts and RR Precincts follow the same boundaries.

    Because the resulting RR Precincts may include RG Precincts that are consolidated into different SV Precincts, we create a geographic consolidation known as the SR Precinct to contain whole RR and SV Precincts.

    How can I estimate the registration breakdown (i.e. number of Republicans and Democrats) for a particular census block using the precinct data files?

    Each census block contains a certain percentage of a precinct's total registrants. In order to estimate the registration breakdown for a particular block, use the Geographic Data links on our Election Data . Use Census Block to Precinct conversion files to multiply the percentage of each party in the overall precinct by the percent that block represents in terms of total registration.

    Are precinct data from one election comparable with precinct data from another election?

    Unfortunately, no. Precinct boundaries, as well as the total number of precincts, change with every election in most counties. How much they change depends on factors such as the fluctuation in voter registration numbers, the availability of polling locations, and expected voter turnout.
    For example, Registrars of Voters will often "consolidate" (combine 2 or more into 1) precincts for low turnout elections such as Primary Elections and will "split" (create 2 or more from 1) precincts for high turnout elections, such as the General Presidential Elections.

    Why isn't the 2000 to 2008 precinct data available by 2000 census block like the 1992 to 2000 electoral data is?

    The Statewide Database produces a merged data set consisting of the previous decade's political data merged to the decennial PL94 census data at the level of the census block for each redistricting following the release of the PL94 data by the Census Bureau.
    The 1992 to 2000 census block data sets are part of the merged data set that was produced for the 2001 redistricting cycle. In between redistricting cycles, we do not create merged data sets. We do produce precinct/block conversion files/equivalency files which allow experienced GIS users to perform their own merges through a geographical proportionality method.

    How/by what method was the 1992 to 2000 voting data merged to the 2000 census block from the precinct?

    The propensities of various groups in the electorate to vote in particular ways (using ecological inference) are estimated. These propensities are then used to proportionally allocate the precinct results back to census blocks. Please note that census block and precinct geography come from different governmental entities and very little effort is made to produce common boundaries.

    What projection and coordinate system does SWDB's GIS spatial files use?

    The .cdf and .shp files use the latitude and longitude system which is an X-Y coordinate system and the projection of the files is NAD 83 UTM. The .mif files use the Earth Projection 1, 0 coordinate system which basically means latitude/longitude on a perfect sphere.

    How is the 1992G to 2000G block-level data produced? Given that there are several blocks to a precinct, how were the votes assigned to the block level? Does this create any issues when aggregating blocks to the city level?

    The process is documented in a report titled, "Disaggregation of Precinct Voting Results to Census Geography". For more information, please visit our Technical Documentation & Metadata page. In brief, this would not create any special issues when aggregating to the city.

    I am interested in aggregating the data to the county level. Does it matter if I use by rgprec, by rrprec, by srprec, or by block files?

    We recommend using the rg precinct files because they do not contain the county and district total records while the sr precinct files do. If you use the rr and sr precinct files, you will need to remove the totals records that are in the files. The names of these records vary by dataset, but they can be identified by the inclusion of the "TOT" in the precinct field. We already have most of our election data aggregated to the county.

    What is a registration cycle as it applies to the cycles registered variables i.e. Dem registered 1 cycle (DREG1G), Dem registered 2 cycles (DREG2G) in the Statewide Database registration precinct data (REG, ABS, POLLV & VOTE) files?

    There is a registration date on the registered voter file that is classified according to how many elections ago the voter registered. That is, if you registered in September, 2006, and we are processing for the November, 2010 election, you were registered for three general elections (2006, 2008, 2010).

    The program then compares the registered date from the registrant's record with the date in the cohorts to determine where it falls and then increases the value of that cohort by one.

    Here are the cohorts:

    insert into reg_cohort (dt) values (#election_date);

    update reg_cohort set cohort_1 = date_sub( dt, interval 2 year);

    update reg_cohort set cohort_2 = date_sub( dt, interval 4 year);

    update reg_cohort set cohort_3 = date_sub( dt, interval 6 year);

    update reg_cohort set cohort_4 = date_sub( dt, interval 8 year);

    update reg_cohort set cohort_5 = date_sub( dt, interval 10 year);

    update reg_cohort set cohort_6 = date_sub( dt, interval 12 year);

    update reg_cohort set cohort_7 = date_sub( dt, interval 14 year);

    update reg_cohort set cohort_8 = date_sub( dt, interval 16 year);

    So going backwards from the election date (say the g10 election) puts a registration date of December 2008 into cohort_1.

    Since there is a 15-day close of registration in California, if you registered at the last moment for the 2008 election, you had to be registered in October, so you would end up in cohort_2 (cohort_1 would be in the last two years, and no other general election. cohort_9 is registering over 16 years ago or no registration date, but that is rare now). No adjustments are made for purges of the registration roles.

    Voting Data

    How does SWDB obtain demographic information about voters?

    The only demographic data that we have for voters are surname-matched registration data. WBasically, we obtain the last names of registered voters by address, and then match their last names to an ethnic group, and then report that data by precinct. Due to the fact that the data is surname-matched, it is not possible to distinguish Black registered voters from white registered voters. For more information, please refer to our documentation on the surname matching process.

    Do you have data on voting patterns by ethnicity?

    We don't have voting patterns by ethnicity per se, but what we do have are surname-matched registration data. The surname-matched registration data can give you an idea of patterns of registration by ethnic groups based on their last names.
    Please note that Black and white registered voters cannot be matched to an ethnic group based on their last name, so this data will not be able to provide numbers for Black and white registration totals.