
It’s commonplace expertise that political events in the United States gather records about capability voters, however precisely how comprehensive is the information they collet?
To explore that question, we got down to discover special descriptions of 176 information factors the Republican National Committee (RNC) has been accumulating about citizens due to the fact as a minimum 2008. The information points were posted on June 19 by using a cybersecurity company after certainly one of its analysts observed a trove of voter statistics that have been inadvertently left on a public Amazon server. Our seek led us to formerly unreported facts sources, offering a more complete view of what the Republican party tracks approximately
American voters.
The voter information had been discovered on June 12 with the aid of Chris Vickery, a risk analyst at the safety company UpGuard. While scanning the internet for misconfigured structures, Vickery got here across an eleven-terabyte cache of election-related data that he later found out have been compiled via three Republican contractors: TargetPoint Consulting, the Data Trust, and Deep Root Analytics. In a public statement, Deep Root took responsibility for leaving the statistics at the Amazon server wherein Vickery located it, pronouncing the information had only been used to “inform local tv advert buying.”
Included inside the documents Vickery found had been 102 large spreadsheets– for every nation and the District of Columbia. For every kingdom, one record contained voter information based totally on the 2008 election, and the opposite primarily based on the 2012 election. In its blog post, UpGuard listed the 176 classes that made up the column headers in the ones spreadsheets. Some have been self-explanatory, inclusive of “FirstName” and “OfficialParty,” but others have been no longer, which includes “VH12PP” and “RNCCalcParty.” A few have been particularly clear, such as “ModeledEthnicGroup,” which shows facts approximately a voter’s ethnic institution as decided through predictive modeling, however what those corporations were was much less clear. UpGuard declined to proportion similarly information about the records, bringing up the inherent privacy violations that would come with this type of disclosure.
However, we have been capable of in shape up the kinds revealed with the aid of UpGuard with different resources and gain distinct descriptions of most of them.
First, we discovered that the unique field names indexed in UpGuard’s weblog put up in shape up with the ones used in a now-offline API that looks to had been built by using the Data Trust for the RNC. The RNC’s API, which became previously hosted at medical doctors.Api.Gop.Com, is no longer on-line, and cached versions of it simplest display an Amazon AWS login web page. But very specific Google searches, together with website:medical doctors.Api.Gop.Com VH12G matched 137 of the 176 categories UpGuard listed, and most of those revealed the category’s descriptions. Some fields have been barely specific, listed in UpGuard’s publish as “RegistrationAddr1” and inside the API as “Registration_Addr1,” for example, but the delivered underscores have been the best inconsistencies.
Additionally, a GitHub account owned by using the Data Trust consists of a repository called “direct-api-examples” that also references many of the subject names, and includes instance makes use of of what seems to be an early model of the API, which it calls the “GOP Data Trust API.”
Of direction, the hyperlink between this API and the records located by Vickery is uncertain and unconfirmed, but it's miles apparent that the matching fields describe the same information. Asked for a remark about the API, the Data Trust referred us to the RNC, and the RNC did not reply to our questions.
Further perception into the character of the statistics got here from a publish on Stack Overflow that included JSON records, which extensively utilized most of the field names. The provenance of the statistics is unclear, however almost all the 59 classes it incorporates healthy the kinds within the RNC’s API and people shared with the aid of UpGuard, together with uniquely named fields like “RNCCalcParty” and “MADR_LastCleanse.” Because the information on Stack Overflow contained actual values, it helped us to make bigger the descriptions of a number of the columns.
In mixture, these clues allowed us to assemble the lists below. They include descriptions of 137 statistics points the Republican celebration is aware of, or at the least wants to understand, approximately each American voter. (According to UpGuard, the database that Vickery found did now not contain facts in each discipline for each voter.) All of the descriptions came from the RNC’s API, except in instances in which the category names had a fit in the API, however wherein the descriptions of these classes did no longer display up. In those cases, we positioned the descriptions we inferred from the field names in italics. Some descriptions encompass “pattern records,” which got here from the information published on Stack Overflow. The 39 fields we have been unable to pick out have been those starting from “PG01” to “PG39.”
Your probably religion and ethnicity
The RNC and the Democratic National Committee each pay tens of millions of greenbacks to records analysis firms like Deep Root to mix information furnished with the aid of states with records amassed from cold-calls, canvasing efforts, campaign contributions, and social media. Then those datapoints are synthesized to determine the way you’re probable to vote and what type of messaging you’ll reply to. The fields below talk to records derived thru that sort of analysis. Notably, the codes for “ModeledEthnicGroup” are confined to “H” for Hispanic and “B” for black, however the subject inside the Stack Overflow information become populated with a “Z.”
- RNCCalcParty: RNC Calculated Partisan score: 1=Hard Rep, 2=Lean Rem [SIC], 3=Swing/Ind, 4=Lean Dem, 5=Hard Dem
- StateCalcParty: Likely a state-level partisanship score similar to RNCCalParty
- ModeledEthnicity: Modeled Ethnicity – Ethnicity Code. See supplemental documention [SIC] for code values. Sample data: “E1”
- ModeledReligion: Modeled Religion – Ethnicity Religious Affiliation Code: B = Buddhist, C = Catholic, G = Greek Orthodox, H = Hindu, I = Islamic, J = Jewish, K = Sikh, L = Lutheran […information cuts off here]. Sample data: “P”
- ModeledEthnicGroup: Modeled Ethnic Coding (H=Hispanic, B=Black). Sample data: “Z”
- RNCCalcParty: RNC Calculated Partisan score: 1=Hard Rep, 2=Lean Rem [SIC], 3=Swing/Ind, 4=Lean Dem, 5=Hard Dem
- StateCalcParty: Likely a state-level partisanship score similar to RNCCalParty
- ModeledEthnicity: Modeled Ethnicity – Ethnicity Code. See supplemental documention [SIC] for code values. Sample data: “E1”
- ModeledReligion: Modeled Religion – Ethnicity Religious Affiliation Code: B = Buddhist, C = Catholic, G = Greek Orthodox, H = Hindu, I = Islamic, J = Jewish, K = Sikh, L = Lutheran […information cuts off here]. Sample data: “P”
- ModeledEthnicGroup: Modeled Ethnic Coding (H=Hispanic, B=Black). Sample data: “Z”
voting history
The voting data retained by each state varies, but is generally considered public information. These fields list which party citizens voted for in each election going back to 2002.
- LastActiveDate (last_activedate): Last Active Date – Date of Last Voter Activity (if provided on source data)
- VoterStatus: Voter Status – Current Status of registration as observed by jurisdiction. A – Active, I – Inactive, C – Cancelled, D – Deceased.
- VH12G: Vote History 2012 General – 2012 General Election
- VH12P: Vote History 2012 Primary – 2012 Primary Election
- VH12PP: Vote History 2012 Presidential – 2012 Presidential Primary Election
- VH11G: Vote History 2011 General – 2011 General Election
- VH11P: Vote History 2011 Primary – 2011 Primary Election
- VH10G: Vote History 2010 General – 2010 General Election
- VH10P: Vote History 2010 Primary – 2010 Primary Election
- VH09G: Vote History 2009 General – 2009 General Election
- VH09P: Vote History 2009 Primary – 2009 Primary Election
- VH08G: Vote History 2008 General – 2008 General Election
- VH08P: Vote History 2008 Primary – 2008 Primary Election
- VH08PP: Vote History 2008 Presidential – 2008 Presidential Primary Election
- VH07G: Vote History 2007 General – 2007 General Election
- VH07P: Vote History 2007 Primary – 2007 Primary Election
- VH06G: Vote History 2006 General – 2006 General Election
- VH06P: Vote History 2006 Primary – 2006 Primary Election
- VH05G: Vote History 2005 General – 2005 General Election
- VH05P: Vote History 2005 Primary – 2005 Primary Election
- VH04G: Vote History 2004 General – 2004 General Election
- VH04P: Vote History 2004 Primary – 2004 Primary Election
- VH04PP: Vote History 2004 Presidential – 2004 Presidential Primary Election
- VH03G: Vote History 2003 General – 2003 General Election
- VH03P: Vote History 2003 Primary – 2003 Primary Election
- VH02G: Vote History 2002 General – 2002 General Election
- VH02P: Vote History 2002 Primary – 2002 Primary Election
What messages you’ll respond to
These fields are a bit ambiguous, but are clearly based on a micro-targeting campaign conducted in 2010, which appears to have examined voter sentiment on several factors.
- MT10_Party: MT10 Party – 2010 Regional Microtargeting – Party Model.
- MT10_GenericBallot: MT10 Generic Ballot – 2010 Regional Microtargeting – Generic Ballot Model
- MT10_Turnout: MT10 Turnout – 2010 Regional Microtargeting – Turnout Model
- MT10_ObamaDisapproval: MT10 Obama Disapproval – 2010 Regional Microtargeting – Obama Disapproval Model
- MT10_Jobs: MT10 Jobs – 2010 Regional Microtargeting – Jobs Model
- MT10_Healthcare: MT10 Healthcare – 2010 Regional Microtargeting – Healthcare Model
- MT10_SoCo: MT10 SoCo – 2010 Regional Microtargeting – Social Conservative Model
What kind of voter you are, where you live, and how to contact you
Each country keeps song of its residents’ balloting information, celebration registrations, and get in touch with info, and all of that data is generally taken into consideration public data. Some states promote the records to campaigns and other groups; others provide it away without cost. The fields listed under include that form of statistics, which each voter must expect their country keeps song of. One fantastic discovery here is that once citizens flow, there’s a field that describes whether it’s an “person” or “circle of relatives” circulate, possibly to account for cases where kids move out of their mother and father home. Another is that phone numbers appear to be received or otherwise proven with opposite-lookups the usage of electorate’ addresses.
- RNCID: RNCID Primary Key for registration
- RNC_RegID: RNC GUID for registration
- SOURCEID: Likely refers to where some or all of the voter’s data came from
- OfficialParty: Clearly indicates the party the voter is registered with
- SelfReportedDemographic: Voter-Provided Demographic code (H=Hispanic, B=Black)
- FTC_DoNotCall: UpGuard confirmed in its blog post that this field indicates whether the voter is on the federal do-not-call list.
- State: Character Abbreviation State Code. Sample data: “DC”
- PermAbs: Likely indicates whether the voter is signed up as a permanent absentee voter
- AffidavitID: AffidavitID – Affidavit Number. Note: Affidavits are paper ballots voters use when their names do not appear on roles at their polling stations.
- RegistrationDate: Clearly indicates the date the voter registered to vote. Sample data: “20030521”
- Juriscode: Registration Juriscode Code- A nationally unique numeric representation of each election jurisdiction responsible for voter registration data
- Jurisname: Jurisname – County or Municipality Name. Sample data: “District of Columbia”
- CountyFIPS: County code as defined by the jurisdiction. Coding scheme is based on Federal Information Processing Standard (FIPS) municipality assignments.
- MCD: Minor Civil Division – Indicates municipality in which voter is registered. Coding scheme is based on Federal Information Processing…
- CNTY: County – State Assigned County Code
- Town: Field is self-explanatory
- Ward: Ward – Jurisdiction Assigned Ward Code. Sample data: “07”
- Precinct: Precinct. Sample data: “097”
- PrecinctName: Precinct – long form name
- Ballotbox: Ballot Box – Jurisdiction Assigned Precinct Sub-Division Code / Ballot Box
- CD_Current: US Congressional District Pre 2011 Redistricting
- CD_NextElection: CD Next Election – US Congressional District Post 2011 Redistricting
- SD_Current: State Upper House District Name Pre 2011 Redistricting
- SDProper_Current: SD Proper Name Current – State Upper House District Name Pre 2011 Redistricting
- SD_NextElection: State Upper House District Name Post 2011 Redistricting
- SDProper_NextElection: SD Proper Name Next Election – State Upper House District Name Post 2011 Redistricting
- LD_Current: LD Current – State Lower House District Pre 2011 Redistricting
- LDS_Current: LDS Current – State Lower House District Subdivision Pre 2011 Redistricting
- LDProper_Current: LD Proper Name Current – State Lower House District Pre 2011 Redistricting
- LD_NextElection: LD Next Election – State Lower House District Post 2011 Redistricting.
- LDS_NextElection: LDS Next Election – State Lower House District Subdivision Post 2011 Redistricting
- LDProper_NextElection: LD Proper Name Current – State Lower House District Post 2011 Redistricting
- NamePrefix: Voter’s Name Prefix
- FirstName: Voter’s First Name. If first name value passed does not match name provided during registration, no match will be made.
- MiddleName: Voter’s Middle Name
- LastName: Voter’s Last Name
- NameSuffix: Voter’s Name Suffix
- Sex: Voter’s Gender (M/F/U)
- BirthYear: Voter’s Birth Year
- BirthMonth: Voter’s Birth Month
- BirthDay: Voter’s Birth Day
- StateVoterID: State Assigned Voter ID Number
- JurisdictionVoterID: Jurisdiction Voter ID – Locality Assigned Voter Identification Number (if provided on source data)
- LegacyID: Presumably an obsolete registration ID number
- HTSEQ: Description not accessible. Sample data: “1” and “2”
- HHSEQ: Household Sequence – Household Sequence Number. [Sample data: “194398”
- ChangeOfAddress: Change of Address – N=NCOA move, A= 48 Month NCOA move, D= Multisourced Non-USPS move, L=LACS address conversion
- COADate: Change of Address Date – Change of Address Date (data only present if there is an address change)
- COAType: Change of Address Type – Change of Address Type: F = Family Move, I = Individual Move (data only present if there is an address) [… description cuts off
- RegistrationAddr1 (Registration_Addr1): Field is self-explanatory
- RegistrationAddr2 (Registration_Addr2): Field is self-explanatory
- RegHouseNum (Reg_HouseNum): Field is self-explanatory
- RegHouseSfx (Reg_House_Sfx): Field is self-explanatory
- RegStPrefix (Reg_St_Prefix): Field is self-explanatory
- RegStName (reg_st_name): Field is self-explanatory
- RegStType (Reg_St_Type): Field is self-explanatory
- RegstPost (Reg_st_Post): Field is self-explanatory
- RegUnitType (Reg_Unit_Type): Field is self-explanatory
- RegUnitNumber (Reg_UnitNumber): Field is self-explanatory
- RegCity (Reg_City): Field is self-explanatory
- RegSta (Reg_Sta): Field is self-explanatory
- RegZip5 (Reg_Zip5): Field is self-explanatory
- RegZip4 (Reg_Zip4): Field is self-explanatory
- RegLatitude (Reg_Latitude): Field is self-explanatory
- RegLongitude (Reg_Longitude): Field is self-explanatory
- RegGeocodeLevel (Reg_GeocodeLevel): Field is self-explanatory
- MailingAddr1 (Mailing_Addr1): Field is self-explanatory
- MailingAddr2 (Mailing_Addr2): Field is self-explanatory
- MailHouseNum (Mail_HouseNum): Field is self-explanatory
- MailHouseSfx (Mail_HouseSfx): Field is self-explanatory
- MailStPrefix (Mail_StPrefix): Field is self-explanatory
- MailStName (Mail_StName): Field is self-explanatory
- MailStType (Mail_StType): Field is self-explanatory
- MailStPost (Mail_StPost): Field is self-explanatory
- MailUnitType (Mail_UnitType): Field is self-explanatory
- MailUnitNumber (Mail_UnitNumber): Field is self-explanatory
- MailCity (Mail_City): Field is self-explanatory
- MailSta (Mail_Sta): Field is self-explanatory
- MailZip5 (Mail_Zip5): Field is self-explanatory
- MailZip4 (Mail_Zip4): Field is self-explanatory
- MailSortCodeRoute (Mail_SortCodeRoute): Field is self-explanatory
- MailDeliveryPt (Mail_DeliveryPt): Field is self-explanatory
- MailDeliveryPtChkDigit (Mail_DeliveryPtChkDigit): Field is self-explanatory
- MailLineOfTravel (Mail_LineOfTravel): Mail Line of Travel – Mail Address Enhanced Line of Travel
- MailLineOfTravelOrder (Mail_LineOfTravelOrder): Clearly related to the above field.
- MailDPVStatus: Mail Delivery Point Verification Status – USPS Delivery Point Verification Flag
- RADR_LastCleanse: Likely refers to when the voter’s registration address was last updated. Sample data: “2013-02-04”
- RADR_LastGeoCode: Likely refers to when the voter’s geographic information for their registration was last updated. Sample data: “2013-02-04”
- RADR_LastCOA: Likely refers to when the voter’s change-of-address information for their registration was last updated
- MADR_LastCleanse: Likely refers to when the voter’s mailing address was last updated. Sample data: “2013-02-04”
- MADR_LastCOA: Likely refers to when the voter’s change-of-mailing-address information was last updated
- AreaCode: Telephone Area Code
- TelSourceCode: Telephone Source Code – Telephone Source Code: N=New append, V=Verified number, S=Source file number, R=Reverse Verify – Name & Address…
- TelephoneNUm: Telephone Number – 7-Digit Telephone Number
- TelMatchLevel (tel_matchlevel): Related to telephone number
- TelReliability (tel_reliability): Telephone Reliability – Telephone Reliability Code: 9=TML of ‘1’ or sum of lower TML recode and number of same number matches in household [Highest… [description cuts off]
- PhoneAppendDate (phone_appenddate): Related to telephone number

No comments:
Post a Comment