De-Identification Guidelines

Purpose

The purpose of this guideline is to outline the UO standards for de-identifying data. To protect the privacy and security of data subjects and to maintain confidentiality of other sensitive information, data (human subject or other types) may need to be de-identified before use in several academic, research, business, or operational functions. For example, sensitive data may need to be stripped of identifying information before use for research purposes, institutional effectiveness studies, operational efficiency, public safety or information security, or prior to release to other entities.

Acceptable De-identification Methods

Two de-identification methods are acceptable – the expert determination and the safe harbor methods. These are based on the Health Insurance Portability and Accountability Act (HIPAA) privacy rules detailed in the US Department of Health and Human Services resources referenced in the Resources section below.

PDF iconFigure 1 - Acceptable Methods of De-identification
 

Expert Determination De-Identification Method

The expert determination method of de-identification is acceptable if determination is made by an expert that the risk of re-identification is “very small” when the anticipated recipients use it alone or in combination with other reasonably available information. Expert should document the methods of such analysis.  Experts may be found in the scientific, mathematical, or other scientific domains.  When using expert determination and data is subject to the provisions of HIPAA, the US Department of Health and Human Services Office of Civil Rights will review and vet the professional experience and credentials of the expert with de-identification methodologies.

Safe Harbor De-identification Method

Unique identifiers of the individual or of relatives, employers, or household members of the individual should be removed to achieve the “safe harbor” method of de-identification.

Unique Identifiers

  1. Names
  2. All geographic subdivisions smaller than a State, including
    1. Street address
    2. City
    3. County
    4. Precinct
    5. Zip code[1]
  3. All elements of dates (except year) for dates directly related to the individual, including
    1. Birth date
    2. Admission date
    3. Discharge date
    4. Date of death
    5. Elements of dates for individuals over 89 years old[2]
  4. Telephone numbers
  5. Fax numbers
  6. Social security numbers
  7. Medical record numbers
  8. Health plan beneficiary numbers
  9. Account numbers
  10. Certificate/license numbers
  11. Email addresses
  12. Social media profile names (or handles)
  13. Web Universal Resource Locators (URLs)
  14. Internet Protocol (IP) address numbers
  15. Device identifiers and serial numbers
  16. Vehicle identifiers and serial numbers, including license plate numbers
  17. Device identifiers and serial numbers
  18. Biometric identifiers, including finger and voice prints
  19. Full-face photographs and any comparable images
  20. Any other unique identifying number, characteristic, or code. In addition to the removal of unique identifiers, there should be reasonable assurance that the individual or entity intending to use the data does not have actual knowledge that the remaining information could be used alone or in combination with any reasonably available information to identify an individual who is subject.  Other details that may result in the identification of an individual include:  initials, circumstances associated with the care of an individual, highly publicized details, and profession or occupation.

[1] Zip code, and their equivalent geocodes, except for the initial three digits of a zip code if, according to the current publicly available data from the Bureau of Census (1) the geographic units formed by combining all zip codes with the same three initial digits contains more than 20,000 people; and (2) the initial three digits of a zip code for all such geographic units containing 20,000 or fewer people is changed to 000

[2] Elements of dates and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older

Re-Identification by Code

Often times, de-identified data may need to be re-identified (rendered distinguishable) for various purposes including to enhance research activities. For these situation, a special code or pseudonym may be assigned to individual records that meet the following criteria:

  • The code or pseudonym may not be derived from other related information about the individual. A common de-identification techniques include the use of a one-way cryptographic function known as hashing, or a random number generator
  • Only authorized parties should know or have access to the re-identification method. 
  • The re-identification method is documented and include, at a minimum, the following security controls:
    • Physical, technical, and administrative safeguards to protect the index of codes used to re-identify the data
    • Physical and/or logical separation of storage of the de-identified data and the index of codes
    • Documented retention period for the index of codes
    • Documented steward responsible for safeguarding the index of codes
    • Documented list of individuals authorized to access the index of codes
    • Description of re-identification purpose, frequency and duration (e.g, will the data remain de-identified for the duration of the study, or will it be re-identified periodically, by whom, and for what reasons?)

Resources

45 CFR Subpart E:  Privacy of Individually Identifiable Health Information, section 164.514 (a) and (b).

"Guidance Regarding Methods for De-identification of Protected Health Information in Accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule," US Department of Health and Human Services, Office for Civil Rights.