Supplemental/General dataset features

From Main

Overview of features common to all Archive datasets.

Content and date range of data

The specific data elements and date ranges of each dataset are described in the individual wiki page for that dataset.


File names

The naming convention for each file is dataset name_generation date. Examples:

  • Demographics_12022020
  • Activity_03172021
  • NightlyHousing_050120201

Column headings and delimeter

Each column value in each file is separated by ; (semicolon) with the column headers in the first row.

Refresh frequency

All files are updated each weekend. In the event of an upgrade or bug fix, a given file may be refreshed more frequently. The refresh date of each file is always appended to the end of the filename. See file names section for details on filenaming.

Date formatting

All dates use ISO calendar date format in the form YYYY-MM-DD. For example, January 1, 2021 is shown as 2021-01-31.

DateTime formatting

Datetimes use ISO calendar date and 24-hour time in the form YYYY-MM-DD HH:MM For example, 9:51 PM on March 15, 2019 is shown as 2019-03-15 21:51.

Anonymized IDs

Datasets contain signed integers (anonymized IDs) that allow individuals and locations to be distinguished without identifying those individuals or locations. For example, ResidentId is an anonymized ID unique to each resident and BunkId is an anonymized ID unique to each bunk. Anonymized IDs are consistent across all current datasets. For example, a given resident has the same ResidentId in the current NightlyHousing datasets as they do in the current Comorbidity dataset. However, anonymous IDs are not necessarily consistent across time. For example, a resident's ResidentID in the current NightlyHousing datasets may not be the same as their ResidentId in the previous week's NightlyHousing datasets. This is an intentional feature to protect privacy and allow for revisions to the anonmyization function.