Supplemental/General dataset features

Overview of features common to all Archive datasets.

Date range of data
Datasets comprise all available data from January 1, 2016 up to the date the files were last generated. Below is a table describing the name stem and date ranges of each pair of files for each dataset.

File names
The naming convention for each file is name stem_generation date. Examples:
 * Demographics_12022020
 * Activity_03172021
 * NightlyHousing1_050120201

Column headings and delimeter
Each column value in each file is separated by ; (semicolon) with the column headers in the first row.

Refresh frequency
All files are updated each weekend. In the event of an upgrade or bug fix, a given file may be refreshed more frequently. The refresh date of each file is always appended to the end of the filename. See file names section for details on filenaming.

Date formatting
All dates use ISO calendar date format in the form YYYY-MM-DD. For example, January 1, 2021 is shown as 2021-01-31.

DateTime formatting
Datetimes use ISO calendar date and 24-hour time in the form YYYY-MM-DD HH:MM For example, 9:51 PM on March 15, 2019 is shown as 2019-03-15 21:51.

Anonymized IDs
Datasets contain signed integers (anonymized IDs) that allow individuals and locations to be distinguished without identifying those individuals or locations. For example, ResidentId is an anonymized ID unique to each resident and BunkId is an anonymized ID unique to each bunk. Anonymized IDs are consistent across all current datasets. For example, a given resident has the same ResidentId in the current NightlyHousing datasets as they do in the current Comorbidity dataset. However, anonymous IDs are not necessarily consistent across time. For example, a resident's ResidentID in the current NightlyHousing datasets may not be the same as their ResidentId in the previous week's NightlyHousing datasets. This is an intentional feature to protect privacy and allow for revisions to the anonmyization function.