Confidentiality Vetting Support: Dominance and Homogeneity using SAS
(The Statistics Canada symbol and Canada wordmark appear on screen with the title: "Confidentiality Vetting Support: Dominance and Homogeneity using SAS")
<
Welcome to Statistics Canada's Data Access Training Series. This video is part of the confidentiality vetting support series and presents examples of how to use different statistical software packages to perform the analyses required for researchers working with confidential data. Today we will show you an example of how to complete the homogeneity dominance, including nk and p-percent tests with continuous dollar value variables in SAS using dummy, synthetic data for the Census.
Dominance occurs when most of the contribution to the statistic comes from one or a few units (based on the unweighted contributions). Nk and p-percent tests are dominance tests. Homogeneity (or MMM) rule aims to prevent the dissemination of statistics when respondents occupy a narrow range of values (perhaps because they were imputed from the same donor).
For CDR researchers, it may be important to know how to perform these tests. The release of descriptive or model results involving the continuous dollar income variable requires the researcher to attach supporting documents to their confidentiality request. We have chosen the census for this example, but other Statistics Canada surveys require them as well.
Note that this program is designed to make life easier researchers. Other examples of such programs are available online. This version is easily accessible to RDC researchers. If you are unsure of its location, ask your analyst. This version is presented as a SAS program where you enter the variables of interest in the macro, including the income variable, and run it.
Please note that this dummy file does not contain any real cases It is possible to import such other data formats as SPSS and transform it into a SAS database. Other code that performs these census tests using STATA and R is available in the RDCs.
First, we will need to determine which variables will be introduced into the SAS macro. For the purposes of this exercise, we will look at a crosstab of the average income (which is a continuous variable) by province and by sex (two categorical variables). Then, we add our income variable on the second line. Here we are using the totinc variable. Finally, we will enter the location where the data file was saved.
The rest of the procedure follows. For the demonstration, I will submit the dominance and homogeneity part of the test. You will then get an output table of results. There are some indicators to see whether each category passes the tests or not. Here we can see that none of the tests failed with a 0 score. This is good news for the researcher! No homogeneity and dominance issues were detected. A value of 1 would indicate that a category failed the test. The researcher would need to combine the variables to increase the numbers in the category in question.
This document can be attached to the results that are the subject of a vetting request, for example as an Excel sheet. The code we ran earlier sent an Excel file to the location we specified. We can find it on the desktop, in this folder, but you can choose any location you want. Now we come back to the program to run the test code of N.K.P. Once again, we select the code to be executed.
The code generates data files necessary for the calculation of the tests. As with the previous test, the table in the data output window indicates zero. It can be concluded that no NKP problem has been detected, so there is no need to group categories. These results can be attached as a supporting document to the vetting request. Thank you for watching! If you have any questions, please reach out to your local RDC analyst or send an email to our data development team.
(Canada wordmark appears.)