Confidentiality Vetting Support: Proportion and Round Tool using SAS
(The Statistics Canada symbol and Canada wordmark appear on screen with the title: "Confidentiality Vetting Support: Proportion and Round Tool using SAS"
Hello and welcome to the data access training series. This video series presents examples of how to use different statistical software packages to perform the analyses required for researchers working with confidential data. In this video, I'll demonstrate a SAS program that generates the rounded proportions along with supporting documents for vetting.
Here are the key features of the code. 1) It checks for unweighted cell sizes which is a common vetting requirement. 2) It will randomly round both the numerators and the denominators before calculating weighted proportions, which is a vetting requirement for some data sets and 3) It will produce the weighted proportions which researchers can use for their analysis.
The files generated by the code include a "fails" file, which contains all entries that failed the cell size checks, the supporting output which shows the unweighted counts behind the for release file, and the "for release" file which is the one researchers may want to request to be released outside of the RDC. Before we jump into the code, here is an example of a three-way table with our variables of interest. The following program is designed to make life easier for researchers. Other examples of such programs are available online. This version is easily accessible to RDC researchers. If you're unsure of its location, ask your analyst. This version is also presented as a SAS program, where you basically enter the variables of interest in the macro and then run it.
Here is the code in SAS and this top section is where we have to edit our variables. The first variable datafid which stands for data file id is going to be our path to our SAS file. For this demonstration we're going to use GSS, the general social survey public use file. The first variable will be sex, the second will be province. So, this will be prv. The third variable is going to be marstat, which is marital status and this will be the numerator in the coming calculations. We're going to make min cell size according to the vetting requirement. Five is pretty standard, but I'm going to change this to 10 just for demonstration purposes. Next, we have our weight and bootstrap weight variables. These we'll find in our data set. The next variables will affect the rounding and the weighted denominator and proportions in the outputted tables. And for our out paths, we're going to put the path where we want our output files to show up.
So now, we're going to run the code, section by section, just to make sure everything is working. The top chunk of code set our variables and in the next section, right below, we're selecting the variables we need. So, this one's going to take a little bit. In the next section we produced our frequency tables. So, we'll have two tables based on the controlling variable, one where the sex is male, and the other where the sex is female. Going from left to right in our table, we have our province, our marital status, their frequency, the weighted frequency, standard errors, and the confidence limits. For these, the survey bootstrap replicate method was used for variance estimation. Then we have row percentages. So the row for this category and its proportions. We see, we have that for all of the variables. So we're going to go back to the original and continue running the file.
The next section splits the cross file into numerators and denominators, which is variable 3, and we get these results. Then we'll go back and run the next section below, which is going to create a supporting file with the counts and percentages and everything works there. This next section is going to create our fails output file. We can see it's been created in the sidebar and we're going to open that with our file explorer. Within the fails file, we can see that one of our variables had less than our minimum cell size which was 10, and we can get information about it through the connected columns. What we're going to want to do, is recode our data so, we don't have this happen. For example, we know it's Prince Edward Island, so we might want to think about recoding our data to have only maritime provinces. But for this demonstration, I'm just going to go back and change our min cell size to 5.
Another thing to note is that the difference between the numerator and the denominator must also be greater than the minimum cell size in order to pass.
Then we will run it back up to that point, and when we reopen our fails file, we'll see that it's empty and we can proceed. We'll run this next section near the bottom of the file and this one is creating the supporting file. The line right below that will export the file, and you can see it's created in the sidebar and we'll open that up from the file explorer. From the excel file we have our three variables and going from left to right, the denominator, the weighted denominator, numerator, the weighted numerator, the row percentage, the row standard error the upper and lower bounds, the difference which is the denominator minus the numerator whether or not it failed the minimum cell size requirement, all the rounded values and we'll have the actual proportion, the rounded proportion and the rounded proportion with the cutoff.
And the next and final section is going to trim our supporting file to create the for release file So, we'll see it's created in the sidebar, and when we open that up, we'll see that it has all our variables but only the ones we want for release. And with these files, you're now ready for the next step in the vetting process. Good luck with your research and have a wonderful day!
(Canada wordmark appears.)