If you’re IR office is anything like Rock City’s, you’re always being asked to provide insight into the ways students will make decisions about your university. Perhaps you’re building a predictive model of enrollment… or maybe you’re being asked to look at factors contributing to retention or persistence to graduation. You’ve also, no doubt, learned that what predicts these things nationally (or “in the literature”) do not necessarily provide predictive utility at your own institution.
Something we’ve found to be predictive of whether or not an admitted student enrolls is where they place Rock City on their FAFSA application. That simple. The sooner they list us, the more likely they will enroll. This trick might not work for you, as it requires that a large percentage of your applicants, a) apply for aid; and, b) list you on their FAFSA application. Obviously, this technique also makes the assumption that students are listing schools in order of preference and not by some other method (alphabetically, numerically by FAFSA code, etc).
No doubt, these are some pretty big assumptions. This may ultimately prove to be a waste of time for your school/university… but it certainly has worked for us. Great, you say, I’m down to give anything a try once. Now what?
Well, it all requires data. Here’s how ours is set up:
Not the greatest screenshot, I’ll admit… but basically what we have here is a student ID number (for merging into whatever database you end up merging this into), and FAFSA code numbers in whatever position they were entered by the student (these are the CAMPUS1-CAMPUS6 variables). Because some FAFSA codes have letters, these are coded as string variables.
So the next step is to make this data usable. I’ve done this in two ways… First, I’ve put together syntax that makes a dummy code out of each of the 6 variables (1 = “My institution,” 0 = “Another institution”). I’ve written this code for Wright State (not, incidentally, Rock City’s main campus). You’ll have to change ‘003078’ in the following syntax to your own FAFSA code:
RECODE CAMPUS1 CAMPUS2 CAMPUS3 CAMPUS4 CAMPUS5 CAMPUS6 (‘003078’=1) (ELSE=0) INTO WRIGHT1 WRIGHT2 WRIGHT3 WRIGHT4 WRIGHT5 WRIGHT6.
COMPUTE WRIGHT7 = WRIGHT1 + WRIGHT2 + WRIGHT3 + WRIGHT4 + WRIGHT5 + WRIGHT6.
RECODE WRIGHT7 (1 = 0) (0 = 1) INTO WRIGHT7.
We found this coding to be the most useful and predictive method of using FAFSA codes. There seemed to be a fairly linear relationship between positions 1, 2, and 3, but less so among positions 4, 5 and 6. As such, entering them as dummy variables seemed to work best for us. In case you were wondering, WRIGHT7 for students who do not list your university anywhere on the FAFSA.
NOTE: Typically, you would leave one variable out of your regression analysis to serve as a reference group. If every student in your sample applied for aid, you will still need to do this. If, however, you have a group of students who did not apply for aid, when you merge this data back into your original dataset, you can recode missing values on these 7 variables as 0. When you then enter the 7 variables into your equation, the reference category will be students who did not apply for aid.
If you think there’s a more linear relationship among positions, you could then use the following code to make a continuous variable out of your position:
COMPUTE NUMERICAL_POSITION=WRIGHT1 + 2* WRIGHT2 + 3 * WRIGHT3 + 4* WRIGHT4 + 5 * WRIGHT5 + 6 * WRIGHT6.
RECODE NUMERICAL_POSITION (0=7).
Again, a 7 indicates a student applied for aid but did not list your institution on the application. If you have students who did not apply for aid, you will need to account for them when you merge this data back into your main dataset.
NOTE: All of this assumes that the FAFSA data is a separate dataset that will be merged into some sort of larger database. The syntax I have written does *not* account for students who did not apply for aid. This is an important group that you will need to account for in your final models. As I mentioned, when you merge this transformed data into your larger dataset, those with missing data on these variables are the students who did not apply for aid.
Additional thoughts: Rock City found that we were able to improve the predictive utility of our model with a variable that coded whether or not a student listed a top rival institution on their FAFSA. Sadly, those who did were about 2 times less likely to enroll at Rock City U… but it certainly helped improve our model.