FAFSFA Position As Predictor of Enrollment

pexels-photo-259191.jpegIf you’re IR office is anything like Rock City’s, you’re always being asked to provide insight into the ways students will make decisions about your university. Perhaps you’re building a predictive model of enrollment… or maybe you’re being asked to look at factors contributing to retention or persistence to graduation. You’ve also, no doubt, learned that what predicts these things nationally (or “in the literature”) do not necessarily provide predictive utility at your own institution.

Something we’ve found to be predictive of whether or not an admitted student enrolls is where they place Rock City on their FAFSA application. That simple. The sooner they list us, the more likely they will enroll. This trick might not work for you, as it requires that  a large percentage of your applicants, a) apply for aid; and, b) list you on their FAFSA application. Obviously, this technique also makes the assumption that students are listing schools in order of preference and not by some other method (alphabetically, numerically by FAFSA code, etc).

No doubt, these are some pretty big assumptions. This may ultimately prove to be a waste of time for your school/university… but it certainly has worked for us. Great, you say, I’m down to give anything a try once. Now what?

Well, it all requires data. Here’s how ours is set up:

FAFSA

Not the greatest screenshot, I’ll admit… but basically what we have here is a student ID number (for merging into whatever database you end up merging this into), and FAFSA code numbers in whatever position they were entered by the student (these are the CAMPUS1-CAMPUS6 variables). Because some FAFSA codes have letters, these are coded as string variables.

So the next step is to make this data usable. I’ve done this in two ways… First, I’ve put together syntax that makes a dummy code out of each of the 6 variables (1 = “My institution,” 0 = “Another institution”). I’ve written this code for Wright State (not, incidentally, Rock City’s main campus). You’ll have to change ‘003078’ in the following syntax to your own FAFSA code:

RECODE CAMPUS1 CAMPUS2 CAMPUS3 CAMPUS4 CAMPUS5 CAMPUS6 (‘003078’=1) (ELSE=0) INTO WRIGHT1 WRIGHT2 WRIGHT3  WRIGHT4 WRIGHT5 WRIGHT6.
EXECUTE.

COMPUTE WRIGHT7 = WRIGHT1 + WRIGHT2 + WRIGHT3  + WRIGHT4 + WRIGHT5 + WRIGHT6.
RECODE WRIGHT7 (1 = 0) (0 = 1) INTO WRIGHT7.
EXECUTE.

We found this coding to be the most useful and predictive method of using FAFSA codes. There seemed to be a fairly linear relationship between positions 1, 2, and 3, but less so among positions 4, 5 and 6. As such, entering them as dummy variables seemed to work best for us. In case you were wondering, WRIGHT7 for students who do not list your university anywhere on the FAFSA.

NOTE: Typically, you would leave one variable out of your regression analysis to serve as a reference group. If every student in your sample applied for aid, you will still need to do this. If, however, you have a group of students who did not apply for aid, when you merge this data back into your original dataset, you can recode missing values on these 7 variables as 0. When you then enter the 7 variables into your equation, the reference category will be students who did not apply for aid.

If you think there’s a more linear relationship among positions, you could then use the following code to make a continuous variable out of your  position:
COMPUTE NUMERICAL_POSITION=WRIGHT1 + 2* WRIGHT2 + 3 * WRIGHT3 + 4* WRIGHT4 + 5 * WRIGHT5 + 6 * WRIGHT6.
EXECUTE.
RECODE NUMERICAL_POSITION (0=7).
EXECUTE.

Again, a 7 indicates a student applied for aid but did not list your institution on the application. If you have students who did not apply for aid, you will need to account for them when you merge this data back into your main dataset.

NOTE: All of this assumes that the FAFSA data is a separate dataset that will be merged into some sort of larger database.  The syntax I have written does *not* account for students who did not apply for aid. This is an important group that you will need to account for in your final models. As I mentioned, when you merge this transformed data into your larger dataset, those with missing data on these variables are the students who did not apply for aid.

Additional thoughts: Rock City found that we were able to improve the predictive utility of our model with a variable that coded whether or not a student listed a top rival institution on their FAFSA. Sadly, those who did were about 2 times less likely to enroll at Rock City U… but it certainly helped improve our model.

TOEFL Scores – Conversion Syntax

Internationalization. Diversity. Inclusion.

Let’s face it, if your institution is anything like Rock City U’s, you’re starting to see an increased internationalization and diversification of your student body. This is good… but if your IR office is anything like ours, you’re going to be asked to study these students in detail. One of our recent tasks was to see if scores on the Test of English as a Foreign Language (TOEFL) test were related to retention, GPA, satisfaction, etc., etc. Easy, right? Throw together some correlations, a t-test or two, maybe even a logistical regression if you’re feeling fancy…

Oh but if it were that simple. If you’ve worked with TOEFL scores in the past, you know that ETS has created a wonderfully convoluted exam. Not only do they provide students with the option of taking the test in one of three distinct modes of administration… they score each mode on a separate scale. And what glorious scales: 0-120 for your internet test, 0-300 for the computer test, and… wait for it… 310 to 677 for the old-skool paper version. Beautiful.

If you search around the ETS website, you can find a comparison table that will help you figure out how to convert these three scales to one distinct measure. Or you could click here.

Thankfully, my administration was only interested in looking at the overall total score. You can find this comparison table on page 6. One thing should come clear pretty quickly: The computer-based test is the only one that never has a range as a converted value. For this reason, and this reason alone, I chose to convert all scores to their computer-based equivalent.

Of course, it wouldn’t be IR without an additional wrinkle. Rock City’s home institution, instead of creating a separate variable for each type of test, lumps them all together in our admissions database under the variable TOEFL. Yup. All in the same variable. Soooooo…. I have to make some assumptions. They are this:

  • If a student’s TOEFL score fell between 0 and 120, they took the internet-based test and needed to be upconverted to the computer-based range.
  • If a student’s TOEFL score fell between 120 and 300, they took the computer-based test and did not need conversion.
  • If a student’s TOEFL score was greater than 300, they took the paper-based test and needed to be downcoverted to the computer-based range.

Obvious problems here, right? It’s totally possible that a student did a *really* crappy job on the computer-based test (say, got a 110) and, in this conversion scheme, end up looking quite good. Unfortunately, that’s the type of error imperfect data introduces into an analysis. Hopefully your institution makes it clear which test the student took so you don’t have to make this kind of assumption.

So… that’s the background of this little piece of SPSS code. I’ve recoded all the way down to a 49 on the internet test, and all the way down to a 463 on the paper test. At the very least, this should give you a good start. If, like me, your school puts everything under one variable, you can rename that variable to “testscore” and you should be able to run this as-is. You’ll get your converted scores in a variable called “testscore_R”. Enjoy… and please let me know if you know a more streamlined way to do this (or, heaven forbid, if you find any errors).

TOEFL_CONVERSION.doc

p.s. – WordPress won’t let me upload a .sps file, so you’ll have to cut and past this one.