Welcome to the GradCafe

Hello!  Welcome to The GradCafe Forums.You're welcome to look around the forums and view posts.  However, like most online communities you must register before you can create your own posts.  This is a simple, free process that requires minimal information. Benefits of membership:

  • Participate in discussions
  • Subscribe to topics and forums to get automatic updates
  • Search forums
  • Removes some advertisements (including this one!)
persistent_homology

Scraped Admissions Data

16 posts in this topic

Hello some of you might be interested in this, I have scraped the admissions data for statistics phds/masters from the grad cafe. The csv file can be gotten here. The python code used to generate it is here.

 

The code could be easily modified to get biostatistics, or math, or any other subject, data too.

 

The most annoying part is that schools have many names given for them, I have collected a list of synonyms, but in the future, if you run the code again, no doubt new cases will arise, and you will have to add these to the list in the code.

 

If you run the code, it will take about 3 minutes to finish. This is because I have added delays between sending requests to the grad cafe for pages in order not to annoy them.

 

In the code I have added a couple of plotting functions the most basic tasks.

apps_by_year.png

 

Things that could be added

  • I have only collected records that are accepted/rejected, could add wait listed and interview as results.
  • Could add data about schools from the usa today rankings or NRC.
  • Standardising gre data/ from 160 to 900 scale, also people report as percentages, esp for subject gre.

I hope those of you who are data-curious will find some interesting things out and share them with us.

Edited by fuzzylogician
links are fixed!

Share this post


Link to post
Share on other sites

I cannot edit my post, but for some reason the links I posted lack the : in http://...

 

Perhaps a moderator can correct it.

 

Data is here and code is here.

 

Hopefully these links will work.

 

Just in case :

data :http://sourceforge.net/projects/triangleinequal/files/Grad%20Cafe%20Data/gc_data.csv/download

code:http://sourceforge.net/projects/triangleinequal/files/Grad%20Cafe%20Data/grad_cafe.py/download

Edited by persistent_homology

Share this post


Link to post
Share on other sites

This is fantastic -- thanks for fixing up the school names.

 

I'll try and take a look at the data later and see what interesting things I can find.

Share this post


Link to post
Share on other sites

One thing to keep in mind when looking at these data is that the GC population is a highly biased sample of applicants. For instance, it is much more heavily domestic (i.e., U.S.-based) than the overall group applying to stat & biostat programs. This is one of the reasons (among many) why you're seeing admit rates in the 40-50% range from the results page, while most top programs report rates under 20%. 

Share this post


Link to post
Share on other sites

That is a good point cyberwolf. What I hope is that although the numbers are biased, we might still be able to see trends such as: is it getting harder to be accepted?

 

Once you have a specific question in mind to answer, perhaps the bias can be compensated for to some extent by using official data from some schools to inform a Bayesian prior.

Edited by persistent_homology

Share this post


Link to post
Share on other sites

I always thought the bigger bias would be that people who were accepted were more likely to post their results

Share this post


Link to post
Share on other sites

Also could be bias in who chooses to put their details and who doesn't. It seems it'd be hard to make any statements about the "average GPA" or anything like that since so many are missing.

Share this post


Link to post
Share on other sites

Investigating the difference between international and American (where I group 'U' together with 'I' for this purpose) I made the following plots:

status_effect.png

 

 

the first is broken down by year and the second looks at all applications together.

 

Also here is a plot showing some of the difference between GPA reporters and non-reporters:

GPA_reporting.png

Share this post


Link to post
Share on other sites

The conclusion from the above is obvious: reporting your GPA on GradCafe increases your chances of being accepted!  ;)

Share this post


Link to post
Share on other sites

Really glad someone has done this, thanks persistent_homology. Something that would be a moderate-to-severe PITA but potentially yield interesting results is to try to match up records from the same user (as I suggested ) to examine which universities tend to accept the same sets of applicants.

Share this post


Link to post
Share on other sites

Ooh interesting idea wine in coffee cups...

 

For me the results were most interesting for getting an idea of the timeline (when I would hear from different schools). I might play around with it later this week and see how consistent the schools are across the years. I think it would be helpful to see the overall pattern though... some schools seem to do it all at once, some spread it out, etc.

Share this post


Link to post
Share on other sites

Yeah same here I use the results page mainly for knowing when results might be coming out based on previous years, and seeing if schools started sending offers for the current year.

Oh and I'm definitely upvoting the OP for taking the time to make this

Edited by StatPhD2014

Share this post


Link to post
Share on other sites

I agree it would be very interesting to tie the different submissions to individuals, but it would be a PITA.

 

I think that thegradcafe could implement a submission system more orientated around collecting data, by linking to a profile, and also by standardizing things like name of school.

 

Then they could offer useful summary statistics themselves.

Share this post


Link to post
Share on other sites

The conclusion from the above is obvious: reporting your GPA on GradCafe increases your chances of being accepted!  ;)

 

 

Why do I sense it is the other way around......  :rolleyes:

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!


Register a new account

Sign in

Already have an account? Sign in here.


Sign In Now