persistent_homology Posted February 22, 2014 Share Posted February 22, 2014 (edited) Hello some of you might be interested in this, I have scraped the admissions data for statistics phds/masters from the grad cafe. The csv file can be gotten here. The python code used to generate it is here. The code could be easily modified to get biostatistics, or math, or any other subject, data too. The most annoying part is that schools have many names given for them, I have collected a list of synonyms, but in the future, if you run the code again, no doubt new cases will arise, and you will have to add these to the list in the code. If you run the code, it will take about 3 minutes to finish. This is because I have added delays between sending requests to the grad cafe for pages in order not to annoy them. In the code I have added a couple of plotting functions the most basic tasks. Things that could be added I have only collected records that are accepted/rejected, could add wait listed and interview as results. Could add data about schools from the usa today rankings or NRC. Standardising gre data/ from 160 to 900 scale, also people report as percentages, esp for subject gre. I hope those of you who are data-curious will find some interesting things out and share them with us. Edited February 23, 2014 by fuzzylogician links are fixed! stats_applicant, wine in coffee cups, ensemble and 1 other 4 Link to comment Share on other sites More sharing options...
Igotnothin Posted February 22, 2014 Share Posted February 22, 2014 Awesome! Nice work and it seems like there are a lot of cool things you could do with this data. Link to comment Share on other sites More sharing options...
statisticsfall2014 Posted February 22, 2014 Share Posted February 22, 2014 This is great stuff. Link to comment Share on other sites More sharing options...
persistent_homology Posted February 22, 2014 Author Share Posted February 22, 2014 (edited) I cannot edit my post, but for some reason the links I posted lack the : in http://... Perhaps a moderator can correct it. Data is here and code is here. Hopefully these links will work. Just in case : data :http://sourceforge.net/projects/triangleinequal/files/Grad%20Cafe%20Data/gc_data.csv/download code:http://sourceforge.net/projects/triangleinequal/files/Grad%20Cafe%20Data/grad_cafe.py/download Edited February 22, 2014 by persistent_homology Link to comment Share on other sites More sharing options...
stats_applicant Posted February 23, 2014 Share Posted February 23, 2014 This is fantastic -- thanks for fixing up the school names. I'll try and take a look at the data later and see what interesting things I can find. Link to comment Share on other sites More sharing options...
cyberwulf Posted February 23, 2014 Share Posted February 23, 2014 One thing to keep in mind when looking at these data is that the GC population is a highly biased sample of applicants. For instance, it is much more heavily domestic (i.e., U.S.-based) than the overall group applying to stat & biostat programs. This is one of the reasons (among many) why you're seeing admit rates in the 40-50% range from the results page, while most top programs report rates under 20%. Link to comment Share on other sites More sharing options...
persistent_homology Posted February 23, 2014 Author Share Posted February 23, 2014 (edited) That is a good point cyberwolf. What I hope is that although the numbers are biased, we might still be able to see trends such as: is it getting harder to be accepted? Once you have a specific question in mind to answer, perhaps the bias can be compensated for to some extent by using official data from some schools to inform a Bayesian prior. Edited February 23, 2014 by persistent_homology Link to comment Share on other sites More sharing options...
StatPhD2014 Posted February 23, 2014 Share Posted February 23, 2014 I always thought the bigger bias would be that people who were accepted were more likely to post their results Robbentheking and agent229 2 Link to comment Share on other sites More sharing options...
agent229 Posted February 23, 2014 Share Posted February 23, 2014 Also could be bias in who chooses to put their details and who doesn't. It seems it'd be hard to make any statements about the "average GPA" or anything like that since so many are missing. Link to comment Share on other sites More sharing options...
persistent_homology Posted February 23, 2014 Author Share Posted February 23, 2014 Investigating the difference between international and American (where I group 'U' together with 'I' for this purpose) I made the following plots: the first is broken down by year and the second looks at all applications together. Also here is a plot showing some of the difference between GPA reporters and non-reporters: Link to comment Share on other sites More sharing options...
cyberwulf Posted February 23, 2014 Share Posted February 23, 2014 The conclusion from the above is obvious: reporting your GPA on GradCafe increases your chances of being accepted! yeezy88, agent229 and mittensmitten895 3 Link to comment Share on other sites More sharing options...
wine in coffee cups Posted February 23, 2014 Share Posted February 23, 2014 Really glad someone has done this, thanks persistent_homology. Something that would be a moderate-to-severe PITA but potentially yield interesting results is to try to match up records from the same user (as I suggested ) to examine which universities tend to accept the same sets of applicants. Link to comment Share on other sites More sharing options...
agent229 Posted February 23, 2014 Share Posted February 23, 2014 Ooh interesting idea wine in coffee cups... For me the results were most interesting for getting an idea of the timeline (when I would hear from different schools). I might play around with it later this week and see how consistent the schools are across the years. I think it would be helpful to see the overall pattern though... some schools seem to do it all at once, some spread it out, etc. Link to comment Share on other sites More sharing options...
StatPhD2014 Posted February 23, 2014 Share Posted February 23, 2014 (edited) Yeah same here I use the results page mainly for knowing when results might be coming out based on previous years, and seeing if schools started sending offers for the current year. Oh and I'm definitely upvoting the OP for taking the time to make this Edited February 23, 2014 by StatPhD2014 Link to comment Share on other sites More sharing options...
persistent_homology Posted February 23, 2014 Author Share Posted February 23, 2014 I agree it would be very interesting to tie the different submissions to individuals, but it would be a PITA. I think that thegradcafe could implement a submission system more orientated around collecting data, by linking to a profile, and also by standardizing things like name of school. Then they could offer useful summary statistics themselves. agent229 1 Link to comment Share on other sites More sharing options...
Y.T. Safire Posted April 5, 2014 Share Posted April 5, 2014 The conclusion from the above is obvious: reporting your GPA on GradCafe increases your chances of being accepted! Why do I sense it is the other way around...... Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now