Jump to content

Recommended Posts

Posted

Hello everyone,

I've seen quite a bit of chatter around the timing of results, and the answers are obscure. I wanted to provide my services, and give you all some extra data. The beauty of grad cafe is that it is one of the most (if not the most) centralized data source for graduate students. Some data examples: What schools are most popular, what degrees, applicant metrics, dates of results, etc. Yet, I've yet to find someone that uses this information to help those applying.

 

Until now...

 

I used R (for those unfamiliar) to scrape the grad cafe results data related to psychology. I returned about 35000 records. There were a lot of un-tidy text strings (e.g., PhD and Ph.D.), especially in relation to program-title and institution; however, that is a project for another day. Data regarding dates and decisions did seem clean enough for me to analyze it and turn it into something nice for you all.

 

So what is it....?

Here is a PDF based on the results of all Psychology students (over 35000) since the origin of grad cafe. The first 3 tables show the 10 most common dates (by count) for getting an interview, getting accepted, and getting rejected ? .  I would do relative frequency and should've...maybe tomorrow.

Then a longitudinal line graph (which shows these decisions throughout the year)

Finally, one that focuses more on the "critical-period" when most decisions are made.

 

Conclusion

This was a piecemeal job that I should be done in R Markdown. I'll link my code at the end. Similar methods could be used to look at the "most popular" programs, probability of acceptance by degree type or program type, average GRE score for accepted candidates, and the list goes on. If anyone wants a csv of the dataset I used, please feel free to message me. I also welcome critiques and suggestions.  I wish you all the best.

 

R code here

Posted (edited)

This is awesome!! One suggestion--can you make the second and third figures' x-axis start earlier (like in November) to be able to see the invites continuously?

Also, would you be willing to share your code for the scraping piece of the project?

Edited by psych0
Posted
12 hours ago, Left Skew said:

Hello everyone,

I've seen quite a bit of chatter around the timing of results, and the answers are obscure. I wanted to provide my services, and give you all some extra data. The beauty of grad cafe is that it is one of the most (if not the most) centralized data source for graduate students. Some data examples: What schools are most popular, what degrees, applicant metrics, dates of results, etc. Yet, I've yet to find someone that uses this information to help those applying.

 

Until now...

 

I used R (for those unfamiliar) to scrape the grad cafe results data related to psychology. I returned about 35000 records. There were a lot of un-tidy text strings (e.g., PhD and Ph.D.), especially in relation to program-title and institution; however, that is a project for another day. Data regarding dates and decisions did seem clean enough for me to analyze it and turn it into something nice for you all.

 

So what is it....?

Here is a PDF based on the results of all Psychology students (over 35000) since the origin of grad cafe. The first 3 tables show the 10 most common dates (by count) for getting an interview, getting accepted, and getting rejected ? .  I would do relative frequency and should've...maybe tomorrow.

Then a longitudinal line graph (which shows these decisions throughout the year)

Finally, one that focuses more on the "critical-period" when most decisions are made.

 

Conclusion

This was a piecemeal job that I should be done in R Markdown. I'll link my code at the end. Similar methods could be used to look at the "most popular" programs, probability of acceptance by degree type or program type, average GRE score for accepted candidates, and the list goes on. If anyone wants a csv of the dataset I used, please feel free to message me. I also welcome critiques and suggestions.  I wish you all the best.

 

R code here

I've updated the code and added some "School" stuff (the school variable was also not the cleanest). I question my text-cleaning ability, so that is an area for improvement.

Here is a table of the data.frame

@psych0 : I've attached the scraping portion (if you want to add more context please look at the R code)

I'll also work on flipping the dates, R doesn't like date extraction as much but I'm sure I can find a workaround.

Best,
 

 

 

Scrape Grad Cafe.txt

Posted
2 hours ago, Boronmage said:

Dude this is amazing! Thank you so much. Ill be studying the R code :)

Thank you! Hopefully it helps

But...

This code is not clean, it's functional, it works but don't use it to impress anyone-- they may heckle you

Posted

This is fantastic, thank you for sharing!

Posted
3 hours ago, Left Skew said:

I've updated the code and added some "School" stuff (the school variable was also not the cleanest). I question my text-cleaning ability, so that is an area for improvement.

Here is a table of the data.frame

@psych0 : I've attached the scraping portion (if you want to add more context please look at the R code)

I'll also work on flipping the dates, R doesn't like date extraction as much but I'm sure I can find a workaround.

Best,
 

 

 

Scrape Grad Cafe.txt

Awesome, thanks!

Posted
On 12/28/2017 at 7:42 AM, psych0 said:

This is awesome!! One suggestion--can you make the second and third figures' x-axis start earlier (like in November) to be able to see the invites continuously?

Also, would you be willing to share your code for the scraping piece of the project?

I fixed the "closer look" plot- ordering from Nov to May.

 

Also, all I added a Program Type table on page 2. Be skeptical of all the program/school info due to the lack of clean data. I'm working in a string-matching function now, in hopes that I can get more accurate numbers related to those variables. 

I don't want people to worry if they missed a peak day. I estimate around half of the sample is Clinical Psych, which (to my knowledge) tend to have a few November deadlines, this will skew the data a bit. 

Updated version

Thanks everyone for working on this with me!

 

 

  • 2 weeks later...
Posted

After a short hiatus (to maintain my sanity) I've updated some things!
 

  1. Cleaned Institution and Program - the past few days have been rough. I've realized what kind of garbage fire the Institution and program data from the results survey is. I emailed grad cafe in hopes they would standardize the text inputs for the aforementioned columns. You'd be surprised at how many different ways someone can type in "UCLA". Hopefully the data is more accurate but there is still a lot to do.
     
  2. Added more graphics. I've split the decision plots by clinical, not-clinical, and everyone combined because who cares about clinical applicants? Kidding. Their apps tend to be due a lot sooner, you can see this in the plots. I've also added result by day of the week thanks to  statisticalsleuth's suggestion. Further there is a table of the top schools that adds Rejection postings as a % of total postings (I removed postings involving other).
     
  3. R nerds: added a function file behind the R code so the script wouldn't be so overwhelming...though it's still overwhelming. This file is necessary for running the script (see the script here). I've also added an index of schools and programs for the string matching algorithm. ALL OF THESE FILES NEED TO BE IN YOUR Working Directory for the procedure to run.

Finally I wanted to thank everyone for being supportive and giving me ideas. Best of luck during interviews! I've attached the PDF. You can also view it here. 

It's almost over....

Grad Cafe Decisions.pdf

Posted
23 hours ago, Left Skew said:

After a short hiatus (to maintain my sanity) I've updated some things!
 

  1. Cleaned Institution and Program - the past few days have been rough. I've realized what kind of garbage fire the Institution and program data from the results survey is. I emailed grad cafe in hopes they would standardize the text inputs for the aforementioned columns. You'd be surprised at how many different ways someone can type in "UCLA". Hopefully the data is more accurate but there is still a lot to do.
     
  2. Added more graphics. I've split the decision plots by clinical, not-clinical, and everyone combined because who cares about clinical applicants? Kidding. Their apps tend to be due a lot sooner, you can see this in the plots. I've also added result by day of the week thanks to  statisticalsleuth's suggestion. Further there is a table of the top schools that adds Rejection postings as a % of total postings (I removed postings involving other).
     
  3. R nerds: added a function file behind the R code so the script wouldn't be so overwhelming...though it's still overwhelming. This file is necessary for running the script (see the script here). I've also added an index of schools and programs for the string matching algorithm. ALL OF THESE FILES NEED TO BE IN YOUR Working Directory for the procedure to run.

Finally I wanted to thank everyone for being supportive and giving me ideas. Best of luck during interviews! I've attached the PDF. You can also view it here. 

It's almost over....

Grad Cafe Decisions.pdf

Honestly, if I were a professor I'd admit you RIGHTAWAY! This is both fantastic and creative. Seriously. Best of luck to you!

Posted
1 hour ago, Nut-ella said:

Honestly, if I were a professor I'd admit you RIGHTAWAY! This is both fantastic and creative. Seriously. Best of luck to you!

I wish you were a professor, preferably at one of the school in my signature. In all honesty, thank you for being so kind. I wish you the best of luck too. People love Nutella.

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

This website uses cookies to ensure you get the best experience on our website. See our Privacy Policy and Terms of Use