Left Skew Posted December 28, 2017 Share Posted December 28, 2017 Hello everyone, I've seen quite a bit of chatter around the timing of results, and the answers are obscure. I wanted to provide my services, and give you all some extra data. The beauty of grad cafe is that it is one of the most (if not the most) centralized data source for graduate students. Some data examples: What schools are most popular, what degrees, applicant metrics, dates of results, etc. Yet, I've yet to find someone that uses this information to help those applying. Until now... I used R (for those unfamiliar) to scrape the grad cafe results data related to psychology. I returned about 35000 records. There were a lot of un-tidy text strings (e.g., PhD and Ph.D.), especially in relation to program-title and institution; however, that is a project for another day. Data regarding dates and decisions did seem clean enough for me to analyze it and turn it into something nice for you all. So what is it....? Here is a PDF based on the results of all Psychology students (over 35000) since the origin of grad cafe. The first 3 tables show the 10 most common dates (by count) for getting an interview, getting accepted, and getting rejected ? . I would do relative frequency and should've...maybe tomorrow. Then a longitudinal line graph (which shows these decisions throughout the year) Finally, one that focuses more on the "critical-period" when most decisions are made. Conclusion This was a piecemeal job that I should be done in R Markdown. I'll link my code at the end. Similar methods could be used to look at the "most popular" programs, probability of acceptance by degree type or program type, average GRE score for accepted candidates, and the list goes on. If anyone wants a csv of the dataset I used, please feel free to message me. I also welcome critiques and suggestions. I wish you all the best. R code here loffire, Magic7090, psych0 and 16 others 7 12 Link to comment Share on other sites More sharing options...
psych0 Posted December 28, 2017 Share Posted December 28, 2017 (edited) This is awesome!! One suggestion--can you make the second and third figures' x-axis start earlier (like in November) to be able to see the invites continuously? Also, would you be willing to share your code for the scraping piece of the project? Edited December 28, 2017 by psych0 Link to comment Share on other sites More sharing options...
Boronmage Posted December 28, 2017 Share Posted December 28, 2017 Dude this is amazing! Thank you so much. Ill be studying the R code Link to comment Share on other sites More sharing options...
Left Skew Posted December 28, 2017 Author Share Posted December 28, 2017 12 hours ago, Left Skew said: Hello everyone, I've seen quite a bit of chatter around the timing of results, and the answers are obscure. I wanted to provide my services, and give you all some extra data. The beauty of grad cafe is that it is one of the most (if not the most) centralized data source for graduate students. Some data examples: What schools are most popular, what degrees, applicant metrics, dates of results, etc. Yet, I've yet to find someone that uses this information to help those applying. Until now... I used R (for those unfamiliar) to scrape the grad cafe results data related to psychology. I returned about 35000 records. There were a lot of un-tidy text strings (e.g., PhD and Ph.D.), especially in relation to program-title and institution; however, that is a project for another day. Data regarding dates and decisions did seem clean enough for me to analyze it and turn it into something nice for you all. So what is it....? Here is a PDF based on the results of all Psychology students (over 35000) since the origin of grad cafe. The first 3 tables show the 10 most common dates (by count) for getting an interview, getting accepted, and getting rejected ? . I would do relative frequency and should've...maybe tomorrow. Then a longitudinal line graph (which shows these decisions throughout the year) Finally, one that focuses more on the "critical-period" when most decisions are made. Conclusion This was a piecemeal job that I should be done in R Markdown. I'll link my code at the end. Similar methods could be used to look at the "most popular" programs, probability of acceptance by degree type or program type, average GRE score for accepted candidates, and the list goes on. If anyone wants a csv of the dataset I used, please feel free to message me. I also welcome critiques and suggestions. I wish you all the best. R code here I've updated the code and added some "School" stuff (the school variable was also not the cleanest). I question my text-cleaning ability, so that is an area for improvement.Here is a table of the data.frame @psych0 : I've attached the scraping portion (if you want to add more context please look at the R code) I'll also work on flipping the dates, R doesn't like date extraction as much but I'm sure I can find a workaround. Best, Scrape Grad Cafe.txt Link to comment Share on other sites More sharing options...
Left Skew Posted December 28, 2017 Author Share Posted December 28, 2017 2 hours ago, Boronmage said: Dude this is amazing! Thank you so much. Ill be studying the R code Thank you! Hopefully it helps But... This code is not clean, it's functional, it works but don't use it to impress anyone-- they may heckle you Link to comment Share on other sites More sharing options...
PsyZei Posted December 28, 2017 Share Posted December 28, 2017 This is fantastic, thank you for sharing! Link to comment Share on other sites More sharing options...
statisticalsleuth Posted December 28, 2017 Share Posted December 28, 2017 Fun stuff. Would be interesting to see if you were to adjust for day of the week by month. - Link to comment Share on other sites More sharing options...
psych0 Posted December 28, 2017 Share Posted December 28, 2017 3 hours ago, Left Skew said: I've updated the code and added some "School" stuff (the school variable was also not the cleanest). I question my text-cleaning ability, so that is an area for improvement.Here is a table of the data.frame @psych0 : I've attached the scraping portion (if you want to add more context please look at the R code) I'll also work on flipping the dates, R doesn't like date extraction as much but I'm sure I can find a workaround. Best, Scrape Grad Cafe.txt Awesome, thanks! Link to comment Share on other sites More sharing options...
Left Skew Posted December 29, 2017 Author Share Posted December 29, 2017 On 12/28/2017 at 7:42 AM, psych0 said: This is awesome!! One suggestion--can you make the second and third figures' x-axis start earlier (like in November) to be able to see the invites continuously? Also, would you be willing to share your code for the scraping piece of the project? I fixed the "closer look" plot- ordering from Nov to May. Also, all I added a Program Type table on page 2. Be skeptical of all the program/school info due to the lack of clean data. I'm working in a string-matching function now, in hopes that I can get more accurate numbers related to those variables. I don't want people to worry if they missed a peak day. I estimate around half of the sample is Clinical Psych, which (to my knowledge) tend to have a few November deadlines, this will skew the data a bit. Updated version Thanks everyone for working on this with me! psytillidie, courtnord and SocialPsych2018 3 Link to comment Share on other sites More sharing options...
Left Skew Posted January 11, 2018 Author Share Posted January 11, 2018 After a short hiatus (to maintain my sanity) I've updated some things! Cleaned Institution and Program - the past few days have been rough. I've realized what kind of garbage fire the Institution and program data from the results survey is. I emailed grad cafe in hopes they would standardize the text inputs for the aforementioned columns. You'd be surprised at how many different ways someone can type in "UCLA". Hopefully the data is more accurate but there is still a lot to do. Added more graphics. I've split the decision plots by clinical, not-clinical, and everyone combined because who cares about clinical applicants? Kidding. Their apps tend to be due a lot sooner, you can see this in the plots. I've also added result by day of the week thanks to statisticalsleuth's suggestion. Further there is a table of the top schools that adds Rejection postings as a % of total postings (I removed postings involving other). R nerds: added a function file behind the R code so the script wouldn't be so overwhelming...though it's still overwhelming. This file is necessary for running the script (see the script here). I've also added an index of schools and programs for the string matching algorithm. ALL OF THESE FILES NEED TO BE IN YOUR Working Directory for the procedure to run. Finally I wanted to thank everyone for being supportive and giving me ideas. Best of luck during interviews! I've attached the PDF. You can also view it here. It's almost over.... Grad Cafe Decisions.pdf cupofsugar, HigherEdPsych, ZachOxford and 2 others 2 3 Link to comment Share on other sites More sharing options...
Nut-ella Posted January 12, 2018 Share Posted January 12, 2018 23 hours ago, Left Skew said: After a short hiatus (to maintain my sanity) I've updated some things! Cleaned Institution and Program - the past few days have been rough. I've realized what kind of garbage fire the Institution and program data from the results survey is. I emailed grad cafe in hopes they would standardize the text inputs for the aforementioned columns. You'd be surprised at how many different ways someone can type in "UCLA". Hopefully the data is more accurate but there is still a lot to do. Added more graphics. I've split the decision plots by clinical, not-clinical, and everyone combined because who cares about clinical applicants? Kidding. Their apps tend to be due a lot sooner, you can see this in the plots. I've also added result by day of the week thanks to statisticalsleuth's suggestion. Further there is a table of the top schools that adds Rejection postings as a % of total postings (I removed postings involving other). R nerds: added a function file behind the R code so the script wouldn't be so overwhelming...though it's still overwhelming. This file is necessary for running the script (see the script here). I've also added an index of schools and programs for the string matching algorithm. ALL OF THESE FILES NEED TO BE IN YOUR Working Directory for the procedure to run. Finally I wanted to thank everyone for being supportive and giving me ideas. Best of luck during interviews! I've attached the PDF. You can also view it here. It's almost over.... Grad Cafe Decisions.pdf Honestly, if I were a professor I'd admit you RIGHTAWAY! This is both fantastic and creative. Seriously. Best of luck to you! Link to comment Share on other sites More sharing options...
Left Skew Posted January 12, 2018 Author Share Posted January 12, 2018 1 hour ago, Nut-ella said: Honestly, if I were a professor I'd admit you RIGHTAWAY! This is both fantastic and creative. Seriously. Best of luck to you! I wish you were a professor, preferably at one of the school in my signature. In all honesty, thank you for being so kind. I wish you the best of luck too. People love Nutella. Nut-ella 1 Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now