brewdata Posted January 31, 2015 Posted January 31, 2015 Hi All, I've seen some really nice scripts that scrape the Grad Cafe, but none had all the features I wanted. I wrote some of my own functions and put them into an R package called brewdata. If you're also interested in using R to parse Results Search data, then you can find brewdata on CRAN ( http://cran.r-project.org/web/packages/brewdata/). Please email or PM me with any suggestions or bugs you find. I'd welcome the chance to work with anyone interested in making their own improvements. Thanks! NW cyberwulf, wine in coffee cups, dynamic89 and 1 other 4
cyberwulf Posted February 1, 2015 Posted February 1, 2015 This is great! One could create a really awesome Shiny app out of this...
wine in coffee cups Posted February 1, 2015 Posted February 1, 2015 Awesome! My main suggestion is to have the data frame returned by brewdata() contain the original program name as a column. Setting map=TRUE lets you get the school name, but I think it makes sense to also return the program name. That way users can remove false positives, e.g. exclude programs like "Educational Psychology - Learning Sciences (Research, Measurement, And Statistics)" from statistics-related results. This seems really important for disciplines like math, where searching for "math*" gets you both pure and applied programs, which are impossible to disentangle without the program name. I also suggest changing the default query to "(stat|stats|statis*)". You actually miss out on a decent number of Duke results because their program is formally called "Statistical Science", for example.
brewdata Posted February 4, 2015 Author Posted February 4, 2015 Thanks cyberwulf & wine in coffee cups! @cyberwulf: Never used Shiny, but I know some swear by it. The examples I saw were great. I'll see how far I can go with the R package. Is that how you use shiny? @wine in coffee cups: I'll adjust the data frame returned and see what I can do about the default search. Certainly do not want to miss any records since many people opt not to share their 'metrics'. I'll roll these (and other fixes) into the next CRAN submission. Thanks again for the tips and feedback!
statisticsfall2014 Posted February 9, 2015 Posted February 9, 2015 One of my friends wrote a post about this.. nice package!!! http://minimallysufficient.github.io/2015/02/08/gradcafe.html bhand and Page228 2
StatsG0d Posted February 9, 2015 Posted February 9, 2015 That was pretty interesting. I could be naive, but wouldn't a dummy for GRE scores be more appropriate? The data are not really continuous (unlike GPA).
statisticsfall2014 Posted February 9, 2015 Posted February 9, 2015 Yeah but you would need a lot of dummies (say for 160, 161, 162.....) since there's a lot of different scores :0! Way to go on the acceptances!
StatsG0d Posted February 10, 2015 Posted February 10, 2015 Yeah but you would need a lot of dummies (say for 160, 161, 162.....) since there's a lot of different scores :0! Way to go on the acceptances! I suppose if you believed the cutoff was x, you could just make one dummy whenever the variable is >= x? Thanks a lot! I'm actually quite surprised at the outcome thus far.
statisticsfall2014 Posted February 10, 2015 Posted February 10, 2015 ooh yeah I gotcha, yeah I think he's along the same line of reasoning as when he says:: "I imagine that a cutoff model would be more appropriate" I think one of the next interesting is comparing this data with actual data that some schools publish (like Duke, UW, Etc..), then we can maybe get a better idea of how representative TGC data is.
StatsML15 Posted February 13, 2015 Posted February 13, 2015 Am I the only one that finds it hilarious that such a package even exists?
brewdata Posted February 14, 2015 Author Posted February 14, 2015 One of my friends wrote a post about this.. nice package!!! http://minimallysufficient.github.io/2015/02/08/gradcafe.html That's great. Really enjoyed reading the post. The footnote about homework procrastination is the best part. Am I the only one that finds it hilarious that such a package even exists? Glad to see it brighten your day! I had fun putting it together.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now