brewdata: Extracting Usable Data from the Grad Cafe Results Search

brewdata · January 31, 2015

Hi All,

I've seen some really nice scripts that scrape the Grad Cafe, but none had all the features I wanted.

I wrote some of my own functions and put them into an R package called brewdata. If you're also interested in using R to parse Results Search data, then you can find brewdata on CRAN ( http://cran.r-project.org/web/packages/brewdata/).

Please email or PM me with any suggestions or bugs you find. I'd welcome the chance to work with anyone interested in making their own improvements.

Thanks!

NW

cyberwulf · February 1, 2015

This is great!

One could create a really awesome Shiny app out of this...

wine in coffee cups · February 1, 2015

Awesome! My main suggestion is to have the data frame returned by brewdata() contain the original program name as a column. Setting map=TRUE lets you get the school name, but I think it makes sense to also return the program name. That way users can remove false positives, e.g. exclude programs like "Educational Psychology - Learning Sciences (Research, Measurement, And Statistics)" from statistics-related results. This seems really important for disciplines like math, where searching for "math*" gets you both pure and applied programs, which are impossible to disentangle without the program name.

I also suggest changing the default query to "(stat|stats|statis*)". You actually miss out on a decent number of Duke results because their program is formally called "Statistical Science", for example.

brewdata · February 4, 2015

Thanks cyberwulf & wine in coffee cups!

@cyberwulf: Never used Shiny, but I know some swear by it. The examples I saw were great. I'll see how far I can go with the R package. Is that how you use shiny?

@wine in coffee cups: I'll adjust the data frame returned and see what I can do about the default search. Certainly do not want to miss any records since many people opt not to share their 'metrics'.

I'll roll these (and other fixes) into the next CRAN submission. Thanks again for the tips and feedback!

statisticsfall2014 · February 9, 2015

One of my friends wrote a post about this.. nice package!!! http://minimallysufficient.github.io/2015/02/08/gradcafe.html

StatsG0d · February 9, 2015

That was pretty interesting. I could be naive, but wouldn't a dummy for GRE scores be more appropriate? The data are not really continuous (unlike GPA).

statisticsfall2014 · February 9, 2015

Yeah but you would need a lot of dummies (say for 160, 161, 162.....) since there's a lot of different scores :0! Way to go on the acceptances!

StatsG0d · February 10, 2015

Yeah but you would need a lot of dummies (say for 160, 161, 162.....) since there's a lot of different scores :0! Way to go on the acceptances!

I suppose if you believed the cutoff was x, you could just make one dummy whenever the variable is >= x?

Thanks a lot! I'm actually quite surprised at the outcome thus far.

statisticsfall2014 · February 10, 2015

ooh yeah I gotcha, yeah I think he's along the same line of reasoning as when he says:: "I imagine that a cutoff model would be more appropriate"

I think one of the next interesting is comparing this data with actual data that some schools publish (like Duke, UW, Etc..), then we can maybe get a better idea of how representative TGC data is.

StatsML15 · February 13, 2015

Am I the only one that finds it hilarious that such a package even exists?

brewdata · February 14, 2015

One of my friends wrote a post about this.. nice package!!! http://minimallysufficient.github.io/2015/02/08/gradcafe.html

That's great. Really enjoyed reading the post. The footnote about homework procrastination is the best part.

Am I the only one that finds it hilarious that such a package even exists?

Glad to see it brighten your day! I had fun putting it together.

Sign In

brewdata: Extracting Usable Data from the Grad Cafe Results Search

Recommended Posts

brewdata

cyberwulf

wine in coffee cups

brewdata

statisticsfall2014

StatsG0d

statisticsfall2014

StatsG0d

statisticsfall2014

StatsML15

brewdata

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity

Search

Results

Important Information