FreeRadical Posted February 27, 2014 Posted February 27, 2014 (edited) This is what happens when you stress out a data geek for too long... Using R, I scraped the data you would get by searching for "epidemiology" in the Results Search. The dates are from the "Decision & Date" column, not the "Date Added" column. The fourteen bins in each histogram roughly correspond to the 14 weeks between January 1 and April 15. This also excludes people who put "Other" for degree type (not PhD or Masters), reported something other than acceptance or rejection, or applied for something other than the fall term. The sample size is 779. I'm curious to hear your interpretations of the data. Anything else I could look at? Edited February 27, 2014 by FreeRadical Quant_Liz_Lemon, Methylmadness, themmases and 1 other 4
TakeruK Posted February 27, 2014 Posted February 27, 2014 (edited) I don't know how useful/meaningful this will be but here are some fun stuff you could do! Note: I think a lot of the suggestions below have tons of other caveats and issues that come with looking at large datasets that might not be properly representative etc but they are still fun: 1. For PhDs, visually, it looks like the peak in the rejected distribution is later than the peak in the accepted distribution. This agrees with our experience that schools tend to send acceptances first then rejections later. However, the bin difference is only one, which means I would be concerned about the effects of binning. Can you bin by half-weeks instead? Or be more strict with your bin edges so that each bin covers an entire business week? 2. In general, I am wary of histograms because of binning. How about doing something like this instead: a ) For each year, determine "folded" time of decision by determining the number of days since Jan 1 of that year where the decision was made. It seems like you have already done this for the histogram, but I would say do this by day so that you don't lose information by binning. b ) PhD decision distribution seems like it is normally distributed, so assume that it is, and then determine the best fitting mean and standard deviation of the distribution that describes the acceptances/rejections (individually). I don't think it would work for Masters though. c ) Compare the mean values for each decision type and the standard deviation of each decision. Are they statistically significant from each other? My guess from visual inspection is that no, the difference in peak should be much less than the 68% interval. 3. You can go even further! You can ask if it's likely that the two distributions (acceptances and rejections) are drawn from the same overall distribution. This can answer the question "does the day of the decision matter?". You can do this by comparing both PhD accepts vs. PhD rejections but also PhD accepts vs. Masters accepts. a ) One easy (but not super duper great) way to do this is the Kolmogorov-Smirnov test. It's not ideal because it assumes a ton of stuff about the data distribution, but basically you want to compare two distributions by making a cumulative probability plot (percentage of total population on y-axis, days on x-axis) and then determine the largest vertical separation between the two distributions. Then there's some statistic thing that you can do to determine the probability that the two observed populations (e.g. PhD accept dates, PhD reject dates) actually come from the same distribution (i.e. the null hypothesis). You probably know what I mean? I usually have to look this up every time I try to do it. b ) There are Bayesian methods of model comparisons too, but I don't really know how to do that, right now! From visual inspection, it really does not seem like there is any real correlation between date and decision for any degree or any decision when we combine all the programs. I think if you did it on a per-school basis, there would be differences but the sample size would be much smaller (and also searching one school at a time is easy enough on the Results Survey database that we don't need fun plots ) Happy data-analysing Edited February 28, 2014 by TakeruK
FreeRadical Posted February 28, 2014 Author Posted February 28, 2014 Here's by day of the week, rather than week of the year (as above): Looking forward to Friday tomorrow! atlremix 1
Joan_bunny Posted February 28, 2014 Posted February 28, 2014 woow... Happy Friday then. But it seems to make me even more nervous... The POI of UMich did say the outcome would come by the end of Feb, which is tomorrow!
FreeRadical Posted February 28, 2014 Author Posted February 28, 2014 (edited) A few things seem to stick out to me... -MPH epidemiology applicants are much more likely to report acceptances than rejections. This is probably either the result of reporting bias and/or the fact the MPH programs are easier to get into. You don't see the same discrepancy among PhD applicants. The potential for reporting bias is what makes me hesitant to do tests of statistical significance; it's easier to conceptualize the effect of bias when looking at descriptive statistics like this. -Among PhD applicants, acceptances are most common from the middle to the end of February. Rejections are most common around the beginning of March. -Among PhD applicants, acceptances seem to drop off dramatically after the first week in March while rejections taper off more gradually over time. -Friday is the most likely day to get a notification (either acceptance or rejection) among PhD applicants. This seems to be the case with MPH applicants too but to a lesser degree. Edited February 28, 2014 by FreeRadical
PhDbefore40 Posted February 28, 2014 Posted February 28, 2014 This is so great! I hope today is our day for some PhD responses, although you and I are waiting for different schools FreeRadical.
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now