Jump to content

calculating your chances


ʕ •ᴥ•ʔ

Recommended Posts

Warning: nerdy and completely useless except as game; if I didn't want to spend a totally unreasonable number of hours engaged in pointless intellectual pursuits, I wouldn't be applying to grad school My math or reasoning in general might be off; if it is; call me on it. Note also that I'm being a Bayesian about things, so read "there is x% of y" as "you should estimate the probability of y at x%."

So, you've applied to some set of schools. For most of them, Peterson's lists their number of applicants, admissions rates, and actual number of attenders from each class. How would one produce an unbiased estimate of your chances of universal rejection (and, possibly, a few other things) from just this information? (Of course if you have more information than this, you would want to make the model more complex to incorporate that extra information, and I'd love to see models that incorporate GPA/GREs, &c.)

a : the number of admissions you will actually receive

n : the number of schools to which you have applied

pi : the admissions rate of the ith school to which you have applied (the order isn't important)

Prior assumption: for the schools for which you have applied, you have no particular reason to believe that you are especially more or less competitive than the typical applicant. This doesn't mean that you expect to be exactly in the middle - if that case you know you would be universally rejected, assuming admissions rates are all below 50% - but that you expect a 1% chance of being in the first percentile of competitive applicants, a 2% chance of being in 2nd percentile or better, a 3% chance of being in the third percentile or better, and so on. If you can accept this prior, your chances of being accepted into school i is, conveniently enough, px, and the average expected number of schools you will get into is

μa = Σni=1pi = p1 + p2 +p3 + ... + pn

or the additive sum of their admission rates. However, you don't know how well these are correlated with each other. If they're maximally correlated - they all admit students on precisely the same criteria - then your chances of a wipeout are equal to the complement of the most favorable admissions rate among your schools; if they are totally uncorrelated, your chances of a wipeout are equal to the multiplicative sum of their complements; if if maximally negatively correlated, then your chances of a wipeout are min(0,1 - μa). Common sense says that they should be positively but not maximally correlated, but how much? Fortunately you know

b : the number of admittances schools in your field send out divided by their number of graduate students per year, where "field" is selected such that its competitiveness roughly reflects the competitiveness of the set of schools to which you have applied

(Sneaky assumption: the number of those in your field you are admitted to grad school and choose not to go at all is zero, or at least small enough to be ignored.)

Thus we know that a randomly chosen applicant in your field - someone, by the first prior, who is as competitive as you - should expect, given that she is accepted into any schools, to get into b schools on average. If, as would be convenient, her expected total number of admittances including the possibility of wipeout is the same as yours, μa, then your/her chances of a wipeout, p(a=0) are

μa = 0*p(a=0) + b*(1-p(a=0))

μa/b = 1 - p(a=0)

p(a=0) = 1 - μa/b

If you haven't applied to the typical number of schools

However, this randomly chosen applicant, who is as competitive as you, isn't necessarily applying to as many schools as you - she's applying to n̄ of them, which might be more or less - although by assumption we suppose the schools she applies to are as competitive as your own. So in fact her expected number of admissions, μā = μan̄/n and

μan̄/n = 0*p(ā=0) + b*[1-p(ā=0)] = b*[1-p(ā=0)]

μan̄/bn = 1 - p(ā=0)

p(ā=0) = 1 - μan̄/bn

n̄ = bn * [1 - p(ā=0)] / p(ā=0)

If we knew n̄, we could know p(ā=0) as well - or visa versa - and thus

p(a=0) = p(ā=0)^(n/)

p(a=0) = (1 - μan̄/bn)^(n/) or p(a=0) = p(ā=0) ^ { p(ā=0) / b[1 - p(ā=0)] }

Can we produce n̄ or p(ā=0) independently? Unfortunately I don't see a way to do so, limiting yourself to the Peterson's data. Choose a number that seems reasonable for one or the other based on anecdotal evidence, or find some publicly available data (and post it here, ideally.) But either way an estimate of one should get you to p(a=0). This should also give you B=μa|a>0, the expected number of schools you get into in the event that you get into any schools at all:

μa = 0*p(a=0) + B*[1-p(a=0)]

B = μa / [1-p(a=0)]

p(a=0) = 1 - μa/B

Revising in light of results

All of the above assumes that you haven't heard back from any schools yet. If you get an acceptance or rejection, how should that affect your expectations of getting into other schools? Unfortunately the ratio of acceptances to grad students doesn't tell us what the distribution of acceptances among the admitted is.

Suppose you hear back from your first institution, University Q - an acceptance. Will you get into another? According to Bayes' theorem,

p(a>1)|into Q = pQ|a>1 * p(a>1) / pQ

Only pQ is a known constant, so we need to guess pQ|a>1 * p(a>1).

pQ|a>1 is the chance that, given that you got into more than one school, one of those schools was Q. This is equal to

pQ|a>1 = (pQ - pQ|a=1) / p(a>1)

so

p(a>1)|into Q = [ (pQ - pQ|a=1) / p(a>1) ] * p(a>1) / pQ

p(a>1)|into Q = (pQ - pQ|a=1) / pQ

(One intuitive, but clearly wrong, estimate of the chance of admittance to Q given only one admission is

pQ|a=1 = (pQ / μa) leading to

p(a>1)|into Q = [pQ - (pQ / μa)] / pQ

p(a>1)|into Q = 1 - (1 / μa)

This implies that an acceptance from one school is nearly as good a signal as a decision from another, and in fact that getting into an easier school should revise your expectations up more than getting into a harder school - prior to learning anything, you have a higher expectation of getting into at least one school other than your reach than getting into at least one school other than your safety, but in fact getting into your reach and into your safety brings their chances to the same level. In fact if there's any overlap between expected admissions at all then the chance of being admitted to an easier program but not a harder program is not only more likely than the reverse, but in a way that exaggerates their independent probabilities.)

One obvious method is to use recursion: imagine someone, as competitive as yourself, who applied to every program but the one you've just heard back from, i.e. μa2 = μa - pQ, n2=n-1,n̄ remains constant, and her field is your field, such that

B2 = (μa - pQ) / [(1- { p(ā=0) ^ [ (n-1) / ] }]

p(a2=0) = 1 - [ (μa - pQ)/B2 ]

In that case, p(a=0) - p(a2=0) = pQ|a=1, and - if we want to write out a big ridiculous equation -

p(a>1)|Q = {1+ pQ + {(1- [ p(ā=0)(n-1)/n̄ ](μa + pQ)}/(μa - pQ) - p(ā=0)n/n̄ } / pQ

p(I made some sort of obvious arithmetic mistake or worse) > 0.5, so the above is most likely nonsense. If it's right then calculating how to update your chances in case of a rejection should be trivial.

Edited by ʕ •ᴥ•ʔ
Link to comment
Share on other sites

Any curse words I can string together will be wholly inadequate in response.

LOL. I love this point at the end

p(I made some sort of obvious arithmetic mistake or worse) > 0.5, so the above is most likely nonsense.
Edited by newms
Link to comment
Share on other sites

I didn't understand anything beyond paragraph 4, but I'll just say it's because English isn't my first language.

But I like stuff like this. I'm a big fan of calculating everything, esp. probabilities... Love it! :)

Link to comment
Share on other sites

*English major brain explosion*

As a proper Haraway fan, you should consider any available calculators to be part of your brain. :P

Anyway, once you get past the notation, the concepts behind everything here are pretty simple. Here's what The Internet has to say about your schools:

Duke: 394 applicants, 5% (20) accepted, 12 enrolled.

Harvard: 405 applicants, 4% accepted.

Columbia: 627 applicants, 12% accepted.

UVA: 463 applicants, 15% (67) accepted, 33 enrolled.

Georgetown: (no data, so let's just arbitrarily assume it's 20%)

Wake Forest: 19 applicants, 79% (15) accepted, 7 enrolled.

First, let's take a look at what our expectations would be before you were rejected from Duke.

We're assuming, in our prior, that the admission rates reflect your actual chance of getting in. (As Bayesians, when we say "chance," we mean "justified expectation." Adcoms aren't rolling dice, they're looking at your GREs and SOP and so on and such forth. There may be no random elements to the process whatsoever. But since I don't know anything about you other than the schools you applied to, this seems like a good ground for at least my own expectations. If you believe you've got an edge that gives your application better odds than that, you might want to adjust your expectations above the base admission rates - assuming you have a good basis to believe that you have more of an edge than the randomly chosen student, who probably also believes she has an edge.) A nice fact that flows from this is that the average expected number of admissions - what we'll refer to as μa, μ just being the greek letter for m, which stands for "mean" - can be found by adding the admission rates together: 5+4+12+15+20+79=135, so you expect to get into 1.35 programs on average. You don't, of course, expect to literally get into 1.35 programs, but whatever else, it is the case that if you add together the chance you get into exactly one program, plus twice the chance you get into exactly two programs, and so on until six, you'll end up with 1.35 - at least as long as those independent probabilities (that is, for each individual school) hold. (I could prove this, if you like, or you could just accept it.) But this doesn't tell you whether you have a small chance of getting into lots of them or a very very good chance of getting into at least one.

For this, we look at at the enrollment rates. We could of course look at further comparable schools, but since I have no idea what the rankings of lit departments are and am far too lazy to look it up, let's go with what we have, which is that it seems that each department seems to send out twice as many acceptances as it actually enrolls. Since the only people that enroll are those who received at least one admission, and we're assuming no one would go through this horrid process unless they really wanted to head off to gradville, this means that those applicants in your field you received at least one admission offer received about two on average. (Maybe they all got two, maybe 10% get 11 and the rest get one, who knows?)

Now, if you're typical in the number of applications sent out, this means that you should expect that, if you got in anywhere, you got in to two programs on average. Since the alternative to getting in anywhere is getting in nowhere, we say that

μa = 0*p(a=0) + b*[1 - p(a=0)]

1.35 = 2*[1 - p(a=0)]

p(a=0) = 1 - 1.35/2 = 32.5%

("p(X)" just means "the odds that X")

This is actually a really terrible estimate because it's actually larger than your chance of not being admitted to Wake Forest alone. Let's relax the assumption that n=n̄ (that you're applying to a typical number of programs) and thus that b=B (that if you get in, you can expect to get into the same number of programs as other graduate students, on average.) Ex recto, applicants as competitive as you actually apply (on average) to eight programs, not six: n̄=12. We still, of course, assume that they get into two programs on average when they get into anything - that's what the data says, after all. Since the competitiveness of the schools they're applying to remains the same, we just multiply their expected total number of admissions expected by the increase in number:

(number of admissions per applicant in your field) = (your expected number of admissions)(the number of times more the typical competitor applied than you)

μā = μan̄/n

μā = 1.35(8/6) = 1.8

So they expect to get into 1.8 schools on average, and to get into 2 schools on average when they get in anywhere, meaning that they get in nowhere 10% of the time. Now if this person had applied to less schools - say, six - then they would expect to get into none

p(ā)^(n/n̄) = p(a=0)

0.16/8 = .1778

17.8% of the time, which is what you can expect, since this is your situation precisely. (I had to choose wonky numbers in order for everything to come out consistently- some combinations of selections imply that someone really is over- or under-reaching, all else being equal - but why we raise these chances to exponents of the number-of-applications ratio might be easier with a better example. Suppose that the applicant who applies to the sorts of schools that you do only applied to half of them on average, and had a 50% chance of not getting into any of them. If he applied to twice as many, he'd have a 50% chance of getting into somewhere in the first batch, and an independent 50% chance of getting into the second, meaning a 25% chance of getting rejected overall. ("Why are we treating them as independent when we know that they probably correlate?" you may be asking. The answer is that we're dealing with a toy person and keeping his level of competitiveness, whatever that might be, constant. Causally, if you applied to twice as many programs as you did (and it didn't cause you to skimp on your SOP or whatever), your chances of total rejection would go down in this simple geometric way, as none of the programs would affect whether you got into any of the other ones. However, when you learn that you have gotten into programs, that serves as evidence that you're one of the people that programs happen to like.)

Since you get in somewhere 82.2% of the time, and expect to get into 1.35 programs on average, you should expect to get into 1.64 programs when you get in somewhere.

However, we also know that you got rejected from Duke. How should that revise your expectations? Bayes' Theorem says that

p(H|D) = p(D|H) * p(H) / p(D)

("p(A|B )" is just statistics for "the odds that A, given that B")

H stands for hypothesis, or our prior expectation, D for the new data we've just encountered. (Not everyone thinks visually, but if you do, it may help to associate Bayes' Theorem with a sort of two-by-two graph, with one axis divided between H and non-H and one between D and not-D, each cell expressing a number, and the numbers in the four cells adding up to 100. Interpret p(H) as the sum of the H row, p(D) as the sum of the D column, p(H|D) as the portion of the D column's numbers lying in the H-and-D cell, and p(D|H) as the portion of the H row lying in the H-and-D cell. Play around with squares like these on some loose leaf paper, seeing what information you can extract from other information. You should eventually grok it.)

In this case, since we want to know the chances that you got into nowhere, given that you didn't get into Duke,

H: you didn't get in anywhere

D: you didn't get into Duke

Fortunately all of the relevant terms are pretty easy:

p(D|H) is the chance that you didn't get into Duke, given that you didn't get in anywhere. This is equal to 1, obviously.

p(H) we already calculated, it's 17.8%. Note that it's our prior estimation that matters for calculating this term, before you knew Duke rejected you.

And p(D) is just the prior chance that you were going to be rejected from Duke, which we already knew was 95%.

So the new chance that you're screwed is

p(a=0|blue devils noooo) = 1 * .1778 / .95 = .1872

Meaning that your chances of making it to graduate school plummeted from a soaring 82.2% to a miserable 81.3% - literally not even a percentage point, which is the biggest increment that our base data measures anyway (leaving aside the fact that we just made a couple figures up.) In other words, you're still at very significant risk of getting an English degree.

(I hope you don't think that I'm picking on you; if it bothers you that I used you as an example, I'll genericize everything, and hope you accept my apologies - I've just found that many students find it more concrete to substitute their own situations into a model, is all. I'm also never completely sure, on the internet, where to draw the line between overcomplicated and patronizing - if you (the generic you, not just cyborg) find I err too much in the former direction, I can try to explain from a different angle; if too much on the latter, just assume that I'm dumbing things down for all those other people.)

Edited by ʕ •ᴥ•ʔ
Link to comment
Share on other sites

Oh this is excellent nerdiness. Truly wonderful. I've amused myself for a full hour calculating and researching. Thank you.

I'll caution though that, as with everything, the veracity of the data is key--and I'm not sure how reliable the data is. Even if the data is correct, there are some built in assumptions.

First, there's an assumption here that the department is both getting the same number of applications and has the same number of spots as last year. That's not necessarily true. A lot of schools have had their budgets cut or simply decided to have smaller departments, and particularly departments with names ending in "Studies" seem to be getting smaller.

Also, there's no breakdown for departments which grant multiple degrees--and English departments tend to grant a lot of different degrees. Take the University of Iowa. They have a few different PhD tracks, a terminal MA, and the most competitive MFA program in the country. All this site says is that they have almost 1,500 applicants and accept 8% of them. If you're applying for a PhD at Iowa, these numbers are meaningless because of all of the MFA applicants.

That said, Thank you.

Link to comment
Share on other sites

(I hope you don't think that I'm picking on you; if it bothers you that I used you as an example, I'll genericize everything, and hope you accept my apologies - I've just found that many students find it more concrete to substitute their own situations into a model, is all. I'm also never completely sure, on the internet, where to draw the line between overcomplicated and patronizing - if you (the generic you, not just cyborg) find I err too much in the former direction, I can try to explain from a different angle; if too much on the latter, just assume that I'm dumbing things down for all those other people.)

Haha, don't worry, I don't think you're picking on me at all. Thanks for trying to explain! And your calculators can be part of my brain as long as they keep doing all of the calculating work for me. :)

Link to comment
Share on other sites

Oh this is excellent nerdiness. Truly wonderful. I've amused myself for a full hour calculating and researching. Thank you.

I'll caution though that, as with everything, the veracity of the data is key--and I'm not sure how reliable the data is. Even if the data is correct, there are some built in assumptions.

First, there's an assumption here that the department is both getting the same number of applications and has the same number of spots as last year. That's not necessarily true. A lot of schools have had their budgets cut or simply decided to have smaller departments, and particularly departments with names ending in "Studies" seem to be getting smaller.

The weasely answer is that, if there aren't distinct trends, rates should be unbiased, and thus so should average admittances. This doesn't work for every function of admission rates, though, unless the distribution is really wonky. I feel like it should baaaaaasically work up until you get admitted somewhere, and then you'd need an idea of what the distribution is, at least if interannual variance is appreciable.

If you do know that there's a trend, adjust admission rates by whatever you expect the trend to be (I can't think of any better basis than intuition for its value, but if you're familiar with your field your intuition might be pretty good.)

Also, there's no breakdown for departments which grant multiple degrees--and English departments tend to grant a lot of different degrees. Take the University of Iowa. They have a few different PhD tracks, a terminal MA, and the most competitive MFA program in the country. All this site says is that they have almost 1,500 applicants and accept 8% of them. If you're applying for a PhD at Iowa, these numbers are meaningless because of all of the MFA applicants.

Yeah, this is much more meaningful for those applying to schools that don't offer terminal master's programs, or who are shooting for a PhD but would settle for an MA if offered as consolation prize, &c. If your department offers something qualitatively different like an MFA or MPP it's no good (unless you want to estimate their proportion of the program.)

Glad you liked it :)

Link to comment
Share on other sites

Hah, I've seriously thought about doing this analysis but I was deterred by the fact that there are too many simplifying assumptions to make the analysis very useful. Glad you weren't deterred (you must not be as anal as I am) so I can use your work. :)

Also, reading this made me realize that this message board needs some sort of TeX equation interpreter.

Edited by was1984
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

This website uses cookies to ensure you get the best experience on our website. See our Privacy Policy and Terms of Use