Top-heaviness and Noise in Admissions

discreature · July 25, 2020

This is a question with probably no uniform answer and the truth is likely much more complicated than what I pose here. Regardless, I've been wondering about the structure of graduate admissions lately, particularly to T10 Ph.D programs in Statistics. Particularly, how top-heavy are applicant profiles? To elaborate, I imagine an "essentially perfect" profile has a student from a top undergraduate program with a 3.9+ GPA, 166+Q, 163+V, multiple research experiences, glowing LoRs from at least one famous faculty member, and completion of many high-level math courses. How many students like this are applying? I imagine there are about 120 or so spots in total at T10 universities. Are there more "essentially perfect" applicants than there are spots? If so, would the hairs be split upon the more "soft" traits? Like how much a certain reader knows an applicant's recommenders, or the impression that a reader gets from one's research experiences? My assumption would be that there are not that many "essentially perfect" applicants, as most universities include some statement about how incoming students without all desired background can make it up in the first year.

This also makes me wonder about noise in admission. Implicitly assumed above is that (1) there is a true "ranking" of applicants and (2) faculty are able to discern this true ranking, and given a list of applicants they can sort it by this ranking and then admit the top 20 or so. So, I question these two assumptions.

I believe in Biostatistics and many other fields, assumption (1) would be quite heavily violated. Many people in these fields apply directly to certain labs and are expected to know their research interests before entering. So, each subfield in these fields might have their ranking, but Student A specializing X might be incomparable to Student B specializing in Y. In these cases, the current department makeup and the makeup of the applicants could introduce noise to the admission. Perhaps Student A has a very strong profile, but specialization X only has one possible advisor at this university, so it is very easy for Student A to be rejected. I don't heavily believe this to be the case in statistics, at least based on discussions with Ph.D students and faculty it seems that a specialization is not by any means expected when applying, so this effect might arise less. One possible counterargument is that Student A might have very strong math skills while Student B has strong computing skills, but I'd intuit that departments prefer math skills. Overall, I'd imagine Stats applicants lightly violate (1), but generally follow it.

The one that I am most unclear on is assumption (2). As it sort of rests on assumption (1), let us assume this. Consider $n$ applicants to University A who typically accepts $k$ students where $k<<n$. Now, let us enumerate the students and order them based on the "true rank" that exists (where a rank of 1 is the best and $n$ is the worst). So, a university that entirely satisfies (2) will admit the top $k$ students, and get an average ranking of $(k+1)/2$. Let's take the average "true rank" of a university to be our measure of assumption (2). So, the lower bound (AKA best case) would be an average true rank of $(k+1)/2)$ while the upper bound (a university that selects the $k$ worst applicants) would have an average true rank of $(2n-k+1)/2$. For $n=400$ and $k=40$ our best case is an average true rank of 20.5 and our worst case is an average true rank of 380.5. So, where in this interval do most universities lie? I imagine that you would never get a university that is close to the worst case, as there are initial screenings and also, it is clear that the top applicants would be visibly different from the bottom applicants, but how close do they get to the best case?

As stated above I'm not exactly expecting an answer to this question, but would love to discuss this with others if anyone has insight on this or perhaps a strong case for or against assumptions (1) and (2), or point estimates on the number of "essentially perfect" applicants. I'd also like to add the disclaimer that this is definitely a gross oversimplification of the process, and moves a bit close to the "quantify people as just one number" mindset, which can be damaging. These are mostly just random quantitative musings that I found interest in and find myself personally involved in. As always the words of George Box are important — "All models are wrong, but some are useful."

July 25, 2020

Your assumption about biostatistics departments is generally not correct - admissions are generally handled at the department level. Chicago statistics is the only school I know that explicitly mentions your profile will be reviewed by professors with similar research interests.

I don't think this situation you're imaging exists, because the problem of having a bunch of good candidates who are hard to distinguish is not true. Even if there are 100 domestic applicants with high GPAs and perfect GREs, then you group them by the prestige of their undergrad, their research experiences, their letters -- there are decisions to be made on the boundary, but if I showed you the profiles of Stanford and CMU's new class or Washington biostat vs Michigan biostat, you could probably guess which is which -- not everyone at a top 10 program is a genius, so there is lots of variation even at the top.

Secondly, there is no true ranking. Some programs have different goals, cultures, and want to accept different types of applicants. A ranking of applicants would need a define criteria (eg "most successful academic career ahead"), and those are different for different departments. "will succeed in completing a PhD" is probably the most important criteria, and that would only produce a categorization, not a ranking. But these are just irrational humans reading a set of documents and choosing which other humans they want to offer to come to school. I don't see what you gain from conceptualizing a "true ranking" -- you have to be clear about what this means or measures or the idea is meaningless. "Quantifying people as just a number" isn't just a moral issue because it could be dehumanizing or because it's hard to estimate -- these numbers literally do not exist in any meaningful way.

Most importantly, this is not how admissions works, so even if the applicants are hard to distinguish in this ranking system, the school doesn't need to stress too much about ranking the top 20, because of self-assortment you can send 30 offers, 30 waitlists, and let the students self-assort. Why stress differentiating between applicants 3 and 4 if they're both going to go to a different school?

Edit: To be clear, I enjoyed thinking about this and it brings up interesting issues/gets the wheels churning about how to think of the problem in a statistical way, but I just don't think it has much basis in the real world.

discreature · July 25, 2020

@bayessays Thank you for this response! It's hard to understand how admissions works as an undergrad, so the most simple way is to assume that there's a clean system of ranking and a simple deterministic way in which these things work. I think I do prefer that it is as you claim — a perfect system in which everyone has a "true ranking" and the universities simply sort based on that would be a bit demoralizing as well as more boring, so I'm glad it is not like that. I imagine the "true ranking" idea is akin to projecting a vector in 100-dimensional Euclidean space to a single real number — almost all meaningful information is lost. It's also reassuring to hear that not everyone at each top 10 program is a genius.

jelquiades · July 25, 2020

6 hours ago, discreature said:

I imagine the "true ranking" idea is akin to projecting a vector in 100-dimensional Euclidean space to a single real number — almost all meaningful information is lost.

Same with ranking schools :^)

cyberwulf · July 28, 2020

It's also important to remember the role that applicant self-selection plays in the process. Most applicants won't apply to every top 10 program, so each admissions committee only ranks a subset of the applicant pool. This actually helps a lot; there would indeed be significant noise if every admissions committee had to rank the top 100 applicants to statistics programs, but things become a lot more stable when programs only have to decide who to admit from among a smaller group. Consider, for example, a school that is ranked between #5 and #10 in the country. It might attract ~20 of the top 100 applicants (some don't apply below top 5 and others just aren't interested in that program for various reasons). Assuming it accepts ~30 students per year, it's likely that most of these top 100 applicants will be admitted, because they are being compared to applicants that aren't among the top 100. Even for top programs like Stanford, the same logic applies, except perhaps replacing "top 100" by "top 50" (since Stanford might perceive a meaningful difference between a top 50 and non-top 50 applicant).

Sign In

Top-heaviness and Noise in Admissions

Recommended Posts

discreature

Guest

discreature

jelquiades

cyberwulf

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity

Search

Results

Important Information