Statistics Ph.D Necessary Coursework

discreature · October 8, 2020

There've been a few threads recently that have touched on this, but I'm interested in asking directly: How much does Ph.D-level stats coursework differ school-to-school and what's your opinion on what should be taught to first/second year PhDs? There was a discussion recently in which someone stated they believe that most first-year sequences are outdated and do not teach the necessary topics for modern statistical research — do you agree? As I understand most schools will have a probability sequence (typically measure-theoretic) and some sort of theoretical statistics sequence (I hear Casella & Berger brought up a lot here). Are there other staples? Do you feel that these are all important to becoming a research-statistician? What's coursework that many are missing?

In a practical sense, I'm going to have some space to take on courses that could potentially overlap with a Ph.D program and am looking to take things that will help me later on that might not be present in all curricula. One example of something that people have said would be good to take (although may not be in my curricula) is convex optimization.

Edited October 8, 2020 by discreature

Stat Assistant Professor · October 8, 2020

The most "typical" required coursework seems to be:

2 semesters of Casella & Berger mathematical statistics
2 semesters of applied statistics (based on the book "Applied Linear Statistics" by Kutner et al.)
1 semester of statistical computing
1 or 2 semesters of measure theoretic probability
1 semester of linear models theory
1 or 2 semesters of advanced statistical inference

Some elite PhD programs like Stanford and UPenn Wharton skip the first two sequences above because the students they admit are fairly advanced already.

Anyway: my opinion is that the typical first-year courses are fine for the most part, though they certainly should be updated to incorporate current research topics. If an entering student has not already had much exposure to statistics at the graduate level, then I think it's fine to teach the topics like linear regression, ANOVA, GLM/categorical data analysis, and theory of sufficient statistics, point estimation, hypothesis testing, etc. in detail... though I definitely agree that some of their curricula should be updated. For example, at my PhD program, an entire semester was devoted to different ANOVA/ANCOVA models, including things like split plot design, etc. That seemed a bit excessive to me -- usually, you only need to go over a couple of ANOVA models in detail to get the general gist. So if I were on the PhD curriculum committee, I would probably "modernize" the applied stats sequence (and the statistical computing class) to spend less time on design of experiments and include more modern topics.

Additionally, the advanced statistical inference courses (i.e. the theoretical statistics course(s) you take in the second or third year) at many programs do seem to focus on some topics that are dated. For example, at some schools, you learn to cross every "t" and dot every "i" for "classical" topics like UMP tests, UMVUE, equivariance, likelihood principle, etc., which isn't necessarily helpful for modern statistics research.

I would probably repurpose the advanced statistical inference classes to cover more 'modern' statistical theory like multiple testing/knock-offs, RKHS and nonparametric regression, convex/nonconvex optimization for high-dimensional regression, graphical models, etc.

Edited October 8, 2020 by Stat Assistant Professor

StatsG0d · October 8, 2020

I agree with @Stat Assistant Professor. Students have been pushing to modernize curricula, but it's difficult because professors are always concerned about prestige or rigor. Some topics I think should always be covered are:

Bayesian statistics (becoming more and more used in practice, even being picked up by CS people)
Computation / simulation (preferably in C++ / Python and on Unix servers)
Machine learning / nonparametric statistics (may be a buzz word, but it gets you jobs)
Missing data (very common in practice)

Some topics I think can be tossed out, that are typically required:

Measure theory (useful for many people, but not for all)
Decision theory (hardly ever used in practice)
Anything concerned with unbiased estimation (UMVUE, etc.--most practical estimators are biased so who cares)

I do think UMP and UMPU tests are important, albeit boring, at least for biostatistics. Drug approval ultimately depends on having a significant p-value, so you def. want to have power.

Stat Assistant Professor · October 8, 2020

5 hours ago, StatsG0d said:

I agree with @Stat Assistant Professor. Students have been pushing to modernize curricula, but it's difficult because professors are always concerned about prestige or rigor. Some topics I think should always be covered are:

Bayesian statistics (becoming more and more used in practice, even being picked up by CS people)

Computation / simulation (preferably in C++ / Python and on Unix servers)

Machine learning / nonparametric statistics (may be a buzz word, but it gets you jobs)

Missing data (very common in practice)

Some topics I think can be tossed out, that are typically required:

Measure theory (useful for many people, but not for all)

Decision theory (hardly ever used in practice)

Anything concerned with unbiased estimation (UMVUE, etc.--most practical estimators are biased so who cares)

I do think UMP and UMPU tests are important, albeit boring, at least for biostatistics. Drug approval ultimately depends on having a significant p-value, so you def. want to have power.

A lot of departments are in the process of revising their PhD curricula, or at least discussing changes to it. I think most programs will continue to require at least one semester of measure theoretic probability -- at some schools, department Chairs/graduate coordinators are also adamant about keeping the two semester requirement of measure theoretic probability. And linear models will probably stay the same.

But I think the other advanced statistical inference classes (post-Casella & Berger math stat) will eventually be updated to de-emphasize extremely detailed study of "classical" topics. The issue seems to be that for a lot of the advanced classes, the same faculty have been teaching the same class for many years. It takes a LOT of time to design a new course, and in some cases, requires learning new subjects entirely (if you're accustomed to just teaching the "traditional" topics). But once the new class is designed, I think it shouldn't be that difficult to keep teaching it or making minor tweaks to it. Getting to that point takes time though.

Geococcyx · October 8, 2020

I have a sneaking suspicion someone (likely Stat Asst. Prof) already answered this, but just in case I've misremembered: is there any school/course that comes to mind as what you want to see in an updated statistical inference course? The closest thing I've seen is Stanford's 300c (here: https://statweb.stanford.edu/~candes/teaching/stats300c/index.html).

(I'm not really experienced enough to have overarching opinions on Stat PhD curricula, but I'll second all the suggestions for more computation, and clarify that in my experience, some emphasis/additional emphasis on algorithm design, numerical linear algebra, matrix decompositions, and maybe factor analysis/matrix-based models would be nice as part of the core curriculum.)

Stat Assistant Professor · October 8, 2020

2 hours ago, Geococcyx said:

I have a sneaking suspicion someone (likely Stat Asst. Prof) already answered this, but just in case I've misremembered: is there any school/course that comes to mind as what you want to see in an updated statistical inference course? The closest thing I've seen is Stanford's 300c (here: https://statweb.stanford.edu/~candes/teaching/stats300c/index.html).

(I'm not really experienced enough to have overarching opinions on Stat PhD curricula, but I'll second all the suggestions for more computation, and clarify that in my experience, some emphasis/additional emphasis on algorithm design, numerical linear algebra, matrix decompositions, and maybe factor analysis/matrix-based models would be nice as part of the core curriculum.)

Yes, that Stats 300C class at Stanford is one possibility. I would say that a PhD-level advanced inference class should focus less on topics like UMVUE, Neyman-Pearson Lemma, admissibility, etc., but more on stuff like theory for shrinkage methods, convex/nonconvex optimization, reproducing kernel Hilbert spaces, resampling methods, etc. That's because the latter topics are more of current interest and are active areas of research.

DanielWarlock · October 9, 2020

I think the best first year plan is probably Columbia. They have 4 different tracks: probability, theoretical statistics, applied statistics and data science (joint with CS and managed by Blei himself). Students take different classes (with some overlap) and take different qual exam. This way, no one will waste time.

Coursework looks to me very rigorous and in-depth within each track. For instance, if you specialize in probability, the probability sequence is 3 semester instead of 2.

Other than this, it is also good to have a more hands-off approach such as Harvard where courses do not take much time and students can just arrange their own studies (perhaps in consultation to their supervisor).

Sign In

Statistics Ph.D Necessary Coursework

Recommended Posts

discreature

Stat Assistant Professor

StatsG0d

Stat Assistant Professor

Geococcyx

Stat Assistant Professor

DanielWarlock

Create an account or sign in to comment

Create an account

Sign in

Browse

Activity

Results

Important Information