Jump to content

Recommended Posts

2 hours ago, Stat Assistant Professor said:

There are *many* faculty in Statistics/Biostatistics departments conducting research and publishing in the top journals and the top ML conferences in the areas you mentioned (high-dimensional statistics, causal inference, etc.). Most students are capable of self-teaching themselves these topics (or taking electives to gain some exposure to them) after they begin their research.  I think it is somewhat unreasonable to expect programs to teach students all they need to know for their research through courses (when a PhD is largely about teaching yourself and contributing new research that isn't already covered in classes) or to tailor coursework around what's "trendy" at the moment. For one, not all students are interested in the same things. Classes on optimization likely have little relevance to students who are interested in applied probability/stochastic processes, for example. Nevertheless, as I also mentioned above, most Stat and Biostat programs are taking it upon themselves to 'update' the curriculum to also include the more current topics. 

Secondly, the other fields you mentioned might have a lot of coursework too that isn't directly applicable to students' research. For example, in an Applied Math PhD, students might need to take two semesters of graduate-level Analysis with measure theory, Hilbert spaces, functional analysis, etc., as well as a lot of classes like numerical analysis, partial differential equations, etc. These students typically also need to pass several written qualifying exams. An EECS student might need to take classes on computer architecture, theoretical analysis of algorithms, etc. But if these Applied Math or EECS students then go on to conduct research in machine learning or global optimization, then it's not like all of their classes are immediately relevant to their research.

Now, some of the top programs in these other fields (like Stanford CS, Princeton Applied Math, Berkeley EECS) likely do keep the coursework requirements to a minimum (so most students are largely done with classes by the end of their first year, and students also have greater flexibility in what classes to choose -- so they probably do only take a few classes that are immediately relevant to their research). But that's mainly because the types of students that are admitted to these kinds of programs have already completed extensive graduate-level coursework as an undergrad and have already done research as an undergrad that got published in major journals or conferences. But these are exceptions rather than a general rule. Most PhD programs in Applied Math, EECS, and IE have at least two years of coursework, and certainly, not all of it is relevant to every student's research.  

Sorry OP. No more posts from now on.

Honestly, I am more surprised by you saying courses like measure theory, functional analysis, numerical analysis has little to do with students' research. How can one learn probability theory, stochastic processes deep enough to do research without having these mathematical foundations? To understand theories of modern MCMC algorithm, functional analysis is indispensable. I am not really sure how far one can go with just probability/stochastic processes without these foundations. In order to publish in top statistics journals like Annals of Statistics/JRSS-B, these courses are like the essential tools one should have. I will let other people judge whether these foundational courses are not worth studying compared to standard classical statistics courses like GLM/Experimental Design/Survey Sampling and etc. If I were a student, I would rather have more guidance and experience on foundations than the latter statistics courses that I can easily pick up as I go along. Isn't that the reason why many statistics PhD program look for those with strong mathematical foundations? And if you check the statistical learning course offered by CMU statistics department, the first thing you see is concepts like Hilbert Space, Sobolev Space and Besov spaces. PDE is indeed somewhat irrelevant to most of the statistics research, but one of the most important tools in methodological work is the optimal transport theory, which is closely related to PDE. Leading statisticians in this field have been writing lecture notes/books on PDE for the application in statistics. It is needless to say how important the optimization theory and numerical linear algebra are to understand statistics and practice them through computers. How can you even fit GLM without knowing numerical analysis?

Plus, I am not asking the program to teach everything I need for research. Indeed this is impossible. But why does it need to waste students' time and effort if there is something more important and fundamental to learn? For example, I do think it is really unfortunate that most classical stat/biostat PhD don't even get to know deep foundations in information theory, which is closely related to the core idea of statistics. Isn't the whole purpose of coursework to let students reach the research level as quickly as possible? Sadly enough, what I observed in the past is that, just because it is easier for professors to teach, programs still require students to take certain classes. Just because they are trained in such a way, they would like to remain in that way. One guy even told me his research field is dead-end, but he needs to teach, and he doesn't know anything else. I really hope this post instills you with further need of change in the current biostatistics/classical statistics curriculum.

I am also differentiating top statistics programs like Stanford, Berkeley, CMU from other biostatistics programs and classical statistics programs. Their directions have been quite distinct in the past two years. The former programs have been more mathematical, algorithmic and methodological focused while the latter have been more centered on application. And there have been many new programs or redesigned quantitative programs which resemble the former type of statistics program, which I believe any serious statistics phd applicant may consider. I hope this discussion gave somewhat informative and constructive to anyone else who is seeing this. 

Edited by Statmaniac
Link to post
Share on other sites
19 minutes ago, Statmaniac said:

Sorry OP. No more posts from now on.

Honestly, I am more surprised by you saying courses like measure theory, functional analysis, numerical analysis has little to do with students' research. How can one learn probability theory, stochastic processes deep enough to do research without having these mathematical foundations? To understand theories of modern MCMC algorithm, functional analysis is indispensable. I am not really sure how far one can go with just probability/stochastic processes without these foundations. In order to publish in top statistics journals like Annals of Statistics/JRSS-B, these courses are like the essential tools one should have. I will let other people judge whether these foundational courses are not worth studying compared to standard classical statistics courses like GLM/Experimental Design/Survey Sampling and etc. If I were a student, I would rather have more guidance and experience on foundations than the latter statistics courses that I can easily pick up as I go along. Isn't that the reason why many statistics PhD program look for those with strong mathematical foundations? And if you check the statistical learning course offered by CMU statistics department, the first thing you see is concepts like Hilbert Space, Sobolev Space and Besov spaces. PDE is indeed somewhat irrelevant to most of the statistics research, but one of the most important tools in methodological work is the optimal transport theory, which is closely related to PDE. Leading statisticians in this field have been writing lecture notes/books on PDE for the application in statistics. It is needless to say how important the optimization theory and numerical linear algebra are to understand statistics and practice them through computers. How can you even fit GLM without knowing numerical analysis?

Plus, I am not asking the program to teach everything I need for research. Indeed this is impossible. But why does it need to waste students' time and effort if there is something more important and fundamental to learn? For example, I do think it is really unfortunate that most classical stat/biostat PhD don't even get to know deep foundations in information theory, which is closely related to the core idea of statistics. Isn't the whole purpose of coursework to let students reach the research level as quickly as possible? Sadly enough, what I observed in the past is that, just because it is easier for professors to teach, programs still require students to take certain classes. Just because they are trained in such a way, they would like to remain in that way. One guy even told me his research field is dead-end, but he needs to teach, and he doesn't know anything else. I really hope this post instills you with further need of change in the current biostatistics/classical statistics curriculum.

I am also differentiating top statistics programs like Stanford, Berkeley, CMU from other biostatistics programs and classical statistics programs. Their directions have been quite distinct in the past two years. The former programs have been more mathematical, algorithmic and methodological focused while the latter have been more centered on application. And there have been many new programs or redesigned quantitative programs which resemble the former type of statistics program, which I believe any serious statistics phd applicant may consider. I hope this discussion gave somewhat informative and constructive to anyone else who is seeing this. 

Regarding the utility of measure theory, etc.: the relevance of it to a PhD student's research depends on what the research is. If the Applied Math or IE PhD student is doing their dissertation on mathematical/computational biology and neuroscience or on queuing theory/mixed-integer programming or something like that, then a lot of measure theory and functional analysis is not going to be directly useful to their research. Yet most programs in Applied Math require two semesters of graduate-level analysis. The ones that do not are the elite programs where the students entering have *already* taken graduate-level courses like measure theory, functional analysis, etc. Most PhD students (at least domestic ones) have not taken these classes as undergrads, hence why they are being taught it in their PhD program.

In general, I support PhD students needing to take two semesters of measure-theoretic probability, although a deep knowledge of it is certainly not needed for all statistics research (especially not if the research is more on the applied side of stats). I myself have published in top stat journals like the ones you mentioned, and I don't use that much measure theory (and I work on statistical methodology *and* theory). Other researchers might need to use more of it -- it really depends on the topic. In spite of this, I have no complaints about the coursework I had to take as a PhD student (which included two semesters of measure-theoretic probability), as I mostly taught myself the stuff I needed to know for my dissertation research. For my postdoc and now being faculty, I also have to constantly teach myself new things. 

Link to post
Share on other sites

To OP: I wouldn't worry too much about the coursework. Once you pick a PhD advisor, they will advise you on additional courses to take for your research (if any). For example, the department I got my PhD from was well-known for its work on theory for MCMC. If you wanted to work with one of the MCMC professors on this, they would ask you to take a functional analysis class in the math department. So you can always take courses that are immediately relevant to your research, especially if your PhD advisor encourages you to do so. 

I think Statmaniac makes a few good points, but I disagree with some of what they are saying and think some of the things are a stretch (for example, their comment, "But why does it need to waste students' time and effort if there is something more important and fundamental to learn?" seems to be a little bit too subjective). Also, most Stat/Biostat departments are incorporating more "modern" topics into their curriculum.

Link to post
Share on other sites
46 minutes ago, trynagetby said:

Just wanted to say that this back and forth thread was extremely informative and helpful for someone applying this cycle.

Agreed, people keep apologizing to OP for getting sidetracked but these discussions end up being some of the most informative on the forum.

Link to post
Share on other sites

I don't have much to add on curriculum as it's so personal, but I do think people should at least consider the required coursework and how it will help them meet their goals.  Some programs (Ohio State sticks out in my mind) have 3 full years of required courses and my current program has 3 required classes total. Huge difference, and what is good for one person may not be good for another. 

Link to post
Share on other sites
2 hours ago, Statmaniac said:

I am also differentiating top statistics programs like Stanford, Berkeley, CMU from other biostatistics programs and classical statistics programs. Their directions have been quite distinct in the past two years. The former programs have been more mathematical, algorithmic and methodological focused while the latter have been more centered on application. And there have been many new programs or redesigned quantitative programs which resemble the former type of statistics program, which I believe any serious statistics phd applicant may consider. I hope this discussion gave somewhat informative and constructive to anyone else who is seeing this. 

Calling Berkeley/CMU particularly mathematical is a bit of a reach. While they can both be mathematical if you want, you can also graduate from Berkeley without taking a single probability/measure theory class. CMU also requires all students to do a faculty-supervised data analysis project over two semesters... not many departments have that. Fully agreed on Stanford though - to them, if there isn't a mathematical proof, it isn't statistics

Link to post
Share on other sites
4 hours ago, insert_name_here said:

Calling Berkeley/CMU particularly mathematical is a bit of a reach. While they can both be mathematical if you want, you can also graduate from Berkeley without taking a single probability/measure theory class. CMU also requires all students to do a faculty-supervised data analysis project over two semesters... not many departments have that. Fully agreed on Stanford though - to them, if there isn't a mathematical proof, it isn't statistics

While Statmaniac has made some valid points, methinks that they have extrapolated a bit too much based on their personal research. For example, they dismiss detailed study of GLM's, but argue that information theory and functional analysis are things that are "more important and fundamental to learn." Many stat students can get by and publish in top journals/conferences without having taken an entire course on information theory or functional analysis -- they can pick up on the things from these areas that they need for their research *if* they ever need it (like the various entropies and divergences, for example). And students who are doing more applied statistics have little use for those subjects. Anyway, it is a matter of opinion what is "most important and fundamental."  

Edited by Stat Assistant Professor
Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use and Privacy Policy.