Jump to content

Statmaniac

Members
  • Posts

    14
  • Joined

  • Last visited

Everything posted by Statmaniac

  1. Hi all, I want to ask for advice from more experienced researchers in this forum on some of the ongoing thoughts I have towards ML/stat/algorithmic/computational research. As a PhD student researching machine learning/statistics, I have been overwhelmed by a chain of negative aspects I saw in the actual research phase. Below are some features I find very uncomfortable staying in these fields. 1. A significant portion of research papers are not reproducible. I understand writing a paper that perfectly explains all the tiny-gritty details of the methodology is very difficult, but at the same time, I realized that there are just way too many papers not explaining details enough for readers actually to use methods they introduced in the papers. What is worse is that a significant amount of papers are error-prone, and I have seen reviewers simply disregarding such errors treating them "trivial". An upper-year student who graduated last year had found at least three major errors in his advisor's previous paper, but his advisor ignored them. He fixed all the theories by himself, submitted the correct version of the proof, and got published with his advisor's name on it. Moreover, this was published in one of the top 4 venues, which statistics people would highly consider. His advice to me was simply to accept the reality and to graduate with PhD without making a fuss. It seems that the more experienced people take a somewhat flexible attitude that "learn what you need to and ignore something doesn't feel right/make sense," a.k.a. look at the bigger picture, not the details or "You do not need to do the correct research, as research, by nature, is prone to error". I agree that understanding the bigger picture is important, but it seems the way research is done dismisses details often too much. I try to follow/learn this attitude, but it just seems very hard and somewhat arbitrary. 2. Due to publication pressure, I feel that there are so many meaningless papers. In fact, I also submitted two papers this year, and I am not proud of any of my work. I hardly find these can be used by practitioners. Methodologies these days are much more complicated than in the past, but I felt that it became too complicated to be actually useful in practice. My advisor seems to be satisfied with my work, but it seems that he can't understand why I am not happy. I guess I am also another one who is just merely trying to survive in this crazy "competition" instead of doing "real" and "meaningful" research. These thoughts have negatively influenced my research work to the degree that I started to question whether this is indeed the way I want to spend the rest of my career. I was fascinated and excited by creative/beautiful ideas of bridging theories with actual data analysis or solving some real-world problem using my quantitative skills, but in reality, it seems that, by the nature of the discipline, there's a lot of darkrooms which I didn't see before. I am sure some other people in the forum have once in their life had similar feelings. I wonder how they dealt with such feelings and moved on.
  2. Sorry OP. No more posts from now on. Honestly, I am more surprised by you saying courses like measure theory, functional analysis, numerical analysis has little to do with students' research. How can one learn probability theory, stochastic processes deep enough to do research without having these mathematical foundations? To understand theories of modern MCMC algorithm, functional analysis is indispensable. I am not really sure how far one can go with just probability/stochastic processes without these foundations. In order to publish in top statistics journals like Annals of Statistics/JRSS-B, these courses are like the essential tools one should have. I will let other people judge whether these foundational courses are not worth studying compared to standard classical statistics courses like GLM/Experimental Design/Survey Sampling and etc. If I were a student, I would rather have more guidance and experience on foundations than the latter statistics courses that I can easily pick up as I go along. Isn't that the reason why many statistics PhD program look for those with strong mathematical foundations? And if you check the statistical learning course offered by CMU statistics department, the first thing you see is concepts like Hilbert Space, Sobolev Space and Besov spaces. PDE is indeed somewhat irrelevant to most of the statistics research, but one of the most important tools in methodological work is the optimal transport theory, which is closely related to PDE. Leading statisticians in this field have been writing lecture notes/books on PDE for the application in statistics. It is needless to say how important the optimization theory and numerical linear algebra are to understand statistics and practice them through computers. How can you even fit GLM without knowing numerical analysis? Plus, I am not asking the program to teach everything I need for research. Indeed this is impossible. But why does it need to waste students' time and effort if there is something more important and fundamental to learn? For example, I do think it is really unfortunate that most classical stat/biostat PhD don't even get to know deep foundations in information theory, which is closely related to the core idea of statistics. Isn't the whole purpose of coursework to let students reach the research level as quickly as possible? Sadly enough, what I observed in the past is that, just because it is easier for professors to teach, programs still require students to take certain classes. Just because they are trained in such a way, they would like to remain in that way. One guy even told me his research field is dead-end, but he needs to teach, and he doesn't know anything else. I really hope this post instills you with further need of change in the current biostatistics/classical statistics curriculum. I am also differentiating top statistics programs like Stanford, Berkeley, CMU from other biostatistics programs and classical statistics programs. Their directions have been quite distinct in the past two years. The former programs have been more mathematical, algorithmic and methodological focused while the latter have been more centered on application. And there have been many new programs or redesigned quantitative programs which resemble the former type of statistics program, which I believe any serious statistics phd applicant may consider. I hope this discussion gave somewhat informative and constructive to anyone else who is seeing this.
  3. I am sorry @kimmy if these posts were distracting to you. I just wanted to widen your perspective when applying for a graduate program. As a person who believes this forum reflects more of opinions from classical statistics/biostatistics programs, I just wanted to tell you how one PhD student feels about these programs. Of course, you need to search more in detail of these programs and carefully assess which program best fits you. My last piece of advice is that there is actually a substantial amount of people who work on something very similar or exactly the same as what classical statisticians claim to do. For example, many econometricians/professors outside the statistics department do causal inference or high-dimensional inference(like Chernozhukov in MIT). Uncertainty quantifications in weather forecasting/design of experiments have been substantially done by applied mathematicians/IE. Bayesian statistics and MCMC were heavily influenced/substantiated by people from physics/CS background(like Ryan Adams in Princeton). There have been establishment of data institute or data initiative to bring all these people together and many quantitative programs allow students to freely choose advisors no matter what department their advisor belongs to. It is unfortunate or somewhat paradoxical if they are not considered as or included into the group of statisticians given the fact that these people consistently publish papers in top statistical journals. Perhaps some statisticians/biostatisticians are not ready to embrace them, which I think is highly detrimental to the field itself. I have seen people with similar profiles as you get into applied math/EE/IE programs. I agree with the above post that EECS particularly CS programs are even more difficult in terms of the admission competition. In fact, I was like you, and I got accepted to several different quantitative PhD programs, including Biostat/IE/stat/applied math/Data Science with a research statement indicating I want to do modern statistical learning. You may want to check my past posts to verify this. I just want to tell you to carefully select the program not only by name but also on their actual curriculum and research. For the curriculum, pick the one that is the most flexible or up-to-date, which could help you read recent papers. The main issue with classical statistics/biostatistics programs is that the gap between coursework and the research is unnecessarily huge. I even wished I had spent the whole year self-studying without taking courses. You wouldn't be able to follow most of high-dimensional statistics/inference papers with zero background in optimization. Causal inference with no graphical model is also very hard these days. In my perspective, spending 6-8 months for taking classical statistics courses and preparing for the qualifying exams is a considerable time loss, given the fact that you need to embark on research as soon as possible to determine your advisor. There are many quantitative programs that are flexible enough in terms of the choice of advisors, not restricting to its own department or program. Besides, as I have said before, EE/Applied Math/IE are huge fields, and many programs require you to contact your potential supervisor first, so perhaps people in the Biostatistics programs are not familiar with this. Otherwise, faculties doing statistics research outside the statistics department would have no chance to recruit students. Actually, several biostatistics programs I was admitted did not allow advisors outside the department, so you may want to double-check. I hope this gives some of the things which were not seen in the above.
  4. I think @DanielWarlock has a point. MIT has a great list of faculties; one could research in statistics. Let me share my perspective here. Many statistics programs are getting a lot of attention because of the big data, machine learning, etc. However, one should note that there are so many programs that offer outdated curriculums. Honestly, who uses UMVUE, complete statistics? I haven't seen any of these in any papers I have read in top statistics journals published within 20 years. What's worse is that these programs still teach courses like survey sampling, generalized linear models(GLM), which had little impact on the data science's current emerge. I am not looking down on these two subjects, but one should note that these courses have almost nothing to do with the current data boom. In machine learning, you spend at most one lecture on GLM, but these outdated curricula still insist students take a full semester-length of GLM/survey sampling and other outdated topics. Now that I am working on so-called hot or emerging statistics fields, I feel my past education from statistics program was completely useless. Courses like Information Theory, Optimization, Graphical models that were not the core curriculum in the statistics program have become essential in modern statistics research. These are somehow more often taught in EECS/CS/Math departments. Aligned with what I said, I think if one wants to have a better edge in applications in the IT industry or new methodological works in statistics journals, it would be better to choose EECS/applied math/ORFE programs like MIT or Princeton. Please take a look at the new Stanford/Berkeley faculties profiles, many of them were not trained in the Statistics PhD program. I think those on the level to get admitted to Stanford/Berkeley stats are on the level to gain admittance on MIT EECS/Princeton applied math. If not, programs like Georgia Tech IE/Upenn Applied Math have successfully yielded top students who acquired tenure track positions in top statistics programs. As far as I know, oftentimes, these programs require applicants to contact potential supervisor first, so with your background, I think it is worth considering. That being said, compared to the IT industry, biomedical applications are somewhat slowly accepting these new machine learning methods. I think this is why top biostatistics departments are still teaching outdated methodologies. In terms of the recent statistical methodological work, EECS departments like MIT have far more contributed than many other statistics programs, which cannot get out of their old fame. Also even at MIT, there are a lot of people working on computational biology. Therefore, as @cyberwulf said, you would have to decide between traditional stat programs(many biostat programs and some stats) vs. data-sciency programs(stat programs like Stanford, Berkeley, CMU, Yale, Columbia, and CS/OR/applied math programs). Fields like genetics are highly computational, so even if you go to the latter program, the chance to work in biomedical fields is quite high. However, given the current training offered by biostat or traditional stat programs, I think the other way would be quite challenging. One way to distinguish these two types of programs would be to ask if the collaborations between departments(CS/applied math/OR) are frequent or have a lot of faculties with joint appointments. Having a separate Data Science institute or Initiative is also a sign of more data-sciency program. Lastly look into the curriculum they offer.
  5. All of these are on the premise that the current popularity of statistics PhD programs is due to ML, AI, Statistical Learning and etc. So I have to disagree with you in that it is relatively easier to find someone of similar interest if one is in the statistics department. First of all, the research focuses on each statistics departments are just vast. Schools like Columbia, still has a heavy focus on financial math. I doubt someone in financial math shares more common interest than a person working on ML in CS department or someone with an information theory background in the EE department if one wants to work on modern statistical methodologies. Secondly, the field of statistics has changed dramatically since the emergence of big data, machine learning. Many statistics departments are trying to get affiliation with faculties outside the department. Look at Yale. They recently changed their name of the department as the Statistics and Data Science and have hired several applied mathematicians/computer scientists working on the theoretical side of statistical learning, information theory and graph theory. Look at Chicago. Almost half of faculties in the statistics department are doing things not considered as traditional statistics. I haven't seen any department without recent hiring coming from a non-statistics domain. I don't necessarily think being a statistics/biostatistics PhD students have a better chance to find someone whom you want to work with if one is interested in modern statistical methodologies. In particular, a lot of my cohorts are interested in machine learning/statistical learning, not genetics nor epidemiology. More than half of the students are taking courses in CS/EE and interact with them to learn more from them. And I must say your percentage looks a bit of exaggeration. Even if it is true, the incoming class PhD size of EE is like 40-50 students. Ten percent out of it is still comparable to the number of incoming Wharton or Cornell statistics PhD students. And more importantly, once you finish the coursework, students within the same research group or advisor communicate and interact more often. Look at the Jordan or Wainwright's group. A significant portion of people in their labs are from non-statistics group. And there is a relatively large group of statistics people in IE and OR in general if you look at profiles of Jianqing Fan or Jeff Wu.
  6. My impression is that the field statistics nowadays is very loosely-defined these days. People working on reinforcement learning are mainly from CS department and papers in NIPS, AISTATS and etc. There are tons of people working on monte carlo methods, and bayesian statistics in geoscience and applied math. They still publish papers in statistics. In fact, many professors from the top-notch statistics department had different training in their PhD. For example, look at Anrea Montanari from stanford and Michael Jordan, Martin Wainwright from berkeley and etc. They are one of the pioneers of the field of statistics. Schools like Georgia Tech, and Princeton have all of the statistics people, including Jeff Wu and Jianqing Fan, respectively, in the IE/OR department. At the Stanford, there is a bulk of students from EE and ICME working with statistics professors. So my point is that, as there are many applicants who had sufficient background in statistics education, I think it is better to expand their horizons through taking courses in other closely related domains that would be useful for statistics research. Indeed for programs like OR, IE and even EE, students can choose statistics/machine learning track these days and take almost the same courses as in statistics phd students. Actually, I think their coursework is much more beneficial to those who want to do methodological work compared to the biostatistics program or some traditional statistics program that emphasizes sampling theory or design of experiments. I just have seen so many applicants who finished graduate-level statistics courses who could not get into statistics PhD programs. I am just telling applicants not just to restrict themselves to the statistics program.
  7. I personally believe competition for statistics is worse than pure mathematics. This is because 1) there are many well-established mathematics departments and their cohort size is usually larger than statistics. For example, Upenn Wharton and Cornell only admit 4-5 Ph.D. every year. Several top 20 statistics departments have a very small size of incoming PhD student. 2) Compared to the past, the current trend among international students is that many talented students from pure math change their careers as statisticians or machine learners. There is no huge difference between statistics and operations research, but my impression is that the latter tends to focus more on optimization methods. As optimization theory plays a central role in modern statistics, statistics PhD students would have to learn anyway. Oftentimes they are housed in the same department like UNC and Georgia Tech. Both schools have prominent statisticians and probabilists, but somehow I have the impression that statistics PhD applicants don't really consider Georgia Tech. As there is no huge difference, this is why I am telling applicants to consider a wide range of closely related programs. Besides the coursework, there is no huge difference among statistics and IE/EE, if you end up doing research in statistics or ML. Departments like EE or IE have a larger number of incoming PhD students. Some of my friends got flat out rejections from top 30 statistics PhD programs, while they have gotten into several top 10 IE/EE PhD programs easily. And I believe, there are many mathematics/applied mathematics/computational mathematics programs which have larger size than statistics programs.
  8. As a person who experienced the whole application process last year and currently attending one of the top biostat Ph.D. programs, I want to drop a piece of useful advice for anyone thinking of applying for Ph.D. programs in stat/cs/biostat/data science, etc. Please don't restrict yourself to stat Ph.D. programs. Consider closely related programs like Electrical Engineering, Operations Research, Industrial Engineering, Applied math, Computational Math, and contact the stat/ML professors of interest if they are willing/able to advise students outside their department. Because of the recent booms in the big data/machine learning, the competition for Ph.D. programs in Statistics has been fiercer than any other period. Even with stellar records of mathematics courses and research experience, it is tough to crack in the top 30 Ph.D. programs. Therefore, if you want to research statistics/machine learning, don't just restrict yourself in statistics or biostatistics program. In my current program, many EE, Applied Math, Mechanical engineering Ph.D. students are working with statistics/biostatistics professors. As long as you have sufficient background in mathematics and coding, you can work on the field of statistics and machine learning. In fact, my honest advice would be to pick a Ph.D. program in which you could benefit more. For example, if you have already taken all Ph.D. level statistics courses, then having more exposures on CS courses or numerical method/optimization/control theory courses would be better. I think Applied math/Computational Math/Electrical Engineering/ Industrial Engineering programs would expand one's horizons, and still, students would be able to conduct researches in theoretical statistics and machine learning. And for those who insist applying on stat/biostat Ph.D. programs, just because you think this would be more beneficial to become a professor in stat/biostat, I want to say this is simply false. Many stat/biostat departments have hired people from Electrical Engineering/Industrial Engineering and Applied math, recently. It is all about the research you are doing, not the department you are affiliated with. Having experienced the first semester in my program, I personally believe these programs actually equip students with more tools to develop methodologies compared to traditional stat or biostat programs.
  9. May I ask your opinion on Computational Mathematics PhD Program? Do you also agree with bayessays? Or if I work closely with statistics professor and publish most of journals in statistics, would the field where I get PhD degree no longer matter?
  10. Does the coursework also matter in faculty hiring? Or are you just saying since both UW and JHU require measure theoretic probabilities, compared to other low ranked programs, they tend to produce students who often do theoretical works?
  11. To be a faculty position in Statistics or Biostatistics, is it generally better to have a PhD degree in Statistics? Or if one has a biostatistics PhD degree, compared to those with who have statistics PhD degree, is it easier to obtain faculty position or post-doc position in the biostatistics department? Except for the top 5 departments in biostatistics(Harvard, UW, JHU, Michigan, UNC), I feel that it is very difficult to become a faculty in statistics. So my question would be if one chooses the biostatistics program which is not the one listed above, will I limit my career mainly to non-top tier biostatistics faculty position? I just have an impression that UPenn, Yale, and Columbia biostatistics departments are also very good and they also have many renowned faculties, who are cross-listed as stat/biostat faculty. If one gets advised by the renowned faculty, would that be better if one's ultimate goal is to stay in academia, hopefully at the top tier research institute? I am personally attracted towards many renowned faculties at the Ivy League and the general school reputation over US News ranking. For example, programs like NCSU, ISU, I know they are good in statistics, but I don't think they are by no means way better in terms of the research than lower ranked Ivy league biostatistics programs. Do NCSU or ISU actually have better perception when it comes to faculty position hiring? In addition, how is the perception upon newly designed computational math PhDs? For example, you can still do statistics research at UPenn or UChicago, Notre Dame in applied or computational math PhD programs. Are they also well regarded when it comes to faculty position hiring in stat or biostatistics?
  12. I just want to ask, to those who got waitlist e-mail from NCSU, did the e-mail explicitly say that you guys are on the waitlist? I received an e-mail today telling my application is still on the review process. It further asked me if I would be interested in the MS degree option for PhD degree. I am not sure if this track guarantees the admission for the Ph.D. program. Is this something different from what other applicants received?
  13. Nowadays when I see job talks at my statistics department(top 30), I have an impression that Post-Doc is kind of the norm for one to stay in academia. Since when did this become the norm? Plus, I just can't help being shocked by the number of top tier journals(Annals, JRSS-B, JASA, Biometrika and etc) that each candidate has published. How can a person who just finished Ph.D. and spent 1-2 years of Postdoc could produce 4-5 top-tier papers at the minimum? (in fact, this year, I see the minimum is seven, at least in my program.) I am not denigrating anyone, but several professors I am aware of did not even have a paper published when he was hired as a tenure-track assistant professor. Isn't this situation a bit insane? I think this does not stop at the faculty position level, but even for doctoral applications. While I was talking with the person who graduated in 2013, I realized that this student was able to crack into many top statistics Ph.D. programs with a weaker profile, while those who recently graduated could not even get any admission with a stronger profile. I just feel that there is a huge surplus in talented people who could do well in academia being wasted because of the current situation. I do realize that many programs are trying to expand the size of faculties, but the current situation is a bit overwhelming to me. Shouldn't there be a university level of expansion of statistics programs instead of designing so-called cash-cow Data Science programs? Are all these due to limited funding issues?
  14. I think they only interview domestics. Is it just me that this year's international students competition is crazy, as domestic students results are already out from multiple programs?
×
×
  • Create New...

Important Information

This website uses cookies to ensure you get the best experience on our website. See our Privacy Policy and Terms of Use