Jump to content

Is Biostatistics becoming outdated in the industry, outside regulatory writing?


Recommended Posts

Are there Biostat titled jobs which actually use real and cool statistics and programming? I am an MS who graduated last year and I just cannot find them.

On the other hand, it looks like a good amount of DS positions do mention these techniques/tools: causal inference, time series, multivariate analysis, predictive modeling, Bayesian modeling, ML/DL. R/Python/Julia, PyTorch/TF, observational and unstructured data etc. 

Then on the other side, in Biostat, you see very boring things like SAPs, SAS, FDA/ICH guidelines, QC, experience in regulatory environments, more documentation, validation, trials.  A good 80-90% of this is non-technical. 
 

Having been in a Biostat job for a year I hate it and want to get out. Its a heavy amount of formal boring writing. And often times even slightly more involved analyses are rejected in favor of a dumb t-test or ANOVA. I see this Biostat field is dying and becoming increasingly a “regulatory monkey” role where the real work and advancement in the field is not ones statistical or programming ability but the ability to communicate with regulatory bodies. I kid you not, I saw a Principal Biostat profile which was like “verifying 100 SAP documents and checking consistency”. 
 

I notice even where I am, even the bioinformaticians do more of the technical statistics. Why is it like this? I know this is not just a case of being at a bad place, I notice this consistently in LinkedIn job postings listing “Biostatistician”. Increasingly, it seems like the real statistics work is going to regular statistics, DS, other computational domain specialists (comp chemists, bioinfo, etc), and of course CS/EE. The exception is some of the biostat jobs on the bioinfo side, but those are mostly out since my program didn’t cover genomics and I know very little. 

In a different thread, it was mentioned that there is a stereotype for Biostatistics in the industry to be about regulatory stuff, trials, and SAS/SAP. I’m wondering how do people get past this stereotype? How did it start in the first place?

Obvioisly I can apply for DS jobs too, but its super competitive and hard to get noticed. Which brings me to potentially going for a PhD in a computationally heavy field such as DS, ML, bioinfo. Regular stat too although I am concerned there that my math stat MS courses I got B/B+s and have not done real analysis. 

It would be ideal if I could land the DS jobs that use the actual statistical techniques heavily without a PhD, but I am having trouble being noticed. Sometimes in the postings they conveniently leave out biostats but for whatever reason (in biotech) will list regular stat, bioinfo, EE/CS. I have even applied for these anyways and then gotten a recruiter back to me who said “Oh actually you will be a better fit for this Biostat validation position”. No thanks. 
 

The industry perception of Biostats is absolutely not good, in my opinion if you want technical work. That being said, I know there are a handful of people who don’t want to be doing stats or programming all day too and like the regulatory, business side. But that is not me. 

Link to comment
Share on other sites

I don't have any first-hand experience working in big pharma, but I'm wondering if there is a difference between the jobs that are available for Biostat Masters holders vs. Biostat PhD holders? At your current workplace, do the employees with Biostatistics PhDs get to do more "stimulating," methodological/statistics work than regulatory writing? I know some Biostat PhD graduates who work as research scientists at Eli Lilly and Company. Is the type of work they do materially different from the work that gets done by Biostat MS holders?

Also, I have definitely seen Biostatistics PhDs can go into data science, some at companies like Google, Amazon, etc. So I would think that Biostatistics graduates are not constrained to only working in biostatistician fields.

Link to comment
Share on other sites

Every "biostatisticIAN" job is probably like this, yes.  That's what the job consists of when you work in pharma for a lot of people.  But you can do any of these other jobs with a biostatistics master's degree, so I don't think this line of thinking is very useful.  Teach yourself a little SQL and Python and I'm sure you can find at least a data analyst job if not a data scientist job directly. If you do get a biostat PhD, and you know sql and Python, you will definitely find a data science job even from a low ranked program.

 

 

Link to comment
Share on other sites

6 minutes ago, Stat Assistant Professor said:

I don't have any first-hand experience working in big pharma, but I'm wondering if there is a difference between the jobs that are available for Biostat Masters holders vs. Biostat PhD holders? At your current workplace, do the employees with Biostatistics PhDs get to do more "stimulating," methodological/statistics work than regulatory writing? I know some Biostat PhD graduates who work as research scientists at Eli Lilly and Company. Is the type of work they do materially different from the work that gets done by Biostat MS holders?

Also, I have definitely seen Biostatistics PhDs can go into data science, some at companies like Google, Amazon, etc. So I would think that Biostatistics graduates are not constrained to only working in biostatistician fields.


Where I am, the PhDs in biostats are doing even more of it in fact. There is a data cleaning component which they *don’t* do but even that is far more fun than regulatory writing. 
 

PhDs in other quantitative computational fields though, even stuff like physics, are doing the actual statistical/ML algorithms work that goes into the omics analysis. I don’t know why this is. 
 

In other companies in pharma/med devices I have not seen it be much different. The algorithms for drug discovery, imaging,  and mining genomic data stuff they would rather have people with substantial domain knowledge and/or pro computational skills. Biostat is mostly regulatory monkey work in general. The writing has to be *on point* and I have been criticized for not being specific enough, and told it needs to be able to be used and understandable by any kind of auditor. This to me is not statistics, its business/law work. 

FAANG does more real statistics in general than biotech. I largely do get the feeling biotech is not the place to be if you want to be doing real statistics. It seems like the value in biotech is largely getting a product past the FDA. Not the methodology. This makes sense when you think about it, if a product submission fails then the company cannot make $$$. Everything hinges mostly on that. 

Link to comment
Share on other sites

2 minutes ago, bayessays said:

Every "biostatisticIAN" job is probably like this, yes.  That's what the job consists of when you work in pharma for a lot of people.  But you can do any of these other jobs with a biostatistics master's degree, so I don't think this line of thinking is very useful.  Teach yourself a little SQL and Python and I'm sure you can find at least a data analyst job if not a data scientist job directly. If you do get a biostat PhD, and you know sql and Python, you will definitely find a data science job even from a low ranked program.

 

 

This was also what I was saying though, why is BIO”statistician” in industry have the name statistician at all as opposed to regulatory specialist or product validator or something? Its misleading. What is causing the huge disconnect between industry work in Biostatistics vs. academia work?
 

How come other fields, even within Biotech, such as bioinformatics are creeping into “our” domain and doing more advanced statistics? This is what bothers me. Its not merely explained by regulation since bioinformatics is also in biotech so its more apples to apples (vs say comparing FAANG DS to Biostat in biotech). 

I do know some Python although I am no expert. I find data wrangling in Python especially to be a pain, though the libraries like scikit learn and PyTorch are much nicer. SQL I have used dbplyr in R mostly. 

Link to comment
Share on other sites

Then apply to the bioinformatics position.  I may be understanding incorrectly, so forgive my if I'm wrong, but it seems to me you are hung up on a word game where you need your job to match your degree name.  Everyone I've ever met has told me this is what biostatisticians in industry do, so if you don't like it, you can use your degree to apply to jobs with other titles that you will enjoy more.  I know people with biostat MS degrees who are consultants, data scientists, data analysts, business analysts, bioinformaticians, software engineers, and a few are even biostatisticians.

 

 

Link to comment
Share on other sites

1 minute ago, bayessays said:

Then apply to the bioinformatics position.  I may be understanding incorrectly, so forgive my if I'm wrong, but it seems to me you are hung up on a word game where you need your job to match your degree name.  Everyone I've ever met has told me this is what biostatisticians in industry do, so if you don't like it, you can use your degree to apply to jobs with other titles that you will enjoy more.  I know people with biostat MS degrees who are consultants, data scientists, data analysts, business analysts, bioinformaticians, software engineers, and a few are even biostatisticians.

 

 

Well yea, it is a word game but that does matter in terms of getting interview #1 for the other things. Why should they take a biostat grad over somebody who specialized in that domain and has developed more computational skills?
 

That being said, I have had a couple interviews for DS. The part which I struggle in is the computer science leetcode questions. I can answer the stat ML questions fine but its the general (non ML) algorithmic thinking I never developed. I guess this can be practiced though. But its really hard to get interviews in the first place.

On the bioinformatics side, I am lacking domain knowledge. I think this is the bigger barrier there. Stuff about different sequencing technologies like RNASeq, NGS, qPCR, etc. I never learned omics. Domain knowledge is really important too, and this was neglected by my MS Biostat program. That is one reason I wonder if maybe tech could be better. Because the CSey stuff can be self learned but the deep domain knowledge is going to be harder to acquire outside a grad program. The thing is, how to make my resume appealing to tech, because its really biotech oriented (and my undergrad was in a biotech related field too, little did I know back then). 

 

Link to comment
Share on other sites

Honestly, it's a good sign you're getting interviews, and then a lot of it is luck.  I'll say that almost everyone i know, including very smart people with masters in machine learning have had hard times getting machine learning jobs because of the competition.  And you'll greatly expand your options if you really learn python and SQL well, which you don't need another degree for.  Maybe get a couple projects on a GitHub too. 

Link to comment
Share on other sites

14 minutes ago, bayessays said:

Honestly, it's a good sign you're getting interviews, and then a lot of it is luck.  I'll say that almost everyone i know, including very smart people with masters in machine learning have had hard times getting machine learning jobs because of the competition.  And you'll greatly expand your options if you really learn python and SQL well, which you don't need another degree for.  Maybe get a couple projects on a GitHub too. 

Yea that is true. Good idea to post projects on github, I actually just recently learned basic git for that. Been cleaning up some of my grad school analyses code and making it modular etc to make it postable lol. 

That is the thing, on the DS/ML end it can seem like infinite competition. On LinkedIn you can see 200+ applicants in a couple hours even sometimes. 

The differentiator nowadays seems to be heading to domain knowledge, especially in biotech. For me, I do know some stuff about medical imaging which is how I got one of my interviews (but then failed the leetcode).Way fewer jobs in med imaging than genomics though.

Edited by untzkatz
Link to comment
Share on other sites

10 hours ago, untzkatz said:

FAANG does more real statistics in general than biotech.

I'm not sure this is fully true, as tech companies are really more focused on machine learning / AI than statistics IMO. I guess it depends on what you mean by "real" statistics. To me, real statistics is about quantification of uncertainty, and that is the primary difference between (bio/)statisticians and ML folk.

10 hours ago, untzkatz said:

I largely do get the feeling biotech is not the place to be if you want to be doing real statistics. It seems like the value in biotech is largely getting a product past the FDA. Not the methodology. This makes sense when you think about it, if a product submission fails then the company cannot make $$$. Everything hinges mostly on that. 

The regulatory constraint is actually more of a methodological interest than a limitation--how can we maximize power / the likelihood of approval subject to the analysis constraints that the FDA sets. Topics such as type I error / multiplicity become increasingly important in drug approval.

Moreover, the FDA has recently begun investigating the use of real world evidence (RWE) and Bayesian methods for drug approval. I agree that the regulatory constraint does limit ones creativity compared to, say, a tech company that can do whatever it wants. However, strongly disagree that tech companies are more statistically rigorous than pharmaceutical companies--I think the contrary is true. I foresee that the FDA will allow more creativity in its analysis, particularly in observational data settings for the long-term safety / efficacy of medical products.

10 hours ago, untzkatz said:

In other companies in pharma/med devices I have not seen it be much different. The algorithms for drug discovery, imaging,  and mining genomic data stuff they would rather have people with substantial domain knowledge and/or pro computational skills. Biostat is mostly regulatory monkey work in general. The writing has to be *on point* and I have been criticized for not being specific enough, and told it needs to be able to be used and understandable by any kind of auditor. This to me is not statistics, its business/law work. 

To me, it seems that you do not have a good idea of what it means to be a statistician (forgive me, I am not trying to criticize you here). Being rigorous mathematically and statistically and actually thinking about the data / problems that could arise from it is what fundamentally sets statisticians apart from our data science counterparts. We are very concerned about assumptions in the data, what could possibly go wrong, how missing data could influence inference on the treatment effect, etc.

I agree with you that if all you'd like to do is conduct data analysis, perhaps biostatistics is not a good fit for you. Conversely, getting a PhD would likely increase the flexibility you have with your work.

Finally, while tech companies have, thus far, been unregulated, it's not clear that the future will be the same as the present. As more people are becoming concerned with data privacy and security, I think it becomes more likely that tech companies will be regulated, which would likely put them in a similar situation as drug / finance companies.

Link to comment
Share on other sites

1 hour ago, StatsG0d said:

I'm not sure this is fully true, as tech companies are really more focused on machine learning / AI than statistics IMO. I guess it depends on what you mean by "real" statistics. To me, real statistics is about quantification of uncertainty, and that is the primary difference between (bio/)statisticians and ML folk.

The regulatory constraint is actually more of a methodological interest than a limitation--how can we maximize power / the likelihood of approval subject to the analysis constraints that the FDA sets. Topics such as type I error / multiplicity become increasingly important in drug approval.

Moreover, the FDA has recently begun investigating the use of real world evidence (RWE) and Bayesian methods for drug approval. I agree that the regulatory constraint does limit ones creativity compared to, say, a tech company that can do whatever it wants. However, strongly disagree that tech companies are more statistically rigorous than pharmaceutical companies--I think the contrary is true. I foresee that the FDA will allow more creativity in its analysis, particularly in observational data settings for the long-term safety / efficacy of medical products.

To me, it seems that you do not have a good idea of what it means to be a statistician (forgive me, I am not trying to criticize you here). Being rigorous mathematically and statistically and actually thinking about the data / problems that could arise from it is what fundamentally sets statisticians apart from our data science counterparts. We are very concerned about assumptions in the data, what could possibly go wrong, how missing data could influence inference on the treatment effect, etc.

I agree with you that if all you'd like to do is conduct data analysis, perhaps biostatistics is not a good fit for you. Conversely, getting a PhD would likely increase the flexibility you have with your work.

Finally, while tech companies have, thus far, been unregulated, it's not clear that the future will be the same as the present. As more people are becoming concerned with data privacy and security, I think it becomes more likely that tech companies will be regulated, which would likely put them in a similar situation as drug / finance companies.


To me ML=statistics, and after taking 2 ML courses in the stats department during my MS I am convinced lol.
 

The rigorous stats you are referring to like missing data I think doesn’t really come up in MS level biostat. And people are conservative when it does anyways, they often just drop it and don’t do fancy imputation.  Power and sample size calcs come up quite a bit but they are very straightforward simulations. 

Also, statistics is not just hypothesis testing and uncertainty quantification to me. I think this is a misconception. Or rather, the Biostat work does not really involve advanced methods for this. Causal Inference for example is rarely by industry pharma/med device biostatisticians, but it is done by data scientists in tech. Causal inference on observational data is a good example of rigorous stats  that is missing in industry biostatistics. You mention RWE observational data but I have not seen industry Biostat positions where one can focus solely on analyzing that data. I see RWE mentioned more in DS positions again.
 

I never said anything about not checking assumptions and so on, in fact, there have been multiple occasions where an analysis I proposed was better on this basis but they still wanted the simple one because it was in the FDA guideline and they don’t like going against the grain, regardless of the mathematical or statistical justification. 

 

For example take a medical image or molecular level data. This is where the interesting work is and you can go deep into the mathematics of for example statistical signal processing (if you don’t want to do ML, and stay within classical) and extract features, but biostatisticians don’t do this either. They are much more product facing. Instead we have bioinformaticians and comp chemists for example doing that other more discovery related stuff. And there is far more actual deep statistics in that then there is with a t test or ANOVA showing that A was better than B. Sometimes I get lucky and I can bootstrap, but even that isn’t exactly exciting. Bootstrap is probably the most “complex” technique I have used. 
 

And then there is study design where majority of the work is the boring planning phase and writing even more documentation. Maybe theres some simulated sample size calculation but beyond that simulation there isn’t much. Or what about classical time series analysis on wearable device data? Nope that doesn’t seem to happen in biostat either. Even within classical stats, the advanced stats besides some designed experiment GLMM I have not seen come up. Within non-ML, things like mixture models, EM algorithm, MCMC, Fourier transforms, Gaussian processes come up more in DS than stats. I did try to use Bayesian once but they are opposed, feels it complicates the documentation. 

As for ML, there is a lot more statistics here— take a variational autoencoder for example. The theory behind this is very statistical and involves probability theory/KL divergence etc. 

You are right, I just wanna dig in and start analyzing the data and focus on the methods. But that to me is the statistics, not documentation and regulatory aspects. Lot of people don’t like data cleaning and while I am not a huge fan, I have even enjoyed the data wrangling/cleaning aspects far more than the regulatory documentation. Like at least even data cleaning has the programming aspect and can be like a puzzle to solve. 

Edited by untzkatz
Link to comment
Share on other sites

9 minutes ago, untzkatz said:

The rigorous stats you are referring to like missing data I think doesn’t really come up in MS level biostat. And people are conservative when it does anyways, they often just drop it and don’t do fancy imputation.  Power and sample size calcs come up quite a bit but they are very straightforward simulations. 

Power and sample size can become quite sophisticated (see, e.g., the literature on probability of success / assurance / Bayesian power). I'm surprised to hear about the not doing imputation though. I imagine that would raise the eyebrows of regulators if there are a lot of missing data. I don't know the size of the place you're at, but there are many people working in pharma that are extending clinical and regulatory science. Sounds like you might be at a smaller place or a place that doesn't encourage such work?

13 minutes ago, untzkatz said:

Also, statistics is not just hypothesis testing and uncertainty quantification to me. I think this is a misconception. Or rather, the Biostat work does not really involve advanced methods for this.

Forgive me if I implied this. My point was that uncertainty quantification is what sets apart statistics from ML. ML, at its heart, is mostly about optimization (I know there are many probabilistic algorithms etc., but the vast majority of algorithms are deterministic and focused on separation, etc.). If you're interested, there's a great presentation by Lisa Lavange (former head of biostats at the FDA, current chair of Biostats at UNC) talking about some initiatives being undergone at the FDA.

16 minutes ago, untzkatz said:

Causal Inference for example is rarely by industry pharma/med device biostatisticians, but it is done by data scientists in tech. Causal inference on observational data is a good example of rigorous stats  that is missing in industry biostatistics. You mention RWE observational data but I have not seen industry Biostat positions where one can focus solely on analyzing that data. I see RWE mentioned more in DS positions again.

This is verifiably false. Here are several job postings from pharma companies / CROs dealing with the analysis of RWE

Link to comment
Share on other sites

You may or may not find the position you're looking for. To be frank, a lot of problems in industry just don't need the latest cutting-edge methods or complicated simulations. There is a reason why tools like linear regression and the two sample t-test have been around forever - they are quick and easy, and they work. 

Many years ago I was talking to a PhD data scientist at a FAANG company who was doing A/B testing. I'm pretty well-versed in experimental design and assumed they would be using the latest and greatest computer-general designs. Turns out their bread-and-butter technique was the full two-level factorial design analyzed using standard ANOVA, something a competent undergraduate could probably do.  This was probably 7 years ago so things may have changed... but maybe not because they seemed really happy with their results.

Your best bet is to learn as much coding as possible (R + Python) in your free time. A PhD in Stats would be good although it's probably going to be a grind. I'm not sure how much your MS in Biostats will get you if you start fresh at a PhD Stats program. You'll also have to consider 5+ years at low pay, no benefits like 401k, missed raises/promotions you would've gotten in industry... but that's a personal decision. Financially the PhD may not be the clear winner at all in your case.

Another path you can consider is perhaps sticking it out a few years, and maybe getting an MBA later? If the management track would ever be of interest to you. 

 

 

Link to comment
Share on other sites

22 minutes ago, StatsG0d said:

Power and sample size can become quite sophisticated (see, e.g., the literature on probability of success / assurance / Bayesian power). I'm surprised to hear about the not doing imputation though. I imagine that would raise the eyebrows of regulators if there are a lot of missing data. I don't know the size of the place you're at, but there are many people working in pharma that are extending clinical and regulatory science. Sounds like you might be at a smaller place or a place that doesn't encourage such work?

Forgive me if I implied this. My point was that uncertainty quantification is what sets apart statistics from ML. ML, at its heart, is mostly about optimization (I know there are many probabilistic algorithms etc., but the vast majority of algorithms are deterministic and focused on separation, etc.). If you're interested, there's a great presentation by Lisa Lavange (former head of biostats at the FDA, current chair of Biostats at UNC) talking about some initiatives being undergone at the FDA.

This is verifiably false. Here are several job postings from pharma companies / CROs dealing with the analysis of RWE

Thanks for the links. The Cytel one looks like it uses SAS and nothing cutting edge is being done in there lol, even a log transform is insanely cumbersome vs R/Python/Julia. But these are interesting otherwise especially the Harnham one is right up my alley, though it says Senior DS and wants a PhD. Rest seems mostly director level. The PhD seems to be a big barrier and I am 26 so am getting older. I regret not doing it earlier, as it seems with an MS you mostly get all the boring work especially in biotech. Biotech seems to value the PhD status a ton. 

Maybe at the end of the day a job is a job and I will just have to do the advanced stat/ML/AI stuff as a side hobby if I never get a PhD. One of my biostat profs suggested to maybe get an MS in DS or ML from a reputable school and then see after but I suspect itll lead to the same problems, as even that field demands a PhD now as I don’t want to be doing ML Engineering either I want to be doing statistical ML. 
 

ISLR/ESLR is on ML though and these are written by statisticians. I don’t think uncertainty quantification is necessary for something to be statistics. If you have a complex observational dataset and you don’t approximate the function correctly (model misspecification) the 2nd order things like SEs/p values are not going to be accurate anyways. Predictive scores are important even for inferential purposes now according to some classical statisticians, like Max Kuhn the R tidymodels author who describes an example of inaccurate inferential results when this isn’t done: https://www.tmwr.org/performance.html. Nowadays people are even combining ML with classical statistics in the things like SuperLearner by Mark VDL and Doubly Robust methods. That is the sort of stuff I find really cool. Seems its all mostly academia though. 

But yea anyways my company doesn’t seem to encourage innovation. Its all about just moving the business forward. Its mid size trying to scale up further and its going to be even more regulatory work going forward. When I first started a year ago, I had more freedom and did more internal data analysis rather than for product submission/FDA but in the last 5 months it has changed a lot more and they even say “we are becoming more like biostatisticians in more established companies so more regulated”.  I think I wouldn’t like MBA either since its again more business oriented, and I was never interested in management. I very much am interested primarily in data analysis and the stat/ML methods. But it seems I don’t have the PhD gold star for this work. 

 

Link to comment
Share on other sites

48 minutes ago, untzkatz said:

But yea anyways my company doesn’t seem to encourage innovation. Its all about just moving the business forward

Even if you get a PhD and get a data science job at Facebook, most of your job is going to be wrangling data and seeing if app downloads went up or down. If you want to do cool statistics stuff, you will have to do it as a hobby or go back to academia.  I understand your dilemma, as this was disappointing to me too, but it's the reality of the job market.  26 isn't old.  If you would enjoy a PhD, you have plenty of time to get one.  But if you're just doing it for job reasons, you may find that the other side is not what you expect, and it would be a shame to do a PhD in that case of you're not enjoying the process.

Link to comment
Share on other sites

1 hour ago, bayessays said:

Even if you get a PhD and get a data science job at Facebook, most of your job is going to be wrangling data and seeing if app downloads went up or down. If you want to do cool statistics stuff, you will have to do it as a hobby or go back to academia.  I understand your dilemma, as this was disappointing to me too, but it's the reality of the job market.  26 isn't old.  If you would enjoy a PhD, you have plenty of time to get one.  But if you're just doing it for job reasons, you may find that the other side is not what you expect, and it would be a shame to do a PhD in that case of you're not enjoying the process.

26 itself isn’t old to be in the middle of PhD already but I see it as kind of old to start, like assuming it is 6 years (and given ill have to apply coming Fall) I would be around 33 after graduation. Lot of people are starting to settle just about now. And yea agreed it is a big consideration. But it sounds like the research scientist jobs in FAANG need one. Though I probably wouldn’t want to work for FB but that is more for my own reasons like not being into social media lol. 

Wrangling data is tedious at times but its still better than writing regulatory reports to the FDA and documentation imo. Tidyverse makes it a lot easier if its structured data.  

I think I would enjoy the PhD, provided it has a good mix of modern and computational topics. Wouldn’t want something where its like mostly dry math-stats and asymptotics. NYUs DS PhD program looks really interesting to me though. And they have cool research too, including a bunch of biomed imaging people. Its probably still really hard to get in though, but maybe its easier than top Stat programs.

Edited by untzkatz
Link to comment
Share on other sites

You can find a program that's shorter and has fewer class requirements.  I sent out my last applications just before turning 28, and I plan to finish at 32, but obviously it's still an opportunity cost.  There are a lot of cool programs to consider if you're not going to become a statistics professor anyways.  Check out some more applied/computational or data science programs that may have fewer course requirements and thus shorter degrees.  University of Vermont has a complex systems/data science PhD that always looked super cool to me: https://vermontcomplexsystems.org/  

Link to comment
Share on other sites

1 hour ago, untzkatz said:

I think I would enjoy the PhD, provided it has a good mix of modern and computational topics. Wouldn’t want something where its like mostly dry math-stats and asymptotics. NYUs DS PhD program looks really interesting to me though. And they have cool research too, including a bunch of biomed imaging people. Its probably still really hard to get in though, but maybe its easier than top Stat programs.

NYU DS admissions tends to be more focused on research track record or if you go the mathematical maturity route, you have to be extremely mathematically mature.

Just a note, NYU DS is very computational NLP, Neural Network type stuff, (or super theoretical ML/Computational theory). This is very different from your standard Biostats PhD/Stats PhD track.

 

Edited by trynagetby
Link to comment
Share on other sites

11 minutes ago, trynagetby said:

NYU DS admissions tends to be more focused on research track record or if you go the mathematical maturity route, you have to be extremely mathematically mature.

Just a note, NYU DS is very computational NLP, Neural Network type stuff, (or super theoretical ML/Computational theory). This is very different from your standard Biostats PhD/Stats PhD track.

 

Oh I see, well I did do medical imaging related biostat research in my MS. It was interdisciplinary and I got 1 applied paper in a well known MRI journal, although it was more in applied classical stats. And that is the sort of stuff I want to do, involving DL/ML and imaging data. I don’t want to do vanilla biostats stuff like survival analysis lol, even in survival nowadays people are analyzing full images and using the survival loss functions in DL. 

Link to comment
Share on other sites

2 hours ago, untzkatz said:

26 itself isn’t old to be in the middle of PhD already but I see it as kind of old to start, like assuming it is 6 years (and given ill have to apply coming Fall) I would be around 33 after graduation. Lot of people are starting to settle just about now. And yea agreed it is a big consideration. But it sounds like the research scientist jobs in FAANG need one. Though I probably wouldn’t want to work for FB but that is more for my own reasons like not being into social media lol. 

Wrangling data is tedious at times but its still better than writing regulatory reports to the FDA and documentation imo. Tidyverse makes it a lot easier if its structured data.  

I think I would enjoy the PhD, provided it has a good mix of modern and computational topics. Wouldn’t want something where its like mostly dry math-stats and asymptotics. NYUs DS PhD program looks really interesting to me though. And they have cool research too, including a bunch of biomed imaging people. Its probably still really hard to get in though, but maybe its easier than top Stat programs.

I think you probably have misconceptions on what PhD is about. At the PhD level, you dig deep into a particular area and conduct original research, where you would need a deep understanding of mathematical/statistical theory. If you don't have very strong mathematical skills, you are gonna have a hard time in your PhD coursework such as probability theory/inference (sorry but I don't mean to scare you), let alone making breakthrough in research. Given your B/B+ in undergraduate math/stat courses, a question you wanna answer is that if you are confident of doing well in real analysis and other proof-based courses, which are much more challenging than the math courses you have taken.

Link to comment
Share on other sites

3 minutes ago, Casorati said:

I think you probably have misconceptions on what PhD is about. At the PhD level, you dig deep into a particular area and conduct original research, where you would need a deep understanding of mathematical/statistical theory. If you don't have very strong mathematical skills, you are gonna have a hard time in your PhD coursework such as probability theory/inference (sorry but I don't mean to scare you), let alone making breakthrough in research. Given your B/B+ in undergraduate math/stat courses, a question you wanna answer is that if you are confident of doing well in real analysis and other proof-based courses, which are much more challenging than the math courses you have taken.

B/B+ was in graduate MS level math stat classes, not undergrad. Its the classes taken by MS students and the 1st year PhD students who need to review the MS level before doing PhD level inference courses. We used Casella and Berger.  My undergrad was in a different biotech related field.

The highest undergrad math course I have taken is upper division linear algebra but I also got a B+ there, never did real analysis. I did struggle in the MS level math stat asymptotic theory type proofs. I got As in the computational courses (comp stats and 2 ML classes) though. How important is the statistical inference asymptotic type proof stuff for going into ML/DL? 

I wonder if maybe a DS program would be better for this reason because it is more applied and would go straight into the more modern statistical areas and not have to bother with regular math stats again. I hated the asymptotic theory stuff as it had very little application (in the end you just throw it into a Wald Test or Bootstrap anyways). 

Link to comment
Share on other sites

48 minutes ago, untzkatz said:

The highest undergrad math course I have taken is upper division linear algebra but I also got a B+ there, never did real analysis. I did struggle in the MS level math stat asymptotic theory type proofs. I got As in the computational courses (comp stats and 2 ML classes) though. How important is the statistical inference asymptotic type proof stuff for going into ML/DL? 

If you're doing research, I think it's crucial. In general, you want to prove some asymptotic properties of your method, whether it be it's consistent or asymptotically normal what have you.

55 minutes ago, untzkatz said:

I wonder if maybe a DS program would be better for this reason because it is more applied and would go straight into the more modern statistical areas and not have to bother with regular math stats again.

The NYU data science PhD program requires a sequence in probability and statistics. One component of the course is convergence, so these topics will definitely come up again.

57 minutes ago, untzkatz said:

I hated the asymptotic theory stuff as it had very little application (in the end you just throw it into a Wald Test or Bootstrap anyways). 

Stuff like this is, in my opinion, somewhat insulting. I realize that you are not intending it to be so, but note that many of us (but not me) on here have devoted our entire careers to asymptotics and they are important. Showing your estimator is consistent and/or asymptotically normal is really important. You mentioned earlier causal inference, well this is basically what causal inference people do--here's a new estimator that's unbiased / consistent and asymptotically normal. 

It's fine if you feel like biostatistics / statistics isn't for you, but you should not come on a statistics forum and basically put down everything and say it's "outdated" or "boring". 

Link to comment
Share on other sites

1 hour ago, untzkatz said:

B/B+ was in graduate MS level math stat classes, not undergrad. Its the classes taken by MS students and the 1st year PhD students who need to review the MS level before doing PhD level inference courses. We used Casella and Berger.  My undergrad was in a different biotech related field.

The highest undergrad math course I have taken is upper division linear algebra but I also got a B+ there, never did real analysis. I did struggle in the MS level math stat asymptotic theory type proofs. I got As in the computational courses (comp stats and 2 ML classes) though. How important is the statistical inference asymptotic type proof stuff for going into ML/DL? 

I wonder if maybe a DS program would be better for this reason because it is more applied and would go straight into the more modern statistical areas and not have to bother with regular math stats again. I hated the asymptotic theory stuff as it had very little application (in the end you just throw it into a Wald Test or Bootstrap anyways). 

Consistent B range grades in graduate level courses are even more concerning, given the grade inflations in master's. In this case I would say that you are probably not a good fit for PhD in quantitative disciplines like biostat/stat/ds/ml, where the math is lot deeper. Understanding asymptotic theory is crucial to conduct higher level ML research, either applied or theoretical. At the PhD's level, even applied research needs solid mathematical foundation. Asymptotic theory is one of the most fundamental elements in statistics, and many if not all research areas are based on asymptotic theory.

Edited by Casorati
Link to comment
Share on other sites

29 minutes ago, Casorati said:

Consistent B range grades in graduate level courses are more concerning, given the grade inflations in master's. In this case I would say that you are probably not a good fit for PhD at quantitative disciplines like math/stat/ds/ml. Understanding asymptotic theory is crucial to conduct higher level ML research, either applied or theoretical. At the PhD's level, even applied research needs solid mathematical foundation. Asymptotic theory is one of the most fundamental elements in statistics, and many if not all research areas are based on asymptotic theory.

My stats/biostats program in grad school didn’t have this grade inflation. It was graded more like how undergrad courses would be on curves. Actually many Americans in particular got similar scores as me, the international Chinese students (who were like 90+% of the dept, which I think isn’t uncommon) set the high barrier. There were classes were I did decently well and then last minute got screwed by the Final Exam curve. Some of these international students had done things like Quadratic Forms way back in HS, and lot of the MS math stat courses were just review for them. 
 

35 minutes ago, StatsG0d said:

If you're doing research, I think it's crucial. In general, you want to prove some asymptotic properties of your method, whether it be it's consistent or asymptotically normal what have you.

The NYU data science PhD program requires a sequence in probability and statistics. One component of the course is convergence, so these topics will definitely come up again.

Stuff like this is, in my opinion, somewhat insulting. I realize that you are not intending it to be so, but note that many of us (but not me) on here have devoted our entire careers to asymptotics and they are important. Showing your estimator is consistent and/or asymptotically normal is really important. You mentioned earlier causal inference, well this is basically what causal inference people do--here's a new estimator that's unbiased / consistent and asymptotically normal. 

It's fine if you feel like biostatistics / statistics isn't for you, but you should not come on a statistics forum and basically put down everything and say it's "outdated" or "boring". 

Ok maybe I should rephrase. I like the computational aspects of all of these tools. I have always liked implementing things like GLMs, EM, gradient descent, doubly robust etc in code. 

From a practical standpoint, when you say fit a GLM model and do data analysis, are you *actively* thinking about asymptotics? Usually you do some deviance residual visual checks, check assumptions like independence based on the design,  and consider things like bootstrap (or if there is dependence, some clustered resampling) if various assumptions aren’t met (or sometimes even if they are). I guess indirectly this is related to it, but its not like a proof more of a check. 

Idk, maybe I just like data analysis and implementing statistical algorithms stuff but it sounds like that isnt really what a PhD is about either. If that is the case, maybe it could be that its better to just looks for DS jobs that involve more of it and improve programming skills so that I can write production code? 

As it sounds like its hard to find something where you are just doing the data analysis component, except maybe in academia. But I am considering also just going back to academia as an MS level biostatistician, where it is more real biostats without the regulatory stuff. Doesn’t pay great but I am considering just working at like a cancer center doing imaging data analysis. Which could help also with PhD apps but also see if I like that stuff more. 

Edited by untzkatz
Link to comment
Share on other sites

45 minutes ago, untzkatz said:

Actually many Americans in particular got similar scores as me, the international Chinese students (who were like 90+% of the dept, which I think isn’t uncommon) set the high barrier. There were classes were I did decently well and then last minute got screwed by the Final Exam curve. Some of these international students had done things like Quadratic Forms way back in HS, and lot of the MS math stat courses were just review for them. 

This is not the correct mindset for pursuing a PhD. If you wanna pursue a PhD, you should strive to stand out in the program and comparing yourself to others who don't do well won't help. PhD programs, especially the elite ones select the strongest candidates from all over the world, so the stake is high, and they don't care when you learnt quadratic forms. If you are motivated, you would've self-learnt it, or if you are smart, you could fill in the gap very quickly. You just have to prove you are capable of doing a PhD by proving your mathematical abilities, usually through high grades in proof-based courses such as real analysis/mathematical statistics/measure theory. Grade inflation or not, B/B+ in core math/stat courses look unimpressive to PhD admissions anyway. If you have occasional bad grades like B/B+, that might be ok, but the majority of your gr

 

45 minutes ago, untzkatz said:

From a practical standpoint, when you say fit a GLM model and do data analysis, are you *actively* thinking about asymptotics? Usually you do some deviance residual visual checks, check assumptions like independence based on the design,  and consider things like bootstrap (or if there is dependence, some clustered resampling) if various assumptions aren’t met (or sometimes even if they are). I guess indirectly this is related to it, but its not like a proof more of a check. 

des should be A's. Those model checking stuff were known knowledge and they are quite routine checks, which can be done by master's or even undergraduates. What sets master's and PhD students apart is your ability to conduct original research. For example, at the PhD level you could propose new methods for improving the performance of GLM and prove that works through mathematical proofs. 

Edited by Casorati
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

This website uses cookies to ensure you get the best experience on our website. See our Privacy Policy and Terms of Use