untzkatz's Content - The GradCafe Forums

Big tech data scientist vs. big pharma biostatistician

untzkatz replied to lxzqw136's topic in Mathematics and Statistics

I am in pharma and I am leaving soon because of the things you listed, because they weren't good fits. I did manage to recently find a more Bioinformatics DS drug discovery type focused position, and they are willing to train me on the domain knowledge thankfully, also because probably my undergrad background is also in a biomedical field so I wasn't a pure biostat/stat person. The TLDR is if you really really want to do stats, data analysis, and modeling focused work, then Biostats in pharma will be disappointing. You don't have to go to big tech though, you can look for something like this too in other titled position. I had gotten feedback on a similar topic weeks ago and I meant to update that but after a whole month of interviews, I managed to find this as an MS. The interviews for diff DS positions were a mix of presenting data analyses I had already done (I used grad school stuff for these), take-home data analysis, data wrangling tests on coderpad, and leetcode type qs. I bombed the leetcode ones but I passed the ones that had the other 3. And then made the decision, hopefully the right one, on one which seemed more analysis focused on biochemical data. One of the others was academia (which had lower pay) and then another I got the vibe during subsequent interviews it was more DE focused despite claiming to do causal inference and ML. Ironically, Biostatistics is a good fit for people who want occasional simple statistics and more focus on writing, communication, FDA/regulatory stuff. If you want to use more statistical methods and have it focused mostly on programming, modeling, etc though then DS, ML engineer are better. For me, what drew me to biostatistics major was the data analysis and modeling, so it turned out that being in pharma in a Biostat *title* was an extremely poor fit for this. I've hated writing ever since middle school and the documentation was painful and stressful for me more than learning advanced programming, ML/DL, and data analysis. I think often times there is a common misconception in school that STEM is "harder" than humanities, social science, writing etc and there are definitely people for whom its the opposite and this side is fortunately or unfortunately depending on the person a major part of biostatistics titled jobs. You have to want to get better at it and improve over time to be successful in Biostatistics, and it's something I had near 0 interest in my entire life. It is true you won't be competing with younger people who could be sharper technically when you get older though I think most likely you will have to pick 1 and then see how you like it in the 1st year. This is how I have ruled out all Biostatistics jobs in the future for me.

Sending GRE scores before applying

untzkatz replied to csheehan10's topic in Mathematics and Statistics

Related to this, if your GRE is expiring this November, does sending it earlier, even if you turn in the app itself later but by the Dec 1 deadline make the scores valid?

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

untzkatz replied to untzkatz's topic in Mathematics and Statistics

Damn, so the job description really is deceiving. I guess not too surprising considering how hyped up DS is. In regards to transitioning out of biotech, one issue is my undergrad was in BE. So compared to perhaps other Biostat people who came from stat/math, I feel a bit more "holed into" this industry. All of these Bio-X fields seem to suffer from this. Like my whole resume is projects that are biomedical stats related, even research experience I got in grad school was doing stats for a lab in BE (hence the work with imaging data). Though this issue is kind of common to any Bio-X field. I actually like the biomedical stuff but in industry there can be too much red-tape Some good news (fingers crossed) is recently I had a non-clinical Biostat interview for a biotech company (some recruiter had referred me, I hadn't applied, but I said well I'll just see what its about), but I made it clear to the team I didn't like the regulatory writing stuff but wanted algorithms. One of the interviewers actually was on the algs team and I was able to answer the ML questions (they asked me weird stuff like what if data isn't labeled what would you do, and is accuracy always a good metric), and got an internal referral to interviewing for DS on the algs team (they themselves said I would be a better fit for this, and they want someone who is familiar with the stats aspects of it) where they do seem to do predictive modeling. Hopefully I get lucky to pass it.

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

untzkatz replied to untzkatz's topic in Mathematics and Statistics

Well I know Python so it would probably be improving SQL. Are you basically saying that such job descriptions look like they have lots of cool modeling, but that reality is not the case and it just seems that way on the outside? I keep hearing that for statistical/ML modeling jobs these days you need a PhD, and even still it'll be competitive as you said. Modeling 20% of the time isn't too bad, but I'm afraid data analyst it'll be like <5% of the time and most you will be doing is basic univariate summary stats and visualizations. Sounds like what you are getting at is that the coursework in statistics/biostatistics departments is heavily foundational classical stats, but the research does more modern things and combines it with the inferential aspects? Whereas NYU DS for example seems to go right into the statistical ML/DL and bayesian network type stuff. I would be interested in Comp Neuro too since you brought that up. Did you do that stuff in undergrad-it seems very advanced for undergrad level. I agree the good thing about an Algorithms course is that even if I don't do a PhD, it can still help to get through interviews at tech companies since that stuff is tested in Leetcode and so on. And still improves general programming skills beyond just numerical computation. That is why I have been leaning towards doing it. As far as the frameworks though, I'm pretty sure most PhD students doing DL are using PyTorch and so aren't implementing various data structures or autograd from scratch. I've seen arxiv github code and it still often follows the formulaic subclassing nn.Module to make a layer, then having __init__ and forward() and so on. And making a Dataset class and Dataloader. Would you say something like UCSD EE with DS/ML may also be good? https://www.ece.ucsd.edu/index.php/faculty-research/ece-research-areas/machine-learning-data-science-impacted. Seems like they do stat learning and DL there too.

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

untzkatz replied to untzkatz's topic in Mathematics and Statistics

So you think that it's not worth doing a PhD in a DS/related field in order to eventually go for one of those more stat/ML modeling based jobs (despite how scarce they seem to be)? A lot of these, like for example the Harnham one posted earlier, seem to require a PhD. Would you say apply for that stuff even with an MS and try to demonstrate that you can do it on the resume/Github & interview? I understand data wrangling is the 80% of data related work, but still I'd like to get away from the regulatory writing aspects primarily. I'm not complaining about data wrangling as much as the biostatistics regulatory stuff. In that sense, it sounds like a DS job, even in biotech, with less of the regulatory writing being given to me (since Biostatisticians are given this) could be a better fit. Here is a more example of something I would eventually like to get into: https://verily.com/roles/job/?job_id=2059874. It's signal processing related, and looks like actually this particular one requires an MS but in CS, and otherwise it says preferred is PhD in BME/CS/App Math/related. A lot of the job posting seems to be about both classical (eg time series, linear models, multivariate analysis) and ML modeling so a good mix. It just seems like these jobs and similar ones vastly prefer PhD candidates. I don't know if I am interested in pipelines like the software/ML engineering sense of the term, but I like both methods+applications of ML/DL. Sometimes, people develop methods for specific applications. My undergrad btw actually was in BE which you listed, although it is far too broad and I wish I did something like applied math ugrad. My Biostat program covered those inferential things at MS level in the 1st year, and 2nd year we had stuff on Survival+GLM+GLMM (these were combined with PhD students, I got an A in survival/GLMM but a B+ in GLM) as well as the electives which were the ML/signals/time series classes I mentioned (those were all As). Yea, the higher ranked programs I would expect to be more modern. I'm not too interested in doing the Fisher/Neymann Pearson inference stuff all over again though, but NYU DS inference & representation course on the graph models does look interesting as its a more modern spin on it. Didn't realize DS&Algs would be needed though, I'm not sure where things like heaps, linked lists, dynamic programming etc come up in ML at all, but I also took ML in a stat department not a CS one. Technically I know NNs for example make use of dynamic programming when using autograd to cache the gradients, but thats in internal detail you don't need to worry about when using high level programming languages/frameworks like Julia's Flux or the TF/PyTorch frameworks. The only computational complexity stuff we did was related to matrix decomps in computational stat. Between Real Analysis and an Algorithms class what would be better? I don't think I can take both.

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

untzkatz replied to untzkatz's topic in Mathematics and Statistics

I see, yea I am not interested in overall CS though. I feel like I only like this narrow ML/DL area and to me it seemed like stats. So seeing that NYU DS you can more or less just focus on that area looks appealing. They do a lot of MRI research in the biomedical track too, which seems to be more applied statistics based than CS. I do agree more math background would help but I was in a different biomedical field in my undergrad, so can’t do much now. I could potentially sign up for one of either Real Analysis or Data Structures&Algs for the summer though, which I am considering. Honestly I never really thought I would like ML/DL back in undergrad because I didn’t really know what it was and thought it was some insane CS thing but 2 stat ML (on supervised+unsupervised learning) classes in my MS I was wowed and I got convinced that ML is stats. And computational statistics (which had a bit of numerical analysis, but mostly matrix decomps/GD/MCMC/EM) had some as well. I also had a signal processing (special topics) stats course I really liked on FFTs, which was actually invented by Tukey, and I liked that too. Time series as well but my TS elective course was undergrad level. There was much less asymptotic stuff in these areas and it was more like “show gradient descent on convex functions converges” which is more optimization. So it seems weird to me that all this stuff isn’t considered statistics, perhaps I went to a more modern department after all. Deep Learning also we had a little bit on it from the GLM/GAM perspective. And I always liked GLMs and regression, so seeing that ML and DL boiled down to that got me interested in it. And regularization too, like how people nowadays are incorporating different kinds of penalties for domain specific problems is interesting. The whole double descent thing in DL to me seems like statistics, at least the way Dr Witten explained it with GAMs and regularization. VAEs for example seem to be heavily statistical would fall under probabilistic modeling of multivariate distributions, letting you generate new data based on the latent space. I don’t think asymptotics, inference, and p values necessarily define the field. I see, yea it seems to be a great idea but maybe I need to think about it more. It is true, as of now work-life balance is not a problem despite the job itself being a bore. Maybe things could get better with a new job+ end of covid lockdowns. That could be a contributor too. I hate remote work and I hate how companies are also pushing to be remote more and more. It makes you feel mostly like a corporate slave imo and long term this model is not going to work. There is no social aspect/culture and it makes the boring stuff more boring. As my first job ever, I have hated that (maybe id have felt differently otherwise). I could also consider joining some DS/ML meetup groups irl just to satisfy the intellectual curiosity aspect, as it sounds like jobs will not have as much of this (especially without a PhD). And also do other hobbies. I will admit, part of why I wanna do a PhD is delay this chapter of life lol. I did like the school research environment more. Its not only because of jobs.

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

untzkatz replied to untzkatz's topic in Mathematics and Statistics

Yea I get this perspective too, which brings me to the elephant in the room—Why has DS/ML/AI been so hyped up? Its certainly starting to sound like the “instagram effect” but for jobs. You see the best, most cutting edge stuff (analogous to seeing highlight reels and heavily curated/edited pics ) from the outside but the reality isn’t like that, and it gives you a skewed view. It sounds like this stuff is really more in research labs and if what you are saying is true, it actually does not pay well (since its in academia) except for the very few who are competitive to get a FAANG-like research scientist position. Or perhaps lucky to find some startup. Otherwise statistical/ML/DL algs related stuff sounds like largely a hobby for personal projects. Guess now it starts to makes sense now why they call it “work” and not “fun”.

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

untzkatz replied to untzkatz's topic in Mathematics and Statistics

Data analyst traditionally is like Tableau and SQL from my understanding. That probably doesn’t have much classical nor ML analysis at all. Don’t think its necessary to do DA to go to DS coming from Biostat is it Im currently actually talking to a biostat position related to imaging data analysis though in academia, I had actually landed it last year but I chose industry due to the pay. It wasn’t directly an imaging position, but I had gone through the process and the labs I would have worked with were radiomics and stat learning ones. I decided recently to contact the main person again last week, who actually did get back to me though its looking probably like I would have to go through the process again soon but they are finding that out for me. The good thing is, they have this thing where you can even take relevant classes on the side while you do research so that could help too. Then there are some DS positions I have had phone calls for too but heard nothing back yet, so competitive. But still hoping for those.

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

untzkatz replied to untzkatz's topic in Mathematics and Statistics

Looking at the PhD DS curriculum here https://cds.nyu.edu/phd-curriculum-info/ Looks like there is a 1 probability course and 1 more modern inference (like graphical models) course. The probability course seems to have notes here https://cims.nyu.edu/~cfgranda/pages/stuff/probability_stats_for_DS.pdf and it looks pretty much like MS level probability (looks like over here the MS students also take this) which I have done before already. Its not measure theoretic probability. The intro to DS course is programming based, and the ML class looks like it goes more into ESLR stuff which I don’t mind. It is also MS level from what I can tell MS are taking the same class, and we had something similar to this too. The hardest class in terms of theoretical background seems to be the Inference & Representation one. But this looks to be about very modernized topics like DAG/Bayesian network models which is very different from your usual math-stat asymptotic/Fisher/ Neymann-Pearson stuff. They seem to discuss applications in ML as well, so its far from the typical math-stat inference class and looks like something I would like potentially based on the notes : https://www.notion.so/Inference-and-Representation-623a215febc3461dbc004682484922ad

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

untzkatz replied to untzkatz's topic in Mathematics and Statistics

My stats/biostats program in grad school didn’t have this grade inflation. It was graded more like how undergrad courses would be on curves. Actually many Americans in particular got similar scores as me, the international Chinese students (who were like 90+% of the dept, which I think isn’t uncommon) set the high barrier. There were classes were I did decently well and then last minute got screwed by the Final Exam curve. Some of these international students had done things like Quadratic Forms way back in HS, and lot of the MS math stat courses were just review for them. Ok maybe I should rephrase. I like the computational aspects of all of these tools. I have always liked implementing things like GLMs, EM, gradient descent, doubly robust etc in code. From a practical standpoint, when you say fit a GLM model and do data analysis, are you *actively* thinking about asymptotics? Usually you do some deviance residual visual checks, check assumptions like independence based on the design, and consider things like bootstrap (or if there is dependence, some clustered resampling) if various assumptions aren’t met (or sometimes even if they are). I guess indirectly this is related to it, but its not like a proof more of a check. Idk, maybe I just like data analysis and implementing statistical algorithms stuff but it sounds like that isnt really what a PhD is about either. If that is the case, maybe it could be that its better to just looks for DS jobs that involve more of it and improve programming skills so that I can write production code? As it sounds like its hard to find something where you are just doing the data analysis component, except maybe in academia. But I am considering also just going back to academia as an MS level biostatistician, where it is more real biostats without the regulatory stuff. Doesn’t pay great but I am considering just working at like a cancer center doing imaging data analysis. Which could help also with PhD apps but also see if I like that stuff more.

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

untzkatz replied to untzkatz's topic in Mathematics and Statistics

B/B+ was in graduate MS level math stat classes, not undergrad. Its the classes taken by MS students and the 1st year PhD students who need to review the MS level before doing PhD level inference courses. We used Casella and Berger. My undergrad was in a different biotech related field. The highest undergrad math course I have taken is upper division linear algebra but I also got a B+ there, never did real analysis. I did struggle in the MS level math stat asymptotic theory type proofs. I got As in the computational courses (comp stats and 2 ML classes) though. How important is the statistical inference asymptotic type proof stuff for going into ML/DL? I wonder if maybe a DS program would be better for this reason because it is more applied and would go straight into the more modern statistical areas and not have to bother with regular math stats again. I hated the asymptotic theory stuff as it had very little application (in the end you just throw it into a Wald Test or Bootstrap anyways).

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

untzkatz replied to untzkatz's topic in Mathematics and Statistics

Oh I see, well I did do medical imaging related biostat research in my MS. It was interdisciplinary and I got 1 applied paper in a well known MRI journal, although it was more in applied classical stats. And that is the sort of stuff I want to do, involving DL/ML and imaging data. I don’t want to do vanilla biostats stuff like survival analysis lol, even in survival nowadays people are analyzing full images and using the survival loss functions in DL.

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

untzkatz replied to untzkatz's topic in Mathematics and Statistics

26 itself isn’t old to be in the middle of PhD already but I see it as kind of old to start, like assuming it is 6 years (and given ill have to apply coming Fall) I would be around 33 after graduation. Lot of people are starting to settle just about now. And yea agreed it is a big consideration. But it sounds like the research scientist jobs in FAANG need one. Though I probably wouldn’t want to work for FB but that is more for my own reasons like not being into social media lol. Wrangling data is tedious at times but its still better than writing regulatory reports to the FDA and documentation imo. Tidyverse makes it a lot easier if its structured data. I think I would enjoy the PhD, provided it has a good mix of modern and computational topics. Wouldn’t want something where its like mostly dry math-stats and asymptotics. NYUs DS PhD program looks really interesting to me though. And they have cool research too, including a bunch of biomed imaging people. Its probably still really hard to get in though, but maybe its easier than top Stat programs.

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

untzkatz replied to untzkatz's topic in Mathematics and Statistics

Thanks for the links. The Cytel one looks like it uses SAS and nothing cutting edge is being done in there lol, even a log transform is insanely cumbersome vs R/Python/Julia. But these are interesting otherwise especially the Harnham one is right up my alley, though it says Senior DS and wants a PhD. Rest seems mostly director level. The PhD seems to be a big barrier and I am 26 so am getting older. I regret not doing it earlier, as it seems with an MS you mostly get all the boring work especially in biotech. Biotech seems to value the PhD status a ton. Maybe at the end of the day a job is a job and I will just have to do the advanced stat/ML/AI stuff as a side hobby if I never get a PhD. One of my biostat profs suggested to maybe get an MS in DS or ML from a reputable school and then see after but I suspect itll lead to the same problems, as even that field demands a PhD now as I don’t want to be doing ML Engineering either I want to be doing statistical ML. ISLR/ESLR is on ML though and these are written by statisticians. I don’t think uncertainty quantification is necessary for something to be statistics. If you have a complex observational dataset and you don’t approximate the function correctly (model misspecification) the 2nd order things like SEs/p values are not going to be accurate anyways. Predictive scores are important even for inferential purposes now according to some classical statisticians, like Max Kuhn the R tidymodels author who describes an example of inaccurate inferential results when this isn’t done: https://www.tmwr.org/performance.html. Nowadays people are even combining ML with classical statistics in the things like SuperLearner by Mark VDL and Doubly Robust methods. That is the sort of stuff I find really cool. Seems its all mostly academia though. But yea anyways my company doesn’t seem to encourage innovation. Its all about just moving the business forward. Its mid size trying to scale up further and its going to be even more regulatory work going forward. When I first started a year ago, I had more freedom and did more internal data analysis rather than for product submission/FDA but in the last 5 months it has changed a lot more and they even say “we are becoming more like biostatisticians in more established companies so more regulated”. I think I wouldn’t like MBA either since its again more business oriented, and I was never interested in management. I very much am interested primarily in data analysis and the stat/ML methods. But it seems I don’t have the PhD gold star for this work.

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

untzkatz replied to untzkatz's topic in Mathematics and Statistics

To me ML=statistics, and after taking 2 ML courses in the stats department during my MS I am convinced lol. The rigorous stats you are referring to like missing data I think doesn’t really come up in MS level biostat. And people are conservative when it does anyways, they often just drop it and don’t do fancy imputation. Power and sample size calcs come up quite a bit but they are very straightforward simulations. Also, statistics is not just hypothesis testing and uncertainty quantification to me. I think this is a misconception. Or rather, the Biostat work does not really involve advanced methods for this. Causal Inference for example is rarely by industry pharma/med device biostatisticians, but it is done by data scientists in tech. Causal inference on observational data is a good example of rigorous stats that is missing in industry biostatistics. You mention RWE observational data but I have not seen industry Biostat positions where one can focus solely on analyzing that data. I see RWE mentioned more in DS positions again. I never said anything about not checking assumptions and so on, in fact, there have been multiple occasions where an analysis I proposed was better on this basis but they still wanted the simple one because it was in the FDA guideline and they don’t like going against the grain, regardless of the mathematical or statistical justification. For example take a medical image or molecular level data. This is where the interesting work is and you can go deep into the mathematics of for example statistical signal processing (if you don’t want to do ML, and stay within classical) and extract features, but biostatisticians don’t do this either. They are much more product facing. Instead we have bioinformaticians and comp chemists for example doing that other more discovery related stuff. And there is far more actual deep statistics in that then there is with a t test or ANOVA showing that A was better than B. Sometimes I get lucky and I can bootstrap, but even that isn’t exactly exciting. Bootstrap is probably the most “complex” technique I have used. And then there is study design where majority of the work is the boring planning phase and writing even more documentation. Maybe theres some simulated sample size calculation but beyond that simulation there isn’t much. Or what about classical time series analysis on wearable device data? Nope that doesn’t seem to happen in biostat either. Even within classical stats, the advanced stats besides some designed experiment GLMM I have not seen come up. Within non-ML, things like mixture models, EM algorithm, MCMC, Fourier transforms, Gaussian processes come up more in DS than stats. I did try to use Bayesian once but they are opposed, feels it complicates the documentation. As for ML, there is a lot more statistics here— take a variational autoencoder for example. The theory behind this is very statistical and involves probability theory/KL divergence etc. You are right, I just wanna dig in and start analyzing the data and focus on the methods. But that to me is the statistics, not documentation and regulatory aspects. Lot of people don’t like data cleaning and while I am not a huge fan, I have even enjoyed the data wrangling/cleaning aspects far more than the regulatory documentation. Like at least even data cleaning has the programming aspect and can be like a puzzle to solve.

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

untzkatz replied to untzkatz's topic in Mathematics and Statistics

Yea that is true. Good idea to post projects on github, I actually just recently learned basic git for that. Been cleaning up some of my grad school analyses code and making it modular etc to make it postable lol. That is the thing, on the DS/ML end it can seem like infinite competition. On LinkedIn you can see 200+ applicants in a couple hours even sometimes. The differentiator nowadays seems to be heading to domain knowledge, especially in biotech. For me, I do know some stuff about medical imaging which is how I got one of my interviews (but then failed the leetcode).Way fewer jobs in med imaging than genomics though.

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

untzkatz replied to untzkatz's topic in Mathematics and Statistics

Well yea, it is a word game but that does matter in terms of getting interview #1 for the other things. Why should they take a biostat grad over somebody who specialized in that domain and has developed more computational skills? That being said, I have had a couple interviews for DS. The part which I struggle in is the computer science leetcode questions. I can answer the stat ML questions fine but its the general (non ML) algorithmic thinking I never developed. I guess this can be practiced though. But its really hard to get interviews in the first place. On the bioinformatics side, I am lacking domain knowledge. I think this is the bigger barrier there. Stuff about different sequencing technologies like RNASeq, NGS, qPCR, etc. I never learned omics. Domain knowledge is really important too, and this was neglected by my MS Biostat program. That is one reason I wonder if maybe tech could be better. Because the CSey stuff can be self learned but the deep domain knowledge is going to be harder to acquire outside a grad program. The thing is, how to make my resume appealing to tech, because its really biotech oriented (and my undergrad was in a biotech related field too, little did I know back then).

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

untzkatz replied to untzkatz's topic in Mathematics and Statistics

This was also what I was saying though, why is BIO”statistician” in industry have the name statistician at all as opposed to regulatory specialist or product validator or something? Its misleading. What is causing the huge disconnect between industry work in Biostatistics vs. academia work? How come other fields, even within Biotech, such as bioinformatics are creeping into “our” domain and doing more advanced statistics? This is what bothers me. Its not merely explained by regulation since bioinformatics is also in biotech so its more apples to apples (vs say comparing FAANG DS to Biostat in biotech). I do know some Python although I am no expert. I find data wrangling in Python especially to be a pain, though the libraries like scikit learn and PyTorch are much nicer. SQL I have used dbplyr in R mostly.

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

untzkatz replied to untzkatz's topic in Mathematics and Statistics

Where I am, the PhDs in biostats are doing even more of it in fact. There is a data cleaning component which they *don’t* do but even that is far more fun than regulatory writing. PhDs in other quantitative computational fields though, even stuff like physics, are doing the actual statistical/ML algorithms work that goes into the omics analysis. I don’t know why this is. In other companies in pharma/med devices I have not seen it be much different. The algorithms for drug discovery, imaging, and mining genomic data stuff they would rather have people with substantial domain knowledge and/or pro computational skills. Biostat is mostly regulatory monkey work in general. The writing has to be *on point* and I have been criticized for not being specific enough, and told it needs to be able to be used and understandable by any kind of auditor. This to me is not statistics, its business/law work. FAANG does more real statistics in general than biotech. I largely do get the feeling biotech is not the place to be if you want to be doing real statistics. It seems like the value in biotech is largely getting a product past the FDA. Not the methodology. This makes sense when you think about it, if a product submission fails then the company cannot make $$$. Everything hinges mostly on that.

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

untzkatz posted a topic in Mathematics and Statistics

Are there Biostat titled jobs which actually use real and cool statistics and programming? I am an MS who graduated last year and I just cannot find them. On the other hand, it looks like a good amount of DS positions do mention these techniques/tools: causal inference, time series, multivariate analysis, predictive modeling, Bayesian modeling, ML/DL. R/Python/Julia, PyTorch/TF, observational and unstructured data etc. Then on the other side, in Biostat, you see very boring things like SAPs, SAS, FDA/ICH guidelines, QC, experience in regulatory environments, more documentation, validation, trials. A good 80-90% of this is non-technical. Having been in a Biostat job for a year I hate it and want to get out. Its a heavy amount of formal boring writing. And often times even slightly more involved analyses are rejected in favor of a dumb t-test or ANOVA. I see this Biostat field is dying and becoming increasingly a “regulatory monkey” role where the real work and advancement in the field is not ones statistical or programming ability but the ability to communicate with regulatory bodies. I kid you not, I saw a Principal Biostat profile which was like “verifying 100 SAP documents and checking consistency”. I notice even where I am, even the bioinformaticians do more of the technical statistics. Why is it like this? I know this is not just a case of being at a bad place, I notice this consistently in LinkedIn job postings listing “Biostatistician”. Increasingly, it seems like the real statistics work is going to regular statistics, DS, other computational domain specialists (comp chemists, bioinfo, etc), and of course CS/EE. The exception is some of the biostat jobs on the bioinfo side, but those are mostly out since my program didn’t cover genomics and I know very little. In a different thread, it was mentioned that there is a stereotype for Biostatistics in the industry to be about regulatory stuff, trials, and SAS/SAP. I’m wondering how do people get past this stereotype? How did it start in the first place? Obvioisly I can apply for DS jobs too, but its super competitive and hard to get noticed. Which brings me to potentially going for a PhD in a computationally heavy field such as DS, ML, bioinfo. Regular stat too although I am concerned there that my math stat MS courses I got B/B+s and have not done real analysis. It would be ideal if I could land the DS jobs that use the actual statistical techniques heavily without a PhD, but I am having trouble being noticed. Sometimes in the postings they conveniently leave out biostats but for whatever reason (in biotech) will list regular stat, bioinfo, EE/CS. I have even applied for these anyways and then gotten a recruiter back to me who said “Oh actually you will be a better fit for this Biostat validation position”. No thanks. The industry perception of Biostats is absolutely not good, in my opinion if you want technical work. That being said, I know there are a handful of people who don’t want to be doing stats or programming all day too and like the regulatory, business side. But that is not me.

Fall 2021 Data Science/Analytics Applicants

untzkatz replied to KMickey's topic in Mathematics and Statistics

I don’t know much about UChicago MS analytics but I would definitely go to NYU DS, its very cutting edge and you have people like Yann LeCun there. They also are rigorous as you said. I am skeptical of MS Analytics programs and the rigor on the math side of things. NYU DS has that aspect though. Plus its NYC, more opportunity in general.

Importance of program ranking for industry

untzkatz replied to 3musketeers's topic in Mathematics and Statistics

The ranking thing makes sense, I went to a UC and its not particularly that high up. What sucks is that most of my classes were also in the stat department and we were not even taught by our own biostat faculty. But of course the industry hiring managers don’t know this. I took extra classes in supervised+unsupervised ML+time series too albeit all at undergrad level. And I found I really liked those way more. But I guess it does seem especially at a lower ranked school you are probably better off even doing something like EE, CS, DS to do real statistics than biostat, if you don’t have the math prereqs to get into a stat program. No doubt though Biostat is a stable career, and for the people that just value that aspect or don’t want to really keep up anymore (like say they have a family and kids to raise etc) it can be a good option. But especially in my 20s I can’t stand doing outdated things, and especially dislike regulatory work. Even within classical stats, people don’t seem to be fond of more rigorous methods because a lot just want you to be a robot following the FDA guidelines. I want to get out of the Biostat field asap but I am competing against all these programmer bros who may not know stats as well but can do “ML production systems”. That is one reason I wanna do a PhD so I can focus on statistical ML/DL

Importance of program ranking for industry

untzkatz replied to 3musketeers's topic in Mathematics and Statistics

This is what I was trying to get at lol, industry has this exact clincal trial or validation monkey stereotype. And I am doing that right now and it sucks. This is why I said earlier biostat by the industry is less actual stats than data science hell even EE/CS now. Its more regulatory in nature. Writing focused, more businessy and law-ey. Honestly, I think some social science or public health majors with a decent handle on stats can succeed and enjoy in this area than someone who has been in STEM their whole life. At a startup, things could be different yea and more innovative. It is my first job since I graduated MS a year ago, but in general just search on LinkedIn “Biostatistician” and compare the listings to Bioinformatics and Data Science even within Biotech. Biostat to industry = SAP, regulatory submissions, FDA/ICH guidelines, etc. The stats might be some power/sample size calculation. And then report a p value or confidence interval once the data comes in. This is like 1950s stuff. DS has its fair share of BS too like tableau and SQL but you can try to avoid those more easily without ruling like 90% out. Otherwise DS has signal processing/time series, ML/DL, multivariate analysis, causal inference, etc mentioned. Regulatory submissions are often not mentioned. DS does more real actual statistics. I completely agree with your last point, though for me not having taken Real Analysis and having gotten a B+ in upper div lin alg as well as my MS math stat classes I feel a stat PhD will be tougher to get into. I currently am in a med-big company and it is quite product focused, getting the device out sort of thing. This involves reams of documentation. Analyses are mostly product validations and trials. It is high pressure during submission timeframes to get the writing correct down to the exact wording. Having talked to some people at even more well known biotech companies like Genentech, Biostat is not too much different there either.

Importance of program ranking for industry

untzkatz replied to 3musketeers's topic in Mathematics and Statistics

Yea DS seems to have that, but its kind of ironic to me that in the industry DS has more stats and tends to be more technical than bio”stats”. Fields that don’t have stat in their name like CS, EE, and within biotech-bioinformatics as well, all tend to use more advanced stats techniques in the industry than most biostats jobs. I kid you not I saw a Biostatistician job at an AI biotech company and then that position itself was all the regulatory, SAP, SAS stuff while the DS got the real stats. Its ridiculous. Maybe a sample size or power calc or at most mixed model in Biostat. It’s quite ridiculous to me how many Biostat jobs want the PhD but then the description ends up having lame SAS stuff (to me SAS is hardly a stats software in 2021), FDA/ICH regulatory writing, and maybe just simple stats. I have seen people with PhDs who are in senior Biostat positions but then are just dealing with documentation all day. That is not real statistics. I don’t know what it is honestly about the industry. I wanted to be a biostatistician, but many of those positions even at PhD level straight up don’t have as much stats at all. So I have to rebrand myself as a “data scientist” but this also involves learning more of the CS side, not just statistical ML. But that is for sure still better than writing. It certainly seems like Industry Biostat != Academic Biostat. Its an entirely different thing. They should label it something else because that sort of work imo (to me) dilutes the Biostatistician title. A field without the word -stat in the name shouldn’t be doing more actual stats than a field with the name.

Importance of program ranking for industry

untzkatz replied to 3musketeers's topic in Mathematics and Statistics

I know a decent amount of Python and am learning PyTorch myself right now doing a neuroimaging related DL project with a research lab as part of a class I am taking as a non degree student. I’m hoping this helps also for getting into PhD Bio/med informatics or DS programs. I still will apply to Biostat programs too but I don’t have the Real Analysis recommended requirement. Could take it this summer but I don’t know if I want to lol. I was always more interested in the applied computational side of things. I realized I really like the signal processing stuff and DL, and want to probably combine that with causal inference. Does a Biostat PhD still let you get into this computational side? I know places like UW with Dr. Witten (one of ISLR authors) does but UW is #1 lol and will virtually require Real Analysis. And yea I have been wondering how much actual stats is used in the corporate world. Some things I do today are so boring I am like why do I even need the MS for this, like a proportion confidence interval can be done by a high schooler who has taken AP stats and maybe knows the R function. I start to wonder if all this interesting ML/DL/AI and causal inference is even used much outside academia and FAANG. It seems like more than biotech companies, tech companies use more cutting edge methods. After dealing with the regulatory writing non-stat bullshit in biotech, I am less opposed to going to tech now. I like the idea of causal inference, AI in healthcare but its just not going to happen for a long time. I’m sick of tabular experimental data too, there is less interesting statistically in a designed experiment vs the freedom of observational and unstructured data.

Sign In

untzkatz

Posts

Joined

Last visited

Content Type

Profiles

Forums

Everything posted by untzkatz

Big tech data scientist vs. big pharma biostatistician

Sending GRE scores before applying

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

Is Biostatistics becoming outdated in the industry, outside regulatory writing?

Fall 2021 Data Science/Analytics Applicants

Importance of program ranking for industry

Importance of program ranking for industry

Importance of program ranking for industry

Importance of program ranking for industry

Browse

Activity

Results

Important Information