untzkatz Posted May 3, 2021 Author Share Posted May 3, 2021 (edited) 1 hour ago, StatsG0d said: If you're doing research, I think it's crucial. In general, you want to prove some asymptotic properties of your method, whether it be it's consistent or asymptotically normal what have you. The NYU data science PhD program requires a sequence in probability and statistics. One component of the course is convergence, so these topics will definitely come up again. Looking at the PhD DS curriculum here https://cds.nyu.edu/phd-curriculum-info/ Looks like there is a 1 probability course and 1 more modern inference (like graphical models) course. The probability course seems to have notes here https://cims.nyu.edu/~cfgranda/pages/stuff/probability_stats_for_DS.pdf and it looks pretty much like MS level probability (looks like over here the MS students also take this) which I have done before already. Its not measure theoretic probability. The intro to DS course is programming based, and the ML class looks like it goes more into ESLR stuff which I don’t mind. It is also MS level from what I can tell MS are taking the same class, and we had something similar to this too. The hardest class in terms of theoretical background seems to be the Inference & Representation one. But this looks to be about very modernized topics like DAG/Bayesian network models which is very different from your usual math-stat asymptotic/Fisher/ Neymann-Pearson stuff. They seem to discuss applications in ML as well, so its far from the typical math-stat inference class and looks like something I would like potentially based on the notes : https://www.notion.so/Inference-and-Representation-623a215febc3461dbc004682484922ad Edited May 3, 2021 by untzkatz Link to comment Share on other sites More sharing options...
bayessays Posted May 3, 2021 Share Posted May 3, 2021 34 minutes ago, untzkatz said: Doesn’t pay great but I am considering just working at like a cancer center doing imaging data analysis If this is a job opening available, I don't understand why you're not doing it. I have mentioned data analyst jobs multiple times, which are essentially intro-level jumping pads to data science positions if you don't currently have the skills for that. From everything you're saying here, it doesn't sound like you would enjoy a PhD very much and I don't think it would help you find a career you like. You are never going to find a position where you just run linear regressions 8 hours a day. Even at the best job, most of your time is going to be wrangling data and translating findings (because that's why they are paying you). If you have an MS in biostatistics, you do not need to go back to school to do what you're looking for. You need to do some soul-searching and read a lot of job descriptions to think about what you can tolerate doing, and then you need to apply to those jobs or teach yourself some skills so you can expand your options. StatsG0d 1 Link to comment Share on other sites More sharing options...
untzkatz Posted May 4, 2021 Author Share Posted May 4, 2021 2 minutes ago, bayessays said: If this is a job opening available, I don't understand why you're not doing it. I have mentioned data analyst jobs multiple times, which are essentially intro-level jumping pads to data science positions if you don't currently have the skills for that. From everything you're saying here, it doesn't sound like you would enjoy a PhD very much and I don't think it would help you find a career you like. You are never going to find a position where you just run linear regressions 8 hours a day. Even at the best job, most of your time is going to be wrangling data and translating findings (because that's why they are paying you). If you have an MS in biostatistics, you do not need to go back to school to do what you're looking for. You need to do some soul-searching and read a lot of job descriptions to think about what you can tolerate doing, and then you need to apply to those jobs or teach yourself some skills so you can expand your options. Data analyst traditionally is like Tableau and SQL from my understanding. That probably doesn’t have much classical nor ML analysis at all. Don’t think its necessary to do DA to go to DS coming from Biostat is it Im currently actually talking to a biostat position related to imaging data analysis though in academia, I had actually landed it last year but I chose industry due to the pay. It wasn’t directly an imaging position, but I had gone through the process and the labs I would have worked with were radiomics and stat learning ones. I decided recently to contact the main person again last week, who actually did get back to me though its looking probably like I would have to go through the process again soon but they are finding that out for me. The good thing is, they have this thing where you can even take relevant classes on the side while you do research so that could help too. Then there are some DS positions I have had phone calls for too but heard nothing back yet, so competitive. But still hoping for those. Link to comment Share on other sites More sharing options...
bayessays Posted May 4, 2021 Share Posted May 4, 2021 3 minutes ago, untzkatz said: Data analyst traditionally is like Tableau and SQL from my understanding. That probably doesn’t have much classical nor ML analysis at all. Yes, this is what most analytics people with jobs do. There are not a bunch of jobs in this world where people recreate their applied statistics homework assignments and run lack-of-fit tests for their GLMs. Even most data scientist jobs, at the end of the day, are this. If you work at a business, your job is "make numbers go up." Most data science is literally figuring out how to write sql to count things. I'm telling you as someone who was a data scientist at FAANG, your expectations of the intellectual satisfaction you will get from your job are way too high. There is not a lot of interesting statistics work to be done because most of it is either: 1) so simple because you have a lot of data (big tech) or well-designed experiments (pharma) or 2) so filled with uncertainty because it doesn't have those things that it's not useful and thus not worth it for a business. Maybe you'll be happy in an academic lab as a research assistant or something, but industry jobs with interesting statistics stuff are few and far between, and I would not bank on having one of them. StatsG0d 1 Link to comment Share on other sites More sharing options...
untzkatz Posted May 4, 2021 Author Share Posted May 4, 2021 (edited) 42 minutes ago, bayessays said: Yes, this is what most analytics people with jobs do. There are not a bunch of jobs in this world where people recreate their applied statistics homework assignments and run lack-of-fit tests for their GLMs. Even most data scientist jobs, at the end of the day, are this. If you work at a business, your job is "make numbers go up." Most data science is literally figuring out how to write sql to count things. I'm telling you as someone who was a data scientist at FAANG, your expectations of the intellectual satisfaction you will get from your job are way too high. There is not a lot of interesting statistics work to be done because most of it is either: 1) so simple because you have a lot of data (big tech) or well-designed experiments (pharma) or 2) so filled with uncertainty because it doesn't have those things that it's not useful and thus not worth it for a business. Maybe you'll be happy in an academic lab as a research assistant or something, but industry jobs with interesting statistics stuff are few and far between, and I would not bank on having one of them. Yea I get this perspective too, which brings me to the elephant in the room—Why has DS/ML/AI been so hyped up? Its certainly starting to sound like the “instagram effect” but for jobs. You see the best, most cutting edge stuff (analogous to seeing highlight reels and heavily curated/edited pics ) from the outside but the reality isn’t like that, and it gives you a skewed view. It sounds like this stuff is really more in research labs and if what you are saying is true, it actually does not pay well (since its in academia) except for the very few who are competitive to get a FAANG-like research scientist position. Or perhaps lucky to find some startup. Otherwise statistical/ML/DL algs related stuff sounds like largely a hobby for personal projects. Guess now it starts to makes sense now why they call it “work” and not “fun”. Edited May 4, 2021 by untzkatz Link to comment Share on other sites More sharing options...
trynagetby Posted May 4, 2021 Share Posted May 4, 2021 (edited) 5 hours ago, untzkatz said: Oh I see, well I did do medical imaging related biostat research in my MS. It was interdisciplinary and I got 1 applied paper in a well known MRI journal, although it was more in applied classical stats. And that is the sort of stuff I want to do, involving DL/ML and imaging data. I don’t want to do vanilla biostats stuff like survival analysis lol, even in survival nowadays people are analyzing full images and using the survival loss functions in DL. Parroting @StatsG0d point, I think you're really on the wrong forum. The people in this forum are fundamentally interested in statistical inference and probabilistic modeling. NYU DS (I have researched the department extensively, and even wrote a specific SOP for it and then I realized I wasn't a good fit after I realized how bad the SOP was) and what you seem to be interested in are more in developing computational tools that push the bounds of what is learnable. Rather than being concerned with proving consistency/convergence or statistical estimation problems they're more interested in solving problems like computational tractability, gradient zeroing, algorithmic correctness/efficiency, good representation for efficient information retrieval (See Dynamic Programming Algorithm for Chomsky Normal Form),methods for compressing neural network . Tbh for developing algorithms like EM and MCMC and even impactful NN work which is just optimization, proofs of convergence are extremely important in both fields and ya gotta be good at Analysis. You should ask around whatever CS/Bioinformatics forums are out there. But to get into programs that attack these problems, you'd need demonstrated competency in CS topics like data-structures, systems programming, analysis of algorithms, numerical analysis. With your research background , which is on the weaker side for CS, I think you'd need a good theoretical math background. If you're interested in it, I'd encourage you to apply, shoot for the stars man/gal. But if you want to do DL research, Statistics departments are not for you. On a philosophical note that I hope you feel free to ignore as I don't know your entire situation: judging from the thread, it seems like you're seeing a PhD as a silver bullet for the existential pain of working in late capitalism. Unfortunately no matter what you do (yes, even most professors who aren't Michael Jordan or tibirashiani, and definitely most grad students) 80% of your time will be spent doing menial pretty frustrating work, but you have to find the other 20% to make it worth it. And even if the actual job all sucks there's almost always a silver lining in a job if you have masters (pay which you mentioned, work life balance etc..). If your job is super interesting, it's probably going to have bad work-life balance and the contrapositive is also true. Having a lot of life suck is just unfortunately part of life and being happy is an explicit effort you have to make. Not to say, you shouldn't try to change, but just having a PhD won't make things better, worse, harder, easier. It won't make you smarter or dumber, it'll just make things different. Seeing things like NYU DS PhD is an attractive solution because it seems so simple, do X get Y. But life doesn't work like that and having a PhD creates a whole host of new problems that you might not be happy dealing with if your primary motivation for a PhD is just that you hate your current job. For context, I work as a datascientist at a Fortune 100 financial services company, and I hate it so much. Everyday when I wake up I curse Bill Gates for spawning Excel/Powerpoint from the 10th circle of hell. I have to use incredible amounts of MBA jargon, but the second I use the words "conditional on" the MBAs lose their minds. I can say with confidence that my job is probably worse than yours. The job tortured my very soul for a while, until I saw the finale of the office while slacking off from work: I realized that although my entire job sucks, I have the work life balance to spend more time with my family, my aging dog, my girlfriend. I've gotten pretty decent at classical guitar and picked up a bunch of other stupid hobbies (e.g latte art and fishing). I realized that when I'm a graduate student drowning in qualifying exams and research, I'll definitely miss this job that I currently hate. Sorry, this probably wasn't helpful, but I just want to warn that a PhD shouldn't be viewed as a solution to a problem. It's a luxury and a privileged that you should deeply want. Edited May 4, 2021 by trynagetby Counterfactual, StatsG0d, bayessays and 1 other 3 1 Link to comment Share on other sites More sharing options...
StatsG0d Posted May 4, 2021 Share Posted May 4, 2021 I did an internship in front office quantitative finance and ultimately I realized the position was really a glorified software programmer. Sure, they used statistics (although I was once told to “do data science”), but the job was really about developing software to create nice plots. At the end of the day, whatever you do as a data scientist / quant has to be explained to a larger audience who will know very little if anything about statistics / ML All this to say: you can’t judge a job by the title / job description. And I’ll echo the others and say the vast minority of my time is actually spent coming up with new methods. Most of it is running simulations or programming or finding the perfect way to texify a data frame. Link to comment Share on other sites More sharing options...
bayessays Posted May 4, 2021 Share Posted May 4, 2021 Couldn't agree more with everything you said @trynagetby Link to comment Share on other sites More sharing options...
untzkatz Posted May 4, 2021 Author Share Posted May 4, 2021 36 minutes ago, trynagetby said: Parroting @StatsG0d point, I think you're really on the wrong forum. The people in this forum are fundamentally interested in statistical inference and probabilistic modeling. NYU DS (I have researched the department extensively, and even wrote a specific SOP for it and then I realized I wasn't a good fit after I realized how bad the SOP was) and what you seem to be interested in are more in developing computational tools that push the bounds of what is learnable. Rather than being concerned with proving consistency/convergence or statistical estimation problems they're more interested in solving problems like computational tractability, gradient zeroing, algorithmic correctness/efficiency, good representation for efficient information retrieval (See Dynamic Programming Algorithm for Chomsky Normal Form),methods for compressing neural network . Tbh for developing algorithms like EM and MCMC and even impactful NN work which is just optimization, proofs of convergence are extremely important in both fields and ya gotta be good at Analysis. You should ask around whatever CS/Bioinformatics forums are out there. But to get into programs that attack these problems, you'd need demonstrated competency in CS topics like data-structures, systems programming, analysis of algorithms, numerical analysis. With your research background , which is on the weaker side for CS, I think you'd need a good theoretical math background. If you're interested in it, I'd encourage you to apply, shoot for the stars man/gal. But if you want to do DL research, Statistics departments are not for you. Sorry, this probably wasn't helpful, but I just want to warn that a PhD shouldn't be viewed as a solution to a problem. It's a luxury and a privileged that you should deeply want. I see, yea I am not interested in overall CS though. I feel like I only like this narrow ML/DL area and to me it seemed like stats. So seeing that NYU DS you can more or less just focus on that area looks appealing. They do a lot of MRI research in the biomedical track too, which seems to be more applied statistics based than CS. I do agree more math background would help but I was in a different biomedical field in my undergrad, so can’t do much now. I could potentially sign up for one of either Real Analysis or Data Structures&Algs for the summer though, which I am considering. Honestly I never really thought I would like ML/DL back in undergrad because I didn’t really know what it was and thought it was some insane CS thing but 2 stat ML (on supervised+unsupervised learning) classes in my MS I was wowed and I got convinced that ML is stats. And computational statistics (which had a bit of numerical analysis, but mostly matrix decomps/GD/MCMC/EM) had some as well. I also had a signal processing (special topics) stats course I really liked on FFTs, which was actually invented by Tukey, and I liked that too. Time series as well but my TS elective course was undergrad level. There was much less asymptotic stuff in these areas and it was more like “show gradient descent on convex functions converges” which is more optimization. So it seems weird to me that all this stuff isn’t considered statistics, perhaps I went to a more modern department after all. Deep Learning also we had a little bit on it from the GLM/GAM perspective. And I always liked GLMs and regression, so seeing that ML and DL boiled down to that got me interested in it. And regularization too, like how people nowadays are incorporating different kinds of penalties for domain specific problems is interesting. The whole double descent thing in DL to me seems like statistics, at least the way Dr Witten explained it with GAMs and regularization. VAEs for example seem to be heavily statistical would fall under probabilistic modeling of multivariate distributions, letting you generate new data based on the latent space. I don’t think asymptotics, inference, and p values necessarily define the field. 52 minutes ago, trynagetby said: For context, I work as a datascientist at a Fortune 100 financial services company, and I hate it so much. Everyday when I wake up I curse Bill Gates for spawning Excel/Powerpoint from the 10th circle of hell. I have to use incredible amounts of MBA jargon, but the second I use the words "conditional on" the MBAs lose their minds. I can say with confidence that my job is probably worse than yours. The job tortured my very soul for a while, until I saw the finale of the office while slacking off from work. I realized that although my entire job sucks, I have the work life balance to spend more time with my family, my aging dog, my girlfriend. I've gotten pretty decent at classical guitar and picked up a bunch of other stupid hobbies (e.g latte art and fishing). I realized that when I'm a graduate student drowning in qualifying exams and research, I'll definitely miss this job that I currently hate. Sorry, this probably wasn't helpful, but I just want to warn that a PhD shouldn't be viewed as a solution to a problem. It's a luxury and a privileged that you should deeply want. I see, yea it seems to be a great idea but maybe I need to think about it more. It is true, as of now work-life balance is not a problem despite the job itself being a bore. Maybe things could get better with a new job+ end of covid lockdowns. That could be a contributor too. I hate remote work and I hate how companies are also pushing to be remote more and more. It makes you feel mostly like a corporate slave imo and long term this model is not going to work. There is no social aspect/culture and it makes the boring stuff more boring. As my first job ever, I have hated that (maybe id have felt differently otherwise). I could also consider joining some DS/ML meetup groups irl just to satisfy the intellectual curiosity aspect, as it sounds like jobs will not have as much of this (especially without a PhD). And also do other hobbies. I will admit, part of why I wanna do a PhD is delay this chapter of life lol. I did like the school research environment more. Its not only because of jobs. Link to comment Share on other sites More sharing options...
statsguy Posted May 4, 2021 Share Posted May 4, 2021 (edited) Studying data science on the side should be a great way to satisfy you. Pre-covid there was a Python and ML group I'd attend occasionally if the talks looked interesting or there was a guest speaker that I wanted to meet. In fact, why not start up an R or Python package? You can start slowly, and contribute a piece at a time in your spare time. There are tons of cutting-edge tools that only exist as fragmented C++ code, if that... you'd be doing the world a great service, and you'd have something to put on your CV. I agree with those that say at the end of the day, it's just a job. You're in biotech with an MS, that means you're probably making a minimum of $80k/year working 40 hours a week doing easy problems. Paid vacation, 401k match, health insurance as well... This is based on what I was seeing 8+ years ago. Assistant professors at the top-15 program where I graduated from started at $78k/year when I graduated some years back, and they were easily working 60-70 hours/week in their quest for tenure. They were incredibly stressed and not all of their time was spent doing cutting edge stuff. They had to teach, serve on committees, referee papers, advise undergrads and MS students... I only published a few formal academic papers and I found the process to be a grind. Only about 20-30% of the time was spent on actually tinkering around and developing the methods. The rest of the time was spent on reading references, writing a lot, waiting for simulations to run, debugging simulations, responding to referee reports... not that fun IMO unless you're a professor with tons of grad students to do the grunt work for you. I now work at middle management at a manufacturing conglomerate and oversee a lot of statisticians and engineers doing applied statistics (design of experiments, statistical process control, process optimization, etc.). I have tons of time to spend with my kids and wife. I have time to train for half and full marathons. Surf Reddit. Read and post on several forums. Financially I never thought I'd be where I am now some 10-ish years after graduating. We bought our house in the midwest a few years back and are on track to pay it off next year. I'm perfectly okay with my job occasionally sucking (e.g. 4-hour Zoom meetings) because life is great outside of 8-4:30PM. Edited May 4, 2021 by statsguy Link to comment Share on other sites More sharing options...
trynagetby Posted May 4, 2021 Share Posted May 4, 2021 (edited) 13 hours ago, untzkatz said: I see, yea I am not interested in overall CS though. I feel like I only like this narrow ML/DL area and to me it seemed like stats. So seeing that NYU DS you can more or less just focus on that area looks appealing. They do a lot of MRI research in the biomedical track too, which seems to be more applied statistics based than CS. I do agree more math background would help but I was in a different biomedical field in my undergrad, so can’t do much now. I could potentially sign up for one of either Real Analysis or Data Structures&Algs for the summer though, which I am considering. Both would be important especially for NYU datascience, especially a formal AoA class (Dynamic programming, Graph Algorithms, basic NP-Completeness proofs) because professors need to know that you can reason rigorously about the complexity and correctness of novel algorithms. 13 hours ago, untzkatz said: Honestly I never really thought I would like ML/DL back in undergrad because I didn’t really know what it was and thought it was some insane CS thing but 2 stat ML (on supervised+unsupervised learning) classes in my MS I was wowed and I got convinced that ML is stats. And computational statistics (which had a bit of numerical analysis, but mostly matrix decomps/GD/MCMC/EM) had some as well. I also had a signal processing (special topics) stats course I really liked on FFTs, which was actually invented by Tukey, and I liked that too. Time series as well but my TS elective course was undergrad level. There was much less asymptotic stuff in these areas and it was more like “show gradient descent on convex functions converges” which is more optimization. So it seems weird to me that all this stuff isn’t considered statistics, perhaps I went to a more modern department after all. Deep Learning also we had a little bit on it from the GLM/GAM perspective. And I always liked GLMs and regression, so seeing that ML and DL boiled down to that got me interested in it. And regularization too, like how people nowadays are incorporating different kinds of penalties for domain specific problems is interesting. The whole double descent thing in DL to me seems like statistics, at least the way Dr Witten explained it with GAMs and regularization. VAEs for example seem to be heavily statistical would fall under probabilistic modeling of multivariate distributions, letting you generate new data based on the latent space. I don’t think asymptotics, inference, and p values necessarily define the field. It seems like you're more interested in developing bioinformatic pipelines that use ML techniques than developing specific ML methods (at least in statistical sense). In modern statistics, optimization and modern techniques are important, but ultimately you have to prove that these techniques give good inference. Focus on modeling, inference and estimation are what differentiates Stats from other fields. While there are people in Stats who do the work with NN/VAEs/GAMs that you're talking about, they're the exception to the rule. People like Liam Paninski and John Cunningham from Columbia who do that type of stuff in statistics departments do so mainly in service to a field like neuroscience and I'm not really sure why their primary appointment is in the statistics department. I think you should seriously think about a PhD in like EE/Bioinformatics/Bioengineering/Computational Neuroscience if you're really not interested in inference and estimation. Idk what department you went to, but Stats PhD departments from Stanford to Berkley to Uchicago to Duke to Washington all require students to take heavy course load in classical inference (linear models, hypothesis testing, ANOVA) and asymptotic statistics. Albeit, the top departments like Stanford, CMU, Chicago, and Berkely put a very high dimensional spin on this, but its fundamentally the same goal. Correct me if I'm wrong, but I think all those departments are fairly modern. If every department is teaching these courses, I would think that they're fundamental to statistics. Lower ranked programs are even more focused on "classical statistics". If you attend these programs without interest in inference, you'll at least be bored for a year. Also take my rambling with a grain of salt, I'm an incoming grad student and I don't even hold a masters. Edited May 4, 2021 by trynagetby Link to comment Share on other sites More sharing options...
bayessays Posted May 4, 2021 Share Posted May 4, 2021 7 minutes ago, statsguy said: You're in biotech with an MS, that means you're probably making a minimum of $80k/year working 40 hours a week doing easy problems. Paid vacation, 401k match, health insurance as well... This is based on what I was seeing 8+ years ago. Assistant professors at the top-15 program where I graduated from started at $78k/year when I graduated some years back, and they were easily working 60-70 hours/week in their quest for tenure. I think this is a key point -- if you really love statistics/data science so much that you want to do it as a hobby, do it all of that time that you're saving by *not* being in school. Take the extra money you've saved up from your job vs. being a PhD student and take a few months off completely and learn some new stuff full-time. There is literally no difference between this and what you're going to be doing in graduate school besides a mindset. Especially in this past year, the number of free online courses has skyrocketed. I think if presented this way, going back to school loses its charm for a lot of people. It is one thing to go back to school because you need the credential, or you have the resources where the money doesn't matter, but you can do research and learn anything you want about statistics and data science for free on the internet, more than enough to occupy you for a lifetime. I understand where @untzkatz is coming from in that it is a culture shock to be doing boring stuff at work all day when you enjoyed the subject during school. But I don't think going back to relive the glory days is the most productive path for most people. Although some people (me and trynagetby just in this thread) found the trade-off personally worth it. Link to comment Share on other sites More sharing options...
untzkatz Posted May 4, 2021 Author Share Posted May 4, 2021 1 hour ago, bayessays said: I think this is a key point -- if you really love statistics/data science so much that you want to do it as a hobby, do it all of that time that you're saving by *not* being in school. Take the extra money you've saved up from your job vs. being a PhD student and take a few months off completely and learn some new stuff full-time. There is literally no difference between this and what you're going to be doing in graduate school besides a mindset. Especially in this past year, the number of free online courses has skyrocketed. I think if presented this way, going back to school loses its charm for a lot of people. It is one thing to go back to school because you need the credential, or you have the resources where the money doesn't matter, but you can do research and learn anything you want about statistics and data science for free on the internet, more than enough to occupy you for a lifetime. I understand where @untzkatz is coming from in that it is a culture shock to be doing boring stuff at work all day when you enjoyed the subject during school. But I don't think going back to relive the glory days is the most productive path for most people. Although some people (me and trynagetby just in this thread) found the trade-off personally worth it. So you think that it's not worth doing a PhD in a DS/related field in order to eventually go for one of those more stat/ML modeling based jobs (despite how scarce they seem to be)? A lot of these, like for example the Harnham one posted earlier, seem to require a PhD. Would you say apply for that stuff even with an MS and try to demonstrate that you can do it on the resume/Github & interview? I understand data wrangling is the 80% of data related work, but still I'd like to get away from the regulatory writing aspects primarily. I'm not complaining about data wrangling as much as the biostatistics regulatory stuff. In that sense, it sounds like a DS job, even in biotech, with less of the regulatory writing being given to me (since Biostatisticians are given this) could be a better fit. Here is a more example of something I would eventually like to get into: https://verily.com/roles/job/?job_id=2059874. It's signal processing related, and looks like actually this particular one requires an MS but in CS, and otherwise it says preferred is PhD in BME/CS/App Math/related. A lot of the job posting seems to be about both classical (eg time series, linear models, multivariate analysis) and ML modeling so a good mix. It just seems like these jobs and similar ones vastly prefer PhD candidates. 1 hour ago, trynagetby said: Both would be important especially for NYU datascience, especially a formal AoA class (Dynamic programming, Graph Algorithms, basic NP-Completeness proofs) because professors need to know that you can reason rigorously about the complexity and correctness of novel algorithms. It seems like you're more interested in developing bioinformatic pipelines that use ML techniques than developing specific ML methods (at least in statistical sense). In modern statistics, optimization and modern techniques are important, but ultimately you have to prove that these techniques give good inference. Focus on modeling, inference and estimation are what differentiates Stats from other fields. While there are people in Stats who do the work with NN/VAEs/GAMs that you're talking about, they're the exception to the rule. People like Liam Paninski and John Cunningham from Columbia who do that type of stuff in statistics departments do so mainly in service to a field like neuroscience and I'm not really sure why their primary appointment is in the statistics department. I think you should seriously think about a PhD in like EE/Bioinformatics/Bioengineering/Computational Neuroscience if you're really not interested in inference and estimation. Idk what department you went to, but Stats PhD departments from Stanford to Berkley to Uchicago to Duke to Washington all require students to take heavy course load in classical inference (linear models, hypothesis testing, ANOVA) and asymptotic statistics. Albeit, the top departments like Stanford, CMU, Chicago, and Berkely put a very high dimensional spin on this, but its fundamentally the same goal. Correct me if I'm wrong, but I think all those departments are fairly modern. If every department is teaching these courses, I would think that they're fundamental to statistics. Lower ranked programs are even more focused on "classical statistics". If you attend these programs without interest in inference, you'll at least be bored for a year. Also take my rambling with a grain of salt, I'm an incoming grad student and I don't even hold a masters. I don't know if I am interested in pipelines like the software/ML engineering sense of the term, but I like both methods+applications of ML/DL. Sometimes, people develop methods for specific applications. My undergrad btw actually was in BE which you listed, although it is far too broad and I wish I did something like applied math ugrad. My Biostat program covered those inferential things at MS level in the 1st year, and 2nd year we had stuff on Survival+GLM+GLMM (these were combined with PhD students, I got an A in survival/GLMM but a B+ in GLM) as well as the electives which were the ML/signals/time series classes I mentioned (those were all As). Yea, the higher ranked programs I would expect to be more modern. I'm not too interested in doing the Fisher/Neymann Pearson inference stuff all over again though, but NYU DS inference & representation course on the graph models does look interesting as its a more modern spin on it. Didn't realize DS&Algs would be needed though, I'm not sure where things like heaps, linked lists, dynamic programming etc come up in ML at all, but I also took ML in a stat department not a CS one. Technically I know NNs for example make use of dynamic programming when using autograd to cache the gradients, but thats in internal detail you don't need to worry about when using high level programming languages/frameworks like Julia's Flux or the TF/PyTorch frameworks. The only computational complexity stuff we did was related to matrix decomps in computational stat. Between Real Analysis and an Algorithms class what would be better? I don't think I can take both. Link to comment Share on other sites More sharing options...
bayessays Posted May 4, 2021 Share Posted May 4, 2021 20 minutes ago, untzkatz said: requires an MS but in CS No, it requires an MS in CS *or a related technical field*. You have one of these. You are too hung up on degree titles. It says you need to learn Python. Do that. 21 minutes ago, untzkatz said: In that sense, it sounds like a DS job, even in biotech, with less of the regulatory writing being given to me (since Biostatisticians are given this) could be a better fit. Yes. You need to leave pharma, that is clear. Get a tech job and you won't be doing that. 21 minutes ago, untzkatz said: Here is a more example of something I would eventually like to get into: https://verily.com/roles/job/?job_id=2059874. I guarantee you this job is not as cool as you think it is. You can do the same stuff at any tech company. The Verily job is not going to be much more interesting than the job you would get at an insurance company, a start-up, a financial company, etc. A million people with PhDs are going to apply for that job and most of them, including brilliant people, are going to be rejected and not get through the interviews. Banking on a specific type of job like this working on wearable sensors is setting yourself up for disappointment. Teach yourself Python and get any intro level data analysis job at a tech company where you use Python and SQL every day. In a year or two you can get a promotion to data scientist and then you'll have the work experience and be able to branch out more. Link to comment Share on other sites More sharing options...
trynagetby Posted May 4, 2021 Share Posted May 4, 2021 On 5/4/2021 at 2:47 PM, untzkatz said: I don't know if I am interested in pipelines like the software/ML engineering sense of the term, but I like both methods+applications of ML/DL. Sometimes, people develop methods for specific applications. My undergrad btw actually was in BE which you listed, although it is far too broad and I wish I did something like applied math ugrad. My Biostat program covered those inferential things at MS level in the 1st year, and 2nd year we had stuff on Survival+GLM+GLMM (these were combined with PhD students, I got an A in survival/GLMM but a B+ in GLM) as well as the electives which were the ML/signals/time series classes I mentioned (those were all As). Yea, the higher ranked programs I would expect to be more modern. I'm not too interested in doing the Fisher/Neymann Pearson inference stuff all over again though, but NYU DS inference & representation course on the graph models does look interesting as its a more modern spin on it. Didn't realize DS&Algs would be needed though, I'm not sure where things like heaps, linked lists, dynamic programming etc come up in ML at all, but I also took ML in a stat department not a CS one. Technically I know NNs for example make use of dynamic programming when using autograd to cache the gradients, but thats in internal detail you don't need to worry about when using high level programming languages/frameworks like Julia's Flux or the TF/PyTorch frameworks. The only computational complexity stuff we did was related to matrix decomps in computational stat. Between Real Analysis and an Algorithms class what would be better? I don't think I can take both. Idk man, if you think you can get what you want out of a Statistics PhD then go for it. But it really doesn't sound like you'd enjoy the curriculum or research focus. Im quoting the first line out of the syllabus for the last course in Stanfords Statistical Inference Sequence: Testing problems in high dimensions: sparse alternatives (needle in a haystack) and nonsparse alternatives, Bonferroni's method, Fisher's test, ANOVA, higher criticism. Even CMU which is really MLey out of all the statistics departments requires to review topics like simple linear regression, ordinary least squares and weighted least squares, the geometry of least squares, quadratic forms, F tests and ANOVA tables, interval estimation, minimax theory, hypothesis testing, data reduction, convergence concepts You are interested in ML, but maybe not from a statistical perspective. Statisticians do all the things you're talking about but you absolutely have to prove inferential properties and understanding the basic foundations of hypothesis testing is necessary. Honestly, you should check out programs like https://bioinformatics.gatech.edu/ through the ISYE (read:OR). I think the problem here is "using high level programming languages/frameworks like Julia's Flux or the TF/PyTorch frameworks." When you're doing academic research you can't be constrained to pre-packaged stuff that everyone has access to. You have to do something novel and new to data which no one has before. That will inevitably involve implementing something from scratch. For example in my computational neuroscience research for a statistics prof at a top school, I once had to find the cluster of vectors lying on a sphere that maximized the sum of projection onto them by a given vector with certain contraints. How do you go about this the fastest, what data structure do you use, can you approximate? DS professors will want to know you have the tools to think about this. It's difficult to say what you should take. Analysis of Algorithms will be useful for jobs and might be enough to get your foot in the door some CS/OR places. Analysis I will be the only way you get into decent Statistics/Bio-statistics programs. Honestly I think you need to read dissertations from places like UWashington/Harvard/JHU biostat and really make sure you're not interested. You seem to be really hung up on hypothesis testing and asymptotics being boring when the concepts are kinda the core of Statistics. Link to comment Share on other sites More sharing options...
untzkatz Posted May 4, 2021 Author Share Posted May 4, 2021 On 5/4/2021 at 3:13 PM, bayessays said: No, it requires an MS in CS *or a related technical field*. You have one of these. You are too hung up on degree titles. It says you need to learn Python. Do that. Yes. You need to leave pharma, that is clear. Get a tech job and you won't be doing that. I guarantee you this job is not as cool as you think it is. You can do the same stuff at any tech company. The Verily job is not going to be much more interesting than the job you would get at an insurance company, a start-up, a financial company, etc. A million people with PhDs are going to apply for that job and most of them, including brilliant people, are going to be rejected and not get through the interviews. Banking on a specific type of job like this working on wearable sensors is setting yourself up for disappointment. Teach yourself Python and get any intro level data analysis job at a tech company where you use Python and SQL every day. In a year or two you can get a promotion to data scientist and then you'll have the work experience and be able to branch out more. Well I know Python so it would probably be improving SQL. Are you basically saying that such job descriptions look like they have lots of cool modeling, but that reality is not the case and it just seems that way on the outside? I keep hearing that for statistical/ML modeling jobs these days you need a PhD, and even still it'll be competitive as you said. Modeling 20% of the time isn't too bad, but I'm afraid data analyst it'll be like <5% of the time and most you will be doing is basic univariate summary stats and visualizations. On 5/4/2021 at 4:29 PM, trynagetby said: Idk man, if you think you can get what you want out of a Statistics PhD then go for it. But it really doesn't sound like you'd enjoy the curriculum or research focus. Im quoting the first line out of the syllabus for the last course in Stanfords Statistical Inference Sequence: Testing problems in high dimensions: sparse alternatives (needle in a haystack) and nonsparse alternatives, Bonferroni's method, Fisher's test, ANOVA, higher criticism. Even CMU which is really MLey out of all the statistics departments requires to review topics like simple linear regression, ordinary least squares and weighted least squares, the geometry of least squares, quadratic forms, F tests and ANOVA tables, interval estimation, minimax theory, hypothesis testing, data reduction, convergence concepts You are interested in ML, but maybe not from a statistical perspective. Statisticians do all the things you're talking about but you absolutely have to prove inferential properties and understanding the basic foundations of hypothesis testing is necessary. Honestly, you should check out programs like https://bioinformatics.gatech.edu/ through the ISYE (read:OR). I think the problem here is "using high level programming languages/frameworks like Julia's Flux or the TF/PyTorch frameworks." When you're doing academic research you can't be constrained to pre-packaged stuff that everyone has access to. You have to do something novel and new to data which no one has before. That will inevitably involve implementing something from scratch. For example in my computational neuroscience research for a statistics prof at a top school, I once had to find the cluster of vectors lying on a sphere that maximized the sum of projection onto them by a given vector with certain contraints. How do you go about this the fastest, what data structure do you use, can you approximate? DS professors will want to know you have the tools to think about this. It's difficult to say what you should take. Analysis of Algorithms will be useful for jobs and might be enough to get your foot in the door some CS/OR places. Analysis I will be the only way you get into decent Statistics/Bio-statistics programs. Honestly I think you need to read dissertations from places like UWashington/Harvard/JHU biostat and really make sure you're not interested. You seem to be really hung up on hypothesis testing and asymptotics being boring when the concepts are kinda the core of Statistics. Sounds like what you are getting at is that the coursework in statistics/biostatistics departments is heavily foundational classical stats, but the research does more modern things and combines it with the inferential aspects? Whereas NYU DS for example seems to go right into the statistical ML/DL and bayesian network type stuff. I would be interested in Comp Neuro too since you brought that up. Did you do that stuff in undergrad-it seems very advanced for undergrad level. I agree the good thing about an Algorithms course is that even if I don't do a PhD, it can still help to get through interviews at tech companies since that stuff is tested in Leetcode and so on. And still improves general programming skills beyond just numerical computation. That is why I have been leaning towards doing it. As far as the frameworks though, I'm pretty sure most PhD students doing DL are using PyTorch and so aren't implementing various data structures or autograd from scratch. I've seen arxiv github code and it still often follows the formulaic subclassing nn.Module to make a layer, then having __init__ and forward() and so on. And making a Dataset class and Dataloader. Would you say something like UCSD EE with DS/ML may also be good? https://www.ece.ucsd.edu/index.php/faculty-research/ece-research-areas/machine-learning-data-science-impacted. Seems like they do stat learning and DL there too. Link to comment Share on other sites More sharing options...
bayessays Posted May 4, 2021 Share Posted May 4, 2021 53 minutes ago, untzkatz said: Are you basically saying that such job descriptions look like they have lots of cool modeling, but that reality is not the case and it just seems that way on the outside? You'll probably spend most of your time cleaning data, figuring out how to do some really boring data pipeline stuff, or being confused as to what you are supposed to do. Every job has its flaws. See this for why almost everyone who works at Verily has left: https://www.statnews.com/2016/03/28/google-life-sciences-exodus/ Link to comment Share on other sites More sharing options...
trynagetby Posted May 5, 2021 Share Posted May 5, 2021 2 hours ago, untzkatz said: Sounds like what you are getting at is that the coursework in statistics/biostatistics departments is heavily foundational classical stats, but the research does more modern things and combines it with the inferential aspects? Whereas NYU DS for example seems to go right into the statistical ML/DL and bayesian network type stuff. I would be interested in Comp Neuro too since you brought that up. Did you do that stuff in undergrad-it seems very advanced for undergrad level. I agree the good thing about an Algorithms course is that even if I don't do a PhD, it can still help to get through interviews at tech companies since that stuff is tested in Leetcode and so on. And still improves general programming skills beyond just numerical computation. That is why I have been leaning towards doing it. As far as the frameworks though, I'm pretty sure most PhD students doing DL are using PyTorch and so aren't implementing various data structures or autograd from scratch. I've seen arxiv github code and it still often follows the formulaic subclassing nn.Module to make a layer, then having __init__ and forward() and so on. And making a Dataset class and Dataloader. I'd recommend watching this NIPs lecture from Robert Tibirishani https://www.microsoft.com/en-us/research/video/invited-talk-post-selection-inference-for-forward-stepwise-regression-lasso-and-other-procedures/?from=http%3A%2F%2Fresearch.microsoft.com%2Fapps%2Fvideo%2F%3Fid%3D259617 to get an idea of the type of problem that Statisticians are interested in and the inference/prediction ML distinction. It just seems you're not interested in statistical inference. There's nothing outdated about inference. It's just a goal that the statistics field is interested in, and the tools to achieve that tool are always evolving. I was a fairly mathematically/computationally mature undergraduate working with a super nice prof who works with undegrads. That type of program does seem well suited to your interests. But UCSD ECE is an extremely hard department to get into (as well as as NYU DS). You should still apply but hedge your bets with applications to bioinformatics/OR/BME programs as well. Link to comment Share on other sites More sharing options...
untzkatz Posted May 5, 2021 Author Share Posted May 5, 2021 (edited) 4 hours ago, bayessays said: You'll probably spend most of your time cleaning data, figuring out how to do some really boring data pipeline stuff, or being confused as to what you are supposed to do. Every job has its flaws. See this for why almost everyone who works at Verily has left: https://www.statnews.com/2016/03/28/google-life-sciences-exodus/ Damn, so the job description really is deceiving. I guess not too surprising considering how hyped up DS is. In regards to transitioning out of biotech, one issue is my undergrad was in BE. So compared to perhaps other Biostat people who came from stat/math, I feel a bit more "holed into" this industry. All of these Bio-X fields seem to suffer from this. Like my whole resume is projects that are biomedical stats related, even research experience I got in grad school was doing stats for a lab in BE (hence the work with imaging data). Though this issue is kind of common to any Bio-X field. I actually like the biomedical stuff but in industry there can be too much red-tape Some good news (fingers crossed) is recently I had a non-clinical Biostat interview for a biotech company (some recruiter had referred me, I hadn't applied, but I said well I'll just see what its about), but I made it clear to the team I didn't like the regulatory writing stuff but wanted algorithms. One of the interviewers actually was on the algs team and I was able to answer the ML questions (they asked me weird stuff like what if data isn't labeled what would you do, and is accuracy always a good metric), and got an internal referral to interviewing for DS on the algs team (they themselves said I would be a better fit for this, and they want someone who is familiar with the stats aspects of it) where they do seem to do predictive modeling. Hopefully I get lucky to pass it. Edited May 5, 2021 by untzkatz Link to comment Share on other sites More sharing options...
Recommended Posts
Create an account or sign in to comment
You need to be a member in order to leave a comment
Create an account
Sign up for a new account in our community. It's easy!
Register a new accountSign in
Already have an account? Sign in here.
Sign In Now