Jump to content

SOP Feedback - Computer Science PhD


Recommended Posts

This is my first draft from CS PhD at UW. I plan to use this template across all the other colleges I apply (I know this is not kosher but I am on a time crunch). Thanks in advance for reviewing it!

 

Here is the prompt:

Submit a personal statement of ~1000 words (max 500KB) that includes: a) how you became interested in doing research, B) a relevant project or research experience that shows your technical knowledge and skill, and c) your plans for the future in computer science. You may wish to include information about what you feel are the strengths of your application, such as special interests and abilities, or give explanations for what you feel are any weaknesses in your academic record.

 

 

 

 

 

 

 

Statement of Purpose

 

My primary research objective is using Machine Learning (ML) techniques in the field of computational biology. I am currently one of the three founding engineers of 20N Labs, a Y-Combinator and Khosla investments backed, computational biotech startup founded by Dr. Christopher Anderson, associate professor of bioengineering at UC Berkeley and Saurabh Srivastava, PhD in Computer Science at University of Maryland. I believe the most important ML investments in computational biology would be in model inference, reducing the large hypothesis space and instrument-agnostic modelling techniques for large, heterogeneous biological datasets. Through my experience building out ML models and graph algorithms for untargeted metabolomics at 20N, I want to advance the state of the art in ML techniques for computational biology. I plan to continue research after my Ph.D., as a faculty in academia or as a researcher in an industrial lab.

 

1 Research

 

While I always had an interest in computer science research at Harvey Mudd College and in my early software engineering career, I was initially focused on building reliable, large scale software systems for enterprises. It was only after I decided to follow my curiosity in recent advancements in ML techniques for protein folding prediction, drug discovery and advancements in experimental techniques like CRISPR-Cas9 that I switched to a research engineer role at 20N. At 20N, I closely worked with Dr Anderson and Dr Srivastava to develop algorithms for untargeted metabolomics.

 

1.1 Peak Detection for Untargeted Metabolomics

 

20N uses Liquid Chromatography Mass Spectrometry (LCMS) instruments to detect chemicals in biological samples. The instrument produces a 3-Dimensional dataset of mass charge, retention time and Intensity values for each for these samples.

 

I developed a targeted metabolomics pipeline to analyze the LCMS output. Here we took a list of molecules we suspected were in the sample and tried to detect peak patterns in the data. Since it was not scalable for a research assistant to look for thousands of peaks in our scans, I developed a signal processing algorithm that used a peak template as a feature detector to find "peak-like" patterns in our scans. This algorithm had a linear runtime, which allowed us to scale on our massive datasets of chemicals and samples.

 

Next, I developed algorithms for the company's untargeted metabolomics pipeline. One part of this problem is to detect 3D peaks in the input dataset without any prior chemical targets to look for. Using instrument data artifacts called "Data Voids" that are found near peaks, I used k-means clustering techniques to find points of interest in the data. I then implemented a continuous wavelet transform function that could reliably detect chromatographic peaks of differing width. Using these two approaches, we developed a pipeline that could detect peak artifacts without an input chemical dataset.

 

Finally, I developed unsupervised deep learning models for chromatographic peak detection and mass spectrometry retention time prediction. Since our lab did not have large data sets of labelled data, I used autoencoders to learn the underlying structure of peaks vs noise in our LCMS data (followed by a human tagging clusters that looked like peaks) and retention time buckets based on chemical structure (followed by implementing a supervised learning algorithm trained on in-house chemical retention times).

 

1.2 Graph Network Analysis

 

20N had a repository of reaction operators, functions that operate on input substrates and transform them to their correct products. These reaction operators were curated from our understanding of biochemical reaction mechanisms. Using our peak detection algorithms, if we detected peaks from the substrates and its derivatives using our reaction operators, we have more confidence that we detected the substrate molecule.

 

I developed algorithms that used Prize Collecting Steiner Trees (PCST) to extract high-value networks from our large datasets that were analyzed using the peak detection algorithms. These high value networks gave us a better understanding of the biochemical pathways that were over/under expressed in our samples.

 

1.3 Human sample analysis

 

I conducted statistical analysis from our implemented computational methods on diseased/non-diseased human samples given by clients. I learnt to develop effective experiments to test the efficacy of our algorithms on these samples. This involved asking the right questions, researching their answers through data analysis and presenting the results in a manner that was easily understood by the team. On the way, I got better at data visualization using tools like Pandas and R.

 

2 Teaching and Mentorship

 

I TA-ed Operating Systems (CS134), taught by Prof Neil Rhodes and Data Structures and Algorithm Development (CS70), taught by Professor Melissa O'Neil. I also mentored aspiring women engineers through the Hackbright fellowship program in San Francisco.

 

4 Related Courses

 

I took Machine Learning and Algorithms Development at Harvey Mudd. For my continual education, I took certificate courses in Bioinformatics from UCSD (Bioinformatics I, II and III) and am also taking a certificate course in Biochemistry from UC Berkeley to better understand the domain space.

 

5 Conclusion

 

I am very interested in Professor Su-In Lee’s work with the ENCODE project to analyze ChiSeq data for detecting gene network perturbations in cancer. The DISCERN algorithm for comparing expression levels between diseased and wild type tissue and scoring them has a lot of similarities to the approach I took in analyzing metabolomics data between disease vs wild type tissue at 20N Labs. Since I thoroughly enjoyed researching and analyzing samples in the project, I hope to bring that enthusiasm and scientific rigor to Professor Lee’s work.

 

Professor William Nobel’s work in proteomics and mass spectrometry analysis also supports my research interests and prior field experience. Professor Nobel’s work on Percolator 3.0, a ML model for protein identification and his findings in “Tandem mass spectrum identification via cascade search” are the types of research methods I want to improve on.

 

Having spent time at Mudd and 3 years in industry, I now want to pursue a PhD as it would allow me to do more focused research. In my research, I want to advance the state of the art in ML and AI techniques being used to solve computational biology problems.

Link to comment
Share on other sites

Hello,

Your statement of purpose reads more like a research paper and not a statement of purpose. Remember, your statement of purpose should provide some background about you as a person, your likes, some of your research and a LOT about the program you are applying to and how it can help you achieve your stated goals. All I see here is a recap of your past research and interests and very little about the program and why you are specifically applying to this program. -selectiveadmissions.com

 

Link to comment
Share on other sites

  • 3 weeks later...

Hi vijay,

I really like your intro and your conclusion! However, I'd make the body part more fluid and more essay-like (e.g. when you talk about coursework, maybe you can add a couple of sentences what exactly did you learn and how it's relevant to the proposed field of study? Same thing with teaching and mentorship. How big was the class? What were your duties (did you give seminars? did you grade papers?)? What was the students' and prof's feedback on your performance in this class?

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

This website uses cookies to ensure you get the best experience on our website. See our Privacy Policy and Terms of Use