Jump to content

How to prepare for research in machine learning?


Recommended Posts

Hello everyone,

I am currently an undergraduate majoring in computer science. I have an option to take 3-4 statistics classes and need some advice regarding which ones would be the most helpful towards graduate research in the areas of machine learning / data-mining. Also if it helps, within ML I'm interested in exploring decision tree learning and neural networks. The list of statistics courses available are:

Concepts in Computing with Data

Concepts of Probability

Concepts of Statistics

Stochastic Processes

Linear Modelling: Theory and Applications I

Linear Modelling: Theory and Applications II

Sampling Surveys

Introduction to Time Series

Game Theory

If there are other courses outside of statistics (say in the math department) that you find very relevant, please suggest them too. Thank you!

Link to comment
Share on other sites

Depends on your knowledge - for instance if you're comfortable with probability, you probably don't need to take Concepts of Prob. I'd take Stochastic Processes, the two linear modeling classes and sampling surveys. If I wasn't comfortable with basic prob/stats, then I'd take concepts in prob/stats and the two linear modeling classes.

Of course, if you can, you should try to take all the classes you've listed. As to math classes - definitely make sure you're very very good with linear algebra. Also if you've space, real analysis and measure theoretic probability. They won't be directly useful to pratical ML research but are helpful in giving you a conceptual framework and if you want to do some theory research in ML.

Edited by jjsakurai
Link to comment
Share on other sites

jjsakurai, thank you for your reply.

Pardon my ignorance but from what I've read about neural networks, they mainly use non-linear models. Hence would 2 whole semesters of linear modelling be relevant? I would understand if it is important to ML in general but I'm not sure. Can you also briefly explain how sampling surveys would be useful? That was actually lowest on my list because I thought it was the least related.

I am a little biased to the financial modelling applications, but graduate studies (masters) is a priority to me. Thus my draft list was stochastic processes, time series and game theory. Do you think I should trade time series and game theory for the 2 linear modelling courses? Thank you for your time :)

Link to comment
Share on other sites

Hmm...my post was intended for a potential PhD applicant. I'm not sure about Masters.

Neural nets are a very small part of ML. While big in the early 90s, they're not used that much these days. Linear Modeling is very very widely used - especially when you have a ton of data and other techniques are computationally too intensive.

The reason I suggested sampling survey's is because sampling is used everywhere in ML and knowing the theory behind it can be useful. But if you're interested in financial modeling, etc. then yeah - the time series course is probably a much better idea.

Game theory is not used at all in ML/Data mining. Even in finance, it really doesn't have any applications so I'd strongly suggest that you Don't take it.

Link to comment
Share on other sites

I have interest in pursuing a PhD though for financial reasons as an international student I would have to enter the workforce first after masters.

Your comments have been really helpful, I think I would go with stochastic processes, time series and 1 or 2 of the linear modeling courses depending on workload constraints.

Link to comment
Share on other sites

Linear modelling is very useful as a foundational course. If you're interested in neural networks, much of that community has moved into support vector machines, which are very mainstream.

Probably the best thing you could do though would be to get some experience working with real world data, and with the machine learning system's you're interested in. There are a number of repositories of free data (e.g. http://archive.ics.uci.edu/ml/). Go there, download a likely dataset, and then implement a simple back-prop neural network or the ID3 decision tree learner. You can compare your results with those generated by Weka (http://www.cs.waikato.ac.nz/ml/weka/). The theory you could learn in classes is all well and good, but unless you just want to do pure theory, the application details are going to matter more.

Link to comment
Share on other sites

Well, traditional neural nets are not the best performing but it is very much an (re)emerging field. Majority of work is in deep learning and cortical models such as hmax. I would take a deep learning course as an advanced elective if it's offered.

Edited by tkulk
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Create New...

Important Information

By using this site, you agree to our Terms of Use and Privacy Policy.