Jump to content

What kinds of programming should statisticians know?


Recommended Posts

If you are exclusively doing statistical analyses of relatively small datasets, without the need to really interface with larger systems/applications, R alone is probably fine.  For anything more, I'd recommend also learning Python since it is both extremely versatile and easy to use.  C is a very important language in the grand scheme of things, but it's not for the faint of heart.  Unless you really need to write lightning-fast code doing things in Python is fine.

 

Finally, some knowledge of databases could definitely come in handy.  And unless you have reason to do otherwise you should start with some form of database that uses SQL (mySQL, PostgreSQL, etc).  Not too hard to learn either.  

Link to comment
Share on other sites

Depends on your research interest, but I would say (in order of decreasing importance): R, MATLAB, and C++. Emphasis on the last two if you are interested in developing algorithm

Link to comment
Share on other sites

Python is great to learn, and it's not too difficult to pick up.  There are plenty of online tutorials for it.  I recommend both codingbat and codecademy, but be aware that sometimes codecademy will reject code even when it's entered completely correctly.  Also pick up numpy, scipy, and MATPLOTLIB, as those are free scientific computing addons that are free Python alternatives to MATLAB.  Python also supports Sage, which is a free symbolic calculations package that's a solid alternative to Mathematica.

Link to comment
Share on other sites

For analyzing data, R is the biggie. You'll probably see a fair amount of SAS if you are in biostatistics (more the traditional clinical trials side than genetics). Statisticians don't use Stata or SPSS, but collaborators/consulting clients in the social sciences use those heavily, so just passing familiarity is nice to have in those cases. C and MATLAB are more specialized -- I'm sure some people in my department use these all the time, but the majority basically never.

 

I don't like to do heavy data cleaning and manipulation in R, and certainly not for anything large. For general-purpose data processing, Python is useful. For any language, knowing regular expressions is very, very useful for cleaning up messy data. I definitely recommend picking up SQL for data extraction and aggegration (which you can use right in R or SAS)--some of the students who were leaving with a master's and looking for industry jobs found that they often wanted SQL skills.

 

For papers, reports, and presentations, you need to learn LaTeX. I used knitr in RStudio to integrate the TeX with R output and graphics seamlessly.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

×
×
  • Create New...

Important Information

By using this site, you agree to our Terms of Use and Privacy Policy.