Jump to content

Coding


Wozezeka

Recommended Posts

This is a bit out of place but I figured y'all would have good advice!

 

I'm starting my masters in public affairs this fall and I am interested in learning to code over the summer. An MPA involves analyzing data and I've heard that R and Python are good for statistics. What recommendations would you have for what code to learn?

Link to comment
Share on other sites

I hear Matlab has good support for statistics, although I haven't used it myself. Whatever language you use though, make sure you have a good environment / easy-to-use graphical interface to develop in (they are called IDEs). You can have the best language for statistical analysis, but if you have to learn how to code via the command line you'll be having a whole other host of issues that will slow you down

Link to comment
Share on other sites

I would recommend R or python, but I am not a computer scientist. I do a lot of Bayesian statistical analysis in my research and I love python. My former officemate swears by R.

 

I think these two languages are the best because:

1) They are free. While my school provides me with tons of MATLAB licenses, I can't always count on this in postdocs or future work. I want to be developing code during grad school that I can use forever.

2) They are modern and continually updated, especially with statistical packages!

Link to comment
Share on other sites

It seems you haven't coded before, so I would suggest you start with Python, it is a very strong language, also it is easy to learn.

After a while you can learn R and MATLAB too, I don't know so much about R but I know MATLAB is amazingly strong in every mathematical field.

Link to comment
Share on other sites

Thanks!! I will look in to python. Any suggestions to gain background understanding about how all this works? I learned some HTML in high school and college but that was a while back.

Link to comment
Share on other sites

If this is your first coding language (sounds like it since HTML is not really a coding language and it was awhile ago!) and you have some spare money, I would strongly recommend buying a textbook and learning from it. I think structured learning (whether it's a class or self-taught from a book) is important for the first programming language because then you are introduced to all the important concepts in programming (logic, loops, variable types, objects, etc.). Again, I'm speaking as a non-computer scientist so my focus is more on the applications of coding so maybe a CS student will say something different.

 

I would recommend that you look up your school's introductory programming course to see if they offer one in Python and then follow their assignments and textbook. Or, you can take an online coursera course. There are also a lot of tutorials online too, but you should try to find one that is meant for people learning Python and how to program, not just people who already know other languages and just need to learn Python.

Link to comment
Share on other sites

It depends on what kind of programming you want to do.

 

Python and R are fine languages, but neither of them are really good for numerical analysis which a lot of advanced statistics require.  They can do these things, but are incredibly slow.  It really depends on how much data you are working with.

 

I think python is a much better language than R. Python is a lower level language than R (meaning it has less built in packages) which will force you to learn responsible programming practices. That being said, both are incredibly slow.

 

For example, I wrote a code in 2010 in python (at the time the only language I knew well) that took 40 minutes  file to process. I had to process ~ 270 files, so it took a while, but still faster than excel because I could prepare 1 file while the other was processing. in 2013 I rewrote it in matlab, and it takes 6 minutes to process. The difference between matlab vs R and Python is that matlab has had decades of optimization behind it. R and Python, as scientific tools, will probably catch up, and they are a lot closer than most people think, but when working with hundreds of thousands of points of data and more both Python and R are not really equipped to do these things efficiently. 

 

But even matlab has its limitations. if you are working with millions of data points, Matlab will take forever.  A lower level language like C or C++ will be necessary. 

 

This might be more information than you want, but I think it is important to think about what kind of researcher you want to be, what kinds of things you could see yourself working on, and learn the right tools for that.

 

I personally think matlab is the most flexible option out there, It lets you work efficiently with reasonable size data but it is just as easy to learn as Python or R.  The upside of Python is it is more similar to C++ in coding practices than Matlab or R, and the upside of R is that it is free and is as easy to learn as matlab.  And thats really why people use R. Its free, reasonably powerful, and much easier to learn than a traditional programming language. Matlab, minus the free part, was what R was 20-30 years ago.  It was the nonscientists alternative to fortran.

Link to comment
Share on other sites

Indeed, python is a lot slower. Luckily, I don't run that many things that rely on computational speed, except for very long Markov Chain Monte Carlo runs, but I just let those run overnight or over a weekend. Also, a down side of python is that it is fairly easily to accidentally do something very slow when there's a much faster way. I try to estimate how long the code will take to run and determine if it's worth optimizing. There has been a few cases where the original way I wrote it would have taken 2 months to run, but through some logic changes, it went down to 2 hours. 

 

Python has a ton of Bayesian statistical modules built in now--"emcee" is a great MCMC package that I and others in my field use all the time! I also like being able to use "Quantity" objects that have units built into them. Astronomy uses a lot of different, non-SI units, so this way, I can just store quantities in their natural units, do the math, and then convert to whatever standard unit in the end. I'm sure this functionality is possible in all the languages, but there are packages built by astronomers for astronomers in python (prior to python, there is/was an extensive library of IDL routines--it seemed like astronomers chose IDL over MATLAB decades ago, for some reason). 

 

The point of that long story is that you also want to learn about what your field has used in the past, and what the popular trends are right now, because you want to stick with software that will be supported and used by others in your field. Also, if you want others to use your code then you want to stay mainstream! Right now, in my field, python is the popular choice (IDL is favoured by older scientists still but python is gaining more and more each year).

Link to comment
Share on other sites

It depends on what kind of programming you want to do.

 

Python and R are fine languages, but neither of them are really good for numerical analysis which a lot of advanced statistics require.  They can do these things, but are incredibly slow.  It really depends on how much data you are working with.

 

I think python is a much better language than R. Python is a lower level language than R (meaning it has less built in packages) which will force you to learn responsible programming practices. That being said, both are incredibly slow.

 

For example, I wrote a code in 2010 in python (at the time the only language I knew well) that took 40 minutes  file to process. I had to process ~ 270 files, so it took a while, but still faster than excel because I could prepare 1 file while the other was processing. in 2013 I rewrote it in matlab, and it takes 6 minutes to process. The difference between matlab vs R and Python is that matlab has had decades of optimization behind it. R and Python, as scientific tools, will probably catch up, and they are a lot closer than most people think, but when working with hundreds of thousands of points of data and more both Python and R are not really equipped to do these things efficiently. 

 

But even matlab has its limitations. if you are working with millions of data points, Matlab will take forever.  A lower level language like C or C++ will be necessary. 

 

This might be more information than you want, but I think it is important to think about what kind of researcher you want to be, what kinds of things you could see yourself working on, and learn the right tools for that.

 

I personally think matlab is the most flexible option out there, It lets you work efficiently with reasonable size data but it is just as easy to learn as Python or R.  The upside of Python is it is more similar to C++ in coding practices than Matlab or R, and the upside of R is that it is free and is as easy to learn as matlab.  And thats really why people use R. Its free, reasonably powerful, and much easier to learn than a traditional programming language. Matlab, minus the free part, was what R was 20-30 years ago.  It was the nonscientists alternative to fortran.

 

I should say I completely agree with this one. 

 

I think the upside for Python is that its easier to learn and it is actually a programming language ( you can use it for any sort of programming ) while MATLAB just provides you with mathematical purposes.

 

For your question on where to start, I do not know the best ways out there, but I think if you want to learn Python, the best way is to buy a book while I think you can learn MATLAB even with watching online videos/courses (as it is just for mathematical purposes, it does not have so many different parts), just my opinion, some may find other ways more helpful :)

 

Also, HTML is not really a coding language, so you should prepare yourself for a completely different approach.

Link to comment
Share on other sites

I never liked Python, I'm no expert on it though. R seems to be popular for stats stuff, as well as things like SAS or SPSS. Maple and Mathematica can also involve some programming. Not sure how useful they are for stats as opposed to math in general. Never used Matlab.

Edited by velua
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...

Important Information

This website uses cookies to ensure you get the best experience on our website. See our Privacy Policy and Terms of Use