Stata Resources

Here are some Stata resources that I have found useful. Statistics with Stata by Hamilton is good for beginners although it is overpriced. For the basics I like German Rodriguez’s free Stata tutorial best, good material can also be found at UCLA’s Stata starter kit and UNC’s Stata Tutorial; two page Stata is good for getting started quickly.

Christopher Baum’s book An Introduction to Modern Econometrics using Stata is excellent and worth the price. The world is indebted to Baum for a number of Stata programs such as NBERCycles which shades in NBER recession dates on time series graphs–this was a big help in producing graphs for our textbook!–so buy Baum’s book and support a public good.

I have found it hugely useful to peruse the proceedings of Stata meetings where you can find professional guides to using Stata to do advanced econometrics. For example, here is Austin Nichols on Regression Discontinuity and related methods, Robert Guitierrez on Recent Developments in Multilevel Modeling, Colin Cameron on Panel Data Methods and David Drukker on Dynamic Panel Models.

I found A Visual Guide to Stata Graphics very useful and then I lent it to someone who never returned it. I suppose they found it very useful as well. I haven’t bought another copy, since it is fairly easy to edit graphs in the newer versions of Stata. You can probably get by with this online guide.

German Rodriguez, mentioned earlier, has an attractively presented class on generalized linear models with lots of material. The LSE has a PhD class on Stata, here are the class notes: Introduction to Stata and Advanced Stata Topics.

Creating a map in Stata is painful since there are a host of incompatible file formats that have to be converted (I spent several hours yesterday working to convert a dBase IV to dBase III file just so I could convert the latter to dta). Still, when it works, it works well. Friedrich Huebler has some of the details.

The reshape command is often critical but difficult, here is a good guide.

Here are many more sources of links: Stata resources, Stata Links, Resources for Learning Stata, and Gabriel Rossman’s blog Code and Culture.


At the steep prices Stata sells at is it much better than a free option like R? Just curious.

Could somebody link to a "resources for R" post? Or write one?
That being said, I like stata and find it more user-friendly than R. So thank you very much, Alex; I have sent the post out to all my colleague graduate students.

Stata is to R as Windows is to Linux.

The Statalist archives are very useful as well because most questions you would want to ask have been asked before and many have ben answered.

To those asking about R and Stata, the advantages I've found with R are a cool 3D scatterplot, object oriented programming makes it easier to grab and use stuff like your covariance matrix after regression, and I'm told that unlike Stata the mapping functions are very useful. The advantages of Stata are everything else. This is just for me though, statisticians I find prefer R and economists prefer Stata.

R works like a Maserati, extremely flexible, cutting-edge.
Stata and the old statistical industry standard SAS work like a Ford,
straight forward, easier than R because they lack flexibility.

Unfortunately, the statistics professors at George Mason University stopped teaching R/Splus a decade ago, but they still use R for their profession,
because their students enter jobs with only SAS or some other basic statistical programming language.
Managers largely choose something like SAS because they cannot use cutting-edge R; because actually spending money on software lets them get answers for money and assign blame; and because they do not conceive the greater variety of statistical problems/solutions and quicker/more-difficult computing solutions.

Gnuplot needs a mention for plotting. Very versatile. Does maps and 3D plots too.

@Adam Ozimek:

Do you find use for 3D scatter plots? Whenever I tried it becomes a mess of data hard to read. Of course, it does have a high coolness-index.


I love mathematica for symbolic stuff. But for number crunching I rather prefer matlab (or its free clone; Octave)

Alex - YES.

Learn R. It'll be hard, but then you can do anything.

BTW last time, you was about to have a new computer for Stata R was not available in 64 bit on windows, now it is, so large sets are OK with R for windows.


To all the people saying that Alex should be talking about R rather than Stata, I think you all know that R is a very powerful and flexible program but it's almost deliberately opaque. As with lots of free software, there's an issue that your time is worth something. In this respect, if R is Linux (not a bad analogy actually) then Stata isn't Windows, but OS X. That is, it's a similar product that's more expensive and slightly less flexible but much easier to use. And I should add that a) Stata isn't that expensive and b) it is extensible through a large library of user-written scripts in much the same ways as is R.

The issue is not a GUI versus a command-line -- almost all Stata users run the program through scripting -- but about having a command-line that is intuitive. Nobody has ever run a regression in Stata and wondered what happened to the results because Stata assumes that when you run a regression, you're actually interested in the output, whereas R assumes that you want it to disappear into the ether unless you specify that you want to capture it in an object or wrap it in a summary() function. Note that R Commander behaves more like Stata as it wraps everything in summary().

For example, loading a dataset, running an OLS, and getting the most common output in R looks like this:
<pre>dataset <- read.table(data.txt)
myregression <- lm(y~x)

In contrast, doing the same thing in Stata looks like this:
<pre>use data.dta, clear
reg y x</pre>

There are also subtler issues, such as that R sample code almost always uses random data whereas Stata sample code almost always uses real data. Since ordinary users are mostly interested in using real data and there's a pretty substantial learning curve to R's object-oriented approach to data, using "rnorm()" instead of "read.table()" and "attach()" in tutorials is just another subtle way that R throws up needless stumbling blocks on the learning curve.

Recommending R to people who don't need its flexibility and would be intimidated by all the parentheses and arrows (and inscrutable assumption that output should be suppressed) is snobbishness, the geek equivalent of lecturing people who don't eat vegetables at all and are considering the Birdseye aisle at the supermarket that they really should be buying all their food from organic farmer's markets.

and inscrutable assumption that output should be suppressed

If you see the output, you'll find it hard to resist putting it into a paper. If something appears in a paper, it will be mis-reported by a journalist.

I have had the same experience with stata as most of the people above: when you're presented with ready datasets and you need to perform simple tasks (i.e. when you're an undergrad) stata is heaven.

But the data handling is horrific, the syntax is abysmal, and doing anything at all that isn't built-in (or readily available on the net) is hell. My go-to stats programs are R and MATLAB, usually preferring MATLAB for its more intuitive syntax and more extensive documentation.

Stata has its weaknesses, but if you consider its syntax abysmal, you're just wrong or have a really weird basis of comparison.


I am currently teaching Stata to social science students with no experience of either stats or programming whatsoever, and it is going pretty well. I understand why a researcher would have both Stata and R installed on his/her machine (I use both), but as far as teaching goes, I feel that Stata is really on top of its competitors.

Many people just have bad eating habits. That's usually why the people in those ads lose so much weight. They start to eat right.

Comments for this post are closed