
gologit2/oglm Troubleshooting
Richard Williams, University of Notre Dame
Here are some of the main issues that I get asked about with gologit2. (some of these points also apply to oglm). Feel free to email me if you have other problems, suggestions or recommendations, or to let me know what recommendations worked best for you. Click here if you want the main gologit2 support page.
Universal Recommendation. Make sure you have the most current version of the program (and also the most up-to-date version of the Stata software you are using.). If you are lucky the problem you are encountering may have already been fixed. From within Stata, type
ssc install gologit2, replace
ssc install oglm, replace
update allIf you have Stata 9 you can also use the adoupdate command. Also, the most up-to-date version of the documentation is gologit2.pdf
For security or other reasons, my computer can't access the Internet. How can I install your programs? This must be a real nuisance for you! You may want to talk to your computing people to see if they can't find a way to make your life easier. Two possible solutions are:
* If you have another computer that has Stata and can access the Internet, install the programs on it. Then, copy c:\ado (or whatever the appropriate directory is on your machine) from one computer to the other. Be sure you understand what you are doing, because you don't want to accidentally overwrite files that are needed on the non-Internet machine.
* Following are zipped versions of my programs and their support files. Unzip the files and store them in c:\ado\personal or some other location where Stata can find them. Again, if you don't understand how to do this, find somebody on your support staff who can help you out.
gologit2 version 2.1.4
oglm version 1.1.3
mfx2 version 1.1.0* Sometimes people have read/write access to some drives but not others. If so, you may be able to modify these suggestions for Notre Dame users. Basically, the "trick" is to get Stata to look for programs in a folder that you have some control over.
I don't have Stata. Is there any other way I can estimate gologit models? I've never used it myself, but I understand that Don Hedeker's mixor program can do many of the same things that gologit2 can. Somebody who is familiar with both programs said that "Hedeker's software does gologit2, but with random effects. I would assume that if you don't specify a random effect you get the same results. His program doesn't do a lot of the cool things that yours does, but if you have specific non-proportionality hypotheses in mind, it will test them and produce the non-proportional results." Hedeker's web page also includes programs or code for DOS, SPSS and SAS.
Can I do a random effects model with gologit2? No. Instead, check out Stefan Boes's regoprob program, which uses code adapted from gologit2 and reoprob. Or, Don Hedeker's mixor program may do what you want.
gologit2 does not work with Long & Freese's spost routines. This is covered in the help file but many people miss the advice. Add the v1 option to gologit2, e.g.
gologit2 y x1 x2 x3, v1
Some (but not all) of Long and Freese's spost routines currently work with the original gologit but not gologit2. The v1 option saves the internally-stored results in the same format that was used by gologit. However, you can still use gologit2's other unique options, such as autofit or pl. Note that post-estimation commands written specifically for gologit2 (including the pr option of predict) may not work correctly if you use the v1 option. In that case just rerun the model without it. Also, the v1 option only works with the default logit link, since that is all the original gologit supported. Long has indicated that future versions of spost will support gologit2.
suest (or some other post-estimation command) produces an error when I run it after gologit2. Try running gologit2 with the nolabel option. This will cause the equations to be labeled eq1, eq2, etc. The printout may not be as aesthetically appealing but this may reduce the likelihood of having problems with commands that have trouble with your labels (e.g. value labels that start with a number sometimes cause problems). Changing your value labels may also solve the problem. In many case though, a program must be officially "blessed" in order to work with a post-estimation command (i.e. the program must be on a hard-coded approved list) so you may just be out of luck if you try to use gologit2 with it.
How do I estimate marginal effects with gologit2 & oglm? Stata's mfx command will work, but make sure you have an up-to-date version of gologit2 (earlier versions had a bug that kept mfx from working correctly). However, it is generally better to use my mfx2 program, which can be downloaded and installed from SSC (ssc install mfx2). mfx2 makes it easier to compute marginal effects after multiple-outcome commands like oglm, gologit2, ologit, oprobrit, mlogit, mprobit and slogit. In addition, the results are formatted in a way that makes them compatible with post-estimation table formatting commands like outreg2 and estout. An often superior and faster program is Tamas Bartus's margeff (also available from SSC). Make sure you have the latest version of margeff, since earlier versions did not support gologit2. margeff now works with gologit2 but not oglm.
The predict command comes up with negative predicted probabilities. Believe it or not, negative predicted probabilities are possible. McCullagh & Nelder discuss this in Generalized Linear Models, 2nd edition, 1989, p. 155:
The usefulness of non-parallel regression models is limited to some extent by the fact that the lines must eventually intersect. Negative fitted values are then unavoidable for some values of x, though perhaps not in the observed range. If such intersections occur in a sufficiently remote region of the x-space, this flaw in the model need not be serious.
So yes, it can happen, and a couple of people have written me about this. But, they've also mentioned things like extremely high standard errors or other problems, so I suspect that in most cases a solution lies somewhere in the next couple of points.
I do recommend computing the predicted probabilities under your models; if they seem implausible, then you may wish to modify your model or use a different statistical technique altogether. (One person wrote me that 2 cases out of 27,068 had negative predicted probabilities; I probably wouldn't worry too much in a case like that, but I would get worried if a non-trivial number of cases had negative predicted values.) Sometimes combining categories of the response variable (especially if the Ns for some categories are small) and/or simplifying the model helps. The imposition of parallel lines constraints (either via autofit or the pl or npl options) may also help because it reduces the likelihood of non-parallel lines intersecting.
Click here for an example of the problem and a solution.
The standard errors are extremely high. You may have high multicollinearity in your variables. User-written routines like collin can check for this. But, routines like ologit and gologit2 can also have problems when an X variable has little or no variability within a category of Y, e.g. when Y = 2 X always equals 0. In ologit, you might get a warning message like this:
Note: 40 observations completely determined. Standard errors questionable.
In gologit2, alas, such a warning is still on the "wish list" of things I'd like to add. But, the high standard errors will still be a clue. Possible diagnostic devices:
Run the similar model in ologit or mlogit. That will help to identify whether the problem is unique to gologit2 or represents a broader problem in your model. And, they may give you more meaningful error or warning messages than gologit2 does.
Try something like bysort y: sum x1 x2 x3 . Look for x's with little or no variability within a category of y. Or, maybe try bysort y: corr x1 x2 x3 and see if there is extreme multicollinearity within a category of y.
If lack of x variability or extreme multicollinearity within a category of y is the problem - you'll have to decide what to do. You may want (or be forced) to drop the problematic variable. Maybe y has too many categories with small Ns, and some will need to be combined. When logit encounters such a problem it not only drops the variable, it drops the cases that were completely determined.
If none of this seems to address the problem - then consider the next FAQ:
gologit2 is very slow and/or does not converge/and/or produces implausible estimates. A couple of people have written to me with problems like this. Often they have a large number of variables and/or cases. Since I don't have their data it is hard for me to tell if there actually is a problem with the program or whether they need to be more patient or whether their model is problematic. Here are several things you can try.
Make sure that you are using the right dependent variable and that it is categorical! One user was having problems until she realized that she was analyzing z-scores rather than the variable she had intended. Just running descriptive statistics on your variables may help to identify basic mistakes you are making.
Probably the simplest thing to do is to use the difficult option. A user reported that this got one complicated model to converge and made another run much faster. To learn more about difficult and other related maximization options, from within Stata type
help maximize
These options will sometimes help programs to converge, but not necessarily (they can even make things worse). For example, typegologit2 y x1 x2 x3, pl(x1) difficult
Another maximization option that may be worth trying is technique. This option will cause Stata to try different algorithms; if one gets "stuck" another might get "unstuck." See help maximize for more details. For example,
gologit2 y x1 x2 x3, pl(x1) technique(nr bhhh dfp bfgs)
Simplify your model!! Drop unnecessary variables completely, or add variables gradually. You may be able to identify problematic variables this way and/or identify the limits of how large a model gologit2 can handle. If none of the "easy" options work, I STRONGLY suspect that this is the best way to go. One user was trying a 22-variable model, with the cluster option, and it took forever to run. He dropped a single variable and the program took 3 seconds to reach a solution! As noted above, variables that have little or no variability within a category of Y may be especially problematic, e.g. X1 is a constant or almost a constant when Y = 1.
Estimate similar models in ologit and mlogit. gologit2 is kind of a cross between those two programs, and by all rights they should be much faster than gologit2 is. If they are slow or have problems, it is probably not too surprising that gologit2 has problems too. You may just need to be patient or make other changes in your analysis.
Analyze a random subsample of your data. If the program works with a 10% sample it may eventually work with a 100% sample (or you might just have to say that a random subsample had to be used because of the size of the model.). If you don't know how to sample then type help sample from within Stata.
Sometimes rescaling variables will help. One user reported that gologit2 had problems when year was coded 1970 through 1999, but worked fine when he recoded it to 1 through 30.
Use the log option, e.g.
gologit2 y x1 x2 x3, pl(x1) log
This option prints out the iteration log, and may help you to see whether gologit2 is just spinning its wheels or slowly but surely working towards a solution. NOTE: The log option makes the autofit output look messy but it does work now.
Let the program run overnight or at least for several hours. gologit2 is not the fastest program in the world, and it may just need time to finish its job. I would use the log option if you do this so you know the program has not locked up.
If you are using autofit and are adventurous, or just desperate to get results, you are welcome to try out a program called gotuff that is currently in beta testing. To get gotuff , from within Stata type
net from http://www.nd.edu/~rwilliam/stata
and then click on gotuff and follow the installation instructions. There is no help file yet but the program includes extensive comments and instructions. The idea behind gotuff is that it limits the amount of time that is spent on problematic intermediate models, in the hope that the final model will still be correct. Even if the program successfully produces a final model, the user has to decide whether the results seem plausible and correct. It seems to have worked well in the few tests I have done but that is no guarantee of its overall usefulness. Again, I think probably the best thing to do is to eliminate unnecessary and problematic variables, but gotuff might work if nothing else does.Before running gologit2, type
set more off
set trace on
gologit2 ...
You will get incredible amounts of output on your screen but you may be able to identify where the program is having problems.Consider using a different technique, e.g. mlogit or slogit. It may just be that your data and model are not well suited for a gologit analysis.
