We describe a general strategy to analyze sequence data and introduce SQ-Ados, a bundle of Stata programs implementing the proposed strategy. The programs include several tools for describing and visualizing sequences as well as a Mata library to perform optimal matching using the Needleman–Wunsch algorithm. With these programs Stata becomes the first statistical package to offer a complete set of tools for sequence analysis.
AbbottA., and ForrestJ.1986. Optimal matching method for historical sequences. Journal of Interdisciplinary History16: 471–494.
2.
BrüderlJ., and SchererS.2004. Methoden zur Analyse von Sequenzdaten. Kölner Zeitschrift für Soziologie und Sozialpsychologie Sonderheft 44: Methoden der Sozialforschung: 330–347.
3.
Brzinsky-FayC.2006. Lost in transition: Labour market entry sequences of school leaves in Europe, Discussion Paper SP I 2006-111, Wissenschaftszentrum Berlin. Discussion Paper SP I 2006-111, Wissenschaftszentrum Berlin. http://skylla.wz-berlin.de/pdf/2006/i06-111.pdf.
4.
CoxN. J.2004. Stata tip 12: Tuning the plot region aspect ratio. Stata Journal4: 357–358.
5.
DiggleP. J., LiangK.-Y., and ZegerS. L.1994. Analysis of Longitudinal Data.Oxford: Oxford University Press.
6.
DijkstraW., and TarisT.1995. Measuring the agreement between sequences. Sociological Methods and Research24: 214–231.
7.
KoganI.2003. A study of employment careers of immigrants in Germany. Mannheimer Zentrum fuür Europüaische Sozialforschung: Arbeitspapier Nr. 66.
8.
KohlerU.2002. Der demokratische Klassenkampf. Zum Zusammenhang von Sozialstruktur und Parteipräferenz.Frankfurt a.M u. New York: Campus.
9.
KohlerU., and Brzinsky-FayC.2005. Sequence index plots. Stata Journal5: 601–602.
10.
KruskalJ. B., and SankoffD.1983. An anthology of algorithms and concepts for sequence comparisons. In Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison, ed. SankoffD., and KruskalJ., 265–310. Reading, MA: Addison–Wesley.
11.
LevenshteinV.1966. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady10: 707–710.
12.
MacindoeH., and AbbottA.2004. Sequence analysis and optimal matching techniques for social science data. In Handbook of Data Analysis, ed. HardyM., and BrymanA., 387–406. London: Sage.
13.
NeedlemanS., and WunschC.1970. A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology48: 443–453.