Reliability of Sequence-Alignment Analysis of Social Processes: Monte Carlo Tests of Clustalg Software

Abstract

Sequences of characters are used in many fields to record events or processes that characterize social processes. However, until recently, there have been very few methods available for the analysis of character-sequence data. Alignment algorithms measure similarities between pairs of sequences by inserting gaps into one or the other to create the best possible matching pattern. In this paper the reliability of alignments in the classification of sequential data is examined. Alignment methods were developed in computational biology, but are being considered for applications in other fields such as sociology, geography, and transportation planning. The ClustalG multiple alignment package is used to examine a set of synthetic sequences generated through the use of eight separate generation rules. Through the application of the software to sequential data with a known number of subgroups and known patterns in the sequences, some strategies for conducting the analysis can be compared and evaluated. The most effective strategy for analysing sequential data when the underlying processes that generate the event sequences are not known is to use low gap penalties that permit the maximum numbers of matches.

Get full access to this article

View all access options for this article.

References

Abbott

, 1995, “Sequence analysis: new methods for old ideas” Annual Review of Sociology 21 93–113

Abbott

Tsay

, 2000, “Sequence analysis and optimal matching methods in sociology” Sociological Methods and Research 29 3–33

Agresti

, 1996 An Introduction to Categorical Data Analysis (John Wiley, New York)

Blair-Loy

, 1999, “Career patterns of executive women in finance: an optimal matching analysis” American Journal of Sociology 104 1346–1397

Breiman

Friedman

Olshen

Stone

, 1984 Classification and Regression Trees (Wadsworth, Belmont, CA)

Carmines

Zeller

, 1979 Reliability and Validity Assessment (Sage, Beverley Hills, CA)

Chapin

Jr , 1974 Human Activity Patterns and the Environment (John Wiley, New York)

Doherty

Miller

, 2000, “A computerized household activity scheduling survey” Transportation 27 75–97

Durbn

Eddy

Krogh

Mitchison

, 1998 Biological Sequence Analysis (Cambridge University Press, Cambridge)

10.

Gower

, 1971, “A general coefficient of similarity and some of its properties” Biometrics 27 857–871

11.

Hammersley

Handscomb

, 1964 Monte Carlo Methods (Methuen, London)

12.

Harvey

Wilson

, 2001, “Evolution of daily activity patterns from 1971 to 1981: a study of the Halifax panel survey” Canadian Studies in Population 28 459–489

13.

Joh

Arentze

Timmermans

HJP

, 2001a, “Multidimensional sequence alignment methods for activity-travel pattern analysis: a comparison of dynamic programming and genetic algorithms” Geographical Analysis 33 247–270

14.

Joh

Arentze

Timmermans

HJP

, 2001b, “A position-sensitive sequence-alignment method illustrated for space–time activity-diary data” Environment and Planning A 33 313–338

15.

Joh

Arentze

Timmermans

HJP

, 2002, “Modeling individuals' activity-travel rescheduling heuristics: theory and numerical experiments” Transportation Research Record number 1807, 16–25

16.

Jones

, 1990 Developments in Dynamic and Activity-based Approaches to Travel Analysis (Gower, Aldershot, Hants)

17.

Koppelman

Pas

, 1985, “Travel-activity behavior in time and space: methods for representation and analysis”, in Measuring the Unmeasurable Eds Nijkamp

Leitner

Wrigley

(Martinus Nijhoff, Dordrecht) pp 587–623

18.

Levine

, 2000, “But what have you done for us lately?” Sociological Methods and Research 29 34–40

19.

Mackinnon

, 2000, “A spreadsheet for the calculation of comprehensive statistics for the assessment of diagnostic tests and inter-rater agreement” Computers in Biology and Medicine 30 127–134

20.

Pas

, 1982, “Analytically derived classifications of daily travel activity behaviour: description, evaluation and interpretation” Transportation Research Record number 879, 9–14

21.

Renaud

, 2001 Ils Sont Maintenant d'Ici! Les Dix Premieres Années au Québec des Immigrants Admis en 1989 [Now they are from here! The first ten years in Quebec of immigrants admitted in 1989] MRCI collection (Les Publications du Quebec, Sainte-Foy)

22.

Schlich

Axhausen

, 2003, “Habits in travel behaviour: evidence from a six-week travel diary” Transportation 30 13–16

23.

Stovel

Savage

Bearman

, 1996, “Ascription into achievement: models of career systems of Lloyd's Bank” American Journal of Sociology 102 358–399

24.

Thompson

Higgins

Gibson

, 1994, “CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice” Nucleic Acids Research 22 4673–4680

25.

Waterman

, 1995 Introduction to Computational Biology (Chapman and Hall, London)

26.

Williams

Han

S-K

, 2003, “Career clocks: forked roads”, in It's About Time: Couples and Careers Ed. Moen

(ILR Press, Ithaca, NY) pp 80–97

27.

Wilson

, 1998a, “Activity pattern analysis by means of sequence-alignment methods” Environment and Planning A 30 1017–1038

28.

Wilson

, 1998b, “Analysis of travel behaviour using sequence alignment methods” Transportation Research Record number 1645, 52–59

29.

Wilson

, 2001, “Activity patterns of Canadian women: application of ClustalG alignment software” Transportation Research Record number 1777, 55–67