Abstract
Sequences of characters are used in many fields to record events or processes that characterize social processes. However, until recently, there have been very few methods available for the analysis of character-sequence data. Alignment algorithms measure similarities between pairs of sequences by inserting gaps into one or the other to create the best possible matching pattern. In this paper the reliability of alignments in the classification of sequential data is examined. Alignment methods were developed in computational biology, but are being considered for applications in other fields such as sociology, geography, and transportation planning. The ClustalG multiple alignment package is used to examine a set of synthetic sequences generated through the use of eight separate generation rules. Through the application of the software to sequential data with a known number of subgroups and known patterns in the sequences, some strategies for conducting the analysis can be compared and evaluated. The most effective strategy for analysing sequential data when the underlying processes that generate the event sequences are not known is to use low gap penalties that permit the maximum numbers of matches.
Get full access to this article
View all access options for this article.
