Harpoon or Bait? A Comparison of Various Metrics in Fishing for Sequence Patterns

Abstract

French

The use of sequence analysis in the social sciences has significantly increased during the last decade or two. Sequence analysis explores and describes trajectories and “fishes for patterns” (Abbott, 2000). Many dissimilarity metrics exist in various domains (bioinformatics, data mining, etc.); therefore a crucial and pervasive issue in papers using sequence analysis is robustness. To what extent do the various techniques lead to consistent and converging results? What kinds of patterns are more easily fished out by each of the metrics? Here we propose a systematic comparison of about ten metrics that have been used in the social science literature, based on the examination of dissimilarity matrices computed from a simulated sequence data set including various patterns that sociologists can try to identify. This should help scholars in picking the method best suited to their data design and inquiry objectives.

Keywords

Analyse de séquences Appariement optimal Analyse géométrique des données Comparaison Simulation Sequence Analysis Optimal Matching Geometric Data Analysis Comparison

Get full access to this article

View all access options for this article.

References

Aassve

Billari

Piccarreta

(2007) Strings of Adulthood: A Sequence Analysis of Young British Women's Work-family Trajectories. European Journal of Population 23(3-4): 369–38.

Abbott

(2000) Reply to Levine and Wu. Sociological Methods & Research 29(1): 65–76.

Abbott

Forrest

(1986) Optimal Matching Methods for Historical Sequences. Journal of Interdisciplinary History 16(3): 471–94.

Abbott

Hrycak

(1990) Measuring Resemblance in Sequence Data: An Optimal Matching Analysis of Musicians’ Careers. American Journal of Sociology 96(1): 144–85.

Aisenbrey

Fasang

(2010) New Life for Old Ideas: The “Second Wave” of Sequence Analysis Bringing the “Course” Back into the Life Course. Sociological Methods & Research 38(3): 420–62.

Allison

(1984) Event History Analysis: Regression for Longitudinal Event Data. Beverly Hills, CA: Sage, Quantitative Applications in the Social Sciences, vol. 46.

Anyadike-Danes

McVicar

(2010) My Brilliant Career: Characterizing the Early Labor Market Trajectories of British Women From Generation X. Sociological Methods & Research 38(3): 482–512.

Barban

Billari

(2012) Classifying Life Course Trajectories: A Comparison of Latent Class and Sequence Analysis. Journal of Royal Statistical Society: Series C (Applied Statistics) 61(5), available at http://onlinelibrary.wiley.com/journal/10.1111/%28ISSN%291467-9876.

Béduwé

Dauty

Espinasse

(1995) Trajectoires types d'insertion professionnelle. Application au cas des bacheliers professionnels de Midi-Pyrénées. In Deuxièmes journées d'étude Céreq-Lasmas-IdL “L'analyse longitudinale du marché du travail”, 28 et 29 juin 1995, Caen: Céreq, 7–29.

10.

Billari

(2001) Sequence Analysis in Demographic Research. Canadian Studies in Population 28(2): 439–58.

11.

Billari

(2005) Life Course Analysis: Two (Complementary) Cultures? Some Reflections with Examples from the Analysis of the Transition to Adulthood. Advances in Life Course Research 10: 261–81.

12.

Bison

(2009) OM Matters: The Interaction Effects between Indel and Substitution Costs. Methodological Innovations Online 4(2): 53–67.

13.

Blair-Loy

(1999) Career Patterns of Executive Women in Finance: An Optimal Matching Analysis. American Journal of Sociology 104(5): 1346–97.

14.

Bras

Liefbroer

Elzinga

(2010) Standardization of Pathways to Adulthood? An Analysis of Dutch Cohorts Born Between 1850 and 1900. Demography 47(4): 1013–34.

15.

Breiman

(2001) Statistical Modeling: The Two Cultures. Statistical Science 16(3): 199–231.

16.

Bry

(1995) Analyses factorielles simples. Paris: Economica, coll. Techniques quantitatives - poches.

17.

Bry

(1996) Analyses factorielles multiples. Paris: Economica, coll. Techniques quantitatives - poches.

18.

Bry

Antoine

(2004) Exploring Explanatory Models. An Event History Application. Population-E 59(6): 795–830.

19.

Chan

(1995) Optimal Matching Analysis: A Methodological Note on Studying Career Mobility. Work and Occupations 22(4): 467–90.

20.

Courgeau

Lelièvre

(1986) Nuptialité et agriculture. Population 41(2): 303–26.

21.

Courgeau

Lelièvre

(1992) Event History Analysis in Demography. Oxford: Clarendon Press.

22.

Cox

(1972) Regression Models and Life Tables (with discussion)”, Journal of royal statistical society, (B34), p. 187–220.

23.

Degenne

Lebeaux

Mounier

(1996) Typologies d’itinéraires comme instrument d’analyse du marché du travail. In Degenne

Mansuy

Podevin

Werquin

(eds) Typologie des marchés du travail, suivi et parcours, 23 et 24 mai 1996, Rennes. Paris: Documents séminaire Céreq, 115: 27–42.

24.

Deville

(1974) Méthodes statistiques et numériques de l’analyse harmonique. Annales de l'INSEE 15: 3–101.

25.

Deville

Saporta

(1980) Analyse harmonique qualitative. In Diday

(ed.), Data Analysis and Informatics. Amsterdam: North Holland, 375–89.

26.

Dijkstra

Taris

(1995) Measuring the Agreement between Sequences. Sociological Methods & Research 24(2): 214–31.

27.

Dureau

Barbary

Elisa

Hoyos

(1994) La observacion de las diferentes formas de movilidad: Propuestas metodologicas experimentadas en la encuesta de movilidad espacial en el area metropolitana de Bogota. In Atelier du CEDE (Montevideo), Nuevas modalidades y tendencias de la migracion entre paises fronterizos y los processos de integracion, 27-29 octobre 1993. Paris: Ortsom.

28.

Elzinga

(2003) Sequence Similarity: A Nonaligning Technique. Sociological Methods & Research 32(1): 3–29.

29.

Elzinga

(2006) Sequence Analysis: Metric Representations of Categorical Time Series. Unpublished manuscript available at http://home.fsw.vu.nl/ch.elzinga/.

30.

Elzinga

(2007) CHESA 2.1 User Manual. Amsterdam: Vrije Universiteit Amsterdam. Available at http://home.fsw.vu.nl/ch.elzinga/.

31.

Elzinga

Liefbroer

(2007) De-standardization of Family-life Trajectories of Young Adults: A Cross-national Comparison Using Sequence Analysis. European Journal of Population 23(3-4): 225–50.

32.

Espinasse

(1993) Enquêtes de cheminement, chronogrammes et classification automatique. Note du Lhire, 19(159).

33.

Fasang

(2010) Retirement: Institutional Pathways and Individual Trajectories in Britain and Germany. Sociological Research Online 15 (2). Available at: http://www.socresonline.org.uk/15/2/1.html.

34.

Forrest

Abbott

(1990) The Optimal Matching Method for Studying Anthropological Sequence Data. Journal of Quantitative Anthropology 2(2): 151–70.

35.

Gabadinho

Ritschard

Müller

Studer

(2011) Analyzing and Visualizing State Sequences in R with TraMineR. Journal of Statistical Software 40(4): 1–37.

36.

Gauthier

Widmer

ÉD

Bucher

Notredame

(2009) How Much Does It Cost?: Optimization of Costs in Sequence Analysis of Social Science Data. Sociological Methods & Research 38(1): 197–231.

37.

Glorieux

Mestdag

Minnen

(2008) The Coming of the 24-hour Economy?: Changing Work Schedules in Belgium between 1966 and 1999. Time & Society 17(1): 63–83.

38.

GRAB (1999) Biographies d'enquêtes - Bilan de 14 collectes biographiques. Paris: INED, coll. Méthodes et savoirs, vol. 3. Available at: http://grab.site.ined.fr/fr/editions_en_ligne/biographies_enquetes/.

39.

Grelet

(2002) Des typologies de parcours. Méthodes et usages. Document Génération 92 Vol. 20, 47 pp.

40.

Halpin

(2010) Optimal Matching Analysis and Life-Course Data: The Importance of Duration. Sociological Methods & Research 38(3): 365–88.

41.

Halpin

Chan

(1998) Class Careers as Sequences: An Optimal Matching Analysis of Work-life Histories. European Sociological Review 14(2): 111–30.

42.

Hamming

(1950) Error-detecting and Error-correcting Codes. Bell System Technical Journa, 29(2): 147–60.

43.

Han

Moen

(1999) Clocking Out: Temporal Patterning of Retirement. American Journal of Sociology 105(1): 191–236.

44.

Hollister

(2009) Is Optimal Matching Suboptimal? Sociological Methods & Research 38(2): 235–64.

45.

Kalbfleisch

Prentice

(1980) The Statistical Analysis of Failure Time Data. New York: Wiley, coll. Wiley Series in Probability and Mathematical Statistics.

46.

Lebart

Morineau

Piron

(2000) Statistique exploratoire multidimensionnelle. Paris: Dunod.

47.

Le Roux

Rouanet

(2004) Geometric Data Analysis: From Correspondence Analysis to Stuctured Data Analysis. Dordrecht: Kluwer Academic Publishers.

48.

Lesnard

(2010) Setting Cost in Optimal Matching to Uncover Contemporaneous Socio-Temporal Patterns. Sociological Methods & Research 38(3): 389–419.

49.

Lesnard

de Saint Pol

(2006) Introduction aux méthodes d'appariement optimal (Optimal Matching Analysis). Bulletin de Méthodologie Sociologique 90(1): 5–25.

50.

Lesnard

de Saint Pol

(2009) Décrire des données séquentielles en sciences sociales: Panorama des méthodes existantes. Communication aux Xe Journées de Méthodologie Statistique, 23-25 mars 2009, Paris, INSEE.

51.

Lesnard

Kan

(2011) Investigating Scheduling of Work: A Two-stage Optimal Matching Analysis of Workdays and Workweeks. Journal of Royal Statistical Society (Series A) 174(2): 349–68.

52.

Levenshtein

(1966) Binary Codes Capable of Correcting Deletions, Insertions, and Reversals. Soviet Physics Doklady 10(8): 707–10.

53.

Levine

(2000) But What Have You Done for Us Lately? Commentary on Abbott and Tsay. Sociological Methods & Research 29(1): 34–40.

54.

Liefbroer

Elzinga

(2012) Intergenerational Transmission of Behavioural Patterns: How Similar are Parents’ and Children’s Demographic Trajectories? Advances in Life Course Research 17(1): 1–10.

55.

Martens

(1994) Analyzing Event History Data by Cluster Analysis and Multiple Correspondence Analysis: An Example Using Data about Work and Occupations of Scientists and Engieneers. In Greenacre

Blasius

, Correspondence Analysis in the Social Sciences: Recent Developments and Applications. New-York: Academic Press, 233–51.

56.

Mayer

Tuma

(1990) Event History Analysis in Life Course Research. Madison: University of Wisconsin Press.

57.

McVicar

Anyadike-Danes

(2002) Predicting Successful and Unsuccessful Transitions from School to Work by Using Sequence Methods. Journal of Royal Statistical Society A, (165), p. 317–334.

58.

Rindfuss

Swicegood

Rosenfeld

(1987) Disorder in the Life Course: How Common and Does It Matter? American Sociological Review 52(6): 785–801.

59.

Ritschard

Oris

(2005) Life Course Data in Demography and Social Sciences: Statistical and Data-mining Approaches. Advances in Life Course Research 10: 283–314.

60.

Robette

(2011) Explorer et décrire les parcours de vie - Les typologies de trajectoires. Paris: CEPED, Les Clefs pour…

61.

Robette

Thibault

(2008) Comparing Qualitative Harmonic Analysis and Optimal Matching. An Exploratory Study of Occupational Trajectories. Population-E 63(4): 533–56.

62.

Rohwer

Pötter

(2005) TDA's User Manual. Available at http://www.stat.ruhr-uni-bochum.de/pub/tda/doc/tman63/tman-pdf.zip.

63.

Rousset

Giret

Grelet

(2012) Typologies de parcours et dynamique longitudinale. Bulletin of Sociological Methodology/Bulletin de Méthodologie Sociologique 114: 5–34.

64.

Sackmann

Wingens

(2003) From Transitions to Trajectories: Sequence Types. In Heinz

Marshall

(eds) The Life Course: Sequences, Institutions and Interrelations. NewYork: Aldine de Gruyter, 93–112.

65.

Sankoff

Kruskal

(eds) (1983) Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Reading: Addison-Wesley.

66.

Scherer

(2001) Early Career Patterns: A Comparison of Great Britain and West Germany. European Sociological Review 17(2): 119–44.

67.

Settersten

Jr Mayer

(1997) The Measurement of Age, Age Structuring and the Life Course. Annual Review of Sociology, 23: 233–61.

68.

Stovel

Bolan

(2004) Residential Trajectories. Using Optimal Alignment to Reveal the Structure of Residential Mobility. Sociological Methods & Research 32(4): 559–98.

69.

Stovel

Savage

Bearman

(1996) Ascription into Achievement: Models of Career Systems at Lloyds Bank, 1890-1970. American Journal of Sociology 102(2): 358–99.

70.

Studer

(2012) Étude des inégalités de genre en début de carrière académique à l’aide de méthodes innovatrices d’analyse de données séquentielles. PhD thesis, université de Genève.

71.

Van der Heijden

PGM

(1987) Correspondence Analysis of Longitudinal Categorical Data. Leiden: DSWO Press.

72.

Van der Heijden

PGM

Teunissen

van Orlé

(1997) Multiple Correspondence Analysis as a Tool for Quantification or Classification of Career Data. Journal of Educational and Behavioral Statistics 22(4): 447–77.

73.

Willekens

(1999) The Life Course: Models and Analysis. In van Wissen

LJG

Dykstra

(eds) Population Issues: An Interdisciplinary Focus. New-York: Plenum Press, 23–51.

74.

(2000) Some Comments on “Sequence Analysis and Optimal Matching Methods in Sociology: Review and Prospect”. Sociological Methods & Research 29(1): 41–64.