Repurposing kinship coefficients as a sample integrity method for next generation sequencing data in a clinical setting

Abstract

BACKGROUND AND OBJECTIVES: Kinship coefficients measure relatedness between two individuals and have wide usage in genetic applications. In this study, we repurpose the kinship coefficient to directly facilitate sample tracking to identify potential sample swaps. Such sample integrity metrics are particularly important for the following two scenarios in large-scale clinical studies: First, multiple biological samples from the same individual were routinely processed as unique samples or technical replicates. Querying the relatedness of genomic data of two samples can identify sample swaps prior to inappropriate inclusion in data analysis. In the second scenario, different biological analytes from the same samples were run across multiple platforms and it is critical to establish the correct mapping for each individual sample, linking genomic information derived from multiple platforms to the same sample. For both cases, all downstream inferences rely on such correct mapping. Kinship coefficients can directly measure the mapping accuracy and ensure the required sample integrity.

MATERIALS AND METHODS:

We first describe the general concept of kinship coefficients and focus on the novel adaptations on feature (i.e. variants and/or SNPs) selection utilizing expressed variants to make it suitable for the clinical setting.

RESULTS:

We illustrate the adapted kinship coefficients estimate in two studies: one for lung fibrosis where multiple samples were routinely collected from each patient and one for thyroid cancers where a cohort of samples was run on different platforms.

CONCLUSION:

We demonstrate the effectiveness of using kinship coefficients to improve sample integrity and discuss potential improvements in the methodology.

Keywords

Kinship sample integrity clinical next generation sequencing

Get full access to this article

View all access options for this article.

References

Van Allen

E. M.

Wagle

Stojanov

Perrin

D. L.

Cibulskis

Marlow

Jane-Valbuena

Friedrich

D. C.

Kryukov

Carter

S. L.

McKenna

Sivachenko

Rosenberg

Kiezun

Voet

Lawrence

Lichtenstein

L. T.

Gentry

J. G.

Huang

F. W.

Fostel

Farlow

Barbie

Gandhi

Lander

E. S.

Gray

S. W.

Joffe

Janne

Garber

MacConaill

Lindeman

Rollins

Kantoff

Fisher

S. A.

Gabriel

Getz

, & Garraway

L. A.

(2014). Whole-exome sequencing and clinical interpretation of FFPE tumor samples to guide precision cancer medicine. Nature Medicine, 20, 682-688.

Ashley

E. A.

(2016). Towards precision medicine. Nat Rev Genet, 17, 507-522.

Boyce

A. J.

(1983). Computation of inbreeding and kinship coefficients on extended pedigrees. The Journal of Heredity, 74, 400-404.

Byron

S. A.

Van Keuren-Jensen

K. R.

Engelthaler

D. M.

Carpten

J. D.

, & Craig

D. W.

(2106). Translating RNA sequencing into clinical diagnostics: Opportunities and challenges. Nat Rev Genet, 17, 257-271.

Choi

Wijsman

E. M.

, & Weir

B. S.

(2009). Case-control association testing in the presence of unknown relationships. Genet Epidemiol, 33, 668-678.

Cirulli

E. T.

, & Goldstein

D. B.

(2010). Uncovering the roles of rare variants in common disease through whole-genome sequencing. Nat Rev Genet, 11, 415-425.

CLIA 42 CFR §493. 1232. https://www.gpo.gov/fdsys/pkg/CFR-2016-title42-vol5/pdf/CFR-2016-title42-vol5-part493.pdf.

Engström

P. G.

Steijger

Sipos

Grant

G. R.

Kahles

The

RGASP

consortium Rätsch

. Goldman

Hubbard

T. J.

Harrow

Guigó

, & Bertone

(2013). Systematic evaluation of spliced alignment programs for RNA-seq data. Nature Methods, 10, 1185-1191.

Frampton

G. M.

Fichtenholtz

Otto

G. A.

Wang

Downing

S. R.

Schnall-Levin

White

Sanford

E. M.

Sun

Juhn

Brennan

Iwanik

Maillet

Buell

White

Zhao

Balasubramanian

Terzic

Richards

Banning

Garcia

Mahoney

Zwirko

Donahue

Beltran

Mosquera

J. M.

Rubin

M. A.

Dogan

Hedvat

C. V.

Berger

M. F.

Pusztai

Lechner

Boshoff

Jarosz

Vietz

Parker

Miller

V. A.

Ross

J. S.

Curran

Cronin

M. T.

Stephens

P. J.

Lipson

, & Yelensky

(2013). Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing. Nat Biotechnol, 31, 1023-31.

10.

Huang

Chen

Lathrop

, & Liang

(2013). A tool for RNA sequencing sample identity check. Bioinformatics, 29, 1463-1464.

11.

Jun

Flickinger

Hetrick

K. N.

Romm

J. M.

Doheny

K. F.

Abecasis

G. R.

, et al. (2012). Detecting and estimating contamination of human DNA samples in sequencing and array-based genotype data. American Journal of Human Genetics, 91, 839-848.

12.

Kelmemi

Teeuw

M. E.

Bochdanovits

Ouburg

Jonker

M. A.

Alkuraya

Hashem

Kayserili

van Haeringen

Sheridan

Masri

Cobben

J. M.

Rizzu

Kostense

P. J.

Dommering

C. J.

Henneman

Bouhamed-Chaabouni

Heutink

Ten Kate

L. P.

, & Cornel

M. C.

(2015). Determining the genome-wide kinship coefficient seems unhelpful in distinguishing consanguineous couples with a high versus low risk for adverse reproductive outcome. BMC Med Genet, 16, 50.

13.

Manichaikul

Mychaleckyj

J. C.

Rich

S. S.

Daly

Sale

, & Chen

W. M.

(2010). Robust relationship inference in genome-wide association studies. Bioinformatics, 26, 2867-2873.

14.

The 1000 Genomes Project Consortium. (2015). A global reference for human genetic variation. Nature, 526, 68-74.

15.

Pankratz

D. G.

Choi

Imtiaz

Fedorowicz

G. M.

Anderson

J. D.

Colby

T. V.

Myers

J. L.

Lynch

D. A.

Brown

K. K.

Flaherty

K. R.

Steele

M. P.

Groshong

S. D.

Raghu

Barth

N. M.

Walsh

P. S.

Huang

Kennedy

G. C.

, & Martinez

F. J.

(2017). Usual interstitial penumonia can be detected in transbronchial biopsies using machine learning. Annals of the American Thoracic Society, In press.

16.

Pengelly

R. J.

Gibson

Andreoletti

Collins

Mattocks

C. J.

, & Ennis

(2013). A SNP profiling panel for sample tracking in whole-exome sequencing studies. Genome Medicine, 5, 89.

17.

Rehm

H. L.

Bale

S. J.

Bayrak-Toydemir

Berg

J. S.

Brown

K. K.

Deignan

J. L.

Friez

M. J.

Funke

B. H.

Hegde

M. R.

, & Lyon

(2013). The working group of the american college of medical genetics and genomics laboratory quality assurance committee, ACMG clinical laboratory standards for next-generation sequencing. Genetics in Medicine, 15, 733-747.

18.

Sepulveda

J. L.

, & Young

D. S.

(2013). The ideal laboratory information system. Archives of Pathology & Laboratory Medicine, 137, 1129-1140.

19.

Thornton

, & McPeek

M. S.

(2010). POADTRIPS: Case-control association testing with partially or completely unknown population and pedigree structure. Am J Hum Genet, 86, 172-184.

20.

Thornton

Tang

Hoffmann

T. J.

Ochs-Balcorn

H. M.

Caan

B. J.

, & Risch

(2012). Estimating kinship in admixed populations. Am J Hum Genet, 91, 122-138.