Abstract
A formally recruited, studied, and followed cohort is a rich source of data for a wide range of research questions; the larger the cohort, the greater the quantity of data systematically collected, and the longer the duration of follow-up, the greater the value of the cohort. Despite the ease of conducting cohort studies and the research value of such studies, few centers with long-term research goals conduct such studies. This article provides easy-to-understand practical guidance on how to start and run a cohort study.
An earlier article 1 in this series explained what cohort studies are and commented on the paucity of such studies in this country. This article presents simple guidance on how to start and run a cohort study and is directly addressed to young researchers with long-term goals.
As a first step, choose a clinical category that is common in the setting in which you work. It does not make sense, for example, to recruit a cohort of patients with autism spectrum disorder (ASD) if new cases of ASD present at a frequency of about one per month; this means that if about half of the new cases meet eligibility criteria for recruitment and do not drop out thereafter, after five years the cohort would comprise only 30 patients. Diagnoses for which large samples are easily recruited include schizophrenia (Sz), bipolar disorder, major depressive disorder, and alcohol dependence. If you work in a general hospital that has departments of obstetrics and gynecology as well as pediatrics, you can start a pregnancy cohort that follows children through childhood into adolescence and beyond.
Next, set a general purpose for the study. This is required to provide direction to the study, and to justify the study to an ethics committee and to patients and their families from whom informed consent is sought. As an example, the purpose could be “to study factors that influence the long-term course and outcome of the disorder.”
Where possible, the cohort may be set up in the context of a separate unit that offers special clinical services to the patients in the cohort; treatments could be standardized, if desired, to improve the homogeneity of the cohort. These special services would facilitate ethical approvals and encourage patient follow-up and hence retention in the cohort.
Patients should be selected into the cohort only if they meet criteria relevant to the general purpose of the study; so, inclusion and exclusion criteria should be set. First-degree relatives can be included if this supports the study objectives. Patients and relatives should be included only if they live within the catchment area of the center and if follow-up at specified intervals is feasible.
At the time of preparation of the ethics committee proposal, all actually and potentially relevant independent and dependent variables should be listed and operationalized as explained in detail in earlier articles.2,3 Data should be collected using a structured form and structured instruments, and should include sociodemographic, clinical, lifestyle, treatment, and other variables. Data collected should immediately be transferred, properly coded, into a spreadsheet, and backed up at frequent intervals.
It is wise to be overinclusive with data collection because whatever data are collected could, at some time, be used as independent variables, dependent variables, or biasing/confounding variables. Creativity and foresight are necessary here. As examples of variables, for a Sz cohort, you could use separate instruments to collect data on prior medication exposure, treatment-refractoriness, delusions, hallucinations, and cognitive functioning, and also on often-ignored aspects such as obsessive-compulsive symptoms, soft neurological signs (SNS), and insight. As examples of possible analyses, you can examine whether SNS (independent variable) predicts 5-year clinical outcomes; or you can examine SNS trajectories across five years (dependent variable); or you can examine whether duration of untreated psychosis predicts 5-year clinical outcomes after adjusting for baseline variables, including SNS (biasing/confounding variables).
The greater the quantity of data collected, the greater the value of the cohort because a larger number of research questions can be framed and answered when the cohort becomes sufficiently large and the follow-up sufficiently long. It could be valuable to collect and store blood samples, too, if deep freezer and uninterrupted power supply facilities exist. This is because such samples can be examined at any time for anything from autoantibodies to abnormal genes. Separate informed consent forms may be necessary if blood samples are stored for such purposes.
How many patients should the cohort include? There is no magic number here; it could range from hundreds to thousands, depending on the research questions that need to be answered. Keep in mind that data in cohort studies are usually examined in regressions where there is a dependent variable, an independent variable, and variables that are “adjusted for”; when the dependent variable is continuous, the minimum sample size needs to be 10-20 times the number of independent and adjustment variables; when the dependent variable is dichotomous, the minimum sample size needs to be 10-20 times the frequency of the less frequent event.
What if after recruiting, say, 200 patients, you wish to include an additional set of variables to accommodate new objectives or to adjust for additional confounders? This can be done after obtaining fresh ethical approval. Patients subsequently recruited could comprise a “second wave.”
Recruitment into the cohort is usually stopped when a critical sample size is reached or after a specified time, such as 1 or 2 years. Practical considerations, such as unit staffing and study funding (if applicable) would determine this. If other centers follow identical protocols, the cohorts can be pooled if inter-rater reliability across centers was established before recruitment began; study center would then be an additional variable for adjustment in the regressions.
Patients should be assessed at baseline (ideally, before treatment initiation) and, again, at specified intervals. For reassessments, follow-up should be at more frequent intervals initially because greater treatment-related changes are expected initially; later follow-up can be at less frequent intervals, such as once in 6 months or once in 12 months. The same assessments should be conducted at the various prespecified time points, although assessments that are more time-consuming and those that do not change in the short-term can be conducted only at specified follow-ups. Ideally, the same trained rater should conduct the same assessments in all patients at all time points to improve reliability. Because this is hardly ever possible, all raters at a center need to be trained and inter-rater reliability confirmed.
When can data analysis be performed? This would depend on the research question and on whether the cohort eligible for analysis has reached sufficient sample size. One-year, 2-year, 5-year, 10-year (etc) data are commonly analyzed. Analyses should not be conducted at too frequent intervals because sufficient time should elapse for new results to emerge. Note that ethical approval will need to be obtained at every time point at which research questions are framed and analyses are conducted if these had not been prespecified at the time of original ethical approval.
The larger the cohort and the longer the duration of the follow-up, the greater the value of the cohort.
Footnotes
Declaration of Conflicting Interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors received no financial support for the research, authorship, and/or publication of this article.
