Sage Journals: Discover world-class research

Abstract

Objective

Short-video platforms have become major sources of rheumatoid arthritis (RA) information for the public. The quality and reliability of such content remain uncertain.

Methods

We conducted a cross-sectional study in October 2025. Using the Chinese keyword “Rheumatoid Arthritis” we screened top-ranked videos on TikTok and Bilibili and included eligible, non-advertising videos that addressed RA. To reduce personalization, searches were performed with newly registered accounts in private-browsing mode. Two trained raters assessed overall quality using the Global Quality Score and reliability using a modified DISCERN instrument. Uploaders were classified as rheumatology professionals, medical professional on other, patients, and science communicators. Inter-rater agreement was quantified using Cohen's kappa. Nonparametric tests compared platforms and sources. Spearman correlations examined associations between engagement metrics (likes, collects, comments, shares, duration) and quality scores.

Results

We included 209 videos (TikTok n = 110; Bilibili n = 99). Engagement was higher on TikTok (median likes, collects, comments, shares all greater; all p < 0.05), whereas Bilibili videos were longer (median 212 s vs 70.5 s; p < 0.001) and achieved higher Global Quality Score and modified DISCERN scores (both p ≤ 0.001). Uploader distributions differed by platform: TikTok was dominated by medical professionals; Bilibili had more science communicators. Video source was strongly associated with quality and reliability: rheumatology specialists and other medical professionals scored highest; patient-generated content scored lowest. Engagement metrics were strongly inter-correlated but showed no meaningful correlation with Global Quality Score or modified DISCERN. Longer duration correlated positively with quality.

Conclusion

RA related short videos on TikTok and Bilibili show variable quality and reliability. Bilibili tends to provide higher-quality and more reliable information, and clinician-produced content scores highest, whereas engagement does not indicate trustworthiness. Platform-level quality signals and greater clinician participation may improve the reliability of RA information online.

Keywords

Bilibili digital health literacy DISCERN Global Quality Score rheumatoid arthritis short video TikTok

Introduction

Rheumatoid arthritis (RA) is a systemic, chronic, and autoimmune disease that is marked by constant synovitis, as well as by progressive destruction of cartilage and bone, which may result in joint deformity and functional loss.¹ It occurs worldwide at approximately 0.5–1% although there is regional and ethnic diversity with obvious sex and age differences.² RA may affect any age but is most prevalent in people between the ages of 30 and 50 years and women are two to three times as susceptible to it as men. Genetic predisposition is not the only cause of both environmental and genetic causes.³ The effects of RA are not limited to the joints: it is a major cause of disability and work loss and has several extra-articular comorbid conditions, such as high cardiovascular risk, pulmonary involvement, osteoporosis, and depression.¹ These complications cause significant quality of life, mental, socioeconomic outcomes. Respect of the quality of public awareness, the promotion of diagnostic treatment in a timely manner, and the treatment on the basis of the guidelines are therefore fundamental priorities in the field of the public health.

The digital health period has transformed medical information access in people. It is known that approximately half of the adult population consults the internet for health-related information.⁴ Short-video platforms are increasingly used for rapid, interactive health communication, but videos are not only verbal and often contain written information, which raises health literacy and readability concerns. Guidance from major organizations has emphasized that patient education materials should be written at approximately a sixth-grade reading level to maximize comprehension.⁵ In this evolving information environment, Tik Tok and Bilibili are emerging as key sources of health information to the masses as a result of their easy-to-use format, quick spread, and high interactivity.^6,7 Through these platforms, there are reduced barriers to medical knowledge and satisfaction on convenience and immediacy with significant potential to increase disease awareness among the users. Their open creation models and restricted pre-publication scrutiny is however a 2-sided sword. These platforms may also facilitate the spread of health misinformation, which can mislead users and may contribute to inappropriate health decisions or delayed care.^8,9 Moreover, the algorithmic recommendation systems can generate echo chambers, exposing the user within the same system to biased or inaccurate opinions.¹⁰

A growing body of work has assessed the quality and reliability of short-video health information across specific diseases, including diabetes,¹¹ cervical cancer,¹² and Hashimoto's thyroiditis.¹³ Similar evaluation frameworks have also been applied in rheumatology, including recent assessments of YouTube exercise videos for fibromyalgia syndrome using comparable quality and reliability instruments.¹⁴ Yet evidence focusing on RA remains limited. This gap matters because RA management often relies on long-term patient education and self-management, and patients who have knowledge about the causes, pathophysiology, treatment and prevention of a disease may be better able to participate and comply during disease prevention or treatment procedures.¹⁵

To fill this gap, we held a cross-sectional analysis of RA-related short videos on Bilibili and Tik Tok, since they are two powerful short video platforms with specific content ecosystems and recommendation/interaction design features in Chinese, it is possible to comparatively communicate RA-related information quality and reliability in the same language context in a pragmatic way. We measured overall quality with Global Quality Score (GQS) and reliability with a modified DISCERN instrument and we tested the hypothesis whether engagement measures correlate with these two scores.

Methods

Search strategy

We conducted a cross-sectional study to evaluate the quality and reliability of RA-related short videos on two Chinese platforms, Bilibili and TikTok. Data collection took place in October 2025 using each platform's default “comprehensive ranking” to approximate what a non-logged-in user would see. In order to reduce personalization, as well as recommendation bias, searches were performed with newly registered accounts and the use of private-browsing mode where possible. Prior to every search, browsing history and cache were deleted and no likes, follows, comments, and watch-history were permitted to accrue during data collection. The search of videos through the platform search feature was done using a standardized Chinese keyword and stored in the order of place at the time of search. We searched both platforms using the Chinese keyword “类风湿性关节炎” (“rheumatoid arthritis”). From each platform, we retrieved the top 120 results.

Inclusion criteria were publicly accessible, non-advertising videos that directly addressed RA (definition, causes, symptoms, diagnosis, treatment, or prevention). Exclusion criteria were duplicates, RA-irrelevant content, and videos uploaded within the past week. In total, 209 videos were included (TikTok n = 110; Bilibili n = 99). Publicly available metrics (likes, collects, comments, shares, duration) were extracted into a standardized spreadsheet for analysis; baseline characteristics are summarized in Table 1.

Table 1.

Characteristics of included videos overall and categorized by platforms.

Variable	Overall (N = 209)	TikTok (N = 110)	Bilibili (N = 99)	p-value
Likes, M (Q₁–Q₃)	165.00 (43.00–755.00)	407.00 (137.75–1250.50)	48.00 (10.50–188.50)	<0.001
Collects, M (Q₁–Q₃)	94.00 (26.00–362.00)	156.00 (32.25–547.75)	68.00 (14.00–234.50)	0.002
Comments, M (Q₁–Q₃)	16.00 (3.00–73.00)	27.00 (9.00–112.75)	4.00 (1.00–29.50)	<0.001
Shares, M (Q₁–Q₃)	56.00 (13.00–214.00)	62.50 (14.25–306.25)	38.00 (12.50–161.00)	0.031
Duration, s, M (Q₁–Q₃)	103.00 (61.00–212.00)	70.50 (54.00–102.50)	212.00 (108.50–460.00)	<0.001
Video age, M (Q₁, Q₃)	219.00 (122.00, 948.00)	143.00 (106.50, 192.00)	972.00 (475.00, 1284.50)	<0.001
Follower count, M (Q₁, Q₃)	27,000.00 (3324.00, 92,000.00)	67,000.00 (21,000.00, 290,500.00)	5263.00 (917.50, 27,000.00)	<0.001
No. of hashtags, M (Q₁, Q₃)	4.00 (3.00, 5.00)	4.00 (3.00, 5.00)	5.00 (3.00, 9.00)	<0.001
Title length, words, M (Q₁, Q₃)	16.00 (12.00, 21.00)	15.00 (11.00, 19.50)	17.00 (13.00, 21.00)	0.054
Total videos by uploader, M (Q₁, Q₃)	440.00 (172.00, 762.00)	414.00 (193.00, 762.00)	496.00 (136.50, 822.00)	0.741
Modified DISCERN scores, M (Q₁–Q₃)	2.00 (2.00–2.00)	2.00 (1.25–2.00)	2.00 (2.00–3.00)	0.001
GQS scores, M (Q₁–Q₃)	3.00 (3.00–3.00)	3.00 (2.00–3.00)	3.00 (3.00–3.50)	<0.001
Video Sources, n (%)				<0.001
Medical professional on other	51 (24.4)	34 (30.91)	17 (17.17)
Medical professional on RA	85 (40.67)	62 (56.36)	23 (23.23)
Patients	15 (7.18)	9 (8.18)	6 (6.06)
Science communicators	58 (27.75)	5 (4.55)	53 (53.54)
Video contents, n (%)
Definition	52 (24.88)	19 (17.27)	33 (33.33)	0.007
Causes	64 (30.62)	39 (35.45)	25 (25.25)	0.110
Symptoms	119 (56.94)	65 (59.09)	54 (54.55)	0.508
Diagnosis	30 (14.35)	14 (12.73)	16 (16.16)	0.480
Interventions	150 (71.77)	77 (70.00)	73 (73.74)	0.549
Personal experiences	11 (5.26)	7 (6.36)	4 (4.04)	0.453
Style of video shooting, n (%)				<0.001
Animation or Action	34 (16.27)	2 (1.82)	32 (32.32)
Interview	2 (0.96)	2 (1.82)	0 (0.00)
Medical scenarios	22 (10.53)	16 (14.55)	6 (6.06)
PPT or class	17 (8.13)	0 (0.00)	17 (17.17)
Q & A	4 (1.91)	1 (0.91)	3 (3.03)
Solo narration	123 (58.85)	89 (80.91)	34 (34.34)
TV programs	7 (3.35)	0 (0.00)	7 (7.07)
Thumbnail: Human figure, n (%)	151 (72.25)	100 (90.91)	51 (51.52)	<0.001
Thumbnail: Text overlay, n (%)	147 (70.33)	99 (90.00)	48 (48.48)	<0.001
Verified uploader, n (%)	110 (52.63)	100 (90.91)	10 (10.10)	<0.001

M: median; Q: quartile; GQS: Global Quality Score.

Video variables and content coding

Uploaders (“video sources”) were categorized using a predefined coding manual developed by the authors based on publicly available account information and commonly used source categories in prior social-media health information evaluations. Classification was determined from the uploader's profile and verification status (when available), stated professional credentials, institutional affiliation, and the video content itself. “Medical professional on RA” referred to uploaders who self-identified as rheumatologists (or RA-focused clinicians) and/or whose profile indicated RA-related clinical specialization. “Medical professional on other” referred to licensed health professionals whose stated specialty was not rheumatology/RA (e.g. orthopedics, rehabilitation, general internal medicine, pharmacy, nursing). “Patients” referred to uploaders who self-identified as people living with RA (or caregivers) and primarily shared personal experiences without professional medical credentials. “Science communicators” referred to non-clinician individuals or organizations producing health-science educational content (e.g. popular science creators, media accounts) without claiming licensed clinical credentials. Content themes covered definition, causes, symptoms, diagnosis, interventions, and personal experiences. Filming styles included animation or dramatized action, interview, medical scenarios, PPT/classroom talk, Q&A, and solo narration. In addition, to explore potential drivers of video visibility beyond content accuracy, we extracted platform-available metadata and creator-level features for each video, including upload date (used to calculate video age at the time of data collection), uploader follower count, total number of videos posted by the uploader, number of hashtags, and title length. We also coded thumbnail presentation cues (presence of a human figure and/or text overlay) and recorded whether the uploader was a verified account (when applicable). Two researchers independently completed all classifications; disagreements were resolved by discussion or third-party arbitration.

Video review and classification

Two trained reviewers (Fangjun Xiao, Yifei Liufu) with RA expertise independently screened the initially retrieved videos using predefined inclusion and exclusion criteria, then labeled source category, content theme, and filming style as defined in section “Video variables and content coding.” Inter-rater agreement for categorical labels was assessed using Cohen's kappa. Disagreements were resolved by consensus; unresolved cases were adjudicated by a senior reviewer (Junxing Yang).

Video quality and reliability assessment

Video quality was assessed using the GQS (5-point Likert scale from 1 = poor to 5 = excellent), covering professionalism, completeness, clarity, and usefulness, and reliability was assessed using the modified DISCERN instrument (five yes/no items; total score 0–5, higher scores indicate greater reliability).^16,17 Detailed scale descriptions are provided in Supplemental Material 1. Two trained raters, blinded to engagement data, scored all videos independently.

Statistical analysis

Analyses were conducted in IBM SPSS Statistics 27.0. Continuous variables (e.g. engagement metrics, quality scores) were non-normally distributed by Shapiro–Wilk testing; they are presented as median (interquartile range) and compared using the Mann–Whitney U-test (two groups) or Kruskal–Wallis H-test (≥3 groups). Categorical variables are shown as counts and percentages and compared using the chi-square test or Fisher's exact test. Spearman's rank correlation examined associations between quality scores and engagement/duration. Inter-rater agreement for GQS and modified DISCERN was evaluated with Cohen's kappa. Statistical significance was set at p < 0.05.

Results

Video characteristics and platform comparison

We analyzed 209 RA-related short videos (TikTok n = 110; Bilibili n = 99), and the selection process is shown in Figure 1. Overall video characteristics by platform are summarized in Table 1. TikTok videos attracted substantially higher engagement than Bilibili (likes, collects, comments, and shares; all p < 0.05). In contrast, Bilibili videos were markedly longer (median 212.0 s vs 70.5 s; p < 0.001) and were generally older (median video age 972 days vs 143 days; p < 0.001). Despite lower engagement, Bilibili achieved higher information quality and reliability, with higher GQS and modified DISCERN scores (both p ≤ 0.001). Platform-available metadata also differed. TikTok videos were associated with higher uploader follower counts and higher rates of verified accounts, and were more likely to use thumbnails featuring a human figure and text overlay (all p < 0.001). Bilibili videos used more hashtags (p < 0.001). Uploader (“video source”) composition differed significantly between platforms (p < 0.001): TikTok was dominated by medical professionals (RA-focused 56.36%; other medical professionals 30.91%), whereas Bilibili included a larger proportion of science communicators (53.54%). Content themes were broadly similar, with “Interventions” most common on both platforms (TikTok 70.00%; Bilibili 73.74%; p = 0.549). Filming styles differed strikingly (p < 0.001): TikTok mainly used solo narration (80.91%), while Bilibili more often adopted animation/action (32.32%) and PPT/class formats (17.17%). These patterns are displayed in Figures 2 and 3.

Figure 1.

Video search strategy for rheumatoid arthritis.

Figure 2.

Percentage of RA videos by sources and style of video shooting on TikTok and Bilibili.

Figure 3.

Percentage of RA videos by content on TikTok and Bilibili.

Within-platform stratified analyses further revealed internal differences (Tables 2 and 3). On TikTok, videos uploaded by patients received the highest engagement across likes, collects, comments, and shares, yet their content focused predominantly on personal experiences (77.78%). On Bilibili, science communicators not only constituted the largest uploader group but also demonstrated greater content completeness, with higher coverage of “Definition” (43.40%), “Causes” (37.74%), and “Interventions” (81.13%).

Table 2.

Characteristics of included videos on TikTok, stratified by video source.

	Medical professional on other (N = 34)	Medical professional on RA (N = 62)	Patients (N = 9)	Science communicators (N = 5)	p-value
Likes, M (Q₁–Q₃)	190.00 (91.25–748.00)	651.50 (202.75–1380.25)	1015.00 (802.00–1605.00)	68.00 (61.00–75.00)	0.002
Collects, M (Q₁–Q₃)	53.00 (16.50–319.50)	215.50 (57.50–708.25)	351.00 (240.00–1151.00)	24.00 (13.00–25.00)	0.014
Comments, M (Q₁–Q₃)	10.00 (5.00–57.25)	32.50 (16.00–98.75)	761.00 (316.00–1484.00)	11.00 (3.00–20.00)	<0.001
Shares, M (Q₁–Q₃)	29.00 (6.25–157.50)	102.00 (20.50–253.25)	242.00 (134.00–739.00)	46.00 (6.00–50.00)	0.030
Duration, M (Q₁–Q₃)	58.50 (48.50–105.50)	76.00 (55.00–99.75)	88.00 (77.00–108.00)	68.00 (56.00–118.00)	0.268
Video age, M (Q₁–Q₃)	156.50 (109.00–222.00)	158.00 (122.75–209.50)	98.00 (95.00–112.00)	103.00 (101.00–108.00)	0.001
Follower count, M (Q₁, Q₃)	192,500.00 (65,000.00–366,000.00)	67,000.00 (29,500.00–160,250.00)	6548.00 (6492.00–9321.00)	3324.00 (3324.00–13,200.00)	<0.001
No. of hashtags, M (Q₁, Q₃)	4.00 (3.00–5.00)	4.00 (3.00–5.00)	4.00 (4.00–5.00)	3.00 (0.00–3.00)	0.048
Title length, words, M (Q₁, Q₃)	15.50 (10.50–21.50)	13.00 (11.00–18.00)	30.00 (23.00–41.00)	12.00 (7.00–14.00)	<0.001
Total videos by uploader, M (Q₁, Q₃)	518.00 (150.75–832.75)	414.00 (230.25–762.00)	167.00 (45.00–167.00)	166.00 (166.00–3782.00)	0.006
Definition, n (%)	6 (17.65%)	12 (19.36%)	0 (0.00%)	1 (20.00%)	0.553
Causes, n (%)	13 (38.24%)	22 (35.48%)	2 (22.22%)	2 (40.00%)	0.838
Symptoms, n (%)	22 (64.71%)	35 (56.45%)	3 (33.33%)	5 (100.00%)	0.088
Diagnosis, n (%)	5 (14.71%)	6 (9.68%)	0 (0.00%)	3 (60.00%)	0.007
Intervention, n (%)	20 (58.82%)	46 (74.19%)	6 (66.67%)	5 (100.00%)	0.192
Personal experiences, n (%)	0 (0.00%)	0 (0.00%)	7 (77.78%)	0 (0.00%)	<0.001
Thumbnail: Human figure, n (%)	32 (94.12%)	57 (91.94%)	7 (77.78%)	4 (80.00%)	0.376
Thumbnail: Text overlay, n (%)	34 (100.00%)	61 (98.39%)	1 (11.11%)	3 (60.00%)	<0.001
Verified uploader, n (%)	34 (100.00%)	62 (100.00%)	0 (0.00%)	4 (80.00%)	<0.001
Style of video shooting, n (%)					0.104
Animation or Action	0 (0.00%)	2 (3.23%)	0 (0.00%)	0 (0.00%)
Interview	0 (0.00%)	1 (1.61%)	0 (0.00%)	1 (20.00%)
Medical scenarios	8 (23.53%)	7 (11.29%)	0 (0.00%)	1 (20.00%)
Q & A	1 (2.94%)	0 (0.00%)	0 (0.00%)	0 (0.00%)
Solo narration	25 (73.53%)	52 (83.87%)	9 (100.00%)	3 (60.00%)
GQS score	3.00 (2.00–3.00)	3.00 (3.00–3.00)	2.00 (2.00–3.00)	3.00 (3.00–3.00)	0.012
Modified, M (Q₁–Q₃) DISCERN score, M (Q₁–Q₃)	2.00 (1.00–2.00)	2.00 (2.00–2.00)	1.00 (1.00–1.00)	2.00 (2.00–2.00)	0.004

M: median; Q: quartile; GQS: Global Quality Score.

Table 3.

Characteristics of included videos on Bilibili, stratified by video source.

	Medical professional on other (N = 17)	Medical professional on RA (N = 23)	Patients (N = 6)	Science communicators (N = 53)	p-value
Likes, M (Q₁–Q₃)	68.00 (30.00–255.00)	29.00 (8.00–141.00)	603.50 (146.00–845.75)	32.00 (10.00–165.00)	0.040
Collects, M (Q₁–Q₃)	79.00 (19.00–306.00)	31.00 (3.00–168.50)	103.00 (55.25–333.00)	73.00 (26.00–308.00)	0.219
Comments, M (Q₁–Q₃)	10.00 (3.00–57.00)	2.00 (0.50–14.50)	103.50 (33.00–284.25)	3.00 (0.00–15.00)	0.005
Shares, M (Q₁–Q₃)	52.00 (16.00–223.00)	31.00 (1.50–163.50)	51.00 (14.75–85.00)	38.00 (14.00–152.00)	0.330
Duration, M (Q₁–Q₃)	299.00 (133.00–556.00)	79.00 (64.50–112.50)	188.50 (119.50–220.75)	303.00 (169.00–590.00)	<0.001
Video age, M (Q₁, Q₃)	541.00 (184.00–1069.00)	1015.00 (398.50–1117.50)	1063.00 (934.75–1320.25)	1024.00 (509.00–1456.00)	0.258
Follower count, M (Q₁, Q₃)	15,000.00 (1075.00–33,000.00)	1596.00 (727.00–7983.00)	16,459.00 (917.25–32,000.00)	5642.00 (773.00–40,000.00)	0.139
No. of hashtags, M (Q₁, Q₃)	4.00 (2.00–9.00)	5.00 (3.00–8.00)	6.00 (4.50–6.00)	5.00 (3.00–9.00)	0.924
Title length, words, M (Q₁, Q₃)	16.00 (13.00–19.00)	16.00 (12.50–19.00)	28.00 (20.75–37.50)	17.00 (13.00–22.00)	0.136
Total videos by uploader, M (Q₁, Q₃)	636.00 (172.00–1089.00)	626.00 (510.00–929.50)	240.00 (114.00–366.00)	291.00 (65.00–653.00)	0.002
Definition, n (%)	5 (29.41%)	5 (21.74%)	0 (0.00%)	23 (43.40%)	0.074
Causes, n (%)	3 (17.65%)	2 (8.70%)	0 (0.00%)	20 (37.74%)	0.016
Symptoms, n (%)	12 (70.59%)	8 (34.78%)	3 (50.00%)	31 (58.49%)	0.123
Diagnosis, n (%)	2 (11.77%)	5 (21.74%)	0 (0.00%)	9 (16.98%)	0.582
Interventions, n (%)	12 (70.59%)	16 (69.57%)	2 (33.33%)	43 (81.13%)	0.077
Personal experiences, n (%)	0 (0.00%)	0 (0.00%)	4 (66.67%)	0 (0.00%)	<0.001
Thumbnail: Human figure, n (%)	11 (64.71%)	14 (60.87%)	6 (100.00%)	20 (37.74%)	0.009
Thumbnail: Text overlay, n (%)	11 (64.71%)	18 (78.26%)	3 (50.00%)	16 (30.19%)	<0.001
Verified uploader, n (%)	2 (11.77%)	2 (8.70%)	1 (16.67%)	5 (9.43%)	0.938
Style of video shooting, n (%)					<0.001
Animation or Action	2 (11.77%)	2 (8.70%)	0 (0.00%)	28 (52.83%)
Medical scenarios	2 (11.77%)	4 (17.39%)	0 (0.00%)	0 (0.00%)
PPT or class	2 (11.77%)	1 (4.35%)	0 (0.00%)	14 (26.42%)
Q & A	0 (0.00%)	0 (0.00%)	0 (0.00%)	3 (5.66%)
Solo narration	10 (58.82%)	15 (65.22%)	6 (100.00%)	3 (5.66%)
TV programs	1 (5.88%)	1 (4.35%)	0 (0.00%)	5 (9.43%)
GQS score, M (Q₁–Q₃)	3.00 (3.00–3.00)	3.00 (3.00–3.00)	2.00 (2.00–2.75)	3.00 (3.00–4.00)	0.002
modified DISCERN score, M (Q₁–Q₃)	2.00 (2.00–2.00)	2.00 (2.00–2.00)	1.50 (1.00–2.00)	2.00 (2.00–3.00)	<0.001

M: median; Q: quartile; GQS: Global Quality Score.

Quality and reliability by video source

Two reviewers showed high agreement on video ratings. Inter-rater reliability was excellent for both instruments (GQS: Cohen's κ = 0.816; modified DISCERN: κ = 0.951; Supplemental Material 2).

Overall, Bilibili scored higher than TikTok on both quality and reliability measures (Table 1; Figure 4). The median GQS on Bilibili was 3.00 (3.00–3.50), compared with 3.00 (2.00–3.00) on TikTok (p < 0.001). The median modified DISCERN score was also higher on Bilibili [2.00 (2.00–3.00)] than on TikTok [2.00 (1.25–2.00)] (p = 0.001).

Figure 4.

Comparison of video quality scores between TikTok and Bilibili.

When stratified by video source, score distributions differed across uploader categories on both platforms (Figures 5 and 6). On TikTok, videos uploaded by rheumatology professionals had the highest scores (median GQS: 3.00; modified DISCERN: 2.00). Patient-uploaded videos had the lowest modified DISCERN scores on both platforms (TikTok: 1.00; Bilibili: 1.50). On Bilibili, science communicator videos showed comparable scores to professional groups (median GQS: 3.00; modified DISCERN: 2.00).

Figure 5.

Video quality scores by sources on TikTok and Bilibili.

Figure 6.

Video quality by sources across platforms (pooled TikTok and Bilibili).

Platform-specific summaries are provided in Tables 2 and 3. On TikTok, science communicator videos accounted for 4.55% of the sample and had median scores of GQS 3.00 and modified DISCERN 2.00. On Bilibili, science communicator videos accounted for 53.54% of the sample, and 25% achieved GQS ≥ 4 (Tables 2 and 3).

Correlations between scores and engagement

Spearman correlation matrices for TikTok and Bilibili are shown in Figure 7. On both platforms, the engagement indicators (likes, collects, comments, and shares) were strongly and positively correlated with one another. In contrast, no significant correlations were observed between video quality/reliability scores (GQS and modified DISCERN) and engagement indicators on either platform. Video duration showed positive correlations with GQS and modified DISCERN on both platforms. Associations between video duration and engagement indicators were weak overall and were not consistently significant across platforms. Regarding creator- and post-level metadata, follower count, number of hashtags, title length, total videos by uploader, verification status, and thumbnail features (human figure and text overlay) showed variable correlation patterns with engagement indicators and with GQS/modified DISCERN across the two platforms, with most correlations being weak and not statistically significant. Video age also showed weak and variable correlations with engagement indicators and with GQS/modified DISCERN across platforms.

Figure 7.

Correlation matrix of video engagement metrics and quality scores on TikTok and Bilibili.

Discussion

In the current age of digital health-based communications, short-video platforms like TikTok and Bilibili have emerged as significant sources of information regarding RA to the population. Although RA predominantly affects middle-aged adults, these platforms remain relevant because their user base skews younger, resulting in high exposure to digital health content.¹⁸ Younger users may encounter RA information for general health learning or when seeking information for family members, rather than because RA is more prevalent in their age group. Accordingly, our discussion of TikTok and Bilibili reflects population-level information exposure and dissemination dynamics rather than the epidemiological age distribution of RA. Within this broader context, access to short-form digital content may influence how patients and the public seek information and engage in empowerment and self-management.¹⁹

In this cross-sectional study, we compared RA-related content across TikTok and Bilibili and observed a platform contrast: Bilibili videos had higher quality and reliability scores, whereas TikTok videos showed higher engagement. This difference may relate to platform-specific content ecosystems. Bilibili supports longer videos and often facilitates more structured, educational presentations, which may be better suited for explaining complex topics such as RA definitions and pathophysiology. In our sample, science communicators represented a substantial proportion of Bilibili uploaders and achieved quality/reliability scores comparable to professional groups, consistent with their potential role in translating specialist knowledge into accessible communication.²⁰ In comparison, TikTok being a typical short-form platform satisfies the need to draw the highest possible attention and immediate response of the user.²¹ As a short-form platform, TikTok content is often optimized for brevity and immediate viewer response, which may favor simplified framing and personal narratives. We have found, as have a series of studies in other disease domains,^22,23 that this is the means that makes one concentrate on such treatment interventions and personal experiences of illness. However, as shown in our correlation analyses, engagement metrics were not significantly associated with quality or reliability scores on either platform, indicating that highly engaged videos do not necessarily provide more rigorous or reliable information.

Scientific rigor depends on the source of the video. We discovered that rheumatology professionals recorded the highest scores in quality of videos and this is corroborated with the underlying influence of professional authority in medical communication.²⁴ Patient-uploaded videos showed the lowest reliability scores on both platforms; although they may reflect lived experiences, their content may be less likely to provide comprehensive, evidence-based information. What is more important, we have not found any significant positive relationship between the user engagement metrics and the quality ratings, and even a negative tendency on Tik Tok. This pattern is consistent with prior research on online patient education materials in musculoskeletal conditions, which has highlighted substantial variability in information quality and the need for cautious interpretation of online health materials.²⁵ Given that engagement is time-dependent and Bilibili videos were generally older than TikTok videos, video age should also be considered when interpreting engagement differences. In addition, our exploratory analyses incorporating creator- and post-level metadata showed mostly weak and nonsignificant associations with engagement and with quality/reliability scores. Explaining why certain content attains high visibility would likely require platform-level recommendation data (e.g. impressions and watch-time) that were not accessible in this study.

Our cross-platform findings illustrate that reach and trustworthiness can diverge across platforms, and the following communication perspective offers a possible contextual explanation. The health information ecosystem associated with short videos is a complex adaptive system determined by features of platforms, creator incentives, and user actions.²⁶ By creating a subculture of “knowledge,” Bilibili generates an environment conducive to depth, so users exhibit a greater desire to learn and high-quality content is positively reinforced.²⁷ The model based on algorithms in TikTok can create “information cocoons” where users are summarized repeatedly and repeatedly to similar (sometimes partial or incorrect) messages which reinforce previously held biases.²⁸ Furthermore, RA is a chronic disease whose management depends on the permanence of education of the patient and shared decision making. Information of this kind online that is wrong or one sided (e.g. stress upon a “special” medicine) may lead to delays in treatment based on guidelines and increased anxiety or financial burden.^29,30

To situate these findings within the wider online information landscape, it is also important to consider other contemporary channels through which RA information is accessed. Beyond TikTok and Bilibili, RA-related health information is also widely accessed through other channels, including traditional web-based materials, long-form video platforms such as YouTube, and rapidly expanding AI-assisted question-answering tools. Compared with short-video platforms, web-based patient education materials and many disease-specific online resources are typically organized in a more text-centered format, where readability and completeness become major determinants of usability.³¹ YouTube, as a long-form video platform, often enables more extended explanations but similarly shows heterogeneity in information quality across medical topics.³² More recently, AI-generated responses (e.g. ChatGPT-like systems) have been evaluated for readability, reliability, and quality in pain-related queries, underscoring both the potential and the uncertainty of AI-mediated health information.³³ Thus, the results should be regarded as platform-specific to China-based short-video ecosystems, and further research may establish the quality of the RA information across these newer and more established platforms by harmonizing their evaluation systems.

These findings carry clear practical and policy implications for multiple stakeholders. For the public and patients, the study underscores the urgency of developing digital health literacy—the ability to critically appraise online health information.³⁴ Users viewing videos should assess uploader credentials and beware of overreaching claims, lack of citations, or content that appeals mainly to emotion. For health care professionals and institutions, a more proactive role in creating authoritative, accessible short video content that corrects information is necessary. In clinical practice, clinicians should explicitly inquire about the sources of information patients have, and clarify misconceptions as part of the office visit.³⁵ For platforms, our results suggest that greater social responsibility is needed: improve the review of health content, provide readily visible verification badges for certified professionals and institutions, consider including quality assessment in recommendation algorithms—for example, an “information quality score” to balance popularity against scientific rigor and guide the ecosystem toward beneficial outcomes. Regulators might consider issuing guidelines to promote graded management of health care content and effective means for reporting and correcting misinformation.³⁶

This study has several limitations. The design is cross-sectional and depicts associations rather than causal effects, and the sampling describes only one time point in a dynamic recommendation ecosystem. Only two Chinese language platforms were examined, which limits generalizability to other cultures, languages, and product characteristics. The search terms and eligibility rules were predetermined, but the tagging of short videos and the rapid turnover of content open the prospect for selection bias. Besides, the engagement metrics also require time to work; we included video age and made it a part of our exploratory saliency, but time since upload can still confound comparisons between platforms. We also used platform-visible data (e.g. likes, collects, comments, shares, and selected metadata) and were not allowed access to platform data such as distribution and recommendation data (e.g. impressions, watch-time, or recommendation pathways), which would have helped us be more mechanistic in understanding how and why some videos get high visibility. The subsequent research would consider longitudinal sampling between a few times and on several occasions and use of RA-specific checklists of assessment and in case feasible, a statistical extrapolation of video age upon retrieval of results of analysis on engagement.

To our knowledge, this is one of the first sources to measure the general level and trustworthiness of the RA-related short-video information on two big Chinese-language sources. This is also a major strength because it has a real-world application: the analysis shows what can easily be accessed and consumed by the population during the everyday search of information. Our comparison of platforms with different content ecosystems reveals structural dissociation between reach and trustworthiness, which can be used in practice to educate patients and engage clinicians as well as govern quality on a platform-by-platform basis. This is also a start point that can be used in later cross-disease and cross-platform comparisons.

In conclusion, RA short-video content shows a clear divide: one platform reaches more people, while the other is generally more reliable. Video quality depends most on who made it and how it is presented, and likes or views are not good indicators of trustworthiness. A practical way forward is to educate viewers, involve more clinicians and skilled science communicators, and build simple reliability signals into how videos are recommended. These steps can help keep information both easy to access and accurate, supporting better learning, decisions, and self-management for patients.

Conclusion

The current cross-sectional investigation demonstrates that the level of the RA videos on short-video platforms is not even: Bilibili tends to have longer and higher-quality content, whereas Tik Tok achieves a better reach, but not necessarily and better accuracy. The most important driving force is uploader identity with most score, and the least reliable is patient-generated content. The metrics used in engagement are not used to believe in quality and, as a result, popularity is a bad measure of reliability. In practice, platforms ought to introduce verified medical creators, facilitate citation/labeling capabilities; clinicians ought to contribute proactively, and viewers should be given specific digital-health-literacy signals.

Supplemental Material

sj-docx-1-dhj-10.1177_20552076261435374 - Supplemental material for The quality and reliability of short videos about rheumatoid arthritis on Bilibili and TikTok: Cross-sectional study

Supplemental material, sj-docx-1-dhj-10.1177_20552076261435374 for The quality and reliability of short videos about rheumatoid arthritis on Bilibili and TikTok: Cross-sectional study by Junpeng Qiu, Muyuan Hou, Yifei Liufu, Junxing Yang and Fangjun Xiao in DIGITAL HEALTH

Supplemental Material

sj-docx-2-dhj-10.1177_20552076261435374 - Supplemental material for The quality and reliability of short videos about rheumatoid arthritis on Bilibili and TikTok: Cross-sectional study

Supplemental material, sj-docx-2-dhj-10.1177_20552076261435374 for The quality and reliability of short videos about rheumatoid arthritis on Bilibili and TikTok: Cross-sectional study by Junpeng Qiu, Muyuan Hou, Yifei Liufu, Junxing Yang and Fangjun Xiao in DIGITAL HEALTH

Footnotes

Acknowledgements

This work was supported by Shenzhen Hospital (Futian) of Guangzhou University of Chinese Medicine Research Project (No. GZYSY2024018) and Shenzhen Society of Traditional Chinese Medicine Scientific Research Project (No. 2024099F).

ORCID iDs

Junpeng Qiu

Muyuan Hou

Fangjun Xiao

Ethical approval

The data used in this study were sourced from publicly available video content published on platforms such as Bilibili and TikTok. These videos are publicly accessible, and no personal privacy information was involved during the data collection process. All analyzed content was publicly available, and the study did not involve the collection or processing of users’ private information. In accordance with relevant ethical review guidelines, ethical approval for this study was not required.

Author contributions

Conceptualization: Fangjun Xiao. Data curation: Junpeng Qiu, Muyuan Hou. Formal analysis: Junpeng Qiu. Methodology: Junpeng Qiu, Fangjun Xiao. Resources: Junpeng Qiu, Muyuan Hou. Software: Junpeng Qiu; Muyuan Hou, Fangjun Xiao. Validation: Junpeng Qiu; Muyuan Hou. Visualization: Junpeng Qiu, Yifei Liufu, Junxing Yang. Writing—original draft: Junpeng Qiu. Writing—review & editing: Muyuan Hou, Yifei Liufu, Junxing Yang, Fangjun Xiao. All the authors contributed to manuscript writing and editing and approved.

Funding

The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by Shenzhen Hospital (Futian) of Guangzhou University of Chinese Medicine Research Project (No.GZYSY2024018) and Shenzhen Society of Traditional Chinese Medicine Scientific Research Project (No.2024099F).

Declaration of conflicting interests

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Generative AI and AI-assisted technologies

The authors declare that no AI tools were used in the development or editing of this manuscript.

Guarantor

Fangjun Xiao.

Supplemental material

Supplemental material for this article is available online.

References

Di Matteo

Bathon

Emery

. Rheumatoid arthritis. Lancet 2023; 402: 2019–2033.

Finckh

Gilbert

Hodkinson

, et al. Global epidemiology of rheumatoid arthritis. Nat Rev Rheumatol 2022; 18: 591–602.

GBD 2021 Rheumatoid Arthritis Collaborators. Global, regional, and national burden of rheumatoid arthritis, 1990-2020, and projections to 2050: a systematic analysis of the Global Burden of Disease Study 2021. Lancet Rheumatol 2023; 5: e594–e610.

Gunduz

Matis

Ozduran

, et al. Evaluating the readability, quality, and reliability of online patient education materials on spinal cord stimulation. Turk Neurosurg 2024; 34: 588–599.

Ozduran

Hanci

Erkin

. Evaluating the readability, quality and reliability of online patient education materials on chronic low back pain. Natl Med J India 2024; 37: 124–130.

Comp

Dyer

Gottlieb

. Is TikTok the next social media Frontier for medicine? AEM Educ Train 2021; 5. Epub ahead of print July 2021. DOI: 10.1002/aet2.10532.

Zhou

Zeng

Yuan

, et al. Content accuracy and reliability of pulmonary nodule information on social media platforms: a cross-platform study of YouTube, Bilibili, and TikTok. Front Med (Lausanne) 2025; 12: 1613526.

Suarez-Lledo

Alvarez-Galvez

. Prevalence of health misinformation on social media: systematic review. J Med Internet Res 2021; 23: e17187. Epub ahead of print 20 January 2021.

Kirkpatrick

Lawrie

. Tiktok as a source of health information and misinformation for young women in the United States: survey study. JMIR Infodemiology 2024; 4: e54663.

10.

Cinelli

De Francisci Morales

Galeazzi

, et al. The echo chamber effect on social media. Proc Natl Acad Sci U S A 2021; 118: e2023301118.

11.

Sun

K-X

Cui

, et al. Reliability, quality, and science popularization of diabetes knowledge information in China video sharing platform: a cross-sectional study. Digit Health 2025; 11: 20552076251393298.

12.

Chen

Liang

, et al. Evaluating the quality and reliability of cervical cancer related videos on YouTube, Bilibili, and Tiktok: a cross-sectional analysis. BMC Public Health 2025; 25: 3682.

13.

Wang

Jia

, et al. Videos on YouTube, Bilibili, TikTok as sources of medical information on Hashimoto’s thyroiditis. Front Public Health 2025; 13: 1611087.

14.

Zure

Korkmaz

Menekşeoğlu

. Exercises for fibromyalgia syndrome: what YouTube tells us as a source of information for patient and physician education. Clin Rheumatol 2024; 43: 473–480.

15.

Özbek

İC

Hancı

Özduran

, et al. Digital guidance: quality and readability analysis of artificial intelligence-generated spondyloarthropathy texts. Turk J Osteoporos 2025. Epub ahead of print 20 March 2025. DOI: 10.4274/tod.galenos.2024.76743.

16.

Xia

Bao

Yao

, et al. Evaluation of the quality and reliability of Chinese content about orthognathic surgery on BiliBili and TikTok: a cross-sectional study. Sci Rep 2025; 15: 28967.

17.

Kyarunts

Mansukhani

Loukianova

, et al. Assessing the quality of publicly available videos on MDMA-assisted psychotherapy for PTSD. Am J Addict 2022; 31: 502–507.

18.

Lim

MSC

Molenaar

Brennan

, et al. Young adults’ use of different social media platforms for health information: insights from web-based conversations. J Med Internet Res 2022; 24: e23656.

19.

Kjærulff

Andersen

Kingod

, et al. When people with chronic conditions turn to peers on social media to obtain and share information: systematic review of the implications for relationships with health care professionals. J Med Internet Res 2023; 25: e41156.

20.

Kaňková

Binder

Matthes

. Helpful or harmful? Navigating the impact of social media influencers’ health advice: insights from health expert content creators. BMC Public Health 2024; 24: 3511.

21.

Griffiths

Harris

Whitehead

, et al. Does TikTok contribute to eating disorders? A comparison of the TikTok algorithms belonging to individuals with eating disorders versus healthy controls. Body Image 2024; 51: 101807.

22.

Dimitroyannis

Fenton

Cho

, et al. A social media quality review of popular sinusitis videos on TikTok. Otolaryngol Head Neck Surg 2024; 170: 1456–1466.

23.

Wang

Yao

Wang

, et al. Bilibili, TikTok, and YouTube as sources of information on gastric cancer: assessment and analysis of the content and quality. BMC Public Health 2024; 24: 57.

24.

Jenkins

Ilicic

Barklamb

, et al. Assessing the credibility and authenticity of social media content for applications in health communication: scoping review. J Med Internet Res 2020; 22: e17296.

25.

Özduran

. “Bel Ağrısı” ile İlgili Türkçe İnternet Kaynaklı Hasta Eğitim Materyallerinin Okunabilirliklerinin Değerlendirilmesi. DEU Tıp Derg 2022; 36: 135–150.

26.

Chou

W-YS

Klein

WMP

. Addressing health-related misinformation on social media. JAMA 2018; 320: 2417–2418.

27.

Moorhead

Hazlett

Harrison

, et al. A new dimension of health care: systematic review of the uses, benefits, and limitations of social media for health communication. J Med Internet Res 2013; 15: e85.

28.

Gao

Liu

Gao

. Echo chamber effects on short video platforms. Sci Rep 2023; 13: 6282.

29.

Shiferaw

Tilahun

Endehabtu

, et al. E-health literacy and associated factors among chronic patients in a low-income country: a cross-sectional survey. BMC Med Inform Decis Mak 2020; 20: 181.

30.

Fraenkel

Bathon

England

, et al. 2021 American College of Rheumatology guideline for the treatment of rheumatoid arthritis. Arthritis Care Res 2021; 73: 924–939.. Epub ahead of print July 2021.

31.

Ozduran

Hanci

. Evaluating the readability, quality, and reliability of online information on Sjogren’s syndrome. Indian J Rheumatol 2023; 18: 16–25.

32.

Hancı

Öner

Özduran

, et al. YouTube as a source of information about Percutan Tracheostomy. Gazi Med J 2023; 34. DOI: 10.12996/gmj.2023.77

33.

Ozduran

Akkoc

Büyükçoban

, et al. Readability, reliability and quality of responses generated by ChatGPT, Gemini, and Perplexity for the most frequently asked questions about pain. Medicine (Baltimore) 2025; 104: e41780.

34.

Ban

Kim

Seomun

. Digital health literacy: a concept analysis. Digit Health 2024; 10: 20552076241287894.

35.

Bautista

Zhang

Gwizdka

. Healthcare professionals’ acts of correcting health misinformation on social media. Int J Med Inform 2021; 148: 104375.

36.

Tangcharoensathien

Calleja

Nguyen

, et al. Framework for managing the COVID-19 infodemic: methods and results of an online, crowdsourced WHO technical consultation. J Med Internet Res 2020; 22: e19659.