Abstract
Objective
To determine the prevalence and types of misinformation on Twitter related to breast cancer prevention and treatment; and compare the differences between the misinformation in English and Malay tweets.
Methods
A total of 6221 tweets related to breast cancer posted between 2018 and 2022 were collected. An oncologist and two pharmacists coded the tweets to differentiate between true information and misinformation, and to analyse the misinformation content. Binary logistic regression was conducted to identify determinants of misinformation.
Results
There were 780 tweets related to breast cancer prevention and treatment, and 456 (58.5%) contain misinformation, with significantly more misinformation in Malay compared to English tweets (OR = 6.18, 95% CI: 3.45–11.07,
Conclusion
Misinformation on breast cancer prevention and treatment is prevalent on social media, with significantly more misinformation in Malay compared to English tweets. Our results highlighted that patients need to be educated on digital health literacy, with emphasis on utilising reliable sources of information and being cautious of any promotional materials that may contain misleading information. More studies need to be conducted in other languages to address the disparity in misinformation.
Introduction
Over the last few decades, the internet has become an important source of health information for the public. Physicians may be regarded as the most trusted source of information, but many patients turn to the internet instead as it is convenient and readily accessible.1,2 Internet users increasingly use social media such as Twitter, Facebook, and TikTok to seek and share health information. These platforms have gained participation from all social groups including health professionals and organizations that use these platforms to disseminate health-related knowledge. 3 Although social media platforms provide immense opportunities for people to engage with each other in beneficial ways, they also allow misinformation to flourish as studies suggested that false information may spread more easily than true information. 4 The fundamental role of health misinformation on social media has been highlighted by the COVID-19 pandemic when the World Health Organization (WHO) declared an ‘infodemic’ due to the overload of information following the pandemic in 2020. 5
Concerns regarding the impact of online misinformation on public health spurred various research on this topic. A systematic review found that health misinformation was most common on Twitter and that most studies analysed posts only in one language. 3 Online misinformation occurs on multiple health-related topics, from vaccine, to communicable and non-communicable diseases. 3 Breast cancer is of particular concern, as it is a growing health problem and is currently the most commonly diagnosed cancer worldwide among females. 6 Previous researchers have studied misinformation about breast cancer on various platforms including Pinterest, Facebook, Twitter, and Weibo.7–9 Posts containing misinformation ranged from 30 to 51%, comprising different topics from risk factors to treatment and prevention of breast cancer. Misinformation related to prevention and treatment poses a substantial concern among healthcare providers as it can potentially cause treatment delay and non-adherence.10,11 Online health-information seeking behaviour was associated with non-adherence to endocrine therapy for breast cancer in a prior study, 12 but it was also associated with better adherence in other disease conditions, 13 suggesting that other factors may have contributed to the differences in outcome, such as prevalence of misinformation.
Despite the large number of studies on misinformation in social media, many of these focused only on one language, especially English.3,14–17 The problem of health misinformation is not limited to English-language content, and it can be even more challenging in languages other than English. Malaysia is a multiracial country that uses Malay as the national language. The use of English and Malay are both prevalent in the country, but the level of English proficiency varies. For people with limited English proficiency, social media may become a major source of information as various content from fellow native speakers is readily available. Unlike traditional media, content is created by individuals who may have limited knowledge of medicine with no factual verification or accountability, thus presenting a higher risk of misinformation.
Therefore, this study aims to determine the prevalence and types of misinformation on Twitter related to breast cancer prevention and treatment, and compare the differences between the misinformation in English and Malay tweets. Several different terminologies exist when discussing the spread of false information on the internet. Misinformation is regarded as an inadvertent spread of false information, while disinformation involves knowingly sharing or creating false information to cause harm. 18 Following previous works, we use the term ‘misinformation’ as an umbrella term to describe health-related false information, giving the benefit of the doubt to the users involved, as the intent is not always clear. 10 To our knowledge, this is the first study investigating misinformation on social media in Malay language regarding breast cancer prevention and treatment. This study is important in order to understand the content of misinformation and identify the determinants that can predict misinformation.
Methods
Study design and data collection
This quali-quantitative study examined data from Twitter to compare the extent of misinformation on breast cancer treatment and prevention between English and Malay tweets and explore their characteristics. Permission to use Twitter public Application Programming Interface (API) endpoints was obtained prior to data collection, 19 using web scraper scripted in Python to collect related tweets that are publicly available. The following information was collected: the date of post, text content, URL to the original tweet, and URL to external sites. Ethical approval to conduct the study was obtained from the Research Ethics Committee, Universiti Kebangsaan Malaysia (JEP-2022-240). The need for informed consent was waived by the ethics committee because this study used only publicly available data published voluntarily by Twitter users. The Consolidated Criteria for Reporting Qualitative Research (COREQ) checklist was used as a guide for reporting the qualitative aspect of the study (Appendix 1). 20
Inclusion criteria are tweets containing information regarding prevention and treatment for breast cancer, and posted between January 2018 until March 2022 in Malay or English language. We excluded tweets that contain more than 50% of words in languages other than English or Malay, and those that are inaccessible due to broken links. We aimed for at least 3000 tweets for each language, 9 using the keywords ‘breast cancer’ for English tweets and ‘kanser payudara’ or ‘kanser payu dara’ for Malay tweets. A preliminary search showed considerably more English than Malay tweets, with over 30,000 English tweets in one month. Therefore, all Malay tweets from the data collection period were included for screening, and tweets in English were randomly selected from different periods based on computer-generated random sequences using Python. Tweets from October were purposely collected each year to explore the characteristics of tweets shared during the Breast Cancer Awareness Month (BCAM) in October.
Categorisation and content analysis
The tweets were first screened to remove duplicates and identify tweets not written in English or Malay languages. Duplicates were removed by identifying duplicate tweet URLs using the Excel ‘Remove duplicate’ function. The remaining were then assessed for medical relevance, and those with broken links were removed. Tweets that contain medical-related information were assessed to identify content related to the treatment and prevention of breast cancer. These categorizations were conducted by a single researcher (IY).
Tweets on prevention and treatment of breast cancer were further evaluated and categorised based on accuracy (true information/misinformation). Two female coders—a pharmacist (IY) and an oncologist (NFAM)—independently evaluated the tweets on prevention and treatment for the accuracy of the information, types, and content of misinformation. Any discrepancies were resolved by a third person (a female pharmacy lecturer, NMS). For tweets containing links to external sources (e.g.: webpages, articles, images, videos, podcasts and social media posts in other platforms), accuracy of information in these were also verified. Information accuracy was judged based on experience and clinical knowledge whenever possible, or cross-referenced with reliable sources including the clinical practice guidelines on breast cancer and peer-reviewed journal articles. The inter-rater reliability was calculated using Cohen's Kappa, which showed a good agreement level at 0.874 and 0.887 for English and Malay tweets, respectively.
Tweets were also categorised based on the source of information, presence of attempts to sell services or products, and the timing—whether they were posted during BCAM in October, or whether tweets were posted before or after the COVID-19 pandemic (tweets before March 2020 were considered pre-pandemic). Using content analysis, tweets that contain misinformation were further categorised based on the type and content of misinformation. Categorisation was done based on previous breast cancer misinformation studies,7,8 and adapted to suit our data. Table 1 summarises the definition and examples of tweet from each content category.
Tweet categorisation: definition and examples.
Note: Tweets in Malay were translated to English. Some tweet examples in the manuscript were rephrased to protect the identity of users and hide the name of products promoted. Spelling corrections were made for better clarity in tweets with short forms and spelling errors.
Statistical analysis
Data were categorised in Microsoft Excel 2019 and quantitatively analysed using IBM SPSS Statistics version 26 (IBM Corporation, Armonk, NY). Continuous data were presented with means and standard deviations, and categorical data with frequencies and percentages. Pearson Chi-square test was used to identify the association between misinformation and tweet language, the timing of posts, sellers-associated tweets, and the source of information. Binary logistic regression was used to estimate the odd ratios and 95% confidence intervals of these variables.
Results
Identification and screening of tweets
A total of 3167 Malay tweets were collected from the keywords ‘kanser payudara’ (

Flow diagram summarising identification and screening of tweets.
Categorisation and sources of misinformation
A summary of tweet categories is shown in Table 2. From the 780 tweets analysed for content accuracy, 456 (58.5%) misinformation was identified. More than half of the tweets contain links to external sources of information (
Summary of information sources used in tweets.
Approximately one-quarter of the tweets (
Categories of tweets and presence of misinformation.
Note: Percentages were calculated based on the content of misinformation for each subcategory.
A binary logistic regression was performed to ascertain the effects of language, product seller, information source, and timing of tweet in relation to COVID-19 and BCAM on the likelihood that the tweet contains misinformation. The logistic regression model was statistically significant, χ2(6) = 643.4,
Binary logistic regression model for the determinants of misinformation.
SE: standard error; CI: confidence interval.
Statistically significant.
Content of misinformation
More than half of the tweets on breast cancer prevention and treatment contained misinformation. Among these, 233 (51.1%) contained fabricated information, and 223 (48.9%) were considered inaccurate. In the English tweets, 68 (69.4%) out of 98 misinformation were categorised as fabricated and 30 (30.6%) were inaccurate information. For Malay tweets, 165 (46.1%) out of 358 were fabricated, and 193 (53.9%) were inaccurate. The content of misinformation was further categorised into themes. The top three most common themes for misinformation were food & lifestyle (i.e. tweets that advocate specific food or lifestyle for breast cancer), alternative medicine (i.e. tweets that advocate the use of herbs or health products as a treatment or cure for breast cancer), and supplements (i.e. tweets that advocate the use of herbs or health products to improve breast cancer symptoms/outcome). Most of the misinformation related to supplements and alternative medicine was made by product sellers (

Summary of misinformation content by language.
The majority of the misinformation regarding food and lifestyle were categorised as inaccurate (
There were also exaggerated claims on the benefits of yoga, meditation, and veganism for breast cancer prevention or treatment.
We observed many repetitive claims that hard-boiled eggs could cut the risk of getting breast cancer tweeted to various users by the same account between 2020 and 2021:
In contrast, misinformation on alternative therapy was mainly fabricated information (
Almost half of the information on alternative therapy was found in promotions for amygdalin or laetrile, also commonly described as ‘Vitamin B17’, which was claimed to be an effective alternative to chemotherapy:
For supplements, the misinformation content was evenly categorised as fabricated (
Misinformation under the ‘Others’ category includes misinformation about hormone therapy, faith, surgery, gene testing, topical application, cupping, and conspiracy theory:
Discussion
Our study evaluated the distribution and content of misinformation on breast cancer prevention and treatment in English and Malay tweets. To our knowledge, this is the first study that compared misinformation regarding breast cancer in two languages, and we found substantially more misinformation in Malay tweets compared to English. This could be attributed to the sources used for the tweets, as Malay tweets tend to use information from less reliable sources such as blogs, non-peer-reviewed websites, and other social media sites such as Facebook and Instagram. Misinformation was often created by individuals with no official institutional affiliation; therefore, it is important to check the credibility of information source.3,10 Another important aspect to consider is the more extensive research on algorithms to detect misinformation in English compared to Malay language.14–17 Social media sites such as Twitter attempted to combat misinformation by having specific features such as labelling content that may contain problematic information and giving prompts when users engage with a potentially misleading post.21,22 However, these features will be obsolete if the algorithm to detect misinformation in the Malay language is deficient.
The most common types of misinformation were on food, lifestyle approaches, and supplements that can purportedly prevent or improve breast cancer outcome. These types of misinformation were also found in previous studies on breast cancer misinformation.7–9 Many patients tend to look beyond conventional treatment for cancer prevention and treatment, with much preference for natural-based and non-pharmacological options. 23 It was alarming to note that approximately one-third of the misinformation found in the present study were advocating the use of these unproven options as alternatives to evidence-based conventional treatment, and majority of these were present in the Malay tweets. In past studies, the use of complementary treatment was associated with refusal of conventional treatment, and breast cancer patients who opted for alternative treatment were found to have a five-fold increase in the risk of death.24,25 Delay in seeking health treatment and diagnosis is a major concern in Malaysia, where the prevalence of delay was higher than in other developed and developing countries. 26 This problem was associated with several factors, including the use of alternative therapy and negative attitude towards the treatment. 26 Within the Malaysian context, belief in traditional healers is highest among the Malay ethnic group, which contributes to the delay in getting treatment, causing a poor outcome and a low survival. 27 The widespread misinformation on breast cancer treatment alternatives on social media may be a contributing factor that needs to be intervened to tackle this issue. The use of concurrent complementary therapy with conventional cancer treatment could potentially jeopardise the effectiveness of the treatment.28–30 It could also affect kidney and liver functions, thus further complicating the course of treatment. 31
Users trying to promote their products or services spread much of the misinformation related to using supplements and health products. Our results showed that product sellers were four times more likely to spread misinformation compared to general users. Similar findings were also noted in past studies, where misinformation was disseminated by private companies promoting their products.3,32–34 There were instances where products registered as supplements were promoted as a prevention or alternative ‘cure’ for cancer. This is especially common in the Malay tweets, despite it being illegal in Malaysia to advertise any product for cancer treatment to the general public. 35 Promotions on the Internet may have contributed to the increased popularity of these products among cancer patients. The use of complementary and alternative approach among cancer patients has increased from 25% in the 1970s to over 49% after 2000, according to data collected in Europe, Australia and North America. 23 The prevalence of CAM use among Malaysian cancer patients was reportedly higher at 61.2%. 36 A recent meta-analysis found that the prevalence of herbal medicine usage in cancer patients was higher among African and Asian countries, and prevalence was also higher in patients from low- and middle-income countries. 37 A local study found that many Malaysian women perceived herbal medicines as being safer and more effective than modern medicines. 38 This is often how health products were promoted on social media, and they are often accompanied by testimonials claimed to be from other patients facing the same health condition. The effectiveness of these online promotions may be attributed to the preferences of certain groups of Asian patients to obtain information through person-to-person communication, as opposed to Caucasian patients preferring objective, scientific information obtained from research institutions. 39 Healthcare providers in a country with multicultural population need to be aware of the diversity in patient beliefs and preferences, so that healthcare education can be tailored accordingly.
The WHO declaration on the ‘infodemic’ following COVID-19 has raised awareness and concerns on the effect of misinformation on public health, spurring various research on this topic. However, our results show that misinformation on breast cancer prevention and treatment was more common before the COVID-19 pandemic. Similarly, a study on COVID-19 misinformation found that posts about COVID-19 included less misinformation than other health-related posts prior to the pandemic. 40 Although this contradicted our initial assumption that COVID-19 catalysed the propagation of misinformation, the declaration by WHO on the dangers of ‘infodemic’ may have created awareness that resulted in better detection and prevention of online misinformation. A previous study on breast cancer misinformation found an increase in news stories classified as ‘verified’ in BCAM of October compared to other months. 7 In our data, chi-square analysis revealed a significant association, but it was not significant in the logistic regression analysis.
Research and practice implication
Our study showed that misinformation on breast cancer prevention and treatment was common on social media, and this problem is especially prominent in Malay tweets. It can potentially distort patients’ belief about their treatment, thus worsening the problem with poor treatment adherence and poor outcome. Most of the misinformation observed may have stemmed from lack of knowledge in evidence-based medicine and unfamiliarity with best practices when sharing online information. Individuals with low digital health literacy may be more prone to judge information credibility based on superficial qualities such as image quality, position in search engine results and celebrity endorsements.2,41 The importance of using official and peer-reviewed sources for health-related information needs to be advocated. Health literacy and digital literacy education should be improved, especially for older patients who may be more susceptible to misinformation.42,43 Furthermore, there is a need to increase accessibility of reliable information sources in Malay, as our observation revealed that there are less sources available in Malay compared to English. There is also a need for stricter monitoring of online advertisements for healthcare products. Social media is a popular medium for product promotion and the volume of content may make it difficult for manual monitoring. This provides opportunities for future research, particularly in the application of machine learning to detect patterns and identify posts that may contain unethical product promotion and flag them for review by human moderators.
Study limitations
Data for Malay tweets were relatively scarce compared to English, as Twitter is not the most commonly used social media platform by Malaysians. 44 However, Twitter was chosen as it is relatively more difficult to access data for certain social media due to concerns over data privacy that led governments to tighten regulation on data access and storage, and issues involving data breaches such as the Cambridge Analytica scandal.3,45 Despite this, we obtained rich data from the Malay tweets, which also contained links to more commonly used social media sites such as Facebook, Instagram, and YouTube. Twitter is usually used by people from the younger generation, and may have limited representation of online information shared by older population. As breast cancer is more prevalent in older population, misinformation circulated in this subgroup may have been missed. Despite these limitations, we believe that this study contributes meaningful findings as it highlights prevalence of misinformation in a relatively less-studied language. In addition, data were collected over several years, providing opportunities to analyse misinformation on continuously evolving topics of discussion.
Conclusion
Misinformation on breast cancer prevention and treatment is prevalent on social media. This study showed that there was twice as much misinformation in Malay compared to English tweets, thus highlighting the need to address misinformation in other languages. Malay tweets utilised less information from official and peer-reviewed sources compared to English tweets, which may have contributed to the extensive degree of misinformation. Content of misinformation predominantly comprised of information on supplements, food and lifestyle, and alternative therapies. Tweets posted by product sellers were found to be a significant determinant of misinformation. These findings suggest that patients need to be educated on digital health literacy, with emphasis on utilising the right information sources, critical appraisal of information and being wary of any promotional materials that may contain misleading information.
Supplemental Material
sj-docx-1-dhj-10.1177_20552076231205742 - Supplemental material for Breast cancer prevention and treatment misinformation on Twitter: An analysis of two languages
Supplemental material, sj-docx-1-dhj-10.1177_20552076231205742 for Breast cancer prevention and treatment misinformation on Twitter: An analysis of two languages by Izzati Yussof, Nur Fa’izah Ab Muin, Masnizah Mohd, Ernieda Hatah, Nor Asyikin Mohd Tahir and Noraida Mohamed Shah in DIGITAL HEALTH
Footnotes
Acknowledgements
We would like to thank Mr Nishakaran Pushpa Rajah for his assistance in the data collection process.
Contributorship
IY, NMS, EH and NAMT researched the literature and conceived the study, and were involved in gaining ethics approval. IY, NMS and MM were involved in protocol development and data collection. IY, NMS and NFAM were involved in data analysis. IY wrote the first draft of the manuscript. All authors reviewed and edited the manuscript and approved the final version of the manuscript.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical approval
Permission to conduct this study was obtained from the Universiti Kebangsaan Malaysia (UKM) Research Ethics Committee (REC number: JEP-2022-240).
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Universiti Kebangsaan Malaysia Research Grant (grant number GUP-2020-004).
Guarantor
NMS
Supplemental material
Supplemental material for this article is available online.
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
