PMScanR: An R Package for the Large-Scale Identification,Analysis,and Visualization of Protein Motifs

Abstract

Proteins play a crucial role in biological processes, with their functions closely related to structure. Protein functions are often associated with specific motifs, which are short amino acid sequences exhibiting particular patterns. Most bioinformatics tools focus on identifying known motifs and they lack the ability to analyze the impact of single substitutions on entire domains or motifs. To address these limitations, we developed PMScanR (Protein Motif Scanner in R), an R package that automates the prediction and evaluation of the impact of single amino acid substitutions on protein motif occurrence in large datasets. In addition, existing tools do not support comparative analysis of multiple motifs across multiple sequences—a key feature that PMScanR was designed to provide. The package integrates various methods to facilitate motif identification, characterization, and visualization. It includes functions for running PS-Scan, a PROSITE database tool. Additionally, PMScanR supports format conversion to GFF, enhancing downstream analyses such as graphical representation and database integration. The library offers multiple visualization tools, including occurrence plots, sequence logos, and pie charts, enabling a deeper understanding of motif distribution and conservation. Through its integration with PROSITE, PMScanR provides access to up-to-date motif data, making it a valuable tool for biological and biomedical research, particularly in protein function annotation and therapeutic target identification. PMScanR is freely available under the GPL license and is distributed through Bioconductor (https://bioconductor.org/packages/PMScanR) and GitHub (https://github.com/prodakt/PMScanR).

Keywords

PROSITE protein motifs R library visualization

Get full access to this article

View all access options for this article.

References

Adzhubei

, Schmidt

, Peshkin

, et al. A method and server for predicting damaging missense mutations. Nat Methods, 2010; 7(4):248–249.

Chang

, Cheng

, Allaire

, et al. shiny: Web application framework for r. r package version 1.10.0.9000, 2025. Available from: https://github.com/rstudio/shiny,https://shiny.posit.co/

Choi

, Chan

. Provean web server: A tool to predict the functional effect of amino acid substitutions and indels. Bioinformatics, 2015; 31(16):2745–2747.

Finn

, Bateman

, Clements

, et al. Pfam: The protein families database. Nucleic Acids Res, 2014; 42(Database issue):D222–D230.

Gentleman

, Vincent

, Douglas

, et al. Bioconductor: Open software development for computational biology and bioinformatics. Genome Biol, 2004; 5(10):R80.

Grant

, Bailey

, Noble

. Fimo: Scanning for occurrences of a given motif. Bioinformatics, 2011; 27(7):1017–1018.

, Petukh

, Alexov

. Ddmut: Predicting the stability change upon single and multiple point mutations. BMC Bioinformatics, 2020; 21(1):57.

Mitchell

, Attwood

, Babbitt

, et al. Interpro in 2019: Improving coverage, classification and access to protein sequence annotations. Nucleic Acids Res, 2019; 47(D1):D351–D360.

Morris

, Katrina

, Elliott

. Uncovering protein function: From classification to complexes. Essays Biochem, 2022; 66(3):255–285.

10.

Mulder

, Apweiler

. Interpro and interproscan: Tools for protein sequence classification and domain annotation. Bioinformatics, 2001a;17(9):847–848.

11.

Mulder

, Apweiler

. Tools and resources for identifying protein families, domains and motifs. Genome Biol, 2001b;3(1):REVIEWS2001.

12.

, Henikoff

. Sift: Predicting amino acid changes that affect protein function. Nucleic Acids Res, 2003; 31(13):3812–3814.

13.

Pires

DEV

, Ascher

, Blundell

. MCSM: Predicting the effects of mutations in proteins using graph-based signatures. Bioinformatics, 2014; 30(3):335–342.

14.

Rodrigues

CHM

, Pires

DEV

, Ascher

. Dynamut2: Assessing changes in protein stability and flexibility upon single and multiple point mutations. Protein Sci, 2021; 30(1):60–69; doi: 10.1002/pro.3942

15.

Sedzik

, Jastrzebski

. High-resolution structural model of porcine p2 myelin membrane protein with associated fatty acid ligand: Fact or artifact?. Wiley Periodicals, Inc; 2011.

16.

Sedzik

, Jastrzebski

, Ikenaka

. Sequence motifs of myelin membrane proteins: Towards the molecular basis of diseases. Wiley Periodicals, Inc.; 2013.

17.

Sedzik

, Jestrzebski

, Grandis

. Glycans of myelin proteins. Wiley Periodicals, Inc.; 2014.

18.

Sigrist

CJA

, de Castro

, Cerutti

, et al. New and continuing developments at prosite. Nucleic Acids Res, 2013; 41(Database issue):D344–D347.

19.

Wickham

. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag. New York, 2016.

20.

Zhong

, Xiao

, Qiu

, et al. Protein posttranslational modifications in health and diseases: Functions, regulatory mechanisms, and therapeutic implications. MedComm (2020), 2023a;4(3):e261.

21.

Zhong

, Xu

, Li

. Protein motifs and post-translational modifications: Decoding functional sites in proteins. Front Mol Biosci, 2023b;10:1123456.