摘要
Peptide-based therapeutics are increasingly pushing to the forefront of biomedicine with their promise of high specificity and low toxicity.Although noncanonical residues can always be used,employing only the natural 20 residues restricts the chemical space to a finite dimension allowing for comprehensive in silico screening.Towards this goal,the dataset comprising all possible di-,tri-,and tetra-peptide combinations of the canonical residues has been previously reported.However,with increasing computational power,the comprehensive set of pentapeptides is now also feasible for screening as the comprehensive set of cyclic peptides comprising four or five residues.Here,we provide both the complete and prefiltered libraries of all di-,tri-,tetra-,and penta-peptide sequences from 20 canonical amino acids and their homodetic(N-to-C-terminal)cyclic homologues.The FASTA,simplified molecular-input line-entry system(SMILES),and structure-data file(SDF)-three dimension(3D)libraries can be readily used for screening against protein targets.We also provide a simple method and tool for conducting identity-based filtering.Access to this dataset will accelerate small peptide screening workflows and encourage their use in drug discovery campaigns.As a case study,the developed library was screened against severe acute respiratory syndrome coronavirus 2(SARS-CoV-2)main protease to identify potential small peptide inhibitors.