Vision Transformer for Active Compound Function Classification Based on 2D Molecular Structures
DOI:
https://doi.org/10.37385/jaets.v7i2.9418Keywords:
artificial intelligence, vision transformer, molecular structure classification, drug discovery, hyperparameter tuningAbstract
Accurate classification of active compounds based on molecular structure is crucial for accelerating drug discovery while reducing laboratory costs and time. However, existing structure-based classification methods, particularly convolutional neural networks and graph-based models, often struggle to capture long-range dependencies or require large-scale datasets and extensive feature engineering. This study investigates the use of the Vision Transformer (ViT) model to classify 2D molecular structure images of compounds into cancer and cardiovascular therapy categories. A dataset containing 500 images, consisting of 250 per class, was obtained from the PubChem database, processed for consistency, and divided into 72% training, 20% testing, and 8% validation. To address the limited dataset size, careful preprocessing, regularization through weight decay, and systematic hyperparameter tuning were applied to reduce overfitting risks. The ViT model was trained with the Adam optimizer and a linear learning rate scheduler. Hyperparameters were systematically tuned to identify the optimal configuration. Results show that the best settings, with batch size 60, weight decay 0.1, learning rate 3.0×10⁻⁶, and 15 epochs, achieve an accuracy, F1 score, and loss of 80.0%, 79.9%, and 0.597, sequentially. These findings highlight the potential of ViT for small-scale cheminformatics tasks, offering an alternative to conventional methods while maintaining competitive performance.
Downloads
References
Ahmad, W., Simon, E., Chithrananda, S., Grand, G., & Ramsundar, B. (2022). Chemberta-2: Towards chemical foundation models. ArXiv Preprint ArXiv:2209.01712.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., & Askell, A. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901. https://arxiv.org/abs/2005.14165
Chen, Y., Leung, C. T., Huang, Y., Sun, J., Chen, H., & Gao, H. (2024). MolNexTR: a generalized deep learning model for molecular image recognition. Journal of Cheminformatics, 16. https://doi.org/10.1186/s13321-024-00926-w
Chen, Z., Xie, Y., Wu, Y., Lin, Y., Tomiya, S., & Lin, J. (2023). An Interpretable and Transferrable Vision Transformer Model for Rapid Materials Spectra Classification. Digital Discovery, 3. https://doi.org/10.1039/D3DD00198A
Chen, Z., Xie, Y., Wu, Y., Lin, Y., Tomiya, S., & Lin, J. (2024). An interpretable and transferrable vision transformer model for rapid materials spectra classification. Digital Discovery, 3(2), 369–380. https://doi.org//10.1039/D3DD00198A
Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186. https://doi.org//10.18653/v1/N19-1423
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., & Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv Preprint ArXiv:2010.11929.
Elton, D. C., Boukouvalas, Z., Fuge, M. D., & Chung, P. W. (2019). Deep learning for molecular design—a review of the state of the art. Molecular Systems Design & Engineering, 4(4), 828–849. https://doi.org//10.1039/C9ME00039A
Gangwal, A., Ansari, A., Ahmad, I., Azad, A. K., Kumarasamy, V., Subramaniyan, V., & Wong, L. S. (2024). Generative artificial intelligence in drug discovery: basic framework, recent advances, challenges, and opportunities. Frontiers in Pharmacology, 15, 1331062. https://doi.org/10.3389/fphar.2024.1331062
Gao, J., Shen, Z., Xie, Y., Lu, J., Lu, Y., Chen, S., Bian, Q., Guo, Y., Shen, L., & Wu, J. (2023). TransFoxMol: predicting molecular property with focused attention. Briefings in Bioinformatics, 24(5), bbad306. https://doi.org/10.1093/bib/bbad306
Isik, M., Saggi, M. K., Gowher, H., & Kais, S. (2025). Multimodal Quantum Vision Transformer for Enzyme Commission Classification from Biochemical Representations. https://arxiv.org/abs/2508.14844
Jia, Z., Lin, S., Gao, M., Zaharia, M., & Aiken, A. (2020). Improving the accuracy, scalability, and performance of graph neural networks with roc. Proceedings of Machine Learning and Systems, 2, 187–198. https://people.eecs.berkeley.edu/~matei/papers/2020/mlsys_roc.pdf
Jiang, J., Ke, L., Chen, L., Dou, B., Zhu, Y., Liu, J., Zhang, B., Zhou, T., & Wei, G. (2024). Transformer technology in molecular science. Wiley Interdisciplinary Reviews: Computational Molecular Science, 14(4), e1725. https://doi.org/10.1002/wcms.1725
Key, S., Sok, V., Lee, S.-W., Ko, C.-S., Nam, S.-R., & Lee, N.-H. (2019). Current Transformer Saturation Compensation Based on Deep Learning Approach. 1273–1277. https://doi.org/10.1109/APAP47170.2019.9224993
Krenn, M., Pollice, R., Guo, S. Y., Aldeghi, M., Cervera-Lierta, A., Friederich, P., dos Passos Gomes, G., Häse, F., Jinich, A., & Nigam, A. (2022). On scientific understanding with artificial intelligence. Nature Reviews Physics, 4(12), 761–769. https://doi.org/10.1038/s42254-022-00518-3
Le, N. Q. K., Yapp, E. K. Y., Ou, Y.-Y., & Yeh, H.-Y. (2019). iMotor-CNN: Identifying molecular functions of cytoskeleton motor proteins using 2D convolutional neural network via Chou’s 5-step rule. Analytical Biochemistry, 575, 17–26. https://doi.org/10.1016/j.ab.2019.03.017
Lim, S., Lee, S., Piao, Y., Choi, M., Bang, D., Gu, J., & Kim, S. (2022). On modeling and utilizing chemical compound information with deep learning technologies: A task-oriented approach. Computational and Structural Biotechnology Journal, 20, 4288–4304. https://doi.org/10.1016/j.csbj.2022.07.049
Loshchilov, I., & Hutter, F. (2017). Decoupled weight decay regularization. ArXiv Preprint ArXiv:1711.05101.
Luong, K.-D., & Singh, A. (2024). Application of transformers in cheminformatics. Journal of Chemical Information and Modeling, 64(11), 4392–4409. https://doi.org/10.1021/acs.jcim.3c02070
Masters, D., & Luschi, C. (2018a). Revisiting small batch training for deep neural networks. ArXiv Preprint ArXiv:1804.07612.
Masters, D., & Luschi, C. (2018b). Revisiting Small Batch Training for Deep Neural Networks. https://doi.org/10.48550/arXiv.1804.07612
Maziarka, Ł., Majchrowski, D., Danel, T., Gaiński, P., Tabor, J., Podolak, I., Morkisz, P., & Jastrzębski, S. (2024). Relative molecule self-attention transformer. Journal of Cheminformatics, 16(1), 3. https://doi.org/10.1186/s13321-023-00789-7
Patne, A., Dhulipala, S., Lawless, W., Prakash, S., Mohapat, S., & Mohapatra, S. (2024). Drug Discovery in the Age of Artificial Intelligence: Transformative Target-Based Approaches. International Journal of Molecular Sciences, 25, 12233. https://doi.org/10.3390/ijms252212233
Rajan, K., Brinkhaus, H. O., Zielesny, A., & Steinbeck, C. (2024). Advancements in hand-drawn chemical structure recognition through an enhanced DECIMER architecture. Journal of Cheminformatics, 16(1), 78. https://doi.org/10.1186/s13321-024-00872-7
Rajan, K., Zielesny, A., & Steinbeck, C. (2021). DECIMER 1.0: deep learning for chemical image recognition using transformers. Journal of Cheminformatics, 13(1), 61. https://doi.org/10.1186/s13321-021-00538-8
Rigden, D., & Fernandez, X. (2023). The 2023 Nucleic Acids Research Database Issue and the online molecular biology database collection. Nucleic Acids Research, 51, D1–D8. https://doi.org/10.1093/nar/gkac1186
Schwaller, P., Probst, D., Vaucher, A. C., Nair, V. H., Kreutter, D., Laino, T., & Reymond, J.-L. (2021). Mapping the space of chemical reactions using attention-based neural networks. Nature Machine Intelligence, 3(2), 144–152. https://doi.org/10.1038/s42256-020-00284-w
Smith, L. N., & Topin, N. (2019). Super-convergence: Very fast training of neural networks using large learning rates. Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, 11006, 369–386. https://doi.org/10.1117/12.2520589
Smith, L., & Topin, N. (2019). Super-convergence: very fast training of neural networks using large learning rates. 36. https://doi.org/10.1117/12.2520589
Stokes, C., Whitmore, L. S., Moreno, D., Malhotra, K., Tisoncik-Go, J., Tran, E., Wren, N., Glass, I. A., Young, J. E., & Gale, M. (2025). The human neural cell atlas of Zika virus infection in developing brain tissue. Cell Reports Medicine, 6(6). https://doi.org/10.1016/j.xcrm.2025.102189
Sultan, A., Sieg, J., Mathea, M., & Volkamer, A. (2024). Transformers for molecular property prediction: Lessons learned from the past five years. Journal of Chemical Information and Modeling, 64(16), 6259–6280. https://doi.org/10.1021/acs.jcim.4c00747
Tay, D., Yeo, N., Adaikkappan, K., Lim, Y. H., & Ang, S. (2023). 67 million natural product-like compound database generated via molecular language processing. Scientific Data, 10. https://doi.org/10.1038/s41597-023-02207-x
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, 30.
Wang, Y., Bryant, S., Cheng, T., Wang, J., Gindulyte, A., Shoemaker, B., Thiessen, P., & Zhang, J. (2016). PubChem BioAssay: 2017 update. Nucleic Acids Research, 45. https://doi.org/10.1093/nar/gkw1118
Wang, Y., Bryant, S. H., Cheng, T., Wang, J., Gindulyte, A., Shoemaker, B. A., Thiessen, P. A., He, S., & Zhang, J. (2017). Pubchem bioassay: 2017 update. Nucleic Acids Research, 45(D1), D955–D963. https://doi.org/10.1093/nar/gkw1118
Wang, Y., Li, Z., & Barati Farimani, A. (2023). Graph neural networks for molecules. In Machine learning in molecular sciences (pp. 21–66). Springer. https://doi.org/10.1007/978-3-031-37196-7_2
Wu, N., Green, B., Ben, X., & O’Banion, S. (2020). Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case. https://doi.org/10.48550/arXiv.2001.08317
Xu, Z., Li, J., Yang, Z., Li, S., & Li, H. (2022). SwinOCSR: end-to-end optical chemical structure recognition using a Swin Transformer. Journal of Cheminformatics, 14(1), 41. https://doi.org/10.1186/s13321-022-00624-5
Ye, G. (2024). De novo drug design as GPT language modeling: large chemistry models with supervised and reinforcement learning. Journal of Computer-Aided Molecular Design, 38(1), 20. https://doi.org/10.1007/s10822-024-00559-z
Zhang, R., Nolte, D., Sanchez, C., Ghosh, S., & Pal, R. (2024). Topological regression as an interpretable and efficient tool for quantitative structure-activity relationship modeling. Nature Communications, 15. https://doi.org/10.1038/s41467-024-49372-0
Zhang, R., Wu, C., Yang, Q., Liu, C., Wang, Y., Li, K., Huang, L., & Zhou, F. (2024). MolFeSCue: enhancing molecular property prediction in data-limited and imbalanced contexts using few-shot and contrastive learning. Bioinformatics, 40(4), btae118. https://doi.org/10.1093/bioinformatics/
Zhang, X.-C., Wu, C., Ant, W., Zeng, X.-X., Yang, C.-Q., Lu, A.-P., Hou, T., & Cao, D.-S. (2022). Pushing the Boundaries of Molecular Property Prediction for Drug Discovery with Multitask Learning BERT Enhanced by SMILES Enumeration. Research, 2022. https://doi.org/10.34133/research.0004
Zhang, X.-C., Wu, C.-K., Yi, J.-C., Zeng, X.-X., Yang, C.-Q., Lu, A.-P., Hou, T.-J., & Cao, D.-S. (2022). Pushing the boundaries of molecular property prediction for drug discovery with multitask learning BERT enhanced by SMILES enumeration. Research, 2022, 0004. https://doi.org/10.34133/research.0004
Zhavoronkov, A., Ivanenkov, Y. A., Aliper, A., Veselov, M. S., Aladinskiy, V. A., Aladinskaya, A. V, Terentiev, V. A., Polykovskiy, D. A., Kuznetsov, M. D., & Asadulaev, A. (2019). Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nature Biotechnology, 37(9), 1038–1040. https://doi.org/10.1038/s41587-019-0224-x




CITEDNESS IN SCOPUS
CITEDNESS IN WOS




