Vision Transformer for Active Compound Function Classification Based on 2D Molecular Structures

Authors

  • Dian Eka Ratnawati Universitas Brawijaya
  • Diva Kurnianingtyas Universitas Brawijaya
  • Agus Wahyu Widodo Universitas Brawijaya
  • Rekyan Regasari Mardi Putri Universitas Brawijaya

DOI:

https://doi.org/10.37385/jaets.v7i2.9418

Keywords:

artificial intelligence, vision transformer, molecular structure classification, drug discovery, hyperparameter tuning

Abstract

Accurate classification of active compounds based on molecular structure is crucial for accelerating drug discovery while reducing laboratory costs and time. However, existing structure-based classification methods, particularly convolutional neural networks and graph-based models, often struggle to capture long-range dependencies or require large-scale datasets and extensive feature engineering. This study investigates the use of the Vision Transformer (ViT) model to classify 2D molecular structure images of compounds into cancer and cardiovascular therapy categories. A dataset containing 500 images, consisting of 250 per class, was obtained from the PubChem database, processed for consistency, and divided into 72% training, 20% testing, and 8% validation. To address the limited dataset size, careful preprocessing, regularization through weight decay, and systematic hyperparameter tuning were applied to reduce overfitting risks. The ViT model was trained with the Adam optimizer and a linear learning rate scheduler. Hyperparameters were systematically tuned to identify the optimal configuration. Results show that the best settings, with batch size 60, weight decay 0.1, learning rate 3.0×10⁻⁶, and 15 epochs, achieve an accuracy, F1 score, and loss of 80.0%, 79.9%, and 0.597, sequentially. These findings highlight the potential of ViT for small-scale cheminformatics tasks, offering an alternative to conventional methods while maintaining competitive performance.

Downloads

Download data is not yet available.

References

Ahmad, W., Simon, E., Chithrananda, S., Grand, G., & Ramsundar, B. (2022). Chemberta-2: Towards chemical foundation models. ArXiv Preprint ArXiv:2209.01712.

Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., & Askell, A. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901. https://arxiv.org/abs/2005.14165

Chen, Y., Leung, C. T., Huang, Y., Sun, J., Chen, H., & Gao, H. (2024). MolNexTR: a generalized deep learning model for molecular image recognition. Journal of Cheminformatics, 16. https://doi.org/10.1186/s13321-024-00926-w

Chen, Z., Xie, Y., Wu, Y., Lin, Y., Tomiya, S., & Lin, J. (2023). An Interpretable and Transferrable Vision Transformer Model for Rapid Materials Spectra Classification. Digital Discovery, 3. https://doi.org/10.1039/D3DD00198A

Chen, Z., Xie, Y., Wu, Y., Lin, Y., Tomiya, S., & Lin, J. (2024). An interpretable and transferrable vision transformer model for rapid materials spectra classification. Digital Discovery, 3(2), 369–380. https://doi.org//10.1039/D3DD00198A

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2019). Bert: Pre-training of deep bidirectional transformers for language understanding. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 4171–4186. https://doi.org//10.18653/v1/N19-1423

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., & Gelly, S. (2020). An image is worth 16x16 words: Transformers for image recognition at scale. ArXiv Preprint ArXiv:2010.11929.

Elton, D. C., Boukouvalas, Z., Fuge, M. D., & Chung, P. W. (2019). Deep learning for molecular design—a review of the state of the art. Molecular Systems Design & Engineering, 4(4), 828–849. https://doi.org//10.1039/C9ME00039A

Gangwal, A., Ansari, A., Ahmad, I., Azad, A. K., Kumarasamy, V., Subramaniyan, V., & Wong, L. S. (2024). Generative artificial intelligence in drug discovery: basic framework, recent advances, challenges, and opportunities. Frontiers in Pharmacology, 15, 1331062. https://doi.org/10.3389/fphar.2024.1331062

Gao, J., Shen, Z., Xie, Y., Lu, J., Lu, Y., Chen, S., Bian, Q., Guo, Y., Shen, L., & Wu, J. (2023). TransFoxMol: predicting molecular property with focused attention. Briefings in Bioinformatics, 24(5), bbad306. https://doi.org/10.1093/bib/bbad306

Isik, M., Saggi, M. K., Gowher, H., & Kais, S. (2025). Multimodal Quantum Vision Transformer for Enzyme Commission Classification from Biochemical Representations. https://arxiv.org/abs/2508.14844

Jia, Z., Lin, S., Gao, M., Zaharia, M., & Aiken, A. (2020). Improving the accuracy, scalability, and performance of graph neural networks with roc. Proceedings of Machine Learning and Systems, 2, 187–198. https://people.eecs.berkeley.edu/~matei/papers/2020/mlsys_roc.pdf

Jiang, J., Ke, L., Chen, L., Dou, B., Zhu, Y., Liu, J., Zhang, B., Zhou, T., & Wei, G. (2024). Transformer technology in molecular science. Wiley Interdisciplinary Reviews: Computational Molecular Science, 14(4), e1725. https://doi.org/10.1002/wcms.1725

Key, S., Sok, V., Lee, S.-W., Ko, C.-S., Nam, S.-R., & Lee, N.-H. (2019). Current Transformer Saturation Compensation Based on Deep Learning Approach. 1273–1277. https://doi.org/10.1109/APAP47170.2019.9224993

Krenn, M., Pollice, R., Guo, S. Y., Aldeghi, M., Cervera-Lierta, A., Friederich, P., dos Passos Gomes, G., Häse, F., Jinich, A., & Nigam, A. (2022). On scientific understanding with artificial intelligence. Nature Reviews Physics, 4(12), 761–769. https://doi.org/10.1038/s42254-022-00518-3

Le, N. Q. K., Yapp, E. K. Y., Ou, Y.-Y., & Yeh, H.-Y. (2019). iMotor-CNN: Identifying molecular functions of cytoskeleton motor proteins using 2D convolutional neural network via Chou’s 5-step rule. Analytical Biochemistry, 575, 17–26. https://doi.org/10.1016/j.ab.2019.03.017

Lim, S., Lee, S., Piao, Y., Choi, M., Bang, D., Gu, J., & Kim, S. (2022). On modeling and utilizing chemical compound information with deep learning technologies: A task-oriented approach. Computational and Structural Biotechnology Journal, 20, 4288–4304. https://doi.org/10.1016/j.csbj.2022.07.049

Loshchilov, I., & Hutter, F. (2017). Decoupled weight decay regularization. ArXiv Preprint ArXiv:1711.05101.

Luong, K.-D., & Singh, A. (2024). Application of transformers in cheminformatics. Journal of Chemical Information and Modeling, 64(11), 4392–4409. https://doi.org/10.1021/acs.jcim.3c02070

Masters, D., & Luschi, C. (2018a). Revisiting small batch training for deep neural networks. ArXiv Preprint ArXiv:1804.07612.

Masters, D., & Luschi, C. (2018b). Revisiting Small Batch Training for Deep Neural Networks. https://doi.org/10.48550/arXiv.1804.07612

Maziarka, Ł., Majchrowski, D., Danel, T., Gaiński, P., Tabor, J., Podolak, I., Morkisz, P., & Jastrzębski, S. (2024). Relative molecule self-attention transformer. Journal of Cheminformatics, 16(1), 3. https://doi.org/10.1186/s13321-023-00789-7

Patne, A., Dhulipala, S., Lawless, W., Prakash, S., Mohapat, S., & Mohapatra, S. (2024). Drug Discovery in the Age of Artificial Intelligence: Transformative Target-Based Approaches. International Journal of Molecular Sciences, 25, 12233. https://doi.org/10.3390/ijms252212233

Rajan, K., Brinkhaus, H. O., Zielesny, A., & Steinbeck, C. (2024). Advancements in hand-drawn chemical structure recognition through an enhanced DECIMER architecture. Journal of Cheminformatics, 16(1), 78. https://doi.org/10.1186/s13321-024-00872-7

Rajan, K., Zielesny, A., & Steinbeck, C. (2021). DECIMER 1.0: deep learning for chemical image recognition using transformers. Journal of Cheminformatics, 13(1), 61. https://doi.org/10.1186/s13321-021-00538-8

Rigden, D., & Fernandez, X. (2023). The 2023 Nucleic Acids Research Database Issue and the online molecular biology database collection. Nucleic Acids Research, 51, D1–D8. https://doi.org/10.1093/nar/gkac1186

Schwaller, P., Probst, D., Vaucher, A. C., Nair, V. H., Kreutter, D., Laino, T., & Reymond, J.-L. (2021). Mapping the space of chemical reactions using attention-based neural networks. Nature Machine Intelligence, 3(2), 144–152. https://doi.org/10.1038/s42256-020-00284-w

Smith, L. N., & Topin, N. (2019). Super-convergence: Very fast training of neural networks using large learning rates. Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications, 11006, 369–386. https://doi.org/10.1117/12.2520589

Smith, L., & Topin, N. (2019). Super-convergence: very fast training of neural networks using large learning rates. 36. https://doi.org/10.1117/12.2520589

Stokes, C., Whitmore, L. S., Moreno, D., Malhotra, K., Tisoncik-Go, J., Tran, E., Wren, N., Glass, I. A., Young, J. E., & Gale, M. (2025). The human neural cell atlas of Zika virus infection in developing brain tissue. Cell Reports Medicine, 6(6). https://doi.org/10.1016/j.xcrm.2025.102189

Sultan, A., Sieg, J., Mathea, M., & Volkamer, A. (2024). Transformers for molecular property prediction: Lessons learned from the past five years. Journal of Chemical Information and Modeling, 64(16), 6259–6280. https://doi.org/10.1021/acs.jcim.4c00747

Tay, D., Yeo, N., Adaikkappan, K., Lim, Y. H., & Ang, S. (2023). 67 million natural product-like compound database generated via molecular language processing. Scientific Data, 10. https://doi.org/10.1038/s41597-023-02207-x

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, 30.

Wang, Y., Bryant, S., Cheng, T., Wang, J., Gindulyte, A., Shoemaker, B., Thiessen, P., & Zhang, J. (2016). PubChem BioAssay: 2017 update. Nucleic Acids Research, 45. https://doi.org/10.1093/nar/gkw1118

Wang, Y., Bryant, S. H., Cheng, T., Wang, J., Gindulyte, A., Shoemaker, B. A., Thiessen, P. A., He, S., & Zhang, J. (2017). Pubchem bioassay: 2017 update. Nucleic Acids Research, 45(D1), D955–D963. https://doi.org/10.1093/nar/gkw1118

Wang, Y., Li, Z., & Barati Farimani, A. (2023). Graph neural networks for molecules. In Machine learning in molecular sciences (pp. 21–66). Springer. https://doi.org/10.1007/978-3-031-37196-7_2

Wu, N., Green, B., Ben, X., & O’Banion, S. (2020). Deep Transformer Models for Time Series Forecasting: The Influenza Prevalence Case. https://doi.org/10.48550/arXiv.2001.08317

Xu, Z., Li, J., Yang, Z., Li, S., & Li, H. (2022). SwinOCSR: end-to-end optical chemical structure recognition using a Swin Transformer. Journal of Cheminformatics, 14(1), 41. https://doi.org/10.1186/s13321-022-00624-5

Ye, G. (2024). De novo drug design as GPT language modeling: large chemistry models with supervised and reinforcement learning. Journal of Computer-Aided Molecular Design, 38(1), 20. https://doi.org/10.1007/s10822-024-00559-z

Zhang, R., Nolte, D., Sanchez, C., Ghosh, S., & Pal, R. (2024). Topological regression as an interpretable and efficient tool for quantitative structure-activity relationship modeling. Nature Communications, 15. https://doi.org/10.1038/s41467-024-49372-0

Zhang, R., Wu, C., Yang, Q., Liu, C., Wang, Y., Li, K., Huang, L., & Zhou, F. (2024). MolFeSCue: enhancing molecular property prediction in data-limited and imbalanced contexts using few-shot and contrastive learning. Bioinformatics, 40(4), btae118. https://doi.org/10.1093/bioinformatics/

Zhang, X.-C., Wu, C., Ant, W., Zeng, X.-X., Yang, C.-Q., Lu, A.-P., Hou, T., & Cao, D.-S. (2022). Pushing the Boundaries of Molecular Property Prediction for Drug Discovery with Multitask Learning BERT Enhanced by SMILES Enumeration. Research, 2022. https://doi.org/10.34133/research.0004

Zhang, X.-C., Wu, C.-K., Yi, J.-C., Zeng, X.-X., Yang, C.-Q., Lu, A.-P., Hou, T.-J., & Cao, D.-S. (2022). Pushing the boundaries of molecular property prediction for drug discovery with multitask learning BERT enhanced by SMILES enumeration. Research, 2022, 0004. https://doi.org/10.34133/research.0004

Zhavoronkov, A., Ivanenkov, Y. A., Aliper, A., Veselov, M. S., Aladinskiy, V. A., Aladinskaya, A. V, Terentiev, V. A., Polykovskiy, D. A., Kuznetsov, M. D., & Asadulaev, A. (2019). Deep learning enables rapid identification of potent DDR1 kinase inhibitors. Nature Biotechnology, 37(9), 1038–1040. https://doi.org/10.1038/s41587-019-0224-x

Downloads

Published

2026-06-15

How to Cite

Ratnawati, D. E., Kurnianingtyas, D., Widodo, A. W., & Putri, R. R. M. (2026). Vision Transformer for Active Compound Function Classification Based on 2D Molecular Structures. Journal of Applied Engineering and Technological Science (JAETS), 7(2), 1216-1229. https://doi.org/10.37385/jaets.v7i2.9418