Conference Proceedings
The International Conference on Emerging Technologies in Electronics, Computing and Communication 2022
(ICETECC`22)
Vision Transformer-based Approach for Classification of Buddhist and Non-Buddhist Heritage Sites in Taxila
Muhammad Sohail Abbas1*; Sonain Jamil2; Asif Muhammad3;1Department of Data Science and Artificial Intelligence, National University of Computer and Emerging Sciences, Islamabad, Pakistan 2Department of Computer Science, Norwegian University of Science and Technology (NTNU), Gjøvik 2815, Norway 3Department of Software Engineering, National University of Computer and Emerging Sciences, Islamabad, Pakistan |
ABSTRACT
Preserving cultural heritage requires accurate classification of diverse artefacts to enable efficient analysis. In this paper, we present a unique dataset curated to classify cultural heritage sites in Taxila city, Pakistan, which was the capital of the ancient Gandhara civilization. The target is to classify Buddhist heritage sites. The dataset consists of different categories including images of Takshasila University, Sirkap, Giri Fort, Taxila Ruins, Jaulian Buddhist Monastery, Dharmarajika Stupa and Monastery, and non-Buddhist heritage sites. The dataset was captured with varying illumination, angles, and lighting to ensure the model was well-generalized. For nonBuddhist data samples, we included images of old buildings and a few heritage sites that are not from Buddhist civilization. After capturing the dataset, we used a vision transformer (ViT) to classify images. We also compared the performance of ViT with conventional convolutional neural networks such as ResNet and EfficientNet. The experimental results show that ViT outperforms CNNs in classifying cultural heritage sites by achieving 100% accuracy in test sample.