THAI TEXT-TO-IMAGE PROMPT ENGINEERING BY PRE-TRAINED LARGE LANGUAGE WITH STABLE DIFFUSION MODEL

Volume 6 (2), December 2023, Pages 171-190

Pakpoom Mookdarsanit, Lawankorn Mookdarsanit


Chandrakasem Rajabhat University, Bangkok, Thailand, This email address is being protected from spambots. You need JavaScript enabled to view it., This email address is being protected from spambots. You need JavaScript enabled to view it.


Abstract

Text-to-image (T2I) generation is a new area of large language models (LLMs), a type of prompt engineering involving inputting a textual description to generate an image. To shift a new paradigm of Thai natural language processing (Thai-NLP), this paper first presents state-of-the-art Thai Text-to-Image prompt engineering (TH-T2I) to translate Thai text into a semantic image according to the semantic Thai textual description. The pre-trained SCB-MT-EN-TH model is employed for Text-to-Text (T2T) translation. Moreover, the image generation is done according to a semantic text prompt by a stable diffusion model. The T2T is evaluated by Bi-lingual Evaluation Understudy (BLEU), while T2I is done by Inception and Frechet Inception Distance (FID). The images generated by TH-T2I were of high quality, as measured by Inception and FID. TH-T2I contributes to a T2I baseline model in Thai, preserving the Thai cultural language on digital heritage.

Keywords:

Text-to-Image Translation, Thai Prompt Engineering, Stable Diffusion Model, Image Generation.

DOI: https://doi.org/10.32010/26166127.2023.6.2.171.190

 

 

 

Reference 

Arreerard, R., Mander, S. & Piao, S. (2022). Survey on Thai NLP language resources and tools. In Proceedings of the 13th Conference on Language Resources and Evaluation (6495-6505). ACL.

Boonkwan, P. & Supnithi, T. (2017). Bidirectional deep learning of context representation for joint word segmentation and POS tagging. In Proceedings of the 5th International Conference on Computer Science, Applied Mathematics and Applications (184-196). Berlin, Germany: Springer.

Emsawas, T. & Kijsirikul, B. (2016). Thai Printed Character Recognition using Long Short-Term Memory and Vertical Component Shifting. In Proceedings of 14th Pacific Rim International Conference on Artificial Intelligence (106-115). Phuket, Thailand: Sprinker.

Haruechaiyasak, C., Kongthon, A., Palingoon, P., & Trakultaweekoon, K. (2013). S-Sense: A sentiment analysis framework for social media sensing. In Proceedings of the 6th International Joint Conference on Natural Language Processing (6-13). Nagoya, Japan: The Association for Computational Linguistic

Ho, J., Jain, A., & Abbeel, P. (2020). Denoising diffusion probabilistic models, arXiv: 2006.11239.

Inthajakra, L., Prachyapruit, A. & Chantavanich, S. (2016). The Emergence of communication intellectual history in Sukhothai and Ayutthaya kingdom of Thailand. Social Science Asia, 2(4), 32-41.

Ketui, N., Theeramunkong, T. & Onsuwan, C. (2013). Thai news text summarization and its application. In Proceedings of the 2013 International Symposium on Natural Language Processing, Phuket, Thailand : AIAT.

Klahan, A., Pannoi, S., Uewichitrapochana, P. & Wiangsripanawan, R. (2018). Thai word safe segmentation with bounding extension for data indexing in search engine. In Proceedings of the 14th International Conference on Computing and Information Technology (83-92). Chiang Mai, Thailand: Springer.

Koanantakool, T., Karoonboonyanan, T. & Wutiwiwatchai, C. (2009). Computers and the Thai Language. IEEE Annals of the History of Computing, 31(1), 46-61.

Kobchaisawat, T., Chalidabhongse, T. H. & Satoh, S. (2020). Scene text detection with polygon offsetting and border augmentation. Electronics, 9(1), 117.

Lapjaturapit, T., Viriyayudhakom, K. & Theeramunkong, T. (2018). Multi-Candidate word segmentation using bi-directional LSTM neural networks. In Proceedings of the 2018 International Conference on Embedded Systems and Intelligent Technology & International Conference on Information and Communication Technology for Embedded Systems (1-6). Khon Kaen, Thailand: IEEE

Lee, S., Hoover, B., Strobelt, H., Wang, Z. J., Peng, S. Y., Wright, A., Li, K., Park, H., Yang, H. & Chau, D. H. (2023). Diffusion explainer: visual explanation for text-to-image stable diffusion, arXiv: 2305.03509.

Lowphansirikul, L., Polpanumas, C., J Rutherford, A. T. & Nutanong, S. (2020). scb-mt-en-th-2020: A Large English-Thai Parallel Corpus, arXiv: 2007.03541.

Lowphansirikul, L., Polpanumas, C., J Rutherford, A. T. & Nutanong, S. (2022). A large English–Thai parallel corpus from the web and machine-generated text. Language Resources and Evaluation. 56(2), 477-499.

Lowphansirikul, L., Polpanumas, C., Jantrakulchai, N. & Nutanong, S. (2021). WangchanBERTa: Pretraining transformer-based Thai Language Models, arXiv: 2101.09635.

Mookdarsanit, L. & Mookdarsanit, P. (2019a). SiamFishNet: The deep investigation of Siamese fighting fishes. International Journal of Applied Computer Technology and Information Systems, 8(2), 40-46.

Mookdarsanit, L. & Mookdarsanit, P. (2019b). Thai herb identification with medicinal properties using convolutional neural network. Suan Sunandha Science and Technology Journal, 6(2), 34-40.

Mookdarsanit, L. & Mookdarsanit, P. (2020a). An adversarial perturbation technique against reCaptcha image attacks. Journal of Science and Technology Buriram Rajabhat University, 4(1), 33-45.

Mookdarsanit, L. & Mookdarsanit, P. (2020b). The insights in computer literacy toward HR intelligence: some associative patterns between IT subjects and job positions. Journal of Science and Technology RMUTSB, 4(2), 12-23 .

Mookdarsanit, L. & Mookdarsanit, P. (2021a). Combating the hate speech in Thai textual memes. Indonesian Journal of Electrical Engineering and Computer Science, 21(3), 1493-1502.

Mookdarsanit, L. & Mookdarsanit, P. (2021b). ThaiWritableGAN: Handwriting generation under given information. International Journal of Computing and Digital Systems, 10(1), 689-699.

Mookdarsanit, L. & Mookdarsanit, P. (2022). Thai NLP-based Text Classification of the 21st-century Skills toward Educational Curriculum and Project Design. International Journal of Applied Computer Technology and Information Systems, 11(2), 62-67.

Mookdarsanit, L. & Mookdarsanit, P. (2023). The cosmetic surgery recommendation: Facial acne localization and recognition. International Journal of Applied Computer Technology and Information Systems, 12(2), 1-6.

Mookdarsanit, L. (2020). The intelligent genuine validation beyond online Buddhist amulet market. International Journal of Applied Computer Technology and Information Systems, 9(2),7-11.

Mookdarsanit, P. & Mookdarsanit, L. (2018a). A content-based image retrieval of Muay-Thai folklores by salient region matching. International Journal of Applied Computer Technology and Information Systems, 7(2), 21-26.

Mookdarsanit, P. & Mookdarsanit, L. (2018b). An automatic image tagging of Thai dance’s gestures. In Proceedings of Joint Conference on ACTIS & NCOBA (76-80). Ayutthaya, Thailand.

Mookdarsanit, P. & Mookdarsanit, L. (2018c). Contextual image classification towards metadata annotation of Thai-tourist attractions. ITMSoc Transactions on Information Technology Management, 3(1), 32-40. 

Mookdarsanit, P. & Mookdarsanit, L. (2018d). Name and recipe estimation of Thai-desserts beyond image tagging. Kasem Bundit Engineering Journal, 8(Special Issue), 193-203.

Mookdarsanit, P. & Mookdarsanit, L. (2019). TGF-GRU: A cyber-bullying autonomous detector of lexical Thai across social media. NKRAFA Journal of Science and Technology, 15, 50-58.

Mookdarsanit, P. & Mookdarsanit, L. (2020a). Thai-IC: Thai image captioning based on CNN-RNN architecture. International Journal of Applied Computer Technology and Information Systems, 10(1), 40-45.

Mookdarsanit, P. & Mookdarsanit, L. (2020b). ThaiWrittenNet: Thai handwritten script recognition using deep neural networks. Azerbaijan Journal of High Performance Computing, 3(1), 75-93.

Mookdarsanit, P. & Mookdarsanit, L. (2020c). The autonomous nutrient and calorie analytics from a Thai food image. Journal of Faculty Home Economics Technology RMUTP, 2(1), 1-12.

Mookdarsanit, P. & Mookdarsanit, L. (2021a). PhosopNet: An improved grain localization and classification by image augmentation. TELKOMNIKA Telecommunication, Computing, Electronics and Control, 19(2), 479-490.

Mookdarsanit, P. & Mookdarsanit, L. (2021b). The COVID-19 fake news detection in Thai social texts. Bulletin of Electrical Engineering and Informatics, 10(2), 988-998.

Mookdarsanit, P. & Rattanasiriwongwut, M. (2017a). GPS determination of Thai-temple arts from a aingle photo. In Proceedings of 11th International Conference on Applied Computer Technology and Information Systems (42-47). Bangkok, Thailand.

Mookdarsanit, P. & Rattanasiriwongwut, M. (2017b). Location estimation of a photo: a Geo-signature MapReduce workflow. Engineering Journal, 21(3), 295-308.

Mookdarsanit, P. & Rattanasiriwongwut, M. (2017c). MONTEAN Framework: a magnificent outstanding native-Thai and ecclesiastical art network. International Journal of Applied Computer Technology and Information Systems, 6(2), 17-22.

Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. (2021). High-resolution image synthesis with latent diffusion models, arXiv: 2112.10752.

Ruangrajitpakorn, T. (2006). An example-based machine translation: a case study of translating stock reports from thai to english [Master’s thesis, Chulalongkorn University]. Graduate School, Chulalongkorn University.

Soimart, L. & Mookdarsanit, P. (2016). Gender estimation of a portrait: Asian facial-significance framework. In Proceedings of the 6th International Conference on Sciences and Social Sciences. Mahasarakham, Thailand.

Soimart, L. & Mookdarsanit, P. (2017a). Ingredients estimation and recommendation of Thai-foods. SNRU Journal of Science and Technology, 9(2), 509-520.

Soimart, L. & Mookdarsanit, P. (2017b). Name with GPS auto-tagging of Thai-tourist attractions from an image. In Proceedings of the 2nd Technology Innovation Management and Engineering Science International Conference (211-217). Nakhon Pathom, Thailand.

Sornlertlamvanich, V. (2019). Natural language processing research in Thai context - A 29-year journey of Thai NLP. Retrieved from: https://www.slideshare.net/virach/nlp-historythaivirach20191025

Sriwirote, P., Thapiang, J., Timtong, V. & Rutherford, A. T. (2023). PhayaThaiBERT: Enhancing a Pretrained Thai Language Model with Unassimilated Loanwords, arXiv: 2311.12475.

Sutthaluang, N. & Prakancharoen, S. (2020). Prediction and protection of car driving accident in urban zone. International Journal of Innovation, Creativity and Change, 14(8), 308-336.

Sutthaluang, N. (2019). An open library development for pesticide residue analytics in vegetables. International Journal of Applied Computer Technology and Information Systems, 8(2), 31-36.

Taerungruang, S. & Aroonmanakun, W. (2018). Constructing an academic Thai plagiarism corpus for benchmarking plagiarism detection systems. GEMA Online Journal of Language Studies, 18(3), 186-202 .

Tapsai, C., Unger, H. & Meesad, P. (2020). The application of Thai natural language processing. Thai Natural Language Processing, 1, 131-159.

Theeramunkong, T., Sornlertlamvanich, V., Tanhermhong,T. & Chinnan, W. (2000). Character cluster based Thai information retrieval. In Proceedings of the 2000 International Workshop on Information Retrieval with Asian Languages, (75-80), Hong Kong, China : ACM.

Tirasaroj, N. (2016). A study of word sense discrimination in Thai using latent semantic analysis [Doctoral dissertation, Chulalongkorn University]. Graduate School, Chulalongkorn University.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. & Polosukhin, I. (2017). “Attention Is All You Need,” In Proceedings of the 2017 International Conference  on  Neural  Information  Processing  Systems, (6000-6010). Long Beach, California : ACM.