An Ensemble Method to Enhance Receipt Identification in Consumer Communication Using Generative AI and LSTM

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The rapid advancement in digital communication presents challenges in managing and categorizing large volumes of emails, particularly transactional receipts. This study introduces an ensemble method leveraging two sophisticated NLP technologies: the Generative Pre-trained Transformer 2 (GPT-2) and Long Short-Term Memory (LSTM) networks. The research began with acquiring an extensive, anonymized dataset from Circana, incorporating diverse real-world consumer data, including purchases, promotions, and e-commerce transactions across various retail sectors. The dataset analyzed subject lines, generated text, and receipt labels, reflecting genuine consumer transactions and offering insights into consumer messaging. The approach involved extracting and preprocessing subject lines and message previews from emails. The GPT-2 model was trained on clean email messages to generate text for initial analysis, enriching the contextual understanding of the dataset. Subsequently, the clean subject lines and generated text were passed to an LSTM model, leveraging its sequential data processing capabilities to accurately identify receipt-containing emails. The performance of this ensemble method was assessed on a diverse email dataset, focusing on precision, recall, F1 score, and overall accuracy in identifying receipt-containing emails. The findings indicated a significant 57.11% enhancement in detecting transactional receipts and a 3.24% reduction in misclassifying non-receipt content when applying the LSTM model to both clean subject lines and generated texts. This study showcases the potential of integrating generative AI with deep learning techniques for email classification tasks. Collaboration with Circana enabled leveraging a significant repository of authentic data, crucial for thorough research and analysis of consumer behavior and market trends.

Original languageEnglish
Title of host publicationProceedings - 2024 International Conference on Information Technology and Computing, ICITCOM 2024
EditorsHsing-Chung Chen, Mohd Yusoff Bin Mashor, Cahya Damarjati, Yessi Jusman, Nurwahyu Alamsyah
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages18-23
Number of pages6
ISBN (Electronic)9798350379839
DOIs
Publication statusPublished - 2024
Event2024 International Conference on Information Technology and Computing, ICITCOM 2024 - Hybrid, Yogyakarta, Indonesia
Duration: 7 Aug 20248 Aug 2024

Publication series

NameProceedings - 2024 International Conference on Information Technology and Computing, ICITCOM 2024

Conference

Conference2024 International Conference on Information Technology and Computing, ICITCOM 2024
Country/TerritoryIndonesia
CityHybrid, Yogyakarta
Period7/08/248/08/24

Keywords

  • Consumer Communication
  • Generative AI
  • GPT-2
  • LSTM

Fingerprint

Dive into the research topics of 'An Ensemble Method to Enhance Receipt Identification in Consumer Communication Using Generative AI and LSTM'. Together they form a unique fingerprint.

Cite this