TY - GEN
T1 - An Ensemble Method to Enhance Receipt Identification in Consumer Communication Using Generative AI and LSTM
AU - Hirway, Chanda
AU - Fallon, Enda
AU - Connolly, Paul
AU - Flanagan, Kieran
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - The rapid advancement in digital communication presents challenges in managing and categorizing large volumes of emails, particularly transactional receipts. This study introduces an ensemble method leveraging two sophisticated NLP technologies: the Generative Pre-trained Transformer 2 (GPT-2) and Long Short-Term Memory (LSTM) networks. The research began with acquiring an extensive, anonymized dataset from Circana, incorporating diverse real-world consumer data, including purchases, promotions, and e-commerce transactions across various retail sectors. The dataset analyzed subject lines, generated text, and receipt labels, reflecting genuine consumer transactions and offering insights into consumer messaging. The approach involved extracting and preprocessing subject lines and message previews from emails. The GPT-2 model was trained on clean email messages to generate text for initial analysis, enriching the contextual understanding of the dataset. Subsequently, the clean subject lines and generated text were passed to an LSTM model, leveraging its sequential data processing capabilities to accurately identify receipt-containing emails. The performance of this ensemble method was assessed on a diverse email dataset, focusing on precision, recall, F1 score, and overall accuracy in identifying receipt-containing emails. The findings indicated a significant 57.11% enhancement in detecting transactional receipts and a 3.24% reduction in misclassifying non-receipt content when applying the LSTM model to both clean subject lines and generated texts. This study showcases the potential of integrating generative AI with deep learning techniques for email classification tasks. Collaboration with Circana enabled leveraging a significant repository of authentic data, crucial for thorough research and analysis of consumer behavior and market trends.
AB - The rapid advancement in digital communication presents challenges in managing and categorizing large volumes of emails, particularly transactional receipts. This study introduces an ensemble method leveraging two sophisticated NLP technologies: the Generative Pre-trained Transformer 2 (GPT-2) and Long Short-Term Memory (LSTM) networks. The research began with acquiring an extensive, anonymized dataset from Circana, incorporating diverse real-world consumer data, including purchases, promotions, and e-commerce transactions across various retail sectors. The dataset analyzed subject lines, generated text, and receipt labels, reflecting genuine consumer transactions and offering insights into consumer messaging. The approach involved extracting and preprocessing subject lines and message previews from emails. The GPT-2 model was trained on clean email messages to generate text for initial analysis, enriching the contextual understanding of the dataset. Subsequently, the clean subject lines and generated text were passed to an LSTM model, leveraging its sequential data processing capabilities to accurately identify receipt-containing emails. The performance of this ensemble method was assessed on a diverse email dataset, focusing on precision, recall, F1 score, and overall accuracy in identifying receipt-containing emails. The findings indicated a significant 57.11% enhancement in detecting transactional receipts and a 3.24% reduction in misclassifying non-receipt content when applying the LSTM model to both clean subject lines and generated texts. This study showcases the potential of integrating generative AI with deep learning techniques for email classification tasks. Collaboration with Circana enabled leveraging a significant repository of authentic data, crucial for thorough research and analysis of consumer behavior and market trends.
KW - Consumer Communication
KW - Generative AI
KW - GPT-2
KW - LSTM
UR - http://www.scopus.com/inward/record.url?scp=85214684976&partnerID=8YFLogxK
U2 - 10.1109/ICITCOM62788.2024.10762484
DO - 10.1109/ICITCOM62788.2024.10762484
M3 - Conference contribution
AN - SCOPUS:85214684976
T3 - Proceedings - 2024 International Conference on Information Technology and Computing, ICITCOM 2024
SP - 18
EP - 23
BT - Proceedings - 2024 International Conference on Information Technology and Computing, ICITCOM 2024
A2 - Chen, Hsing-Chung
A2 - Mashor, Mohd Yusoff Bin
A2 - Damarjati, Cahya
A2 - Jusman, Yessi
A2 - Alamsyah, Nurwahyu
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 International Conference on Information Technology and Computing, ICITCOM 2024
Y2 - 7 August 2024 through 8 August 2024
ER -