Unsupervised noise detection in unstructured data for automatic parsing

Shubham Jain, Amy De Buitleir, Enda Fallon

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Citations (Scopus)

Abstract

The telecommunications industry makes extensive use of data extracted from logs, alarms, traces, diagnostics, and other monitoring devices. Analyzing the generated data requires that the data be parsed, re-structured, and re-formatted. Developing custom parsers for each input format is labor-intensive and requires domain knowledge. In this paper, we describe a novel unsupervised text processing pipeline to automatically detect and label relevant data and eliminate noise using Levenshtein similarity and Agglomerative clustering. We experiment with different similarity and clustering algorithms on a selection of common data formats to verify the accuracy of the proposed technique. The results suggest that the proposed methodology has higher accuracy.

Original languageEnglish
Title of host publication16th International Conference on Network and Service Management, CNSM 2020, 2nd International Workshop on Analytics for Service and Application Management, AnServApp 2020 and 1st International Workshop on the Future Evolution of Internet Protocols, IPFuture 2020
EditorsNur Zincir-Heywood, Mehmet Ulema, Muge Sayit, Stuart Clayman, Myung-Sup Kim, Cihat Cetinkaya
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9783903176317
ISBN (Print)9783903176317
DOIs
Publication statusPublished - 2 Nov 2020
Event16th International Conference on Network and Service Management, CNSM 2020, 2nd International Workshop on Analytics for Service and Application Management, AnServApp 2020 and 1st International Workshop on the Future Evolution of Internet Protocols, IPFuture 2020 - Virtual, Izmir, Turkey
Duration: 2 Nov 20206 Nov 2020

Publication series

Name16th International Conference on Network and Service Management, CNSM 2020, 2nd International Workshop on Analytics for Service and Application Management, AnServApp 2020 and 1st International Workshop on the Future Evolution of Internet Protocols, IPFuture 2020

Conference

Conference16th International Conference on Network and Service Management, CNSM 2020, 2nd International Workshop on Analytics for Service and Application Management, AnServApp 2020 and 1st International Workshop on the Future Evolution of Internet Protocols, IPFuture 2020
Country/TerritoryTurkey
CityVirtual, Izmir
Period2/11/206/11/20

Keywords

  • Clustering
  • Information Extraction
  • Similarity
  • Unsupervised Data Mining

Fingerprint

Dive into the research topics of 'Unsupervised noise detection in unstructured data for automatic parsing'. Together they form a unique fingerprint.

Cite this