Logo image
Heterogeneous graph based parallel and distributed data management system for application in fraud detection
Dissertation   Open access

Heterogeneous graph based parallel and distributed data management system for application in fraud detection

Hongming Zhu
Doctor of Philosophy (PHD), University of Bolton
07/01/2022

Abstract

With the rapid development of IT technologies, most organizations have their own IT system to record different kinds of data in an organization. Individuals also collect and share data through the Internet. How to manage such a large amount of data and how to make a good use of it becomes a key challenge in knowledge management. As datasets become larger and more complex the ability to process them efficiently without loss or misinterpretation of information becomes a major challenge with current knowledge management systems. The thesis describes a knowledge management system, which attempts to overcome these problems by developing a new and innovative system architecture and algorithms which will enable distributed data storage and parallel processing. The contribution to knowledge will be: 1). develop a basis for data management system architecture based on heterogeneous graph, distributed storage and parallel processing technologies. 2). Develop a basis for knowledge finding an sharing system based on entropy based clustering and attribute selection technologies. 3). Develop a fraud detection prototype system which will used the proposed data management and knowledge finding method. The developed new and innovative system architecture consists of two parts: storage subsystem and knowledge clustering subsystem. The storage system focuses on two key issues: avoidance of data duplication, and optimization for parallel processing. Since the volume of the dataset may be very large, the storage sub-system has to avoid data duplication and needs to be located quickly. This is achieved by the use of multi dataset schemas to describe the dataset and index the dataset. The index associated with each schema enables the data to be located rapidly. In order to optimize for parallel processing, distributed storage and index technologies are incorporated into the system. In the knowledge clustering subsystem, a heterogeneous graph is used to describe a body of knowledge with the node representing individual components of the knowledge and the edge representing the linkage between those components. The thesis proposes a feature selection model, combines the graph attribute and graph structure together. The model for the heterogeneous knowledge graph can deal with the incomplete attributes across knowledge and different types of link, according to a user specified attribute parameters. The thesis gives an example of a prototype knowledge management system for fraud detection to combine all the ideas together and evaluate all the proposed ideas
pdf
Hongming Zhu Thesis-final.pdfDownloadView
Submitted Open Access

Metrics

42 File views/ downloads
17 Record Views

Details

Logo image

Usage Policy