6 October 2021

Automated Manufacturing Engineering Research and Innovation Intelligent and Autonomous Systems SYNCHROMEDIA – Multimedia Communication in Telepresence

A Pattern Segmentation Model for Ancient Manuscripts

Purchased on Istockphoto.com. Copyright.

SUMMARY

Over the past few years, there has been a significant interest in digitalizing documents, especially ancient manuscripts, as a vital source in history and social development. Since historical documents have deteriorated and aged, copying them on the online network has solved the problem of preservation and access. In terms of analyzing such data—due to the complex characteristics of historical documents—one challenge is to find the essential patterns in documents for users who do not have the proper tools. Thus, we propose a practical deep learning method to simultaneously segment vital information in historical documents. This mechanism could offer an accurate way for scientists to decipher the secrets behind ancient manuscripts. The proposed approach takes advantage of the potential generative adversarial network to enhance the decent pattern quality on such documents.

Critical Features in Historical Documents

Unlike other documents, ancient manuscript analysis is an active search field [1]. Each historical document contains various vital objects, including the caption, table, drawing, floating word, paragraph, and page. Figure 1 shows a clear view of the critical features in a document.

Segmenting objects in an ancient manuscript

Figure 1: Different objects in an ancient manuscript.

Segmentation Descriptor

The fundamental idea of document image segmentation is to group the extracted pixels. Features are represented by the spatial similarities between the different pixels in a given region. In such cases, segmentation means the separation process of a digital image into multiple areas (sets of pixels) to obtain a representation of the image into something more meaningful and easier to analyze.

Segmentation Challenges of Historical Documents

Segmentation techniques have been created to offset the problem of extracting visual elements. Most object segmentation approaches are based on supervised learning, and they require labels for each object, which increases processing time and needs specific expertise to annotate the data. This annotation process increases the potential for error associated with an input document image.

Another difficulty with supervised learning methods is the lack of sufficient historical document images in order to reach high accuracy in the segmentation of different objects. Although object detection and segmentation approaches are mainly based on supervised learning, we propose segmenting such entities in an unsupervised manner. Besides, our proposed method can generate an artificial dataset to eliminate limitations in historical document images.

Proposed Framework

This paper proposes a two-fold model representing the different objectives, including generating high-quality images and simultaneously segmenting various objects. We also provide a hybrid objective function that allows the user to apply the results (learning rate, weight, and bias) from the first fold to the second fold in order to reduce the processing time. After processing the proposed model, the optimal feature extraction is obtained.

Figure 2: Architecture for the analysis of historical documents.

Proposed Image Augmentation Using the Generative Adversarial Networks (GAN) Method

The first stage of the proposed method takes advantage of two neural networks. The first network reconstructs the fake images that are improved versions of real historical document images. The second neural network is in charge of comparing the generated images versus the real images. In other words, the second network is a classification network that looks at the real document images, the generator’s output, and decides whether it is a real or fake image. This manipulation continues between the fake images until the model believes that the generated images are real images. These reconstructed images can then be considered as our new resources for the segmentation task.

Proposed Unsupervised Segmentation Method

In order to accurately locate the region of interest of different objects in an ancient manuscript, a 2-dimensional convolutional neural network was explored. Since the number of objects is unknown in each document, the most iterated pixels are selected as clusters, and the neighbour pixels would be considered as a segment. This process benefits from the k-means clustering approach to generate superpixels. The network could then assign superpixels to the different features of a document image.

Results

For the evaluation phase, we assessed our proposed approach using three different datasets. The F1-score was used as an evaluation metrics to measure the true feature between the real document image and the generated predicted mask.

F1-Score

Figure 3 shows some of the qualitative results. The proposed approach was used in three different datasets and could segment further information in ancient manuscripts.

Segmented objects in ancient manuscripts

Figure 3: Samples of generated data and segmented features.
a) segmented Page, b) segmented Ornament, c) segmented Character

Conclusion

In this paper, we propose a new document segmentation technique based on deep learning that can segment different features simultaneously. The augmentation task reconstructs the high-quality images while the segmentation represents various objects. Such an approach is also able to split images into a separate partition of objects to speed up the recognition tasks and yield better performance in terms of analysis.

Additional Information

For more information on this research, please read the following research paper:

Tamrin, M.O. and Cheriet, M., 2021, January. Simultaneous detection of regular patterns in ancient manuscripts using GAN-Based deep unsupervised segmentation. In International Conference on Pattern Recognition

Portes ouvertes