Technique for searching tabular form documents using metadata harvested by table structure analysis

Isaac Okada, Minoru Saito, Yoshiaki Oida, Hiroyuki Yamato, Kazuo Hiekata, Satoru Nakamura, Naoto Fukada

Abstract


Conducting full-text searches on collections of tabular files, in which a single sheet corresponds to a single document and eachfile consists of multiple sheets, typically involves retrieving many candidate files that include the search terms. Opening eachof these tabular files to determine whether it is the desired sheet is labor-intensive. Searching with high precision thus requiresexpert intuition born of operational experience. Therefore, it would be advantageous to enable the pinpointing of desireddocuments with greater accuracy regardless of the operator’s level of experience.In the present study, we propose a method in which operational classifications are assigned as metadata on the basis of thetable structure of a sheet. We obtain the table structure of the sheet and assign metadata based on a set of rules establishedindividually for each pattern in the structure. We propose two methods for representing the table structures obtained: a methodusing node property matrix, and a method in which positional data regarding cells containing specific operation-description dataare indexed. Comparing the results of searches that use assigned metadata to the results of traditional full-text searches revealsthat our method has greater search accuracy.


Full Text: PDF DOI: 10.5430/air.v3n1p46

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.

Artificial Intelligence Research

ISSN 1927-6974 (Print)   ISSN 1927-6982 (Online)

Copyright © Sciedu Press 
To make sure that you can receive messages from us, please add the 'Sciedu.ca' domain to your e-mail 'safe list'. If you do not receive e-mail in your 'inbox', check your 'bulk mail' or 'junk mail' folders.