查詢結果分析
來源資料
頁籤選單縮合
題 名 | Structural Feature Extraction for Table-form Documents=表格文件的結構特徵擷取 |
---|---|
作 者 | 曾定章; 莊東昇; 陳旭楠; | 書刊名 | 電腦學刊 |
卷 期 | 10:1 1998.03[民87.03] |
頁 次 | 頁46-58 |
分類號 | 312.1 |
關鍵詞 | 文件分析; 表格文件處理; 結構分析; 虛實線擷取; 欄位擷取; Document analysis; Table-form documents; Structure analysis; Line extraction; Field extraction; |
語 文 | 英文(English) |
中文摘要 | 我們曾提出一個表格文件分析系統,其中包含四大模組。本論文是這個系統第二 模組的新版本,其主要的研究內容包括:(i)圖文分離,(ii)表格實線擷取,(iii)表格虛線 擷取,(iv)表格斷線連結,(v)欄位擷取,及(vi)利用表格線條的相關性進一步擷取破損嚴 重的表格線條及欄位。本研究的特點在於:(i)以連結區塊為基礎做圖文分離可加快表格線條 的擷取,(ii)表格虛線的擷取提高了本模組的實用性(在大部份的相關研究中都沒考慮虛線 擷取),(iii)利用表格線條的相關性來擷取破損嚴重的表格線條及欄位,其意義是利用欄位 的封閉性來確認破損的表格線條,再利用確認出來的線條來擷取不完整的欄位,及(iv)本模 組只用到一個根據表格自行產生的參數,因此本模組適用於各式表貉文件的處理。在實驗中, 我們藉由一些實際的表格來驗證這個模組的可行性及穩定性。 |
英文摘要 | We have proposed a table-form document analysis system. The system consists of four modules. This work is just the new version of the second module. In this module, we extract table-form structures: lines and fields. After connected components are generated, an original image is processed by the following steps: (i) form/text separation, (ii) solid-line extraction, (iii) dashed-line extraction, (iv) broken-line connection, (v) field extraction, and (vi) mutual re-construction of lines and fields. The features of this work include: (i) the form/text separation is based on connected components, thus the line-extraction process is fast, (ii) dashed-line extraction is considered, thus the module is high practical (most related researches don't include the function), (iii) the mutual re-construction of lines and fields means that we use the closure property of a field to identify the heavily broken lines and then use the identified broken lines to extract the incomplete field, and (iv) only one parameter is used, thus the module is adaptable to various table forms. In experiments, several table-form documents are processed to evaluate the performance of the proposed structural-feature extraction for table-form documents. |
本系統中英文摘要資訊取自各篇刊載內容。