查詢結果分析
來源資料
頁籤選單縮合
| 題 名 | 一種多模型融合的中文古籍OCR後處理方法=A Post-OCR Method of Multi-Model Ensemble for Chinese Ancient Scriptures |
|---|---|
| 作 者 | 釋賢超; | 書刊名 | 數位典藏與數位人文 |
| 卷 期 | 11 2023.04[民112.04] |
| 頁 次 | 頁83-104 |
| 分類號 | 312.831 |
| 關鍵詞 | 古籍; 模型融合; 版面分析; 深度學習; Post-OCR; Ancient scriptures; Model ensemble; Layout analysis; Deep learning; |
| 語 文 | 中文(Chinese) |
| DOI | 10.6853/DADH.202304_(11).0003 |
| 中文摘要 | 本文提出一種多模型融合的OCR後處理方法,採用獨特的版面分析和對齊算法,整合了整頁檢測模型、字識別模型、列識別模型與語言預訓練模型等深度學習模型,實現了超越單一模型的效果。全文錯誤率達到1.64%,僅為單一模型平均錯誤率的23%。在各類常規古籍版式場景中,該方法具有較好的泛用性。 |
| 英文摘要 | This paper proposes a post-OCR method of multi-model ensemble, which uses a unique layout analysis and alignment algorithms, and integrate different types of deep learning models, such as the full-page character detection model, character recognition model, line recognition model and language pre-training model, and achieves effects beyond a single model. The full-text error rate reaches 1.64%, which is only 23% of the average error rate of a single model. In various conventional ancient book layout scenarios, this method has good generalization. |
本系統中英文摘要資訊取自各篇刊載內容。