查詢結果分析
來源資料
頁籤選單縮合
題名 | 一種多模型融合的中文古籍OCR後處理方法=A Post-OCR Method of Multi-Model Ensemble for Chinese Ancient Scriptures |
---|---|
作者姓名(中文) | 釋賢超; | 書刊名 | 數位典藏與數位人文 |
卷期 | 11 2023.04[民112.04] |
頁次 | 頁83-104 |
分類號 | 312.831 |
關鍵詞 | 古籍; 模型融合; 版面分析; 深度學習; Post-OCR; Ancient scriptures; Model ensemble; Layout analysis; Deep learning; |
語文 | 中文(Chinese) |
DOI引用網址 | 10.6853/DADH.202304_(11).0003 |
中文摘要 | 本文提出一種多模型融合的OCR後處理方法,採用獨特的版面分析和對齊算法,整合了整頁檢測模型、字識別模型、列識別模型與語言預訓練模型等深度學習模型,實現了超越單一模型的效果。全文錯誤率達到1.64%,僅為單一模型平均錯誤率的23%。在各類常規古籍版式場景中,該方法具有較好的泛用性。 |
英文摘要 | This paper proposes a post-OCR method of multi-model ensemble, which uses a unique layout analysis and alignment algorithms, and integrate different types of deep learning models, such as the full-page character detection model, character recognition model, line recognition model and language pre-training model, and achieves effects beyond a single model. The full-text error rate reaches 1.64%, which is only 23% of the average error rate of a single model. In various conventional ancient book layout scenarios, this method has good generalization. |
本系統之摘要資訊系依該期刊論文摘要之資訊為主。