求助:翻译下文,能翻多少是多少(2篇文章)
Segmentation of page images having artifacts of photocopying and scanning
L. Cinque , , a, S. Levialdia, L. Lombardib and S. Tanimotoc
a Dipartimento di Scienze dell 'Informazione, University of Rome “La Sapienza”, Via Salaria 113, 00198 Rome, Italy
b Dipartimento di Informatica e Sistemistica, University of Pavia, Via Ferrata 1, 27100 Pavia, Italy
c Department of Computer Sci. and Engineering, Box 352350, University of Washington, Seattle, WA 98195, USA
Received 18 June 1999; revised 16 February 2001; accepted 26 February 2001 Available online 11 February 2002.
Abstract
The analysis of scanned documents is important in the construction of digital libraries and paperless offices. One significant challenge is coping with artifacts of photocopying and scanning. We present a series of simple techniques for handling these difficulties. Using 125 images of the University of Washington scanned documents database, we demonstrate the effectiveness of these methods in preparing the images for segmentation by a multiresolution algorithm.
Author Keywords: Document analysis; Artifact elimination; Segmentation ; Print-through; Marginal artifact; Partial extra page; Digital library
Article Outline
1. Introduction
1.1. General motivation
1.2. Problem description
2. Previous work
3. Processing methods
3.1. Eliminating print-through
Algorithm 1—[Treatment of print-through]
3.2. Marginal artifacts and partial extra pages
Algorithm 2—[Treatment of marginal artifacts and partial extra pages]
4. Multiresolution segmentation method
4.1. Phase 1: construction of feature pyramids
4.2. Phase 2: classification of regions
5. Experimental results and discussion
5.1. Computational considerations
6. Conclusions
References
Vitae
1. Introduction
1.1. General motivation
Online digital libraries can provide improved distribution of information and more flexible access via search algorithms than can traditional print libraries. However, adding existing print materials to electronic libraries is a costly, slow process unless good automated procedures can be developed. After academic journal articles have been photocopied and/or scanned from their bound, printed versions, various artifacts have often been introduced into the images that make further analysis difficult. Either these artifacts need to be removed before further processing, or special considerations must be given to the following processing steps to make them tolerant of the artifacts. We addressed the problem of artifacts by developing means to reduce and/or eliminate them from the scanned document images prior to segmentation.
1.2. Problem description
Print-through is caused when the printing on one side of the paper is visible in the copy or scan of the other side. It can cause a page segmentation algorithm to falsely conclude that a page contains a photograph when it actually contains only text.
Marginal artifacts are caused by several phenomena in copying and scanning: (1) curvature of the page away from the glass near the binding of the publication, (2) imaging the edges of the pages behind the one being scanned, due to skew in the pages when the publication is open, (3) imaging the void beyond the boundary of the page, or (4) imaging the cover of the scanner or copier beyond the boundary of the page. Another troublesome effect is the presence of a partial extra page when the copying or scanning process captures part of the page facing the one of interest.
These artifacts typically give rise either to misinterpretations of regions in subsequent segmentation or correct identification of regions that are not part of the page of interest and therefore are unwanted.
The problem is to develop means of eliminating the unwanted artifacts in such a way that a subsequent segmentation process works correctly. Furthermore, the method should be validated on a large and realistic database of document images.
This paper is organized as follows: In Section 2 we conduct a survey of related literature. In Section 3 the segmentation algorithms that we developed are described. In Section 4 we outline our two phases of the pyramidal implementation of the proposed algorithms. In Section 5 we report experimental results and provide a detailed discussion. Finally in Section 6 we give our conclusions.
2. Previous work
Several algorithms for page segmentation have been proposed in the literature. These algorithms can be categorized into three classes: bottom–up approaches, top–down approaches and hybrid approaches.
Tipical bottom–up algorithms are the Docstrum algorithm of O’Gorman [1], the run-length smearing algorithm of Wahl et al. [2], the Voronoi diagram-based algorithm of Kise et al. [3], the segmentation algorithm of Jain and Yu [4], and the text string separation algorithm of Fletcher and Kasturi [5], while top–down algorithms are the X–Y cut algorithm of Nagy [6 and 7], the shape-directed-covers-based algorithm by Baird [8] and Baird et al. [9] and the algorithm on classification of newspaper image block based on texture analysis of Wang and Srihari [10]. Pavlidis and Zhou [11] proposed a hybrid algorithm using a split-and-merge strategy. A survey can be found in O’Gorman and Kasturi [12], in Tang et al. [13] and Jain and Yu [4].
The top–down approaches begin with expectations about what structures may appear in a page, and they proceed to identify elements at successively finer levels of granularity. On the other hand, the bottom–up approaches typically begin with individual pixels or characters, and proceed to combine them into larger units such as words, lines, graphic elements, etc., until the entire page has been analyzed.
The success of all these techniques is limited by the quality of the digital image that is input to them. While the method of [14] permits direct operation on unenhanced pixels of the scanned image, it still suffers from marginal artifacts and partial extra pages. The methods we present for automatically removing artifacts of photocopying and scanning can be used either with our own multiresolution page segmentation algorithm or with any of the other systems.
[解决办法]
看到英文我就晕
[解决办法]
要耐心,
[解决办法]
装个词霸漫漫看吧...
[解决办法]
装个词霸漫漫看吧...
[解决办法]
给rmb或者usd吧
------解决方案--------------------
d
[解决办法]
我知道CSDN上的朋友是我最坚强的后盾!
请帮我看看:http://community.csdn.net/Expert/topic/5531/5531431.xml?temp=.6267969
[解决办法]
学习
[解决办法]
~-~
[解决办法]
分割页图像经文物影印及扫描油菜cinque , ,一个国会levialdia , 油菜lombardib和S tanimotoc一个dipartimento di scienze dell 'informazione大学罗马 " La Sapienza " ,途经salaria 113号 00198罗马,意大利二dipartimento di informatica e sistemistica , Pavia大学,经编辑1 , 27100帕维亚, 意大利的三部田绪. 和工程箱352350 ,西雅图华盛顿大学,佤族98195 ,美国收到1999年6月18日; 修改2001年2月16日; 接受2001年2月26日在网上2002年2月11日. 摘要分析扫描文件是非常重要的数字图书馆建设和无纸办公室. 一个重大的挑战是应付文物影印和扫描. 我们提出了一系列简单的技术处理这些困难. 用125图像华盛顿大学扫描文件数据库 我们展示的效果,这些方法在编写图像分割的多分辨率的算法. 作者关键词:文献分析; artifact消除; 分割; 打印通过; 边际效应; 局部额外页; 数字图书馆文章概要1 . 引言1.1 . 总动力1.2 . 问题描述2 . 以前的工作3 . 加工方法3.1 . 消除打印通过算法-1-[待遇打印通过] 3.2 . 边际文物和局部加页算法-2-[5治疗边际文物和局部加页] 4 . multiresolution分割方法4.1 . 第一阶段:建造金字塔特征4.2 . 第2阶段:区域划分5 . 实验结果与讨论5.1 . computational考虑6 . 结论提vitae 1 . 引言1.1 . 一般动机网上数字图书馆可以提供更好的信息分布和更灵活的访问途经搜索算法可以比 传统的印刷图书馆. 不过,加上现有的印刷材料,电子图书馆是一个昂贵,过程缓慢,除非好的自动化程序可以发展. 经过学术期刊的文章都复印和/或扫描,其约束,印刷版本, 各种器物常常被引进的图像作进一步分析. 不论这些文物需要拆除,然后再做进一步处理, 或特殊因素必须考虑以下的处理步骤,使他们能包容遗物. 我们解决了文物的开发手段来减少和/或消除他们从扫描图像文件之前 分割. 1.2 . 问题描述打印透过是因当印刷一侧的纸张是看得见的副本或 扫描对方. 它可造成版面分割算法虚假断定一个页包含一张照片时,它实际上包含 唯一文本. 边际文物造成的几个现象复印和扫描: ( 1 )曲率页离玻璃近约束力的出版物, ( 2 )影像边缘的页面背后,一个被扫, 由于歪斜,在当页的出版物是开放的, ( 3 )影像无效范围以外的新的一页, 或( 4 )影像封面的扫描仪或复印机超越边界的页. 另一个棘手的效应是存在着一个局部多页,当复印或扫描过程记录部分 页数面临的一个兴趣. 这些遗物通常会引起任何误解的地区,在随后进行分割或正确识别区域, 不属于该网页的利益,因此是无用的. 问题是发展的手段,消除不必要的文物这种方式,在以后的分割过程 正确地运作. 此外,这种方法应该验证一个大的和现实的数据库的文件图像. 本文组织如下:在第2 ,我们进行了调查,相关文献. 在第3条的分割,我们开发的描述. 在第4节我们介绍我们两个阶段的金字塔执行算法. 在第5我们报导实验结果,并提供了详细的讨论. 终于在第6我们给我们的结论. 2 . 以前工作的几个算法版面分割已提议于文献. 这些算法可以分为三类:自下而上,自上而下的方式和混合方式. tipical自下而上的算法是docstrum算法o 'gorman [1] ,游程平滑算法wahl et al . [2] Voronoi图算法的kise et al . [3] ,分割Jain和钰[4] 与文串分离算法fletcher和kasturi [5] 而自上而下的算法是X y切割算法斯蒂娜[ 6 7 ] 形状指示复盖算法的贝尔德[8]和贝尔德et al . [9]和算法的分类报纸块基于纹理分析王srihari〔10〕. 南风和周〔11〕建议混合使用一个分开合并策略. 一项调查可以发现o 'gorman和kasturi [12] ,唐et al . [13]和Jain和俞[4] . 自上而下的做法,开始与期望什么结构可能出现在首页, 并着手确定分子在相继finer层次的粒度. 在另一方面,自下而上的方法,通常是从个人像素或人物, 进而结合成较大的单位,如文字,线,图形元素等, 直到整页进行了分析. 成功,所有这些技术也是有限的,高质量的数字图像,输入到他们. 虽然法〔14〕许可直接手术unenhanced像素的扫描图像, 还患有轻微的文物和局部加页. 该方法目前我们自动清除文物影印及扫描可以与自己multiresolution 版面分割算法或任何其他系统.
[解决办法]
Look....
[解决办法]
帮顶
[解决办法]
顺便JF