您好,欢迎来到五一七教育网。
搜索
您的当前位置:首页EXTRACTING PRINCIPAL CONTENT FROM WEB PAGES

EXTRACTING PRINCIPAL CONTENT FROM WEB PAGES

来源:五一七教育网
专利内容由知识产权出版社提供

专利名称:EXTRACTING PRINCIPAL CONTENT FROM

WEB PAGES

发明人:BIGNERT, Jakob,COARNA, Gabriel, Alexandru申请号:EP12847034.1申请日:20121107公开号:EP2776945A1公开日:20140917

摘要:Extracting principal content from Web pages includes identifying and classifyingitems on the Web page, building a list of candidates, calculating candidate scores,selecting a top score candidate, performing clean up processing for the top scorecandidate, and performing final page processing for the top score candidate. Candidatescores may vary according to a number of paragraphs and images grouped according tosize. A world length of CJK (Chinese-Japanese-Korean) text may be determined accordingto punctuation therein. Candidate scores may be modified according to a number ofcontainers and pieces and wherein a container is a Web page element that is associatedwith tags 'body', 'div', 'td', 'li', 'article/section' and pieces are candidates that do notinclude other candidates. Candidate scores may be modified according to a number ofratios corresponding to text and link density.

申请人:Evernote Corporation

地址:305 Walnut Street Redwood City, CA 94063 US

国籍:US

代理机构:Patentanwälte Freischem

更多信息请下载全文后查看

因篇幅问题不能全部显示,请点此查看更多更全内容

Copyright © 2019- 517ttc.cn 版权所有 赣ICP备2024042791号-8

违法及侵权请联系:TEL:199 18 7713 E-MAIL:2724546146@qq.com

本站由北京市万商天勤律师事务所王兴未律师提供法律服务