nav emailalert searchbtn searchbox tablepage yinyongbenwen piczone journalimg journalInfo journalinfonormal searchdiv searchzone qikanlogo popupnotification paper paperNew
2011, 03, v.18;No.63 25-29
网络爬虫技术研究
基金项目(Foundation):
邮箱(Email):
DOI: 10.16002/j.cnki.10090312.2011.03.015
摘要:

网络爬虫为搜索引擎从互联网上下载网页,是搜索引擎不可或缺的组成部分。介绍网络爬虫的分类、工作原理及存在的问题,并对主题爬虫进行了详细设计,最后总结了设计高性能网络爬虫需要解决的技术难题。

Abstract:

As an indispensable part of search engine,web crawler is used to download web pages from Internet.Classifications and principles of web crawler are introduced.The topic crawler is designed in detail,and then technical problems of designing high-performance web crawler to be solved are summarized.

参考文献

[1]Hersovici M,Jacovi M,Maarek Y S,et a1.The Shark-Search Algorithm:An Application:Tailored Web Site Mapping[C]//Proceedings ofthe7th international World Wide Web 7 conference.Brisbane,Australia,1998.

[2]Joson Rennie,AndrewKachites McCallum.Using reinforcement learning to spider the web efficiently[C]//Proceedings of the 16th Internation-al Conference on Machine Learning(ICML-99).Bled,Slovenia,1999:335-343.

[3]Diligenti M,Coetzee F,Lawrence S,et al.Focused crawling using context graphs.Proceedings of the 26th International Conference on VeryLarge Database(VLDB2000),2000:527-534.

[4]Aggaewal C,A1-Garawif Yup.Intelligent crawling on the World Wide Web with arbitrary predicates[C]//Proc of the 10th International WorldWide Web Conference.HongKong:[S n],2001.

[5]Maenchea Ehrig.Ontology-focused crawling of Web documents[C]//Proc of ACMSymposium on Applied Computing,2003.

[6]Chakrabarti S,Punera K,Subramanyam M.Accelerated Focused Crawling through Online Relevance Feedback[C]//Proceedings of the 11thInternational Conference on World Wide Web,Hawaii,USA,2002:148-159.

[7]孙立伟,何国辉,吴礼发.网络爬虫技术的研究[J].电脑知识与技术,2010,6(15):4112-4115.

[8]Cai Rui,Yang Jiang-ming,Wei lai.iRobot:An Intelligent Crawler for Web Forums[A]//Proceedings of the 17th International world WideWeb Conference[C].ACMPress,2008:447-456.

[9]吕赛辉.主题爬虫关键技术研究及应用[D].浙江:浙江工业大学,2009.

[10]曾伟辉,李淼.深层网络爬虫研究综述[J].计算机系统应用,2008,17(5):122-125.

[11]陈丛丛.主题爬虫搜索策略研究[D].山东大学,2009.

[12]Duskin O,Dror G.Feitelson distinguishing humans from robots in Web search logs:Preliminary results using query rates and intervals[C]//Proceedings of the 2009 Workshop on Web Search Click Data,New York:ACM,2009:15-19.

[13]王舜燕,李蕾,吴兵华.基于ID3分类算法的深度网络爬虫设计[J].现代图书情报技术,2008(6):41-45.

[14]周德懋,李舟军.高性能网络爬虫:研究综述[J].计算机科学,2009,36(8):26-29.

基本信息:

DOI:10.16002/j.cnki.10090312.2011.03.015

中图分类号:TP391.3

引用信息:

[1]于成龙,于洪波.网络爬虫技术研究[J].东莞理工学院学报,2011,18(03):25-29.DOI:10.16002/j.cnki.10090312.2011.03.015.

发布时间:

2011-06-15

出版时间:

2011-06-15

检 索 高级检索

引用

GB/T 7714-2015 格式引文
MLA格式引文
APA格式引文