In order to solve the problem of low efficiency of traditional theme crawlers in searching theme pages, the crawling algorithm based on Context Graph was discussed. After analyzing the working principle and process of the algorithm, we introduced a new algorithm idea named feature selection algorithm. This new algorithm improved the original TF-IDF formula accordingly and solved the algorithm problems.
ZhouY., Research on event driven and protocol driven subject crawler application in topical domain, Hunan University of Science and Technology, 2012.
2.
ChengJ., Design and implementation of metaserch engine based on suffix tree clustering algorithm, Jilin University, 2017.
3.
YuJ. and LiuQ., Overview of the subject web crawler research, Computer Engineering and Science (2) (2015), 45–51.
4.
XuC.ZhangY. and LiuG., An optimized path focusing crawler crawling strategy, Minicomputer System8(8) (2016), 1721–1723.
5.
MinY. and HuangY., The design and implementation of the customized theme focused crawler, Computer Engineering and Design36(1) (2015), 17–19.
6.
LiuS. and LiH., Fusion link structure of the subject crawler aalgorithm, Journal of Huaqiao University (Natural Science Edition)2(38) (2017), 195–197.
7.
WuH., Research on key technology of vertical search engine and distributed implementation, Southeast University, 2017.
8.
WuH., Binary network community partition based on PageRank algorithm, Shenyang University of Aeronautics and Astronautics, 2016.
9.
NovakB., A survey of focused web crawling algorithms, Proceedings of SIKDD at Multiconference IS. Slovenia: ACM Press, 2004, pp. 55–58.
10.
ChenR. and DesaiC.B., An enhanced web robot for the CINDI system, Proceedings of the C3S2E Conference. Canadia: ACM Press, 2008, pp. 133–135.
11.
BarbosaL. and FreireJ., An adaptive crawler for locating hidder web entry point, Proceeding of the 18th International Conference on World Wide Web. Madrid, Spain, 2009, pp. 681–697.
12.
PatelA., An adaptive updating topic specific web system using T-graph, Journal of Computer Science6(4) (2010), 450–456.
13.
BusscheF. and WeiandK., Not so creepy crawler: Easy crawler generation with standard XML queries, Proceeding of the 19th international conference on World Wide Web, Raleigh, North Carolina, USA, 2010, pp. 1305–1308.
14.
LiJ.J.WeiY. and ZhouR., The optimized background value of the GM(1,1) model which based on non-homogenous index series, Journal of Systems Science and Information (9) (2010), 149–156.
15.
TanC.GeiF.RenP. et al., Entity linking for queries by searching wikipedia sentences, EMNLP (2017), 68–77.
16.
ShijiaE. and YangX., Entity search based on the representation learning model with different embedding strategies, IEEE Access5 (2017), 15174–15183.