Package us.codecraft.webmagic.processor
Interface PageProcessor
-
- All Known Implementing Classes:
AlexanderMcqueenGoodsProcessor
,AmanzonPageProcessor
,AngularJSProcessor
,BaiduBaikePageProcessor
,CompositePageProcessor
,ConfigurablePageProcessor
,DiandianBlogProcessor
,DiaoyuwengProcessor
,F58PageProcesser
,GithubRepoPageMapper
,GithubRepoPageProcessor
,GithubRepoPageProcessor
,HuxiuProcessor
,InfoQMiniBookProcessor
,IteyeBlogProcessor
,KaichibaProcessor
,MamacnPageProcessor
,MeicanProcessor
,NjuBBSProcessor
,PhantomJSPageProcessor
,QzoneBlogProcessor
,ScriptProcessor
,SimplePageProcessor
,SinaBlogProcessor
,TianyaPageProcesser
,ZhihuPageProcessor
,ZhihuPageProcessor
,ZipCodePageProcessor
public interface PageProcessor
Interface to be implemented to customize a crawler.In PageProcessor, you can customize:
- start URLs and other settings in
Site
- how the URLs to fetch are detected
- how the data are extracted and stored
-
-
Method Summary
All Methods Instance Methods Abstract Methods Default Methods Modifier and Type Method Description default Site
getSite()
Returns the site settings.void
process(Page page)
Processes the page, extract URLs to fetch, extract the data and store.
-