Uses of Package
us.codecraft.webmagic
-
Classes in us.codecraft.webmagic used by us.codecraft.webmagic Class Description MultiPageModel Extract an object of more than one pages, such as news and articles.Page Object storing extracted result and urls to fetch.
Not thread safe.
Main method:
Page.getUrl()
get url of current page
Page.getHtml()
get content of current page
Page.putField(String, Object)
save extracted result
Page.getResultItems()
get extract results to be used inPipeline
Page.addTargetRequests(java.util.List)
Page.addTargetRequest(String)
add urls to fetchRequest Object contains url to crawl.
It contains some additional information.ResultItems Object contains extract results.
It is contained in Page and will be processed in pipeline.Site Object contains setting for crawler.Spider Entrance of a crawler.
A spider contains four modules: Downloader, Scheduler, PageProcessor and Pipeline.
Every module is a field of Spider.Spider.Status SpiderListener Listener of Spider on page processing.Task Interface for identifying different tasks. -
Classes in us.codecraft.webmagic used by us.codecraft.webmagic.configurable Class Description Page Object storing extracted result and urls to fetch.
Not thread safe.
Main method:
Page.getUrl()
get url of current page
Page.getHtml()
get content of current page
Page.putField(String, Object)
save extracted result
Page.getResultItems()
get extract results to be used inPipeline
Page.addTargetRequests(java.util.List)
Page.addTargetRequest(String)
add urls to fetchSite Object contains setting for crawler. -
Classes in us.codecraft.webmagic used by us.codecraft.webmagic.downloader Class Description Page Object storing extracted result and urls to fetch.
Not thread safe.
Main method:
Page.getUrl()
get url of current page
Page.getHtml()
get content of current page
Page.putField(String, Object)
save extracted result
Page.getResultItems()
get extract results to be used inPipeline
Page.addTargetRequests(java.util.List)
Page.addTargetRequest(String)
add urls to fetchRequest Object contains url to crawl.
It contains some additional information.Site Object contains setting for crawler.Task Interface for identifying different tasks. -
Classes in us.codecraft.webmagic used by us.codecraft.webmagic.downloader.selenium Class Description Page Object storing extracted result and urls to fetch.
Not thread safe.
Main method:
Page.getUrl()
get url of current page
Page.getHtml()
get content of current page
Page.putField(String, Object)
save extracted result
Page.getResultItems()
get extract results to be used inPipeline
Page.addTargetRequests(java.util.List)
Page.addTargetRequest(String)
add urls to fetchRequest Object contains url to crawl.
It contains some additional information.Task Interface for identifying different tasks. -
Classes in us.codecraft.webmagic used by us.codecraft.webmagic.example Class Description Page Object storing extracted result and urls to fetch.
Not thread safe.
Main method:
Page.getUrl()
get url of current page
Page.getHtml()
get content of current page
Page.putField(String, Object)
save extracted result
Page.getResultItems()
get extract results to be used inPipeline
Page.addTargetRequests(java.util.List)
Page.addTargetRequest(String)
add urls to fetchSite Object contains setting for crawler. -
Classes in us.codecraft.webmagic used by us.codecraft.webmagic.handler Class Description Page Object storing extracted result and urls to fetch.
Not thread safe.
Main method:
Page.getUrl()
get url of current page
Page.getHtml()
get content of current page
Page.putField(String, Object)
save extracted result
Page.getResultItems()
get extract results to be used inPipeline
Page.addTargetRequests(java.util.List)
Page.addTargetRequest(String)
add urls to fetchRequest Object contains url to crawl.
It contains some additional information.ResultItems Object contains extract results.
It is contained in Page and will be processed in pipeline.Site Object contains setting for crawler.Task Interface for identifying different tasks. -
Classes in us.codecraft.webmagic used by us.codecraft.webmagic.model Class Description Page Object storing extracted result and urls to fetch.
Not thread safe.
Main method:
Page.getUrl()
get url of current page
Page.getHtml()
get content of current page
Page.putField(String, Object)
save extracted result
Page.getResultItems()
get extract results to be used inPipeline
Page.addTargetRequests(java.util.List)
Page.addTargetRequest(String)
add urls to fetchSite Object contains setting for crawler.Spider Entrance of a crawler.
A spider contains four modules: Downloader, Scheduler, PageProcessor and Pipeline.
Every module is a field of Spider.Task Interface for identifying different tasks. -
Classes in us.codecraft.webmagic used by us.codecraft.webmagic.model.samples Class Description MultiPageModel Extract an object of more than one pages, such as news and articles.Page Object storing extracted result and urls to fetch.
Not thread safe.
Main method:
Page.getUrl()
get url of current page
Page.getHtml()
get content of current page
Page.putField(String, Object)
save extracted result
Page.getResultItems()
get extract results to be used inPipeline
Page.addTargetRequests(java.util.List)
Page.addTargetRequest(String)
add urls to fetch -
Classes in us.codecraft.webmagic used by us.codecraft.webmagic.monitor Class Description Request Object contains url to crawl.
It contains some additional information.Spider Entrance of a crawler.
A spider contains four modules: Downloader, Scheduler, PageProcessor and Pipeline.
Every module is a field of Spider.SpiderListener Listener of Spider on page processing. -
Classes in us.codecraft.webmagic used by us.codecraft.webmagic.pipeline Class Description ResultItems Object contains extract results.
It is contained in Page and will be processed in pipeline.Task Interface for identifying different tasks. -
Classes in us.codecraft.webmagic used by us.codecraft.webmagic.processor Class Description Page Object storing extracted result and urls to fetch.
Not thread safe.
Main method:
Page.getUrl()
get url of current page
Page.getHtml()
get content of current page
Page.putField(String, Object)
save extracted result
Page.getResultItems()
get extract results to be used inPipeline
Page.addTargetRequests(java.util.List)
Page.addTargetRequest(String)
add urls to fetchSite Object contains setting for crawler. -
Classes in us.codecraft.webmagic used by us.codecraft.webmagic.processor.example Class Description Page Object storing extracted result and urls to fetch.
Not thread safe.
Main method:
Page.getUrl()
get url of current page
Page.getHtml()
get content of current page
Page.putField(String, Object)
save extracted result
Page.getResultItems()
get extract results to be used inPipeline
Page.addTargetRequests(java.util.List)
Page.addTargetRequest(String)
add urls to fetchSite Object contains setting for crawler. -
Classes in us.codecraft.webmagic used by us.codecraft.webmagic.proxy Class Description Page Object storing extracted result and urls to fetch.
Not thread safe.
Main method:
Page.getUrl()
get url of current page
Page.getHtml()
get content of current page
Page.putField(String, Object)
save extracted result
Page.getResultItems()
get extract results to be used inPipeline
Page.addTargetRequests(java.util.List)
Page.addTargetRequest(String)
add urls to fetchTask Interface for identifying different tasks. -
Classes in us.codecraft.webmagic used by us.codecraft.webmagic.recover Class Description Request Object contains url to crawl.
It contains some additional information.Task Interface for identifying different tasks. -
Classes in us.codecraft.webmagic used by us.codecraft.webmagic.samples Class Description Page Object storing extracted result and urls to fetch.
Not thread safe.
Main method:
Page.getUrl()
get url of current page
Page.getHtml()
get content of current page
Page.putField(String, Object)
save extracted result
Page.getResultItems()
get extract results to be used inPipeline
Page.addTargetRequests(java.util.List)
Page.addTargetRequest(String)
add urls to fetchSite Object contains setting for crawler. -
Classes in us.codecraft.webmagic used by us.codecraft.webmagic.samples.pipeline Class Description ResultItems Object contains extract results.
It is contained in Page and will be processed in pipeline.Task Interface for identifying different tasks. -
Classes in us.codecraft.webmagic used by us.codecraft.webmagic.samples.scheduler Class Description Page Object storing extracted result and urls to fetch.
Not thread safe.
Main method:
Page.getUrl()
get url of current page
Page.getHtml()
get content of current page
Page.putField(String, Object)
save extracted result
Page.getResultItems()
get extract results to be used inPipeline
Page.addTargetRequests(java.util.List)
Page.addTargetRequest(String)
add urls to fetchRequest Object contains url to crawl.
It contains some additional information.Site Object contains setting for crawler.Task Interface for identifying different tasks. -
Classes in us.codecraft.webmagic used by us.codecraft.webmagic.scheduler Class Description Request Object contains url to crawl.
It contains some additional information.Task Interface for identifying different tasks. -
Classes in us.codecraft.webmagic used by us.codecraft.webmagic.scheduler.component Class Description Request Object contains url to crawl.
It contains some additional information.Task Interface for identifying different tasks. -
Classes in us.codecraft.webmagic used by us.codecraft.webmagic.scripts Class Description Page Object storing extracted result and urls to fetch.
Not thread safe.
Main method:
Page.getUrl()
get url of current page
Page.getHtml()
get content of current page
Page.putField(String, Object)
save extracted result
Page.getResultItems()
get extract results to be used inPipeline
Page.addTargetRequests(java.util.List)
Page.addTargetRequest(String)
add urls to fetchSite Object contains setting for crawler. -
Classes in us.codecraft.webmagic used by us.codecraft.webmagic.utils Class Description Request Object contains url to crawl.
It contains some additional information.