Uses of Interface
us.codecraft.webmagic.Task
-
Packages that use Task Package Description us.codecraft.webmagic Main class "Spider" and models.us.codecraft.webmagic.downloader Downloader is the part that downloads web pages and store in Page object.us.codecraft.webmagic.downloader.selenium us.codecraft.webmagic.handler us.codecraft.webmagic.model Page model and annotations used to customize a crawler.us.codecraft.webmagic.pipeline Pipeline is the persistent and offline process part of crawler.us.codecraft.webmagic.proxy us.codecraft.webmagic.recover us.codecraft.webmagic.samples.pipeline us.codecraft.webmagic.samples.scheduler us.codecraft.webmagic.scheduler Scheduler is the part of url management.us.codecraft.webmagic.scheduler.component Component of scheduler. -
-
Uses of Task in us.codecraft.webmagic
Classes in us.codecraft.webmagic that implement Task Modifier and Type Class Description class
Spider
Entrance of a crawler.
A spider contains four modules: Downloader, Scheduler, PageProcessor and Pipeline.
Every module is a field of Spider.Methods in us.codecraft.webmagic that return Task Modifier and Type Method Description Task
Site. toTask()
-
Uses of Task in us.codecraft.webmagic.downloader
Methods in us.codecraft.webmagic.downloader with parameters of type Task Modifier and Type Method Description Page
Downloader. download(Request request, Task task)
Downloads web pages and store in Page object.Page
HttpClientDownloader. download(Request request, Task task)
Page
PhantomJSDownloader. download(Request request, Task task)
protected Page
HttpClientDownloader. handleResponse(Request request, java.lang.String charset, org.apache.http.HttpResponse httpResponse, Task task)
protected void
AbstractDownloader. onError(Page page, Task task, java.lang.Throwable e)
protected void
AbstractDownloader. onError(Request request, Task task, java.lang.Throwable e)
Deprecated.UseAbstractDownloader.onError(Page, Task, Throwable)
instead.protected void
AbstractDownloader. onSuccess(Page page, Task task)
protected void
AbstractDownloader. onSuccess(Request request, Task task)
Deprecated.UseAbstractDownloader.onSuccess(Page, Task)
instead. -
Uses of Task in us.codecraft.webmagic.downloader.selenium
Methods in us.codecraft.webmagic.downloader.selenium with parameters of type Task Modifier and Type Method Description Page
SeleniumDownloader. download(Request request, Task task)
-
Uses of Task in us.codecraft.webmagic.handler
Methods in us.codecraft.webmagic.handler with parameters of type Task Modifier and Type Method Description void
CompositePipeline. process(ResultItems resultItems, Task task)
RequestMatcher.MatchOther
SubPipeline. processResult(ResultItems resultItems, Task task)
process the page, extract urls to fetch, extract the data and store -
Uses of Task in us.codecraft.webmagic.model
Classes in us.codecraft.webmagic.model that implement Task Modifier and Type Class Description class
OOSpider<T>
The spider for page model extractor.
In webmagic, we call a POJO containing extract result as "page model".Methods in us.codecraft.webmagic.model with parameters of type Task Modifier and Type Method Description void
ConsolePageModelPipeline. process(java.lang.Object o, Task task)
-
Uses of Task in us.codecraft.webmagic.pipeline
Methods in us.codecraft.webmagic.pipeline with parameters of type Task Modifier and Type Method Description void
CollectorPageModelPipeline. process(T t, Task task)
void
ConsolePipeline. process(ResultItems resultItems, Task task)
void
FilePageModelPipeline. process(java.lang.Object o, Task task)
void
FilePipeline. process(ResultItems resultItems, Task task)
void
JsonFilePageModelPipeline. process(java.lang.Object o, Task task)
void
JsonFilePipeline. process(ResultItems resultItems, Task task)
void
MultiPagePipeline. process(ResultItems resultItems, Task task)
void
PageModelPipeline. process(T t, Task task)
void
Pipeline. process(ResultItems resultItems, Task task)
Process extracted results.void
ResultItemsCollectorPipeline. process(ResultItems resultItems, Task task)
-
Uses of Task in us.codecraft.webmagic.proxy
Methods in us.codecraft.webmagic.proxy with parameters of type Task Modifier and Type Method Description default Proxy
ProxyProvider. getProxy(Request request, Task task)
Returns a proxy for the request.default Proxy
ProxyProvider. getProxy(Task task)
Deprecated.UseProxyProvider.getProxy(Request, Task)
instead.Proxy
SimpleProxyProvider. getProxy(Request request, Task task)
void
ProxyProvider. returnProxy(Proxy proxy, Page page, Task task)
Return proxy to Provider when complete a download.void
SimpleProxyProvider. returnProxy(Proxy proxy, Page page, Task task)
-
Uses of Task in us.codecraft.webmagic.recover
Methods in us.codecraft.webmagic.recover with parameters of type Task Modifier and Type Method Description int
DuplicateStorageRemover. getTotalRequestsCount(Task task)
boolean
DuplicateStorageRemover. isDuplicate(Request request, Task task)
Request
MmapQueueScheduler. poll(Task task)
void
MmapQueueScheduler. pushWhenNoDuplicate(Request request, Task task)
void
DuplicateStorageRemover. resetDuplicateCheck(Task task)
-
Uses of Task in us.codecraft.webmagic.samples.pipeline
Methods in us.codecraft.webmagic.samples.pipeline with parameters of type Task Modifier and Type Method Description void
OneFilePipeline. process(ResultItems resultItems, Task task)
-
Uses of Task in us.codecraft.webmagic.samples.scheduler
Methods in us.codecraft.webmagic.samples.scheduler with parameters of type Task Modifier and Type Method Description Request
DelayQueueScheduler. poll(Task task)
void
DelayQueueScheduler. push(Request request, Task task)
void
LevelLimitScheduler. push(Request request, Task task)
-
Uses of Task in us.codecraft.webmagic.scheduler
Methods in us.codecraft.webmagic.scheduler with parameters of type Task Modifier and Type Method Description protected java.lang.String
RedisScheduler. getItemKey(Task task)
int
FileCacheQueueScheduler. getLeftRequestsCount(Task task)
int
MonitorableScheduler. getLeftRequestsCount(Task task)
int
PriorityScheduler. getLeftRequestsCount(Task task)
int
QueueScheduler. getLeftRequestsCount(Task task)
int
RedisScheduler. getLeftRequestsCount(Task task)
protected java.lang.String
RedisScheduler. getQueueKey(Task task)
protected java.lang.String
RedisScheduler. getSetKey(Task task)
int
BloomFilterDuplicateRemover. getTotalRequestsCount(Task task)
int
FileCacheQueueScheduler. getTotalRequestsCount(Task task)
int
MonitorableScheduler. getTotalRequestsCount(Task task)
int
PriorityScheduler. getTotalRequestsCount(Task task)
int
QueueScheduler. getTotalRequestsCount(Task task)
int
RedisScheduler. getTotalRequestsCount(Task task)
boolean
BloomFilterDuplicateRemover. isDuplicate(Request request, Task task)
boolean
RedisScheduler. isDuplicate(Request request, Task task)
Request
FileCacheQueueScheduler. poll(Task task)
Request
PriorityScheduler. poll(Task task)
Request
QueueScheduler. poll(Task task)
Request
RedisPriorityScheduler. poll(Task task)
Request
RedisScheduler. poll(Task task)
Request
Scheduler. poll(Task task)
get an url to crawlvoid
DuplicateRemovedScheduler. push(Request request, Task task)
void
Scheduler. push(Request request, Task task)
add a url to fetchprotected void
DuplicateRemovedScheduler. pushWhenNoDuplicate(Request request, Task task)
protected void
FileCacheQueueScheduler. pushWhenNoDuplicate(Request request, Task task)
void
PriorityScheduler. pushWhenNoDuplicate(Request request, Task task)
void
QueueScheduler. pushWhenNoDuplicate(Request request, Task task)
protected void
RedisPriorityScheduler. pushWhenNoDuplicate(Request request, Task task)
protected void
RedisScheduler. pushWhenNoDuplicate(Request request, Task task)
void
BloomFilterDuplicateRemover. resetDuplicateCheck(Task task)
void
RedisPriorityScheduler. resetDuplicateCheck(Task task)
void
RedisScheduler. resetDuplicateCheck(Task task)
-
Uses of Task in us.codecraft.webmagic.scheduler.component
Methods in us.codecraft.webmagic.scheduler.component with parameters of type Task Modifier and Type Method Description int
DuplicateRemover. getTotalRequestsCount(Task task)
Get TotalRequestsCount for monitor.int
HashSetDuplicateRemover. getTotalRequestsCount(Task task)
boolean
DuplicateRemover. isDuplicate(Request request, Task task)
Check whether the request is duplicate.boolean
HashSetDuplicateRemover. isDuplicate(Request request, Task task)
void
DuplicateRemover. resetDuplicateCheck(Task task)
Reset duplicate check.void
HashSetDuplicateRemover. resetDuplicateCheck(Task task)
-