Package us.codecraft.webmagic.scheduler
Class QueueScheduler
- java.lang.Object
-
- us.codecraft.webmagic.scheduler.DuplicateRemovedScheduler
-
- us.codecraft.webmagic.scheduler.QueueScheduler
-
- All Implemented Interfaces:
MonitorableScheduler
,Scheduler
public class QueueScheduler extends DuplicateRemovedScheduler implements MonitorableScheduler
Basic Scheduler implementation.
Store urls to fetch in LinkedBlockingQueue and remove duplicate urls by HashMap. Note: if you use thisQueueScheduler
withSite.getCycleRetryTimes()
enabled, you may encountered dead-lock when the queue is full.- Since:
- 0.1.0
- Author:
- code4crafter@gmail.com
-
-
Field Summary
-
Fields inherited from class us.codecraft.webmagic.scheduler.DuplicateRemovedScheduler
logger
-
-
Constructor Summary
Constructors Constructor Description QueueScheduler()
QueueScheduler(int capacity)
Creates aQueueScheduler
with the given (fixed) capacity.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description int
getLeftRequestsCount(Task task)
int
getTotalRequestsCount(Task task)
Request
poll(Task task)
get an url to crawlvoid
pushWhenNoDuplicate(Request request, Task task)
-
Methods inherited from class us.codecraft.webmagic.scheduler.DuplicateRemovedScheduler
getDuplicateRemover, noNeedToRemoveDuplicate, push, setDuplicateRemover, shouldReserved
-
-
-
-
Method Detail
-
pushWhenNoDuplicate
public void pushWhenNoDuplicate(Request request, Task task)
- Overrides:
pushWhenNoDuplicate
in classDuplicateRemovedScheduler
-
poll
public Request poll(Task task)
Description copied from interface:Scheduler
get an url to crawl
-
getLeftRequestsCount
public int getLeftRequestsCount(Task task)
- Specified by:
getLeftRequestsCount
in interfaceMonitorableScheduler
-
getTotalRequestsCount
public int getTotalRequestsCount(Task task)
- Specified by:
getTotalRequestsCount
in interfaceMonitorableScheduler
-
-