Package us.codecraft.webmagic.scheduler
Class DuplicateRemovedScheduler
- java.lang.Object
-
- us.codecraft.webmagic.scheduler.DuplicateRemovedScheduler
-
- All Implemented Interfaces:
Scheduler
- Direct Known Subclasses:
FileCacheQueueScheduler
,MmapQueueScheduler
,PriorityScheduler
,QueueScheduler
,RedisScheduler
public abstract class DuplicateRemovedScheduler extends java.lang.Object implements Scheduler
Remove duplicate urls and only push urls which are not duplicate.- Since:
- 0.5.0
- Author:
- code4crafer@gmail.com
-
-
Field Summary
Fields Modifier and Type Field Description protected org.slf4j.Logger
logger
-
Constructor Summary
Constructors Constructor Description DuplicateRemovedScheduler()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description DuplicateRemover
getDuplicateRemover()
protected boolean
noNeedToRemoveDuplicate(Request request)
void
push(Request request, Task task)
add a url to fetchprotected void
pushWhenNoDuplicate(Request request, Task task)
DuplicateRemovedScheduler
setDuplicateRemover(DuplicateRemover duplicatedRemover)
protected boolean
shouldReserved(Request request)
-
-
-
Method Detail
-
getDuplicateRemover
public DuplicateRemover getDuplicateRemover()
-
setDuplicateRemover
public DuplicateRemovedScheduler setDuplicateRemover(DuplicateRemover duplicatedRemover)
-
push
public void push(Request request, Task task)
Description copied from interface:Scheduler
add a url to fetch
-
shouldReserved
protected boolean shouldReserved(Request request)
-
noNeedToRemoveDuplicate
protected boolean noNeedToRemoveDuplicate(Request request)
-
-