Package us.codecraft.webmagic.scheduler
Class BloomFilterDuplicateRemover
- java.lang.Object
-
- us.codecraft.webmagic.scheduler.BloomFilterDuplicateRemover
-
- All Implemented Interfaces:
DuplicateRemover
public class BloomFilterDuplicateRemover extends java.lang.Object implements DuplicateRemover
BloomFilterDuplicateRemover for huge number of urls.- Since:
- 0.5.1
- Author:
- code4crafer@gmail.com
-
-
Constructor Summary
Constructors Constructor Description BloomFilterDuplicateRemover(int expectedInsertions)
BloomFilterDuplicateRemover(int expectedInsertions, double fpp)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description int
getTotalRequestsCount(Task task)
Get TotalRequestsCount for monitor.protected java.lang.String
getUrl(Request request)
boolean
isDuplicate(Request request, Task task)
Check whether the request is duplicate.protected com.google.common.hash.BloomFilter<java.lang.CharSequence>
rebuildBloomFilter()
void
resetDuplicateCheck(Task task)
Reset duplicate check.
-
-
-
Constructor Detail
-
BloomFilterDuplicateRemover
public BloomFilterDuplicateRemover(int expectedInsertions)
-
BloomFilterDuplicateRemover
public BloomFilterDuplicateRemover(int expectedInsertions, double fpp)
- Parameters:
expectedInsertions
- the number of expected insertions to the constructedfpp
- the desired false positive probability (must be positive and less than 1.0)
-
-
Method Detail
-
rebuildBloomFilter
protected com.google.common.hash.BloomFilter<java.lang.CharSequence> rebuildBloomFilter()
-
isDuplicate
public boolean isDuplicate(Request request, Task task)
Description copied from interface:DuplicateRemover
Check whether the request is duplicate.- Specified by:
isDuplicate
in interfaceDuplicateRemover
- Parameters:
request
- requesttask
- task- Returns:
- true if is duplicate
-
getUrl
protected java.lang.String getUrl(Request request)
-
resetDuplicateCheck
public void resetDuplicateCheck(Task task)
Description copied from interface:DuplicateRemover
Reset duplicate check.- Specified by:
resetDuplicateCheck
in interfaceDuplicateRemover
- Parameters:
task
- task
-
getTotalRequestsCount
public int getTotalRequestsCount(Task task)
Description copied from interface:DuplicateRemover
Get TotalRequestsCount for monitor.- Specified by:
getTotalRequestsCount
in interfaceDuplicateRemover
- Parameters:
task
- task- Returns:
- number of total request
-
-