Class BloomFilterDuplicateRemover

  • All Implemented Interfaces:
    DuplicateRemover

    public class BloomFilterDuplicateRemover
    extends java.lang.Object
    implements DuplicateRemover
    BloomFilterDuplicateRemover for huge number of urls.
    Since:
    0.5.1
    Author:
    code4crafer@gmail.com
    • Constructor Detail

      • BloomFilterDuplicateRemover

        public BloomFilterDuplicateRemover​(int expectedInsertions)
      • BloomFilterDuplicateRemover

        public BloomFilterDuplicateRemover​(int expectedInsertions,
                                           double fpp)
        Parameters:
        expectedInsertions - the number of expected insertions to the constructed
        fpp - the desired false positive probability (must be positive and less than 1.0)
    • Method Detail

      • rebuildBloomFilter

        protected com.google.common.hash.BloomFilter<java.lang.CharSequence> rebuildBloomFilter()
      • isDuplicate

        public boolean isDuplicate​(Request request,
                                   Task task)
        Description copied from interface: DuplicateRemover
        Check whether the request is duplicate.
        Specified by:
        isDuplicate in interface DuplicateRemover
        Parameters:
        request - request
        task - task
        Returns:
        true if is duplicate
      • getUrl

        protected java.lang.String getUrl​(Request request)
      • getTotalRequestsCount

        public int getTotalRequestsCount​(Task task)
        Description copied from interface: DuplicateRemover
        Get TotalRequestsCount for monitor.
        Specified by:
        getTotalRequestsCount in interface DuplicateRemover
        Parameters:
        task - task
        Returns:
        number of total request