Class Site


  • public class Site
    extends java.lang.Object
    Object contains setting for crawler.
    Since:
    0.1.0
    Author:
    code4crafter@gmail.com
    See Also:
    PageProcessor
    • Constructor Summary

      Constructors 
      Constructor Description
      Site()  
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      Site addCookie​(java.lang.String name, java.lang.String value)
      Add a cookie with domain getDomain()
      Site addCookie​(java.lang.String domain, java.lang.String name, java.lang.String value)
      Add a cookie with specific domain.
      Site addHeader​(java.lang.String key, java.lang.String value)
      Put an Http header for downloader.
      boolean equals​(java.lang.Object o)  
      java.util.Set<java.lang.Integer> getAcceptStatCode()
      get acceptStatCode
      java.util.Map<java.lang.String,​java.util.Map<java.lang.String,​java.lang.String>> getAllCookies()
      get cookies of all domains
      java.lang.String getCharset()
      get charset set manually
      java.util.Map<java.lang.String,​java.lang.String> getCookies()
      get cookies
      int getCycleRetryTimes()
      When cycleRetryTimes is more than 0, it will add back to scheduler and try download again.
      java.lang.String getDefaultCharset()
      The default charset if charset detected failed.
      java.lang.String getDomain()
      get domain
      java.util.Map<java.lang.String,​java.lang.String> getHeaders()  
      int getRetrySleepTime()  
      int getRetryTimes()
      Get retry times immediately when download fail, 0 by default.
      int getSleepTime()
      Get the interval between the processing of two pages.
      Time unit is milliseconds.
      int getTimeOut()  
      java.lang.String getUserAgent()
      get user agent
      int hashCode()  
      boolean isDisableCookieManagement()  
      boolean isUseGzip()  
      static Site me()
      new a Site
      Site setAcceptStatCode​(java.util.Set<java.lang.Integer> acceptStatCode)
      Set acceptStatCode.
      When status code of http response is in acceptStatCodes, it will be processed.
      {200} by default.
      It is not necessarily to be set.
      Site setCharset​(java.lang.String charset)
      Set charset of page manually.
      When charset is not set or set to null, it can be auto detected by Http header.
      Site setCycleRetryTimes​(int cycleRetryTimes)
      Set cycleRetryTimes times when download fail, 0 by default.
      Site setDefaultCharset​(java.lang.String defaultCharset)
      Set default charset of page.
      Site setDisableCookieManagement​(boolean disableCookieManagement)
      Downloader is supposed to store response cookie.
      Site setDomain​(java.lang.String domain)
      set the domain of site.
      Site setRetrySleepTime​(int retrySleepTime)
      Set retry sleep times when download fail, 1000 by default.
      Site setRetryTimes​(int retryTimes)
      Set retry times when download fail, 0 by default.
      Site setSleepTime​(int sleepTime)
      Set the interval between the processing of two pages.
      Time unit is milliseconds.
      Site setTimeOut​(int timeOut)
      set timeout for downloader in ms
      Site setUseGzip​(boolean useGzip)
      Whether use gzip.
      Site setUserAgent​(java.lang.String userAgent)
      set user agent
      java.lang.String toString()  
      Task toTask()  
      • Methods inherited from class java.lang.Object

        clone, finalize, getClass, notify, notifyAll, wait, wait, wait
    • Constructor Detail

      • Site

        public Site()
    • Method Detail

      • me

        public static Site me()
        new a Site
        Returns:
        new site
      • addCookie

        public Site addCookie​(java.lang.String name,
                              java.lang.String value)
        Add a cookie with domain getDomain()
        Parameters:
        name - name
        value - value
        Returns:
        this
      • addCookie

        public Site addCookie​(java.lang.String domain,
                              java.lang.String name,
                              java.lang.String value)
        Add a cookie with specific domain.
        Parameters:
        domain - domain
        name - name
        value - value
        Returns:
        this
      • setUserAgent

        public Site setUserAgent​(java.lang.String userAgent)
        set user agent
        Parameters:
        userAgent - userAgent
        Returns:
        this
      • getCookies

        public java.util.Map<java.lang.String,​java.lang.String> getCookies()
        get cookies
        Returns:
        get cookies
      • getAllCookies

        public java.util.Map<java.lang.String,​java.util.Map<java.lang.String,​java.lang.String>> getAllCookies()
        get cookies of all domains
        Returns:
        get cookies
      • getUserAgent

        public java.lang.String getUserAgent()
        get user agent
        Returns:
        user agent
      • getDomain

        public java.lang.String getDomain()
        get domain
        Returns:
        get domain
      • setDomain

        public Site setDomain​(java.lang.String domain)
        set the domain of site.
        Parameters:
        domain - domain
        Returns:
        this
      • setCharset

        public Site setCharset​(java.lang.String charset)
        Set charset of page manually.
        When charset is not set or set to null, it can be auto detected by Http header.
        Parameters:
        charset - charset
        Returns:
        this
      • getCharset

        public java.lang.String getCharset()
        get charset set manually
        Returns:
        charset
      • setDefaultCharset

        public Site setDefaultCharset​(java.lang.String defaultCharset)
        Set default charset of page. When charset detect failed, use this default charset.
        Parameters:
        defaultCharset - the default charset
        Returns:
        this
        Since:
        0.9.0
      • getDefaultCharset

        public java.lang.String getDefaultCharset()
        The default charset if charset detected failed.
        Returns:
        the defulat charset
        Since:
        0.9.0
      • getTimeOut

        public int getTimeOut()
      • setTimeOut

        public Site setTimeOut​(int timeOut)
        set timeout for downloader in ms
        Parameters:
        timeOut - timeOut
        Returns:
        this
      • setAcceptStatCode

        public Site setAcceptStatCode​(java.util.Set<java.lang.Integer> acceptStatCode)
        Set acceptStatCode.
        When status code of http response is in acceptStatCodes, it will be processed.
        {200} by default.
        It is not necessarily to be set.
        Parameters:
        acceptStatCode - acceptStatCode
        Returns:
        this
      • getAcceptStatCode

        public java.util.Set<java.lang.Integer> getAcceptStatCode()
        get acceptStatCode
        Returns:
        acceptStatCode
      • setSleepTime

        public Site setSleepTime​(int sleepTime)
        Set the interval between the processing of two pages.
        Time unit is milliseconds.
        Parameters:
        sleepTime - sleepTime
        Returns:
        this
      • getSleepTime

        public int getSleepTime()
        Get the interval between the processing of two pages.
        Time unit is milliseconds.
        Returns:
        the interval between the processing of two pages,
      • getRetryTimes

        public int getRetryTimes()
        Get retry times immediately when download fail, 0 by default.
        Returns:
        retry times when download fail
      • getHeaders

        public java.util.Map<java.lang.String,​java.lang.String> getHeaders()
      • setRetryTimes

        public Site setRetryTimes​(int retryTimes)
        Set retry times when download fail, 0 by default.
        Parameters:
        retryTimes - retryTimes
        Returns:
        this
      • getCycleRetryTimes

        public int getCycleRetryTimes()
        When cycleRetryTimes is more than 0, it will add back to scheduler and try download again.
        Returns:
        retry times when download fail
      • setCycleRetryTimes

        public Site setCycleRetryTimes​(int cycleRetryTimes)
        Set cycleRetryTimes times when download fail, 0 by default.
        Parameters:
        cycleRetryTimes - cycleRetryTimes
        Returns:
        this
      • isUseGzip

        public boolean isUseGzip()
      • getRetrySleepTime

        public int getRetrySleepTime()
      • setRetrySleepTime

        public Site setRetrySleepTime​(int retrySleepTime)
        Set retry sleep times when download fail, 1000 by default.
        Parameters:
        retrySleepTime - retrySleepTime
        Returns:
        this
      • setUseGzip

        public Site setUseGzip​(boolean useGzip)
        Whether use gzip.
        Default is true, you can set it to false to disable gzip.
        Parameters:
        useGzip - useGzip
        Returns:
        this
      • isDisableCookieManagement

        public boolean isDisableCookieManagement()
      • setDisableCookieManagement

        public Site setDisableCookieManagement​(boolean disableCookieManagement)
        Downloader is supposed to store response cookie. Disable it to ignore all cookie fields and stay clean. Warning: Set cookie will still NOT work if disableCookieManagement is true.
        Parameters:
        disableCookieManagement - disableCookieManagement
        Returns:
        this
      • toTask

        public Task toTask()
      • equals

        public boolean equals​(java.lang.Object o)
        Overrides:
        equals in class java.lang.Object
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class java.lang.Object
      • toString

        public java.lang.String toString()
        Overrides:
        toString in class java.lang.Object