Class Site

java.lang.Object
us.codecraft.webmagic.Site

public class Site extends Object
Object contains setting for crawler.
Since:
0.1.0
Author:
code4crafter@gmail.com
See Also:
  • Constructor Details

    • Site

      public Site()
  • Method Details

    • me

      public static Site me()
      new a Site
      Returns:
      new site
    • addCookie

      public Site addCookie(String name, String value)
      Add a cookie with domain getDomain()
      Parameters:
      name - name
      value - value
      Returns:
      this
    • addCookie

      public Site addCookie(String domain, String name, String value)
      Add a cookie with specific domain.
      Parameters:
      domain - domain
      name - name
      value - value
      Returns:
      this
    • setUserAgent

      public Site setUserAgent(String userAgent)
      set user agent
      Parameters:
      userAgent - userAgent
      Returns:
      this
    • getCookies

      public Map<String,String> getCookies()
      get cookies
      Returns:
      get cookies
    • getAllCookies

      public Map<String,Map<String,String>> getAllCookies()
      get cookies of all domains
      Returns:
      get cookies
    • getUserAgent

      public String getUserAgent()
      get user agent
      Returns:
      user agent
    • getDomain

      public String getDomain()
      get domain
      Returns:
      get domain
    • setDomain

      public Site setDomain(String domain)
      set the domain of site.
      Parameters:
      domain - domain
      Returns:
      this
    • setCharset

      public Site setCharset(String charset)
      Set charset of page manually.
      When charset is not set or set to null, it can be auto detected by Http header.
      Parameters:
      charset - charset
      Returns:
      this
    • getCharset

      public String getCharset()
      get charset set manually
      Returns:
      charset
    • setDefaultCharset

      public Site setDefaultCharset(String defaultCharset)
      Set default charset of page. When charset detect failed, use this default charset.
      Parameters:
      defaultCharset - the default charset
      Returns:
      this
      Since:
      0.9.0
    • getDefaultCharset

      public String getDefaultCharset()
      The default charset if charset detected failed.
      Returns:
      the defulat charset
      Since:
      0.9.0
    • getTimeOut

      public int getTimeOut()
    • setTimeOut

      public Site setTimeOut(int timeOut)
      set timeout for downloader in ms
      Parameters:
      timeOut - timeOut
      Returns:
      this
    • setAcceptStatCode

      public Site setAcceptStatCode(Set<Integer> acceptStatCode)
      Set acceptStatCode.
      When status code of http response is in acceptStatCodes, it will be processed.
      {200} by default.
      It is not necessarily to be set.
      Parameters:
      acceptStatCode - acceptStatCode
      Returns:
      this
    • getAcceptStatCode

      public Set<Integer> getAcceptStatCode()
      get acceptStatCode
      Returns:
      acceptStatCode
    • setSleepTime

      public Site setSleepTime(int sleepTime)
      Set the interval between the processing of two pages.
      Time unit is milliseconds.
      Parameters:
      sleepTime - sleepTime
      Returns:
      this
    • getSleepTime

      public int getSleepTime()
      Get the interval between the processing of two pages.
      Time unit is milliseconds.
      Returns:
      the interval between the processing of two pages,
    • getRetryTimes

      public int getRetryTimes()
      Get retry times immediately when download fail, 0 by default.
      Returns:
      retry times when download fail
    • getHeaders

      public Map<String,String> getHeaders()
    • addHeader

      public Site addHeader(String key, String value)
      Put an Http header for downloader.
      Use addCookie(String, String) for cookie and setUserAgent(String) for user-agent.
      Parameters:
      key - key of http header, there are some keys constant in HttpConstant.Header
      value - value of header
      Returns:
      this
    • setRetryTimes

      public Site setRetryTimes(int retryTimes)
      Set retry times when download fail, 0 by default.
      Parameters:
      retryTimes - retryTimes
      Returns:
      this
    • getCycleRetryTimes

      public int getCycleRetryTimes()
      When cycleRetryTimes is more than 0, it will add back to scheduler and try download again.
      Returns:
      retry times when download fail
    • setCycleRetryTimes

      public Site setCycleRetryTimes(int cycleRetryTimes)
      Set cycleRetryTimes times when download fail, 0 by default.
      Parameters:
      cycleRetryTimes - cycleRetryTimes
      Returns:
      this
    • isUseGzip

      public boolean isUseGzip()
    • getRetrySleepTime

      public int getRetrySleepTime()
    • setRetrySleepTime

      public Site setRetrySleepTime(int retrySleepTime)
      Set retry sleep times when download fail, 1000 by default.
      Parameters:
      retrySleepTime - retrySleepTime
      Returns:
      this
    • setUseGzip

      public Site setUseGzip(boolean useGzip)
      Whether use gzip.
      Default is true, you can set it to false to disable gzip.
      Parameters:
      useGzip - useGzip
      Returns:
      this
    • isDisableCookieManagement

      public boolean isDisableCookieManagement()
    • setDisableCookieManagement

      public Site setDisableCookieManagement(boolean disableCookieManagement)
      Downloader is supposed to store response cookie. Disable it to ignore all cookie fields and stay clean. Warning: Set cookie will still NOT work if disableCookieManagement is true.
      Parameters:
      disableCookieManagement - disableCookieManagement
      Returns:
      this
    • toTask

      public Task toTask()
    • equals

      public boolean equals(Object o)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object
    • toString

      public String toString()
      Overrides:
      toString in class Object