Package us.codecraft.webmagic
Class Site
- java.lang.Object
-
- us.codecraft.webmagic.Site
-
public class Site extends java.lang.Object
Object contains setting for crawler.- Since:
- 0.1.0
- Author:
- code4crafter@gmail.com
- See Also:
PageProcessor
-
-
Constructor Summary
Constructors Constructor Description Site()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description Site
addCookie(java.lang.String name, java.lang.String value)
Add a cookie with domaingetDomain()
Site
addCookie(java.lang.String domain, java.lang.String name, java.lang.String value)
Add a cookie with specific domain.Site
addHeader(java.lang.String key, java.lang.String value)
Put an Http header for downloader.boolean
equals(java.lang.Object o)
java.util.Set<java.lang.Integer>
getAcceptStatCode()
get acceptStatCodejava.util.Map<java.lang.String,java.util.Map<java.lang.String,java.lang.String>>
getAllCookies()
get cookies of all domainsjava.lang.String
getCharset()
get charset set manuallyjava.util.Map<java.lang.String,java.lang.String>
getCookies()
get cookiesint
getCycleRetryTimes()
When cycleRetryTimes is more than 0, it will add back to scheduler and try download again.java.lang.String
getDefaultCharset()
The default charset if charset detected failed.java.lang.String
getDomain()
get domainjava.util.Map<java.lang.String,java.lang.String>
getHeaders()
int
getRetrySleepTime()
int
getRetryTimes()
Get retry times immediately when download fail, 0 by default.int
getSleepTime()
Get the interval between the processing of two pages.
Time unit is milliseconds.int
getTimeOut()
java.lang.String
getUserAgent()
get user agentint
hashCode()
boolean
isDisableCookieManagement()
boolean
isUseGzip()
static Site
me()
new a SiteSite
setAcceptStatCode(java.util.Set<java.lang.Integer> acceptStatCode)
Set acceptStatCode.
When status code of http response is in acceptStatCodes, it will be processed.
{200} by default.
It is not necessarily to be set.Site
setCharset(java.lang.String charset)
Set charset of page manually.
When charset is not set or set to null, it can be auto detected by Http header.Site
setCycleRetryTimes(int cycleRetryTimes)
Set cycleRetryTimes times when download fail, 0 by default.Site
setDefaultCharset(java.lang.String defaultCharset)
Set default charset of page.Site
setDisableCookieManagement(boolean disableCookieManagement)
Downloader is supposed to store response cookie.Site
setDomain(java.lang.String domain)
set the domain of site.Site
setRetrySleepTime(int retrySleepTime)
Set retry sleep times when download fail, 1000 by default.Site
setRetryTimes(int retryTimes)
Set retry times when download fail, 0 by default.Site
setSleepTime(int sleepTime)
Set the interval between the processing of two pages.
Time unit is milliseconds.Site
setTimeOut(int timeOut)
set timeout for downloader in msSite
setUseGzip(boolean useGzip)
Whether use gzip.Site
setUserAgent(java.lang.String userAgent)
set user agentjava.lang.String
toString()
Task
toTask()
-
-
-
Method Detail
-
me
public static Site me()
new a Site- Returns:
- new site
-
addCookie
public Site addCookie(java.lang.String name, java.lang.String value)
Add a cookie with domaingetDomain()
- Parameters:
name
- namevalue
- value- Returns:
- this
-
addCookie
public Site addCookie(java.lang.String domain, java.lang.String name, java.lang.String value)
Add a cookie with specific domain.- Parameters:
domain
- domainname
- namevalue
- value- Returns:
- this
-
setUserAgent
public Site setUserAgent(java.lang.String userAgent)
set user agent- Parameters:
userAgent
- userAgent- Returns:
- this
-
getCookies
public java.util.Map<java.lang.String,java.lang.String> getCookies()
get cookies- Returns:
- get cookies
-
getAllCookies
public java.util.Map<java.lang.String,java.util.Map<java.lang.String,java.lang.String>> getAllCookies()
get cookies of all domains- Returns:
- get cookies
-
getUserAgent
public java.lang.String getUserAgent()
get user agent- Returns:
- user agent
-
getDomain
public java.lang.String getDomain()
get domain- Returns:
- get domain
-
setDomain
public Site setDomain(java.lang.String domain)
set the domain of site.- Parameters:
domain
- domain- Returns:
- this
-
setCharset
public Site setCharset(java.lang.String charset)
Set charset of page manually.
When charset is not set or set to null, it can be auto detected by Http header.- Parameters:
charset
- charset- Returns:
- this
-
getCharset
public java.lang.String getCharset()
get charset set manually- Returns:
- charset
-
setDefaultCharset
public Site setDefaultCharset(java.lang.String defaultCharset)
Set default charset of page. When charset detect failed, use this default charset.- Parameters:
defaultCharset
- the default charset- Returns:
- this
- Since:
- 0.9.0
-
getDefaultCharset
public java.lang.String getDefaultCharset()
The default charset if charset detected failed.- Returns:
- the defulat charset
- Since:
- 0.9.0
-
getTimeOut
public int getTimeOut()
-
setTimeOut
public Site setTimeOut(int timeOut)
set timeout for downloader in ms- Parameters:
timeOut
- timeOut- Returns:
- this
-
setAcceptStatCode
public Site setAcceptStatCode(java.util.Set<java.lang.Integer> acceptStatCode)
Set acceptStatCode.
When status code of http response is in acceptStatCodes, it will be processed.
{200} by default.
It is not necessarily to be set.- Parameters:
acceptStatCode
- acceptStatCode- Returns:
- this
-
getAcceptStatCode
public java.util.Set<java.lang.Integer> getAcceptStatCode()
get acceptStatCode- Returns:
- acceptStatCode
-
setSleepTime
public Site setSleepTime(int sleepTime)
Set the interval between the processing of two pages.
Time unit is milliseconds.- Parameters:
sleepTime
- sleepTime- Returns:
- this
-
getSleepTime
public int getSleepTime()
Get the interval between the processing of two pages.
Time unit is milliseconds.- Returns:
- the interval between the processing of two pages,
-
getRetryTimes
public int getRetryTimes()
Get retry times immediately when download fail, 0 by default.- Returns:
- retry times when download fail
-
getHeaders
public java.util.Map<java.lang.String,java.lang.String> getHeaders()
-
addHeader
public Site addHeader(java.lang.String key, java.lang.String value)
Put an Http header for downloader.
UseaddCookie(String, String)
for cookie andsetUserAgent(String)
for user-agent.- Parameters:
key
- key of http header, there are some keys constant inHttpConstant.Header
value
- value of header- Returns:
- this
-
setRetryTimes
public Site setRetryTimes(int retryTimes)
Set retry times when download fail, 0 by default.- Parameters:
retryTimes
- retryTimes- Returns:
- this
-
getCycleRetryTimes
public int getCycleRetryTimes()
When cycleRetryTimes is more than 0, it will add back to scheduler and try download again.- Returns:
- retry times when download fail
-
setCycleRetryTimes
public Site setCycleRetryTimes(int cycleRetryTimes)
Set cycleRetryTimes times when download fail, 0 by default.- Parameters:
cycleRetryTimes
- cycleRetryTimes- Returns:
- this
-
isUseGzip
public boolean isUseGzip()
-
getRetrySleepTime
public int getRetrySleepTime()
-
setRetrySleepTime
public Site setRetrySleepTime(int retrySleepTime)
Set retry sleep times when download fail, 1000 by default.- Parameters:
retrySleepTime
- retrySleepTime- Returns:
- this
-
setUseGzip
public Site setUseGzip(boolean useGzip)
Whether use gzip.
Default is true, you can set it to false to disable gzip.- Parameters:
useGzip
- useGzip- Returns:
- this
-
isDisableCookieManagement
public boolean isDisableCookieManagement()
-
setDisableCookieManagement
public Site setDisableCookieManagement(boolean disableCookieManagement)
Downloader is supposed to store response cookie. Disable it to ignore all cookie fields and stay clean. Warning: Set cookie will still NOT work if disableCookieManagement is true.- Parameters:
disableCookieManagement
- disableCookieManagement- Returns:
- this
-
toTask
public Task toTask()
-
equals
public boolean equals(java.lang.Object o)
- Overrides:
equals
in classjava.lang.Object
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classjava.lang.Object
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
-