Package us.codecraft.webmagic
Class Site
java.lang.Object
us.codecraft.webmagic.Site
Object contains setting for crawler.
- Since:
- 0.1.0
- Author:
- code4crafter@gmail.com
- See Also:
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionAdd a cookie with domaingetDomain()
Add a cookie with specific domain.Put an Http header for downloader.boolean
get acceptStatCodeget cookies of all domainsget charset set manuallyget cookiesint
When cycleRetryTimes is more than 0, it will add back to scheduler and try download again.The default charset if charset detected failed.get domainint
int
Get retry times immediately when download fail, 0 by default.int
Get the interval between the processing of two pages.
Time unit is milliseconds.int
get user agentint
hashCode()
boolean
boolean
static Site
me()
new a SitesetAcceptStatCode
(Set<Integer> acceptStatCode) Set acceptStatCode.
When status code of http response is in acceptStatCodes, it will be processed.
{200} by default.
It is not necessarily to be set.setCharset
(String charset) Set charset of page manually.
When charset is not set or set to null, it can be auto detected by Http header.setCycleRetryTimes
(int cycleRetryTimes) Set cycleRetryTimes times when download fail, 0 by default.setDefaultCharset
(String defaultCharset) Set default charset of page.setDisableCookieManagement
(boolean disableCookieManagement) Downloader is supposed to store response cookie.set the domain of site.setRetrySleepTime
(int retrySleepTime) Set retry sleep times when download fail, 1000 by default.setRetryTimes
(int retryTimes) Set retry times when download fail, 0 by default.setSleepTime
(int sleepTime) Set the interval between the processing of two pages.
Time unit is milliseconds.setTimeOut
(int timeOut) set timeout for downloader in mssetUseGzip
(boolean useGzip) Whether use gzip.setUserAgent
(String userAgent) set user agenttoString()
toTask()
-
Constructor Details
-
Site
public Site()
-
-
Method Details
-
me
new a Site- Returns:
- new site
-
addCookie
Add a cookie with domaingetDomain()
- Parameters:
name
- namevalue
- value- Returns:
- this
-
addCookie
Add a cookie with specific domain.- Parameters:
domain
- domainname
- namevalue
- value- Returns:
- this
-
setUserAgent
set user agent- Parameters:
userAgent
- userAgent- Returns:
- this
-
getCookies
get cookies- Returns:
- get cookies
-
getAllCookies
get cookies of all domains- Returns:
- get cookies
-
getUserAgent
get user agent- Returns:
- user agent
-
getDomain
get domain- Returns:
- get domain
-
setDomain
set the domain of site.- Parameters:
domain
- domain- Returns:
- this
-
setCharset
Set charset of page manually.
When charset is not set or set to null, it can be auto detected by Http header.- Parameters:
charset
- charset- Returns:
- this
-
getCharset
get charset set manually- Returns:
- charset
-
setDefaultCharset
Set default charset of page. When charset detect failed, use this default charset.- Parameters:
defaultCharset
- the default charset- Returns:
- this
- Since:
- 0.9.0
-
getDefaultCharset
The default charset if charset detected failed.- Returns:
- the defulat charset
- Since:
- 0.9.0
-
getTimeOut
public int getTimeOut() -
setTimeOut
set timeout for downloader in ms- Parameters:
timeOut
- timeOut- Returns:
- this
-
setAcceptStatCode
Set acceptStatCode.
When status code of http response is in acceptStatCodes, it will be processed.
{200} by default.
It is not necessarily to be set.- Parameters:
acceptStatCode
- acceptStatCode- Returns:
- this
-
getAcceptStatCode
get acceptStatCode- Returns:
- acceptStatCode
-
setSleepTime
Set the interval between the processing of two pages.
Time unit is milliseconds.- Parameters:
sleepTime
- sleepTime- Returns:
- this
-
getSleepTime
public int getSleepTime()Get the interval between the processing of two pages.
Time unit is milliseconds.- Returns:
- the interval between the processing of two pages,
-
getRetryTimes
public int getRetryTimes()Get retry times immediately when download fail, 0 by default.- Returns:
- retry times when download fail
-
getHeaders
-
addHeader
Put an Http header for downloader.
UseaddCookie(String, String)
for cookie andsetUserAgent(String)
for user-agent.- Parameters:
key
- key of http header, there are some keys constant inHttpConstant.Header
value
- value of header- Returns:
- this
-
setRetryTimes
Set retry times when download fail, 0 by default.- Parameters:
retryTimes
- retryTimes- Returns:
- this
-
getCycleRetryTimes
public int getCycleRetryTimes()When cycleRetryTimes is more than 0, it will add back to scheduler and try download again.- Returns:
- retry times when download fail
-
setCycleRetryTimes
Set cycleRetryTimes times when download fail, 0 by default.- Parameters:
cycleRetryTimes
- cycleRetryTimes- Returns:
- this
-
isUseGzip
public boolean isUseGzip() -
getRetrySleepTime
public int getRetrySleepTime() -
setRetrySleepTime
Set retry sleep times when download fail, 1000 by default.- Parameters:
retrySleepTime
- retrySleepTime- Returns:
- this
-
setUseGzip
Whether use gzip.
Default is true, you can set it to false to disable gzip.- Parameters:
useGzip
- useGzip- Returns:
- this
-
isDisableCookieManagement
public boolean isDisableCookieManagement() -
setDisableCookieManagement
Downloader is supposed to store response cookie. Disable it to ignore all cookie fields and stay clean. Warning: Set cookie will still NOT work if disableCookieManagement is true.- Parameters:
disableCookieManagement
- disableCookieManagement- Returns:
- this
-
toTask
-
equals
-
hashCode
public int hashCode() -
toString
-