Package us.codecraft.webmagic
Class Page
java.lang.Object
us.codecraft.webmagic.Page
Object storing extracted result and urls to fetch.
Not thread safe.
Main method:
Not thread safe.
Main method:
getUrl()
get url of current page getHtml()
get content of current page putField(String, Object)
save extracted result getResultItems()
get extract results to be used in Pipeline
addTargetRequests(Iterable)
addTargetRequest(String)
add urls to fetch - Since:
- 0.1.0
- Author:
- code4crafter@gmail.com
- See Also:
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoid
addTargetRequest
(String requestString) add url to fetchvoid
addTargetRequest
(Request request) add requests to fetchvoid
addTargetRequests
(Iterable<String> requests) add urls to fetchvoid
addTargetRequests
(Iterable<String> requests, long priority) add urls to fetchstatic Page
fail()
Deprecated.static Page
Deprecated, for removal: This API element is subject to removal in a future version.UseofFailure(Request)
instead.byte[]
getBytes()
getHtml()
get html content of pagegetJson()
get json content of pageget request of current pageint
getUrl()
get url of current pageboolean
static Page
static Page
void
store extract resultsvoid
setBytes
(byte[] bytes) void
setCharset
(String charset) void
setDownloadSuccess
(boolean downloadSuccess) void
setHeaders
(Map<String, List<String>> headers) void
Deprecated.since 0.4.0 The html is parse just when first time of callinggetHtml()
, so usesetRawText(String)
instead.setRawText
(String rawText) void
setRequest
(Request request) setSkip
(boolean skip) void
setStatusCode
(int statusCode) void
setUrl
(Selectable url) toString()
-
Constructor Details
-
Page
public Page()
-
-
Method Details
-
ofSuccess
- Parameters:
request
- the request.- Since:
- 1.0.2
-
ofFailure
- Parameters:
request
- the request.- Since:
- 1.0.2
-
fail
Deprecated.Usefail(Request)
instead.- Returns:
- the page.
-
fail
Deprecated, for removal: This API element is subject to removal in a future version.UseofFailure(Request)
instead.- Parameters:
request
- theRequest
.- Returns:
- the page.
- Since:
- 0.10.0
-
setSkip
-
putField
store extract results- Parameters:
key
- keyfield
- field
-
getHtml
get html content of page- Returns:
- html
-
getJson
get json content of page- Returns:
- json
- Since:
- 0.5.0
-
setHtml
Deprecated.since 0.4.0 The html is parse just when first time of callinggetHtml()
, so usesetRawText(String)
instead.- Parameters:
html
- html
-
getTargetRequests
-
addTargetRequests
add urls to fetch- Parameters:
requests
- requests
-
addTargetRequests
add urls to fetch- Parameters:
requests
- requestspriority
- priority
-
addTargetRequest
add url to fetch- Parameters:
requestString
- requestString
-
addTargetRequest
add requests to fetch- Parameters:
request
- request
-
getUrl
get url of current page- Returns:
- url of current page
-
setUrl
-
getRequest
get request of current page- Returns:
- request
-
setRequest
-
getResultItems
-
getStatusCode
public int getStatusCode() -
setStatusCode
public void setStatusCode(int statusCode) -
getRawText
-
setRawText
-
getHeaders
-
setHeaders
-
isDownloadSuccess
public boolean isDownloadSuccess() -
setDownloadSuccess
public void setDownloadSuccess(boolean downloadSuccess) -
getBytes
public byte[] getBytes() -
setBytes
public void setBytes(byte[] bytes) -
getCharset
-
setCharset
-
toString
-
fail(Request)
instead.