Package us.codecraft.webmagic.downloader
Class PhantomJSDownloader
java.lang.Object
us.codecraft.webmagic.downloader.AbstractDownloader
us.codecraft.webmagic.downloader.PhantomJSDownloader
- All Implemented Interfaces:
us.codecraft.webmagic.downloader.Downloader
public class PhantomJSDownloader
extends us.codecraft.webmagic.downloader.AbstractDownloader
this downloader is used to download pages which need to render the javascript
- Version:
- 0.5.3
- Author:
- dolphineor@gmail.com
-
Constructor Summary
ConstructorsConstructorDescriptionPhantomJSDownloader
(String phantomJsCommand) 添加新的构造函数,支持phantomjs自定义命令PhantomJSDownloader
(String phantomJsCommand, String crawlJsPath) 新增构造函数,支持crawl.js路径自定义,因为当其他项目依赖此jar包时,runtime.exec()执行phantomjs命令时无使用法jar包中的crawl.js -
Method Summary
Methods inherited from class us.codecraft.webmagic.downloader.AbstractDownloader
download, download, onError, onError, onError, onSuccess, onSuccess, onSuccess
-
Constructor Details
-
PhantomJSDownloader
public PhantomJSDownloader() -
PhantomJSDownloader
添加新的构造函数,支持phantomjs自定义命令example: phantomjs.exe 支持windows环境 phantomjs --ignore-ssl-errors=yes 忽略抓取地址是https时的一些错误 /usr/local/bin/phantomjs 命令的绝对路径,避免因系统环境变量引起的IOException
- Parameters:
phantomJsCommand
- phantomJsCommand
-
PhantomJSDownloader
新增构造函数,支持crawl.js路径自定义,因为当其他项目依赖此jar包时,runtime.exec()执行phantomjs命令时无使用法jar包中的crawl.jscrawl.js start -- var system = require('system'); var url = system.args[1]; var page = require('webpage').create(); page.settings.loadImages = false; page.settings.resourceTimeout = 5000; page.open(url, function (status) { if (status != 'success') { console.log("HTTP request failed!"); } else { console.log(page.content); } page.close(); phantom.exit(); }); -- crawl.js end
具体项目时可以将以上js代码复制下来使用example: new PhantomJSDownloader("/your/path/phantomjs", "/your/path/crawl.js");
- Parameters:
phantomJsCommand
- phantomJsCommandcrawlJsPath
- crawlJsPath
-
-
Method Details
-
download
public us.codecraft.webmagic.Page download(us.codecraft.webmagic.Request request, us.codecraft.webmagic.Task task) -
setThread
public void setThread(int threadNum) -
getPage
- Throws:
Exception
-