个性化阅读
专注于IT技术分析

如何检查PhantomJS收到了哪些Web资源

本文概述

为了验证收到了哪些资源, 我们需要依赖PhantomJS中网页的onResourceReceived事件。当收到页面请求的资源时, 将调用此回调。回调的唯一参数是响应元数据对象。如果资源很大并且由服务器以多个块发送, 则将为PhantomJS接收的每个块调用onResourceReceived。在这种情况下, 将列出所有Web资源, 例如图像, 字体, 样式表和脚本。

PhantomJS(Chromium, 因为PhantomJS是无头WebKit)使用多进程资源加载方法。所有网络通信均由主浏览器进程处理。这样做不仅是为了使浏览器进程可以控制每个渲染器对网络的访问, 而且还可以使其在各个进程(如Cookie和缓存的数据)之间保持一致的会话状态。这也很重要, 因为作为HTTP / 1.1用户代理, 浏览器作为一个整体不应在每个主机上打开太多连接。

检查收到了哪些资源

var webpage = require('webpage');
var page = webpage.create();
var websiteToCheck = "https://github.com";

page.open(websiteToCheck, function() {
    phantom.exit();
});

page.onResourceReceived = function(response) {
    console.log(response.url);
};

要检查其工作方式, 请将先前的代码保存在脚本(index.js)中, 并使用phantomjs index.js在phantom中执行它。该代码将生成以下输出:

https://github.com/
https://github.com/
https://assets-cdn.github.com/assets/site-052f19062b5cc9c804bcfe6c835ee11a90f898e7524d1609f639301a5eb7cd1d.css
https://assets-cdn.github.com/assets/frameworks-a44e0bdd1666101af23963e4027cd7a0a1eea1339e0e7422524f2e7f3900e86b.css
https://assets-cdn.github.com/assets/github-ac9c637b29122a4699fcd4d205b2d09efa4d4962d369158f7d907123061143f1.css
https://assets-cdn.github.com/images/modules/site/inform-globe-transparent.svg
https://assets-cdn.github.com/images/modules/site/inform-globe-transparent.svg
https://assets-cdn.github.com/images/modules/site/home-ill-build.png?sn
https://assets-cdn.github.com/images/modules/site/home-ill-build.png?sn
https://assets-cdn.github.com/assets/site-052f19062b5cc9c804bcfe6c835ee11a90f898e7524d1609f639301a5eb7cd1d.css
https://assets-cdn.github.com/images/modules/site/home-ill-work.png?sn
https://assets-cdn.github.com/images/modules/site/home-ill-work.png?sn
https://assets-cdn.github.com/assets/frameworks-a44e0bdd1666101af23963e4027cd7a0a1eea1339e0e7422524f2e7f3900e86b.css
https://assets-cdn.github.com/images/modules/site/home-ill-projects.png?sn
https://assets-cdn.github.com/images/modules/site/home-ill-platform.png?sn
https://assets-cdn.github.com/images/modules/site/org_example_nasa.png?sn
https://assets-cdn.github.com/images/modules/site/home-ill-projects.png?sn
https://assets-cdn.github.com/images/modules/site/home-ill-platform.png?sn
https://assets-cdn.github.com/assets/compat-8e19569aacd39e737a14c8515582825f3c90d1794c0e5539f9b525b8eb8b5a8e.js
https://assets-cdn.github.com/assets/compat-8e19569aacd39e737a14c8515582825f3c90d1794c0e5539f9b525b8eb8b5a8e.js
https://assets-cdn.github.com/assets/frameworks-a631ecd079e91d27e8c4826bced857c2e359148f6e4139c2485ee4eaf6e8b493.js
https://assets-cdn.github.com/assets/github-e34181e8d9bc6f988dd7ed883775106306f940b87ad55ff9dee30c7014b3d596.js
https://assets-cdn.github.com/assets/github-ac9c637b29122a4699fcd4d205b2d09efa4d4962d369158f7d907123061143f1.css
https://assets-cdn.github.com/images/modules/site/org_example_nasa.png?sn
https://assets-cdn.github.com/assets/frameworks-a631ecd079e91d27e8c4826bced857c2e359148f6e4139c2485ee4eaf6e8b493.js
https://assets-cdn.github.com/images/modules/site/home-hero-sm.jpg?sn
https://assets-cdn.github.com/images/modules/site/home-hero-sm.jpg?sn
https://assets-cdn.github.com/assets/github-e34181e8d9bc6f988dd7ed883775106306f940b87ad55ff9dee30c7014b3d596.js

如果你知道的话, 你可能已经注意到列表中两次列出了一些资源。为防止此行为, 你需要检查资源是否处于启动阶段, 并使用response对象中的stage属性。 stage属性有2个可能的值, “开始”给出第一个字节的到达时间, “结束”则在获得完整响应时设置。要仅显示一次资源, 请在onResourceReceived事件中添加条件语句:

var webpage = require('webpage');
var page = webpage.create();
var websiteToCheck = "https://github.com";

page.open(websiteToCheck, function() {
    phantom.exit();
});

page.onResourceReceived = function(response) {
    // Skip resource if already in queue
    if(response.stage == 'end'){
        return;
    }
    
    console.log(response.url);
};

现在应该在控制台中打印:

https://github.com/
https://assets-cdn.github.com/assets/frameworks-a44e0bdd1666101af23963e4027cd7a0a1eea1339e0e7422524f2e7f3900e86b.css
https://assets-cdn.github.com/assets/github-ac9c637b29122a4699fcd4d205b2d09efa4d4962d369158f7d907123061143f1.css
https://assets-cdn.github.com/assets/site-052f19062b5cc9c804bcfe6c835ee11a90f898e7524d1609f639301a5eb7cd1d.css
https://assets-cdn.github.com/images/modules/site/inform-globe-transparent.svg
https://assets-cdn.github.com/images/modules/site/home-ill-build.png?sn
https://assets-cdn.github.com/images/modules/site/home-ill-work.png?sn
https://assets-cdn.github.com/images/modules/site/home-ill-projects.png?sn
https://assets-cdn.github.com/images/modules/site/home-ill-platform.png?sn
https://assets-cdn.github.com/images/modules/site/org_example_nasa.png?sn
https://assets-cdn.github.com/assets/compat-8e19569aacd39e737a14c8515582825f3c90d1794c0e5539f9b525b8eb8b5a8e.js
https://assets-cdn.github.com/assets/frameworks-a631ecd079e91d27e8c4826bced857c2e359148f6e4139c2485ee4eaf6e8b493.js
https://assets-cdn.github.com/assets/github-e34181e8d9bc6f988dd7ed883775106306f940b87ad55ff9dee30c7014b3d596.js
https://assets-cdn.github.com/images/modules/site/home-hero-sm.jpg?sn

onResourceReceived事件中响应对象的结构

在回调中作为第一个参数接收到的响应元数据对象包含以下属性:

  • id:请求的资源号
  • url:所请求资源的URL
  • time:包含响应日期的Date对象
  • headers:http标头列表
  • bodySize:已解压缩的接收内容的大小(整个内容或块内容)
  • contentType:内容类型(如果指定)
  • redirectURL:如果存在重定向, 则重定向的URL
  • 阶段:”开始”, “结束”(FIXME:中间块的其他值?)
  • status:http状态码。例如:200
  • statusText:http状态文本。例如:好的

事件中显示的每个响应对象都具有以下结构:

{  
   "body":"", "bodySize":4714, "contentType":"text/html; charset=utf-8", "headers":[  
      {  
         "name":"Server", "value":"GitHub.com"
      }, {  
         "name":"Date", "value":"Thu, 09 Feb 2017 12:35:38 GMT"
      }, {  
         "name":"Content-Type", "value":"text/html; charset=utf-8"
      }, {  
         "name":"Transfer-Encoding", "value":"chunked"
      }, {  
         "name":"Status", "value":"200 OK"
      }, {  
         "name":"Cache-Control", "value":"no-cache"
      }, {  
         "name":"X-UA-Compatible", "value":"IE=Edge, chrome=1"
      }, {  
         "name":"Set-Cookie", "value":"logged_in=no; domain=.github.com; path=/; expires=Mon, 09 Feb 2037 12:35:38 -0000; secure; HttpOnly\n_gh_sess=eyJzZXNzaW9uX2lkIjoiMTQ2Y2VjOTM2YWY2MTIwYzZkZGRmNGI0NzY5MGQ1YTAiLCJfY3NyZl90b2tlbiI6IkEwc1BxQlNYTndyWm9oUFh1aDIxWGlBOE5ZNmlCbnE0cjJ1K0JldUNJaFU9In0%3D--9f27661358a0c06e16dc86f7a085b33263f5633e; path=/; secure; HttpOnly"
      }, {  
         "name":"X-Request-Id", "value":"fef18cf6da42783a3a5ad53b876bb153"
      }, {  
         "name":"X-Runtime", "value":"0.039490"
      }, {  
         "name":"Content-Security-Policy", "value":"default-src 'none'; connect-src 'self' uploads.github.com status.github.com collector.githubapp.com api.github.com www.google-analytics.com github-cloud.s3.amazonaws.com wss://live.github.com; font-src assets-cdn.github.com; frame-src render.githubusercontent.com; img-src 'self' data: assets-cdn.github.com identicons.github.com collector.githubapp.com github-cloud.s3.amazonaws.com *.githubusercontent.com; media-src 'none'; script-src assets-cdn.github.com; style-src 'unsafe-inline' assets-cdn.github.com"
      }, {  
         "name":"Strict-Transport-Security", "value":"max-age=31536000; includeSubdomains; preload"
      }, {  
         "name":"Public-Key-Pins", "value":"max-age=5184000; pin-sha256=\"WoiWRyIOVNa9ihaBciRSC7XHjliYS9VwUGOIud4PB18=\"; pin-sha256=\"RRM1dGqnDFsCJXBTHky16vi1obOlCgFFn/yOhI/y+ho=\"; pin-sha256=\"k2v657xBsOVe1PQRwOsHsw3bsGT2VzIqz5K+59sNQws=\"; pin-sha256=\"K87oWBWM9UZfyddvDfoxL+8lpNyoUB2ptGtn0fv6G2Q=\"; pin-sha256=\"IQBnNBEiFuhj+8x6X8XLgh01V9Ic5/V3IRQLNFFc7v4=\"; pin-sha256=\"iie1VXtL7HzAMF+/PVPR9xzT80kQxdZeJ+zduCB3uj0=\"; pin-sha256=\"LvRiGEjRqfzurezaWuj8Wie2gyHMrW5Q06LspMnox7A=\"; includeSubDomains"
      }, {  
         "name":"X-Content-Type-Options", "value":"nosniff"
      }, {  
         "name":"X-Frame-Options", "value":"deny"
      }, {  
         "name":"X-XSS-Protection", "value":"1; mode=block"
      }, {  
         "name":"Vary", "value":"X-PJAX, Accept-Encoding"
      }, {  
         "name":"X-Served-By", "value":"1868c9f28a71d80b2987f48dbd1824a0"
      }, {  
         "name":"Content-Encoding", "value":"gzip"
      }, {  
         "name":"X-GitHub-Request-Id", "value":"D86F:6207:1645007:23F3A82:589C6219"
      }
   ], "id":1, "redirectURL":null, "stage":"start", "status":200, "statusText":"OK", "time":"2017-02-09T12:35:37.537Z", "url":"https://github.com/"
}

你可以使用此功能复制网站并在本地下载所有资源。

编码愉快!

赞(0)
未经允许不得转载:srcmini » 如何检查PhantomJS收到了哪些Web资源

评论 抢沙发

评论前必须登录!