Quantcast
Channel: Java – ASPIRE
Viewing all articles
Browse latest Browse all 5

[collect]用js的方式采集Google的URL

$
0
0
Google hack-简单页面URL采集

采集Google的URL方式多重多样,最简单的方式莫过于js直接获取节点了。比如:

var h3 = document.getElementsByTagName('h3');
for(var i=0;i<h3.length;i++){
    var a = h3[i]. getElementsByTagName('a');
    console.log(a[0].href);
}

在Chrome浏览器中,按下F12打开其中的“Console”,然后将上面的代码贴入,按下Enter键执行即可看到效果。

在java里面用jsoup也可以非常简单的获取到搜索结果的URL:

public static void main(String[] args) throws IOException {
	Document doc = Jsoup.connect("https://www.google.ws/search?num=100&site=&source=hp&q=filetype%3Ajsp&oq=filetype%3Ajsp&gs_l=hp.3...8115.14780.0.15194.22.21.1.0.0.0.523.5187.3j3j3j5j4j1.19.0....0...1c.1.36.hp..14.8.1440.P_2EQhc7Pz0").userAgent("Googlebot/2.1 (+http://www.googlebot.com/bot.html)").timeout(5000).get();
	Elements element = doc.getElementsByTag("h3");
	for (Element e : element) {
		Matcher m= Pattern.compile("/url\?q=(.*)&sa").matcher(e.getElementsByTag("a").get(0).attr("href"));
		if(m.find()){
			System.out.println(URLDecoder.decode(m.group(1),"UTF-8"));
		}
	}
}

正则的方式:

package org.javaweb.test;
   
import java.util.regex.Matcher;
import java.util.regex.Pattern;
   
public class TestReg {
       
    public static void main(String[] args) {
        String source="<h3 class="r"><a href="http://baidu.com">百度</a></h3><h3 class="r"><a href="http://google.com">谷歌</a></h3> ";
        StringBuilder resultComment=new StringBuilder();
        StringBuilder resultName=new StringBuilder();
        System.out.println("=======开始匹配========");
        String patternStrs="(<h3 class="r"><a.+?)href="(.+?)">(.+?)(</a></h3>)";
        Pattern pattern=Pattern.compile(patternStrs);
        Matcher matcher=pattern.matcher(source);
        while(matcher.find()){
            resultName.append(matcher.group(2)+"n");
            resultComment.append(matcher.group(3)+"n");
        }
        System.out.println("=======标签内内容=======");
        System.out.println(resultComment.toString());
        System.out.println("=======name属性值=======");
        System.out.println(resultName.toString());
    }
}

原文链接:

http://p2j.cn/?p=807


Viewing all articles
Browse latest Browse all 5

Trending Articles


Girasoles para colorear


mayabang Quotes, Torpe Quotes, tanga Quotes


Tagalog Quotes About Crush – Tagalog Love Quotes


OFW quotes : Pinoy Tagalog Quotes


Long Distance Relationship Tagalog Love Quotes


Tagalog Quotes To Move on and More Love Love Love Quotes


5 Tagalog Relationship Rules


Best Crush Tagalog Quotes And Sayings 2017


Re:Mutton Pies (lleechef)


FORECLOSURE OF REAL ESTATE MORTGAGE


Sapos para colorear


tagalog love Quotes – Tiwala Quotes


Break up Quotes Tagalog Love Quote – Broken Hearted Quotes Tagalog


Patama Quotes : Tagalog Inspirational Quotes


Pamatay na Banat and Mga Patama Love Quotes


Tagalog Long Distance Relationship Love Quotes


BARKADA TAGALOG QUOTES


“BAHAY KUBO HUGOT”


Vimeo 10.7.0 by Vimeo.com, Inc.


Vimeo 10.7.1 by Vimeo.com, Inc.