Jump to content




Recent Status Updates

View All Updates

Developed by TechBiz Xccelerator
Photo
- - - - -

offline explorer(how save images , css files ,js files?)


This topic has been archived. This means that you cannot reply to this topic.
12 replies to this topic

#1 mark92

mark92

    CC Newcomer

  • Member
  • PipPip
  • 16 posts

Posted 21 December 2012 - 01:47 AM

Hello my dear friends.
i want to write a offline explorer program with java.
i save url html code that user typed.(output.txt)
now how to sava images and css files and java scripts file?[Without the use of htmlparse class]
Thanks.(output.txt seve in C:\Users\fh\Documents\NetBeansProjects\offline explorer\output.txt)

i with following code save html code.
public static void main(String[] args) throws MalformedURLException, IOException {
String urlString;
if (args.length == 1)
	 urlString = args[0];
else {
	 System.out.println("Enter URL:");
	 Scanner s = new Scanner(System.in);
	 urlString = s.next();
	 System.out.println("Using " + urlString);
}
URL u = new URL(urlString);
URLConnection connection = u.openConnection();
HttpURLConnection httpConnection = (HttpURLConnection) connection;
int code = httpConnection.getResponseCode();
String message = httpConnection.getResponseMessage();
System.out.println(code + " " + message);
if (code != HttpURLConnection.HTTP_OK)
	 return;

InputStream instream = connection.getInputStream();
Scanner in = new Scanner(instream);
PrintWriter out = new PrintWriter("output.txt");
while (in.hasNextLine())
{
	 String input = in.nextLine();
	 out.println(input);
	 System.out.println(input);
}
out.close();

Edited by mark92, 21 December 2012 - 01:54 AM.


#2 wim DC

wim DC

    Roar

  • Expert Member
  • PipPipPipPipPipPipPipPip
  • 2,582 posts

Posted 21 December 2012 - 03:41 AM

1) There's a nice scanner trick to read the whole inputstream at once:
String html = new Scanner(instream).useDelimiter("\\A").next();

2)You'll have to parse that string. Either by using indexOf(..) or by using a regular expression (Pattern and Matcher class in Java)
To get a Javascript file url for example, a simple regex could be
<script .*src="(.*?)">

Same idea for images and css files.

Once you have those "urls" you must check whether they are relative, or absolute urls. If they are dynamic, you will have to paste the website's url in front to get the correct location.

Once you got the URLs it's just opening a connection again, and downloading the bytes.

Edited by wim DC, 21 December 2012 - 03:42 AM.


#3 mark92

mark92

    CC Newcomer

  • Member
  • PipPip
  • 16 posts

Posted 21 December 2012 - 06:17 AM

hi
Can you explain more?

#4 wim DC

wim DC

    Roar

  • Expert Member
  • PipPipPipPipPipPipPipPip
  • 2,582 posts

Posted 21 December 2012 - 06:25 AM

Assume the html String is the full html you've downloaded:
        String html = "some \n random html\n <script type=\"text/javascript\" src=\"http://www.google.com/script.js\"></script> and \nthe rest \n of the \n html";
	    System.out.println(html);
	    System.out.println("\n\n");
	    Pattern pattern = Pattern.compile("<script .*src=\"(.*?)\">");
	    final Matcher matcher = pattern.matcher(html);
	    if(matcher.find()){
		    System.out.println("script url : " + matcher.group(1));
	    }

If the result would've been just "/script.js" then the url is relative, and you should prepend the website's url to have to full url of the script file.

#5 mark92

mark92

    CC Newcomer

  • Member
  • PipPip
  • 16 posts

Posted 21 December 2012 - 06:59 AM

I can not define string html.
Gives error.
why?
Posted Image

#6 wim DC

wim DC

    Roar

  • Expert Member
  • PipPipPipPipPipPipPipPip
  • 2,582 posts

Posted 21 December 2012 - 07:01 AM

Because you have quotes in your String, which terminates the String. You must escape quotes with a backslash like I did.

#7 mark92

mark92

    CC Newcomer

  • Member
  • PipPip
  • 16 posts

Posted 21 December 2012 - 07:30 AM

Did this work will be performed with the code(in output.txt)?

#8 wim DC

wim DC

    Roar

  • Expert Member
  • PipPipPipPipPipPipPipPip
  • 2,582 posts

Posted 21 December 2012 - 07:32 AM

I don't understand what you mean.

#9 mark92

mark92

    CC Newcomer

  • Member
  • PipPip
  • 16 posts

Posted 21 December 2012 - 07:39 AM

now how save js files on my computer?

#10 wim DC

wim DC

    Roar

  • Expert Member
  • PipPipPipPipPipPipPipPip
  • 2,582 posts

Posted 21 December 2012 - 07:41 AM

Once you got the url, download them like you downloaded the HTML.

#11 mark92

mark92

    CC Newcomer

  • Member
  • PipPip
  • 16 posts

Posted 21 December 2012 - 01:52 PM

if string html have More than one tag script then what do?

#12 wim DC

wim DC

    Roar

  • Expert Member
  • PipPipPipPipPipPipPipPip
  • 2,582 posts

Posted 22 December 2012 - 03:55 AM

do matcher.find() again, it will find the next occurrence. Use a while loop while( matcher.find() ) { ... }




Powered by binpress