Jump to content

Accessing Temporary Internet Files

- - - - -

  • Please log in to reply
4 replies to this topic

#1
DorumonSg

DorumonSg

    Newbie

  • Members
  • Pip
  • 5 posts
Correct me if I am wrong, ALL web sites viewed have their htm/html/aspx/jsp pages downloaded into the Temporary Internet Files right? I am trying to access the Temporary Internet Files to collect and copy Information from these web sites. For example if I view a page on Wikipedia, I want to pull the HTML file out of my Temporary Internet Files and then extract the content of the Wikipedia out of it.

So I am doing an experiment to see if I can copy files out of my Temporary Internet Files

I am trying to access my Temporary Internet Files and then copy out some files that are accessed at the same time the web page has completed loading or later(This is to ensure that I only copy out the files that from the web site I am currently viewing) but it is not working.

On top of that even if I were to try manually open my Temporary Internet Files, I do not see any htm/html/aspx/jsp, all I see are images and scripts. I am unsure if I am even in the correct direction to start with. Please direct me.



private void webBrowser1_DocumentCompleted(object sender, WebBrowserDocumentCompletedEventArgs e)

        {

            currentDateTime = DateTime.Now;

        }


private void toolStripButton1_Click(object sender, EventArgs e)

        {

            

            String temporaryInternetFilesPath = Environment.GetFolderPath(Environment.SpecialFolder.InternetCache);

            DirectoryInfo directoryInfo = new DirectoryInfo(temporaryInternetFilesPath);

            int x = 0;

            foreach (FileInfo fileInfo in directoryInfo.GetFiles())

            {

                if (fileInfo.LastAccessTime >= currentDateTime)

                {

                    fileInfo.CopyTo(@"C:\Users\Justin\Documents\Visual Studio 2010\Projects\WindowsFormsApplication1\WindowsFormsApplication1\bin\Debug\Test\fileCopy" + x + ".txt");

                    x = x + 1;

                }

            }

        }


#2
WingedPanther

WingedPanther

    A spammer's worst nightmare

  • Moderators
  • 16,831 posts
  • Location:Upstate, South Carolina
  • Programming Language:C, C++, PL/SQL, Delphi/Object Pascal, Pascal, Transact-SQL, Others
  • Learning:Java, C#, PHP, JavaScript, Lisp, Fortran, Haskell, Others
What browser are you using? Different browsers do this differently.
Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

#3
Alexander

Alexander

    It's Science!

  • Moderators
  • 4,118 posts
  • Location:Vancouver, Eh! Cleverness: 200
If you wish to retrieve a web page, you may wish to use the facilities provided to you by the .NET framework to download the page itself and its supporting scripts (through link searching and downloading) however the CSS will be not so useful if you are trying to extract data.

The temporary Internet files as mentioned may only be used by IE, and may change file structure or even name between versions of IE and you will be guessing.
Be sure to read the updated FAQ! || Health is achieved through the same 10,000 steps.
If a suggested code/method fails, informing us is less important than telling us why or what errors occurred.

#4
CommittedC0der

CommittedC0der

    Speaks fluent binary

  • Members
  • PipPipPipPipPipPipPipPip
  • 1,565 posts
As Alexander said, it'd be better to just download the source from that page rather then pulling it out of the temp files. This link may help with that. :)
How can I download HTML source in C# - Stack Overflow
~ Committed.
A man can be defined by what he does when no one is looking.
Science is only an educated theory, which we cannot disprove.

#5
sam_coder

sam_coder

    Programming Expert

  • Members
  • PipPipPipPipPipPip
  • 372 posts
committed, check out the HtmlAgilityPack

its amazing for scraping information like this, it allows you to treat the HTML strucutre as a DOM, like XmlDocument,
and allows you to fire xpath queries against it.

it will readily handle issues like the fact that most html pages are not xml compliant. it doesnt require you ensure that its well formatted, it also provides out of the box facilities to remove junk, like inline javascript, tags, css or whatever.

anywho, check it out

Html Agility Pack




1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users