+ Reply to Thread
Page 1 of 2 12 LastLast
Results 1 to 10 of 15

Thread: How To Get Content From A Website

  1. #1
    Join Date
    Aug 2006
    Posts
    11,209
    Blog Entries
    6
    Rep Power
    101

    How To Get Content From A Website

    Scenario:
    Let’s say that you want to get some values, for example a paragraph or something else from a website into your application, and you want to manipulate the data or just display it to the user.

    Solution:
    In this tutorial I will show you how you can easily get any type of information from a website in an easy, fast and effective way. For this tutorial you will need some RegEx (Regular Expressions) knowledge, the complications of the regular expressions depends on the source code of the website you want to get the information, how well formatted it is and what part of the website you want to get. In this tutorial I will assume that you do have basic knowledge in C#, if not start with something more basic, and you will eventually get here.

    For the sake of this tutorial I will get the post count of a user that you specify on CodeCall. The logic is very simple, it works this way:
    • It opens the profile page of the user
    • Using regular expressions it will find the post count
    • It will display the information into the c# program
    • From there you can do whatever you want with the data

    First of all we want to know the source code of the website that will display the data that we want to go, so open the page that you want to access, view the source code and find the data that you want to get, now you will have to select that and some other HTML code along with it, make sure that it will be unique and that no such string exists on the website (don’t forget that the data that you want to get will change) so you want the HTML code that you select with it to be unique, not the string or paragraph per se, or otherwise the Regular Expression will pick up other data rather than the data that you want.

    In this case I found the source code that displays the post count, it’s something like this



    Code:
    <li><span class="shade">Total Posts:</span> PostCountHere</li>
    As you can see, it is unique in the whole document, because there is nothing else with that code, don’t forget that the Post Count will be changing.

    Now that you have that code, make a regular expression to pick up that string ONLY.

    Now you are ready to open Visual Studio, in my case I am using 2005, because my 2008 installation is corrupted. Anyways, it should work with both, so it won’t make a difference. In my case I’m going to use a Windows Application, so we design a nice looking GUI, for the sake of the tutorial I used
    1 button
    2 text areas (one of them is MultiLine)

    Should look something like this:

    How To Get Content From A Website-gcfw-1.jpg

    After that, double click on the Button and here we will write all the code, if you are going to make some complicated application I’d suggest you make this code into a class, but that’s another story.

    The code is the following

    Code:
    //Always use a try in case something happens we can easily handle it, and the program won't crash
                try
                {
                    //Make a new web request
                    System.Net.WebRequest req = System.Net.WebRequest.Create("http://forum.codecall.net/members/" + textBox1.Text.ToLower() + ".html");
                    System.Net.WebResponse resp = req.GetResponse();
    
                    //If the web request returns a null value, show an error
                    if (resp == null)
                    {
                        MessageBox.Show("Failed. Make Sure You Are Connected To The Internet.");
                    }
    
                    //Make a new StreamReader and insert the response stream into it
                    System.IO.StreamReader sr = new System.IO.StreamReader(resp.GetResponseStream());
    
                    //Insert that stream into a string
                    string txt = sr.ReadToEnd();
    
                    //Some Regular Expressions to get the data we want...
                    //But the other methods are less reliable and much more complicated
                    Regex myReg = new Regex("(Total Posts:<\\/span> )(.*?)(<\\/li>)", RegexOptions.IgnoreCase | RegexOptions.Singleline);
                    Match matchFound = myReg.Match(txt);
    
                    //If we find a match
                    if (matchFound.Success)
                    {
                        //Output the result
                          String result = matchFound.Groups[2].ToString();
                          output.Text = "We Accessed The URL: " + req.RequestUri.AbsoluteUri.ToString() + ". The User: " + textBox1.Text + " has " + result + " posts.";
                    }
                    //Else show an error
                    else
                    {
                        MessageBox.Show("RegEx Failed. Make Sure You Entered A Correct Username");
                    }
    
                }
                //if something goes wrong, catch it and output it into an error message instead of crashing.
                catch (Exception ex)
                {
                    MessageBox.Show("An error occured. " + ex.Message);
                }
    To be able to use the Regular Expressions you have to include the following code at the very top

    Code:
    using System.Text.RegularExpressions;
    The code is fully commented so you can easily edit it to your own needs. The code is pretty simple; it will make an instance of a WebRequest, it will request that URL, then if the server responds it will get the response (which will be the HTML code of the webpage you requested) and then using Regular Expressions we will be able to find the string that we want (in this case it’s the post count) and in this case we will just output it into a textbox, but once it’s stored into a variable in your program, you can do whatever you want with it.

    If you don’t know Regular Expressions, you might want to Google it, because Regular Expressions can get quite complicated. But if you have no clue you can use a bunch of if statements, a ton of patience and a lot of substring etc… but that is NOT the way to go, it’s very complicated and not so reliable, so I won’t explain it in this tutorial.

    Additional Help:
    If you will get a paragraph and you don’t want the <br> from appearing, you can replace that by getting the string that the Regular Expression returned and replace <br> or <br /> (or whatever) with Enviroment.NewLine

    For example

    Code:
    result = result.Replace("<br>", Environment.NewLine);
    But in this case we will only be getting one string, so we won’t need it.

    Conclusion:
    I hope that this tutorial was useful to you, now go and practice

  2. CODECALL Circuit advertisement
    Join Date
    Always
    Posts
    Many

     
  3. #2
    Join Date
    Mar 2008
    Location
    The North Pole
    Posts
    13,174
    Blog Entries
    13
    Rep Power
    114

    Re: How To Get Content From A Website

    Nice. For a general tutorial on downloading data from a website, see my tutorial: C#:Tutorial - Download Data

    Quote Originally Posted by Jordan View Post
    Good members, like yourself, stick around and post for ages to come!
    Mr. Xav | Blog | Forums

  4. #3
    Join Date
    Nov 2008
    Location
    Kosovo.
    Posts
    2,391
    Rep Power
    30

    Re: How To Get Content From A Website

    Good Tutorial , +rep .

  5. #4
    TALucas's Avatar
    TALucas is offline Learning Programmer
    Join Date
    Dec 2008
    Location
    Illinois
    Posts
    92
    Rep Power
    12

    Re: How To Get Content From A Website

    Nice tutorial....I've done similar stuff in Java.
    Your thoughts are the architects of your destiny.

  6. #5
    Join Date
    Sep 2008
    Location
    Kosovo
    Posts
    4,032
    Rep Power
    44

    Re: How To Get Content From A Website

    nice tutorial TcM .. +rep

  7. #6
    Jordan Guest

    Re: How To Get Content From A Website

    Very cool! +rep

  8. #7
    tinyy is offline Newbie
    Join Date
    Mar 2009
    Posts
    1
    Rep Power
    0

    Re: How To Get Content From A Website

    thanks alot for this its helping me on some of my noob programs

  9. #8
    Gemini's Avatar
    Gemini is offline Newbie
    Join Date
    Jul 2009
    Location
    Netherlands
    Posts
    8
    Rep Power
    0

    Re: How To Get Content From A Website

    Good tut.
    I can has kood

  10. #9
    Join Date
    Aug 2006
    Posts
    11,209
    Blog Entries
    6
    Rep Power
    101

    Re: How To Get Content From A Website

    Thanks for your feedback. You are all welcome

    And thanks for the +rep

  11. #10
    Join Date
    Mar 2009
    Posts
    1,375
    Rep Power
    24

    Re: How To Get Content From A Website

    I always wondered how HTTP pages can be downloaded, and processed in C#. Thanks TcM. +rep

+ Reply to Thread
Page 1 of 2 12 LastLast

Thread Information

Users Browsing this Thread

There are currently 1 users browsing this thread. (0 members and 1 guests)

Similar Threads

  1. Replies: 2
    Last Post: 05-28-2010, 03:25 PM
  2. Content Management System/Website Building Platform
    By Bioshox in forum Services for Buy/Sell/Trade
    Replies: 1
    Last Post: 05-08-2010, 11:36 AM
  3. Review my website: Free Website Build.com
    By buikie in forum Site Reviews
    Replies: 2
    Last Post: 03-09-2010, 12:32 AM
  4. Free website content
    By mysticalone in forum Hosting and Registrars
    Replies: 12
    Last Post: 01-19-2007, 05:54 AM
  5. Associated Content
    By DevilsCharm in forum The Lounge
    Replies: 3
    Last Post: 01-06-2007, 01:27 PM

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts