Scenario:
Let’s say that you want to get some values, for example a paragraph or something else from a website into your application, and you want to manipulate the data or just display it to the user.
Solution:
In this tutorial I will show you how you can easily get any type of information from a website in an easy, fast and effective way. For this tutorial you will need some RegEx (Regular Expressions) knowledge, the complications of the regular expressions depends on the source code of the website you want to get the information, how well formatted it is and what part of the website you want to get. In this tutorial I will assume that you do have basic knowledge in C#, if not start with something more basic, and you will eventually get here.
For the sake of this tutorial I will get the post count of a user that you specify on CodeCall. The logic is very simple, it works this way:
- It opens the profile page of the user
- Using regular expressions it will find the post count
- It will display the information into the c# program
- From there you can do whatever you want with the data
First of all we want to know the source code of the website that will display the data that we want to go, so open the page that you want to access, view the source code and find the data that you want to get, now you will have to select that and some other HTML code along with it, make sure that it will be unique and that no such string exists on the website (don’t forget that the data that you want to get will change) so you want the HTML code that you select with it to be unique, not the string or paragraph per se, or otherwise the Regular Expression will pick up other data rather than the data that you want.
In this case I found the source code that displays the post count, it’s something like this
As you can see, it is unique in the whole document, because there is nothing else with that code, don’t forget that the Post Count will be changing.Code:<li><span class="shade">Total Posts:</span> PostCountHere</li>
Now that you have that code, make a regular expression to pick up that string ONLY.
Now you are ready to open Visual Studio, in my case I am using 2005, because my 2008 installation is corrupted. Anyways, it should work with both, so it won’t make a difference. In my case I’m going to use a Windows Application, so we design a nice looking GUI, for the sake of the tutorial I used
1 button
2 text areas (one of them is MultiLine)
Should look something like this:
After that, double click on the Button and here we will write all the code, if you are going to make some complicated application I’d suggest you make this code into a class, but that’s another story.
The code is the following
To be able to use the Regular Expressions you have to include the following code at the very topCode://Always use a try in case something happens we can easily handle it, and the program won't crash try { //Make a new web request System.Net.WebRequest req = System.Net.WebRequest.Create("http://forum.codecall.net/members/" + textBox1.Text.ToLower() + ".html"); System.Net.WebResponse resp = req.GetResponse(); //If the web request returns a null value, show an error if (resp == null) { MessageBox.Show("Failed. Make Sure You Are Connected To The Internet."); } //Make a new StreamReader and insert the response stream into it System.IO.StreamReader sr = new System.IO.StreamReader(resp.GetResponseStream()); //Insert that stream into a string string txt = sr.ReadToEnd(); //Some Regular Expressions to get the data we want... //But the other methods are less reliable and much more complicated Regex myReg = new Regex("(Total Posts:<\\/span> )(.*?)(<\\/li>)", RegexOptions.IgnoreCase | RegexOptions.Singleline); Match matchFound = myReg.Match(txt); //If we find a match if (matchFound.Success) { //Output the result String result = matchFound.Groups[2].ToString(); output.Text = "We Accessed The URL: " + req.RequestUri.AbsoluteUri.ToString() + ". The User: " + textBox1.Text + " has " + result + " posts."; } //Else show an error else { MessageBox.Show("RegEx Failed. Make Sure You Entered A Correct Username"); } } //if something goes wrong, catch it and output it into an error message instead of crashing. catch (Exception ex) { MessageBox.Show("An error occured. " + ex.Message); }
The code is fully commented so you can easily edit it to your own needs. The code is pretty simple; it will make an instance of a WebRequest, it will request that URL, then if the server responds it will get the response (which will be the HTML code of the webpage you requested) and then using Regular Expressions we will be able to find the string that we want (in this case it’s the post count) and in this case we will just output it into a textbox, but once it’s stored into a variable in your program, you can do whatever you want with it.Code:using System.Text.RegularExpressions;
If you don’t know Regular Expressions, you might want to Google it, because Regular Expressions can get quite complicated. But if you have no clue you can use a bunch of if statements, a ton of patience and a lot of substring etc… but that is NOT the way to go, it’s very complicated and not so reliable, so I won’t explain it in this tutorial.
Additional Help:
If you will get a paragraph and you don’t want the <br> from appearing, you can replace that by getting the string that the Regular Expression returned and replace <br> or <br /> (or whatever) with Enviroment.NewLine
For example
But in this case we will only be getting one string, so we won’t need it.Code:result = result.Replace("<br>", Environment.NewLine);
Conclusion:
I hope that this tutorial was useful to you, now go and practice![]()
Nice. For a general tutorial on downloading data from a website, see my tutorial: C#:Tutorial - Download Data
Good Tutorial , +rep .
nice tutorial TcM .. +rep![]()
Very cool! +rep
thanks alot for this its helping me on some of my noob programs![]()
Good tut.
I can has kood
Thanks for your feedback. You are all welcome
And thanks for the +rep
I always wondered how HTTP pages can be downloaded, and processed in C#. Thanks TcM. +rep![]()
proudly presenting my personal website and game website: F1Simulation. a thrilling Managed DirectX racing game... also my Ask Me
look at my tutorials about cropping images and Mono: bundling Mono with programs and lambda expressions
There are currently 1 users browsing this thread. (0 members and 1 guests)
Bookmarks