Let’s say that you want to get some values, for example a paragraph or something else from a website into your application, and you want to manipulate the data or just display it to the user.
Solution:
In this tutorial I will show you how you can easily get any type of information from a website in an easy, fast and effective way. For this tutorial you will need some RegEx (Regular Expressions) knowledge, the complications of the regular expressions depends on the source code of the website you want to get the information, how well formatted it is and what part of the website you want to get. In this tutorial I will assume that you do have basic knowledge in C#, if not start with something more basic, and you will eventually get here.
For the sake of this tutorial I will get the post count of a user that you specify on CodeCall. The logic is very simple, it works this way:
- It opens the profile page of the user
- Using regular expressions it will find the post count
- It will display the information into the c# program
- From there you can do whatever you want with the data
First of all we want to know the source code of the website that will display the data that we want to go, so open the page that you want to access, view the source code and find the data that you want to get, now you will have to select that and some other HTML code along with it, make sure that it will be unique and that no such string exists on the website (don’t forget that the data that you want to get will change) so you want the HTML code that you select with it to be unique, not the string or paragraph per se, or otherwise the Regular Expression will pick up other data rather than the data that you want.
In this case I found the source code that displays the post count, it’s something like this
<li><span class="shade">Total Posts:</span> PostCountHere</li>
As you can see, it is unique in the whole document, because there is nothing else with that code, don’t forget that the Post Count will be changing.
Now that you have that code, make a regular expression to pick up that string ONLY.
Now you are ready to open Visual Studio, in my case I am using 2005, because my 2008 installation is corrupted. Anyways, it should work with both, so it won’t make a difference. In my case I’m going to use a Windows Application, so we design a nice looking GUI, for the sake of the tutorial I used
1 button
2 text areas (one of them is MultiLine)
Should look something like this:
[ATTACH]1314[/ATTACH]
After that, double click on the Button and here we will write all the code, if you are going to make some complicated application I’d suggest you make this code into a class, but that’s another story.
The code is the following
//Always use a try in case something happens we can easily handle it, and the program won't crash
try
{
//Make a new web request
System.Net.WebRequest req = System.Net.WebRequest.Create("http://forum.codecall.net/members/" + textBox1.Text.ToLower() + ".html");
System.Net.WebResponse resp = req.GetResponse();
//If the web request returns a null value, show an error
if (resp == null)
{
MessageBox.Show("Failed. Make Sure You Are Connected To The Internet.");
}
//Make a new StreamReader and insert the response stream into it
System.IO.StreamReader sr = new System.IO.StreamReader(resp.GetResponseStream());
//Insert that stream into a string
string txt = sr.ReadToEnd();
//Some Regular Expressions to get the data we want...
//But the other methods are less reliable and much more complicated
Regex myReg = new Regex("(Total Posts:<\\/span> )(.*?)(<\\/li>)", RegexOptions.IgnoreCase | RegexOptions.Singleline);
Match matchFound = myReg.Match(txt);
//If we find a match
if (matchFound.Success)
{
//Output the result
String result = matchFound.Groups[2].ToString();
output.Text = "We Accessed The URL: " + req.RequestUri.AbsoluteUri.ToString() + ". The User: " + textBox1.Text + " has " + result + " posts.";
}
//Else show an error
else
{
MessageBox.Show("RegEx Failed. Make Sure You Entered A Correct Username");
}
}
//if something goes wrong, catch it and output it into an error message instead of crashing.
catch (Exception ex)
{
MessageBox.Show("An error occured. " + ex.Message);
}
To be able to use the Regular Expressions you have to include the following code at the very top
using System.Text.RegularExpressions;
The code is fully commented so you can easily edit it to your own needs. The code is pretty simple; it will make an instance of a WebRequest, it will request that URL, then if the server responds it will get the response (which will be the HTML code of the webpage you requested) and then using Regular Expressions we will be able to find the string that we want (in this case it’s the post count) and in this case we will just output it into a textbox, but once it’s stored into a variable in your program, you can do whatever you want with it.
If you don’t know Regular Expressions, you might want to Google it, because Regular Expressions can get quite complicated. But if you have no clue you can use a bunch of if statements, a ton of patience and a lot of substring etc… but that is NOT the way to go, it’s very complicated and not so reliable, so I won’t explain it in this tutorial.
Additional Help:
If you will get a paragraph and you don’t want the <br> from appearing, you can replace that by getting the string that the Regular Expression returned and replace <br> or <br /> (or whatever) with Enviroment.NewLine
For example
result = result.Replace("<br>", Environment.NewLine);
But in this case we will only be getting one string, so we won’t need it.
Conclusion:
I hope that this tutorial was useful to you, now go and practice :)


Sign In
Create Account



Back to top










