Jump to content


Check out our Community Blogs

Register and join over 40,000 other developers!


Recent Status Updates

View All Updates

Photo
- - - - -

[HELP]Project: IMDb Finder


This topic has been archived. This means that you cannot reply to this topic.
17 replies to this topic

#1 Tonchi

Tonchi

    Helping the world with programming

  • Expert Member
  • PipPipPipPipPipPipPip
  • 1249 posts

Posted 15 December 2012 - 03:10 AM

I am doing IMDb finder. This app should search the imdb.com for movies via my application. Here is the related code for searching and putting the results in listBox1 instance:

ObservableCollection<String> movies = new ObservableCollection<string>();
if (textBox1.Text != null)
{
     //Setting a new value to navbar-query element in HTML document
     wb.document.GetElementById("navbar-query").SetAttribute("value", textBox1.Text);
     //Choosing the existing value for quicksearch element in HTML document
     wb.document.GetElementById("quicksearch").SetAttribute("value", "tt");
     //Click the button to accept the search query
     HtmlElement acceptButton = wb.document.GetElementById("navbar-submit-button");
     if (acceptButton != null)
     {
          acceptButton.InvokeMember("click");
     }
     //Creating a new event handler for wb instance
     wb.DocumentCompleted += (send, ev) =>
     {
          foreach(char gl in wb.DocumentText)
          if (wb.DocumentText.Contains(textBox1.Text))
          {
               movies.Add(Convert.ToString(gl));
          }
      };
      listBox1.Items.Add(movies);
}

It seems that my logic for foreach loop is not legit. All I get in the listBox is "Collection" as a result and nothing more. Can someone correct my logic? I have tried to work with:

Regex.Match("([" + textBox1.Text + "])");

But it seems that I can't put that part in the Contains method. So I am screwed to work with RegEx. Maybe I should use LINQ to query DocumentText?

Microsoft Student Partner, Microsoft Certified Professional


#2 BlackRabbit

BlackRabbit

    CodeCall Legend

  • Expert Member
  • PipPipPipPipPipPipPipPip
  • 3871 posts

Posted 15 December 2012 - 04:19 AM

I don't get much your logic...
you are looping every char in the web page content, to see the same thing: does the whole doc contain the text, and as movie you add a char?

The regex i thing is wrong, it should trim the .text and you need to quit the [] and (), and you need to make sure yo add the Regex.Multiline for whole text parsing, else it stops with the first carriage return.

In the other hand, I understand IMDB has its own api.

#3 Tonchi

Tonchi

    Helping the world with programming

  • Expert Member
  • PipPipPipPipPipPipPip
  • 1249 posts

Posted 15 December 2012 - 04:37 AM

I don't want to use their api. I want to create software by my own code. It is more challenging. If DocumentText is a wrong property for retrieving the text HTML document, which property should I use then? The main problem is to loop every line in the HTML document to check if there is and part of text that contains specific string. I know that my logic is wrong in my code, only because I don't know which property to use to retrieve string instead of char.

Microsoft Student Partner, Microsoft Certified Professional


#4 BlackRabbit

BlackRabbit

    CodeCall Legend

  • Expert Member
  • PipPipPipPipPipPipPipPip
  • 3871 posts

Posted 15 December 2012 - 04:44 AM

I will throw this to chance ( in not opening old projects to check sake ), I think the method you are looking for is : in the body element (or div or whatever container of the text) the InnerText property.

and then, if you want to look for words in there, if you wan to use contains, you can do a a regexp like ( param , param , Regex.Multiline ) it will return a collection with all the matches of your word in the text.

if you want the text line by line, you do the stringvar.Split( "\n" ) which will give you an string array (forecheable) with one line by array row ;)

#5 Tonchi

Tonchi

    Helping the world with programming

  • Expert Member
  • PipPipPipPipPipPipPip
  • 1249 posts

Posted 15 December 2012 - 05:31 AM

I will have to ask you nicely to open your old projects to check it.

Microsoft Student Partner, Microsoft Certified Professional


#6 BlackRabbit

BlackRabbit

    CodeCall Legend

  • Expert Member
  • PipPipPipPipPipPipPipPip
  • 3871 posts

Posted 15 December 2012 - 08:36 PM

Oki, now I see the trick, you need to work with the HtmlWindow object. take a look to the "Illustrative" example.

foreach (HtmlWindow win in currentPage.document.window.Frames)
{
	    if (win.Name.ToString().Trim() == "your windows, frame, or whatever")
	   {
		    MaddosText = win.document.Body.InnerText.ToString();
	   }
}

So, you find the window in the page (main page is a windows too), then you look for the body, and then as foretold, the innerText.

It will give you the clean text ;)

#7 Tonchi

Tonchi

    Helping the world with programming

  • Expert Member
  • PipPipPipPipPipPipPip
  • 1249 posts

Posted 16 December 2012 - 06:08 AM

Here is my project:
http://uploading.com...IMDb-Finder-rar

You will see what I want to do and what is wrong.

Microsoft Student Partner, Microsoft Certified Professional


#8 kernelcoder

kernelcoder

    CC Devotee

  • Expert Member
  • PipPipPipPipPipPip
  • 990 posts

Posted 16 December 2012 - 07:08 AM

The following code may help up to some extent --
/* Find rows of the first table */
HtmlElementCollection tables = _ieBrowser.document.GetElementsByTagName("table");
HtmlElementCollection rows = tables[0].GetElementsByTagName("tr");

/* for each row, take the non-empty cell value as search result*/
foreach (HtmlElement row in rows)
{
HtmlElementCollection cells = row.GetElementsByTagName("td");
foreach (HtmlElement cell in cells)
{
String text = cell.InnerText;
if (!String.IsNullOrEmpty(text) && !String.IsNullOrWhiteSpace(text))
{
listBox1.Items.Add(text);
}
}
}
Note that the above code is particularly for www.imdb.com site.

Edited by kernelcoder, 16 December 2012 - 07:17 AM.


#9 Tonchi

Tonchi

    Helping the world with programming

  • Expert Member
  • PipPipPipPipPipPipPip
  • 1249 posts

Posted 16 December 2012 - 09:36 AM

If I use like this:

wb.DocumentCompleted += (send, ev) =>
			 {
				 /* Find rows of the first table */
				 HtmlElementCollection tables = wb.document.GetElementsByTagName("table");
				 HtmlElementCollection rows = tables[0].GetElementsByTagName("tr");
				 /* for each row, take the non-empty cell value as search result*/
				 foreach (HtmlElement row in rows)
				 {
					 HtmlElementCollection cells = row.GetElementsByTagName("td");
					 foreach (HtmlElement cell in cells)
					 {
						 String text = cell.InnerText;
						 if (!String.IsNullOrEmpty(text) && !String.IsNullOrWhiteSpace(text))
						 {
							 listBox1.Items.Add(text);
						 }
					 }
				 }
			 };

I get the runtime exception when click event is handled.
Exception is:

ArgumentOutOfRangeExceptio was unhadnled by user code:
Value of '0' is not valid for 'index'. 'index' should be between 0 and -1.


Edited by Tonchi, 16 December 2012 - 09:41 AM.

Microsoft Student Partner, Microsoft Certified Professional


#10 kernelcoder

kernelcoder

    CC Devotee

  • Expert Member
  • PipPipPipPipPipPip
  • 990 posts

Posted 16 December 2012 - 09:41 AM

You should use that code in the DocumentCompleted event after the page is searched. So here are the steps --
  • Navigate to the www.imdb.com url
  • In the DocumentCompleted fill the 'Search key' and hit the 'Search' button
  • In the DocumentCompleted test the code I post above.


#11 Tonchi

Tonchi

    Helping the world with programming

  • Expert Member
  • PipPipPipPipPipPipPip
  • 1249 posts

Posted 16 December 2012 - 09:53 AM

I have tested your part of the code. It is like this now:

public partial class MainWindow : Window
    {
	    //Creating a new instance of WinForms WebBrowser class
	    System.Windows.Forms.WebBrowser wb = new System.Windows.Forms.WebBrowser();
	    public MainWindow()
	    {
		    InitializeComponent();
		    //navigating wb instance to http://www.imdb.com
		    wb.Navigate("www.imdb.com");
		    wb.ScriptErrorsSuppressed = true;
		    wb.DocumentCompleted += (send, ev) =>
		    {
			    if (wb.DocumentText.Contains("Results for:"))
			    {
				    /* Find rows of the first table */
				    HtmlElementCollection tables = wb.document.GetElementsByTagName("table");
				    HtmlElementCollection rows = tables[0].GetElementsByTagName("tr");
				    /* for each row, take the non-empty cell value as search result*/
				    foreach (HtmlElement row in rows)
				    {
					    HtmlElementCollection cells = row.GetElementsByTagName("td");
					    foreach (HtmlElement cell in cells)
					    {
						    String text = cell.InnerText;
						    if (!String.IsNullOrEmpty(text) && !String.IsNullOrWhiteSpace(text))
						    {
							    listBox1.Items.Insert(0, text);
						    }
						    else
						    {
							    listBox1.Items.Add("There is no result");
						    }
					    }
				    }
			    }
		    };
	    }
	    private void Button_Click_1(object sender, RoutedEventArgs e)
	    {
		    ObservableCollection<String> movies = new ObservableCollection<String>();
		    if (textBox1.Text != null)
		    {
			    if (wb.DocumentText.Contains("NewsDesk"))
			    {
				    //Setting a new value to navbar-query element in HTML document
				    wb.document.GetElementById("navbar-query").SetAttribute("value", textBox1.Text);
				    //Choosing the existing value for quicksearch element in HTML document
				    wb.document.GetElementById("quicksearch").SetAttribute("value", "tt");
				    //Click the button to accept the search query
				    HtmlElement acceptButton = wb.document.GetElementById("navbar-submit-button");
				    if (acceptButton != null)
				    {
					    acceptButton.InvokeMember("click");
				    }
			    }
		    }
	    }
			    //Creating a new event handler for wb instance
			    //MaddosText = win.document.Body.InnerText.ToString();
			   
				
	    }

I don't get any errors or exceptions but I don't get the result either. It is just doing something in the background and it stops in some point of the program. After that I can click my button again.

Microsoft Student Partner, Microsoft Certified Professional


#12 kernelcoder

kernelcoder

    CC Devotee

  • Expert Member
  • PipPipPipPipPipPip
  • 990 posts

Posted 16 December 2012 - 10:03 AM

Here is the code I tested and following is the image from my testing --
public partial class IMDBBrowser : Form
{
int _state = 0;

private void button1_Click(object sender, EventArgs e)
{
button1.Enabled = false;
_state = 0;
_ieBrowser.Navigate("http://www.imdb.com/");

}

public IMDBBrowser()
{
InitializeComponent();

_ieBrowser.DocumentCompleted += (send, ev) =>
{
if (_state == 0)
{
_ieBrowser.document.GetElementById("navbar-query").SetAttribute("value", textBox1.Text);
//Choosing the existing value for quicksearch element in HTML document
_ieBrowser.document.GetElementById("quicksearch").SetAttribute("value", "tt");
//Click the button to accept the search query
HtmlElement acceptButton = _ieBrowser.document.GetElementById("navbar-submit-button");
if (acceptButton != null)
{
acceptButton.InvokeMember("click");
}
_state = 1;
}
else if (_state == 1)
{
HtmlElementCollection tables = _ieBrowser.document.GetElementsByTagName("table");
HtmlElementCollection rows = tables[0].GetElementsByTagName("tr");
foreach (HtmlElement row in rows)
{
HtmlElementCollection cells = row.GetElementsByTagName("td");
foreach (HtmlElement cell in cells)
{
String text = cell.InnerText;
if (!String.IsNullOrEmpty(text) && !String.IsNullOrWhiteSpace(text))
{
listBox1.Items.Add(text);
}
}
}

_state = 2;
}
};

}
}
LieToMe.png




Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download