Jump to content

Xhtml class in C#???

- - - - -

  • Please log in to reply
19 replies to this topic

#1
Tonchi

Tonchi

    Programming Expert

  • Members
  • PipPipPipPipPipPip
  • 471 posts
  • Location:Varaždin
  • Programming Language:C, C++, C#
Is there any Xhtml class in C# which is not for asp.net??? I need it for WinForms
I need that class so I can read attribute values from some sites which is using Xhtml

#2
Momerath

Momerath

    Programming Professional

  • Members
  • PipPipPipPipPip
  • 243 posts
The XML classes should be able to deal with XHTML.

#3
sam_coder

sam_coder

    Programming Expert

  • Members
  • PipPipPipPipPipPip
  • 372 posts
problem is not all XHtml pages are compliant.

try downloading the HtmlAgilityPack, its amazing. It will handle xhtml and even just plain html, in a DOM that is very similar to the Xml Dom. It even allows you to fire XPath queries against it.

I could provide an example when I get back to my workstation. But anyways, its available on codeplex.

Html Agility Pack

#4
Tonchi

Tonchi

    Programming Expert

  • Members
  • PipPipPipPipPipPip
  • 471 posts
  • Location:Varaždin
  • Programming Language:C, C++, C#
can you give some review how to use it (to sam_coder)...i want to read some value in html code from my program and show it inside datagridview control...i was never working with html inside c# so it any help would be great...
there is some value from hrportfolio.com | FONDOVI | Hrvatski otvoreni investicijski fondovi detaljno - prinosi fondova, grafi ki prikaz, usporedba fondova, kupnja udjela u fondu, pristup fondu | like KD Victoria...only values from 2 columns "Vrijednost" and "Promj.%"
can someone give me an example

#5
sam_coder

sam_coder

    Programming Expert

  • Members
  • PipPipPipPipPipPip
  • 372 posts
I'll see what I can do tonchi.

What you're trying to do, is typically called 'scraping'.

It has some dangers associated with it. for instance, scraping content without permission might be against the end user policies, it's always better to ask permission.

It's also possible that the page could change, rendering your scraping code incompatable. XPath greatly midigates this risk mind you, generally keeping code changes to a minimum.

Anyways, that said, I'll see if I can whip something up simple.

#6
sam_coder

sam_coder

    Programming Expert

  • Members
  • PipPipPipPipPipPip
  • 372 posts
Hey Tonchi,
yea, you will likely have to re-reference the HtmlAgilityPack, not sure if I packaged it or not.

I can't read your language, so I just called a couple of columns, data, data 2, etc

But you get the gist.

Attached File  ScreenScraping.zip   179.01K   6 downloads

So, anyways, I haven't tried to make it look pretty or anything. I'm kinda busy. =)

What gets complicated about these documents, and this one in particular is all the embedded tables. tables and tables and tables.

So what I do, is I look for a spot in the document, and sync on that. Just download the page, and ignore everything up to that point. That allows me to throw much cleaner XPath queries against it.

the Html Agility Pack is really forgiving, so you can throw any malformed markup at all, I've never seen it not make sense of it. =)

enjoy!

#7
sam_coder

sam_coder

    Programming Expert

  • Members
  • PipPipPipPipPipPip
  • 372 posts
oh, and the solution XML file, thats just the source from the page you wanted me to hit, You can ignore that, I just had it there to keep my head straight

#8
Tonchi

Tonchi

    Programming Expert

  • Members
  • PipPipPipPipPipPip
  • 471 posts
  • Location:Varaždin
  • Programming Language:C, C++, C#
tnx a lot...that was my first problem...
but i have a 2 more questions...first question is how did you fill the names of columns inside table (Name, Data, Data2, Data3)
and second question is how can i save datas from that table (as i can see it's not datagridview control)...after i save those data i want to load it from last save and when i click "upload" button that a new data change gets in new row, so previous data is still on that table

#9
sam_coder

sam_coder

    Programming Expert

  • Members
  • PipPipPipPipPipPip
  • 372 posts
ok, well I used a list view control, and list view item has a constructor that takes a string array

in the Update Grid, you can see that I'm just appending each column of the table into the array.

A data grid view would work very similarly. Just make a data table with the appropriate columns, and then use an object array (make sure all columns are string type)

so then you end up with

dt.rows.add(new object[] { 

 node.SelectSingleNode("td[@class='colFond']/a").InnerText, //scrape the appropriate fields

 node.SelectSingleNode("td[@class='colDatum']").InnerText,

 node.SelectSingleNode("td[@class='colVrijednost']").InnerText,

 node.SelectSingleNode("td[@class='colValuta']").InnerText

}



The data grid view can be bound to that data table, using the DataSource property on the grid.

That's it, that would allow this to be shown

#10
Tonchi

Tonchi

    Programming Expert

  • Members
  • PipPipPipPipPipPip
  • 471 posts
  • Location:Varaždin
  • Programming Language:C, C++, C#
is there any property to allow me to copy those informations so i can paste it in word???

#11
Tonchi

Tonchi

    Programming Expert

  • Members
  • PipPipPipPipPipPip
  • 471 posts
  • Location:Varaždin
  • Programming Language:C, C++, C#
is this the way to copy text from listbox:

private void listBox1_MouseClick(object sender, MouseEventArgs e)

       {

           if (e.Button == MouseButtons.Right)

           {

               Clipboard.SetText(listBox1.Items[listBox1.SelectedIndex].ToString());

 

 

           }

       }

 

and can i paste it to word???

#12
Tonchi

Tonchi

    Programming Expert

  • Members
  • PipPipPipPipPipPip
  • 471 posts
  • Location:Varaždin
  • Programming Language:C, C++, C#
and what's with the scrapeResult??? where did you defined it??? i copied every single code from your project into mine and this is error from VS:

Quote

Error 2 The name 'scrapeResults' does not exist in the current context c:\documents and settings\antonio\my documents\visual studio 2010\Projects\WindowsFormsApplication2\WindowsFormsApplication2\Form1.cs 66 13 WindowsFormsApplication2


from those lines:


...

doc.LoadHtml(content); //Load the content into the structure


            scrapeResults.Items.Clear(); //clear the table


            foreach (HtmlNode node in doc.DocumentNode.SelectNodes(

                "/div/div/table[@id='tabelaTec1']/tbody/tr"))

...

and


...

{

                //add a new listview item for each item in the table

                scrapeResults.Items.Add(new ListViewItem(new string[] {

                     node.SelectSingleNode("td[@class='colFond']/a").InnerText, //scrape the appropriate fields

                     node.SelectSingleNode("td[@class='colDatum']").InnerText;

...


and what's with those error (i can't fix it)

Quote

Error 1 The type or namespace name 'Form1' could not be found (are you missing a using directive or an assembly reference?) c:\documents and settings\antonio\my documents\visual studio 2010\Projects\WindowsFormsApplication2\WindowsFormsApplication2\Program.cs 18 33 WindowsFormsApplication2


it's from here:


static void Main()

        {

            Application.EnableVisualStyles();

            Application.SetCompatibleTextRenderingDefault(false);

            Application.Run(new Form1());

        }

    }






1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users