Jump to content


Check out our Community Blogs

Register and join over 40,000 other developers!


Recent Status Updates

View All Updates

Photo
- - - - -

Xhtml class in C#???

asp.net xhtml

  • Please log in to reply
19 replies to this topic

#1 Tonchi

Tonchi

    Helping the world with programming

  • Expert Member
  • PipPipPipPipPipPipPip
  • 1249 posts
  • Location:Zagreb
  • Programming Language:C#, Others
  • Learning:C, C++, Python, JavaScript, Transact-SQL, Assembly

Posted 17 July 2011 - 03:38 PM

Is there any Xhtml class in C# which is not for asp.net??? I need it for WinForms
I need that class so I can read attribute values from some sites which is using Xhtml
  • 0

#2 Momerath

Momerath

    CC Addict

  • Advanced Member
  • PipPipPipPipPip
  • 282 posts
  • Programming Language:C, Java, C++, C#, PHP, (Visual) Basic, Python, JavaScript, Perl, Visual Basic .NET, Pascal, Ada, Assembly, Fortran, Scheme
  • Learning:Others

Posted 17 July 2011 - 08:44 PM

The XML classes should be able to deal with XHTML.
  • 0

#3 sam_coder

sam_coder

    CC Addict

  • Senior Member
  • PipPipPipPipPip
  • 380 posts

Posted 22 July 2011 - 06:25 AM

problem is not all XHtml pages are compliant.

try downloading the HtmlAgilityPack, its amazing. It will handle xhtml and even just plain html, in a DOM that is very similar to the Xml Dom. It even allows you to fire XPath queries against it.

I could provide an example when I get back to my workstation. But anyways, its available on codeplex.

Html Agility Pack
  • 0

#4 Tonchi

Tonchi

    Helping the world with programming

  • Expert Member
  • PipPipPipPipPipPipPip
  • 1249 posts
  • Location:Zagreb
  • Programming Language:C#, Others
  • Learning:C, C++, Python, JavaScript, Transact-SQL, Assembly

Posted 22 July 2011 - 09:06 AM

can you give some review how to use it (to sam_coder)...i want to read some value in html code from my program and show it inside datagridview control...i was never working with html inside c# so it any help would be great...
there is some value from hrportfolio.com | FONDOVI | Hrvatski otvoreni investicijski fondovi detaljno - prinosi fondova, grafi ki prikaz, usporedba fondova, kupnja udjela u fondu, pristup fondu | like KD Victoria...only values from 2 columns "Vrijednost" and "Promj.%"
can someone give me an example
  • 0

#5 sam_coder

sam_coder

    CC Addict

  • Senior Member
  • PipPipPipPipPip
  • 380 posts

Posted 22 July 2011 - 09:09 AM

I'll see what I can do tonchi.

What you're trying to do, is typically called 'scraping'.

It has some dangers associated with it. for instance, scraping content without permission might be against the end user policies, it's always better to ask permission.

It's also possible that the page could change, rendering your scraping code incompatable. XPath greatly midigates this risk mind you, generally keeping code changes to a minimum.

Anyways, that said, I'll see if I can whip something up simple.
  • 0

#6 sam_coder

sam_coder

    CC Addict

  • Senior Member
  • PipPipPipPipPip
  • 380 posts

Posted 23 July 2011 - 06:31 AM

Hey Tonchi,
yea, you will likely have to re-reference the HtmlAgilityPack, not sure if I packaged it or not.

I can't read your language, so I just called a couple of columns, data, data 2, etc

But you get the gist.

Attached File  ScreenScraping.zip   179.01KB   274 downloads

So, anyways, I haven't tried to make it look pretty or anything. I'm kinda busy. =)

What gets complicated about these documents, and this one in particular is all the embedded tables. tables and tables and tables.

So what I do, is I look for a spot in the document, and sync on that. Just download the page, and ignore everything up to that point. That allows me to throw much cleaner XPath queries against it.

the Html Agility Pack is really forgiving, so you can throw any malformed markup at all, I've never seen it not make sense of it. =)

enjoy!
  • 0

#7 sam_coder

sam_coder

    CC Addict

  • Senior Member
  • PipPipPipPipPip
  • 380 posts

Posted 23 July 2011 - 06:32 AM

oh, and the solution XML file, thats just the source from the page you wanted me to hit, You can ignore that, I just had it there to keep my head straight
  • 0

#8 Tonchi

Tonchi

    Helping the world with programming

  • Expert Member
  • PipPipPipPipPipPipPip
  • 1249 posts
  • Location:Zagreb
  • Programming Language:C#, Others
  • Learning:C, C++, Python, JavaScript, Transact-SQL, Assembly

Posted 23 July 2011 - 06:52 AM

tnx a lot...that was my first problem...
but i have a 2 more questions...first question is how did you fill the names of columns inside table (Name, Data, Data2, Data3)
and second question is how can i save datas from that table (as i can see it's not datagridview control)...after i save those data i want to load it from last save and when i click "upload" button that a new data change gets in new row, so previous data is still on that table
  • 0

#9 sam_coder

sam_coder

    CC Addict

  • Senior Member
  • PipPipPipPipPip
  • 380 posts

Posted 23 July 2011 - 07:12 AM

ok, well I used a list view control, and list view item has a constructor that takes a string array

in the Update Grid, you can see that I'm just appending each column of the table into the array.

A data grid view would work very similarly. Just make a data table with the appropriate columns, and then use an object array (make sure all columns are string type)

so then you end up with

dt.rows.add(new object[] { 
 node.SelectSingleNode("td[@class='colFond']/a").InnerText, //scrape the appropriate fields
 node.SelectSingleNode("td[@class='colDatum']").InnerText,
 node.SelectSingleNode("td[@class='colVrijednost']").InnerText,
 node.SelectSingleNode("td[@class='colValuta']").InnerText
}


The data grid view can be bound to that data table, using the DataSource property on the grid.

That's it, that would allow this to be shown
  • 0

#10 Tonchi

Tonchi

    Helping the world with programming

  • Expert Member
  • PipPipPipPipPipPipPip
  • 1249 posts
  • Location:Zagreb
  • Programming Language:C#, Others
  • Learning:C, C++, Python, JavaScript, Transact-SQL, Assembly

Posted 23 July 2011 - 09:48 AM

is there any property to allow me to copy those informations so i can paste it in word???
  • 0

#11 Tonchi

Tonchi

    Helping the world with programming

  • Expert Member
  • PipPipPipPipPipPipPip
  • 1249 posts
  • Location:Zagreb
  • Programming Language:C#, Others
  • Learning:C, C++, Python, JavaScript, Transact-SQL, Assembly

Posted 23 July 2011 - 10:37 AM

is this the way to copy text from listbox:
private void listBox1_MouseClick(object sender, MouseEventArgs e)
       {
           if (e.Button == MouseButtons.Right)
           {
               Clipboard.SetText(listBox1.Items[listBox1.SelectedIndex].ToString());
 
 
           }
       }
 
and can i paste it to word???
  • 0

#12 Tonchi

Tonchi

    Helping the world with programming

  • Expert Member
  • PipPipPipPipPipPipPip
  • 1249 posts
  • Location:Zagreb
  • Programming Language:C#, Others
  • Learning:C, C++, Python, JavaScript, Transact-SQL, Assembly

Posted 23 July 2011 - 01:07 PM

and what's with the scrapeResult??? where did you defined it??? i copied every single code from your project into mine and this is error from VS:

Error 2 The name 'scrapeResults' does not exist in the current context c:\documents and settings\antonio\my documents\visual studio 2010\Projects\WindowsFormsApplication2\WindowsFormsApplication2\Form1.cs 66 13 WindowsFormsApplication2


from those lines:

...
doc.LoadHtml(content); //Load the content into the structure

            scrapeResults.Items.Clear(); //clear the table

            foreach (HtmlNode node in doc.DocumentNode.SelectNodes(
                "/div/div/table[@id='tabelaTec1']/tbody/tr"))
...
and

...
{
                //add a new listview item for each item in the table
                scrapeResults.Items.Add(new ListViewItem(new string[] {
                     node.SelectSingleNode("td[@class='colFond']/a").InnerText, //scrape the appropriate fields
                     node.SelectSingleNode("td[@class='colDatum']").InnerText;
...

and what's with those error (i can't fix it)

Error 1 The type or namespace name 'Form1' could not be found (are you missing a using directive or an assembly reference?) c:\documents and settings\antonio\my documents\visual studio 2010\Projects\WindowsFormsApplication2\WindowsFormsApplication2\Program.cs 18 33 WindowsFormsApplication2


it's from here:

static void Main()
        {
            Application.EnableVisualStyles();
            Application.SetCompatibleTextRenderingDefault(false);
            Application.Run(new Form1());
        }
    }

  • 0





Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download