skip to Main Content

I have a XML document being loaded into a webpage representing a single client looking like this:

 <!--?xml version="1.0" encoding="UTF-8" ?-->
    <html>
    <head></head>
    <body>
        <document>
            <Name>Pablo</Name>
            <Surname>Salamanca</Surename>
            <Age>68</Age>
            <Gender>M</Gender>
        </document>
    </body> 
 </html>

Resulting page show data and nothing more like this:

Pablo Salamnca 68 M

Things to keep in mind before answering:

  • There are nigh 15 different templates for XML structure. Meaning there is no standardization. One person may have a different order of information presented and other may have different information presented entirely. The structure of XML is not changed, only the TAG values and their order.

  • Each client has depending on a template 68 – 278 (exactly) attributes/elements in their respective XML.

I need a List<string> into which I would parse the TAG values of the person only. Meaning I would have a list like this:

List[0] = "Name"

List[1] = "Surname"

List[2] = "Age"

List[3] = "Gender"

...

This is the code which I have ready:

 var url = _urlMaker.GetUrl();
 WebClient client = new WebClient();
 client.Encoding = System.Text.Encoding.GetEncoding("utf-8");
 string xml = client.DownloadString(url);

 int n = 2;
 List<string> xmlSplit = xml
     .Split(Environment.NewLine.ToCharArray())
     .Skip(n)
     .ToList();

 xml = string.Join(Environment.NewLine, xmlSplit);

Here I need a universal way of parsing the TAGs (or tags’ values) into a List of strings please. The best case scenario is parsing only tag values surrounding the person so no words like "document" "html" etc. get parsed, but I can probably work with the field which has these values. I did try, among several other ways, to work with document by splitting it into nodes but that has went amiss.

Side note:

The reason for the whole splitting string into a list of strings and removing 1st two lines is due to an issue brought up to me by another engineer. The document from which I am parsing is apparently an HTML and Visual Studio refuses to parse it entirely. So I have forced it to look like XML which works and now in a different part of my code the parsing of values is successful.

I could just use XmlDocument.GetElements like this:

...code...
...loading data into something like a XML document....
        XmlNodeList elemList = doc.GetElementsByTagName("title");

But I cannot since I have no idea with what to replace "title".

2

Answers


  1. I would use Xml Linq and put results into a DataTable instead of a List. The Datatable contains both the name of the tag and the value instead of just the value.

    using System;
    using System.Linq;
    using System.Collections.Generic;
    using System.Data;
    using System.Xml;
    using System.Xml.Linq;
    
    namespace ConsoleApp10
    {
    
        class Program
        {
            const string FILENAME = @"c:temptest.xml";
            static void Main(string[] args)
            {
                DataTable dt = new DataTable();
                XDocument doc = XDocument.Load(FILENAME);
                XElement document = doc.Descendants("document").FirstOrDefault();
                DataRow newRow = dt.Rows.Add();
                foreach(XElement element in document.Elements())
                {
                    string name = element.Name.LocalName;
                    string value = (string)element;
                    if(!dt.Columns.Contains(name))
                    {
                        dt.Columns.Add(name);
                    }
                    newRow[name] = value;
                }
            }
        }
    }
    
    Login or Signup to reply.
  2. You can try the following method to get dynamic person’s data as a string.

    It is using SQL Server, and its XML and XQuery functionality.

    The XQuery below is retrieving all XML elements values under the <document> XML element regardless of their dynamic nature.

    You can create a one line stored procedure in SQL Server that accepts a single XML data type parameter, and call it from c#.

    SQL

    DECLARE @xml XML =
    N'<!--?xml version="1.0" encoding="UTF-8" ?-->
    <html>
        <head></head>
        <body>
            <document>
                <Name>Pablo</Name>
                <Surname>Salamanca</Surname>
                <Age>68</Age>
                <Gender>M</Gender>
            </document>
        </body>
    </html>';
    
    SELECT @xml.query('data(//document/*)').value('.', 'NVARCHAR(255)') AS result;
    
    SELECT c.value('(local-name(.))[1]', 'SYSNAME') AS column_list
    FROM @xml.nodes('//document/*') AS t(c);
    

    Output #1

    Pablo Salamanca 68 M
    

    Output #2

    column_list
    Name
    Surname
    Age
    Gender
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search