skip to Main Content

I am new learner in R Programming,i have sample xml file as shown below

<Attribute ID="GroupSEO" MultiValued="false" ProductMode="Property" FullTextIndexed="false" ExternallyMaintained="false" Derived="false" Mandatory="false">
  <Name>Group SEO Name</Name>
  <Validation BaseType="text" MinValue="" MaxValue="" MaxLength="1024" InputMask=""/>
  <DimensionLink DimensionID="Language"/>
  <MetaData>
    <Value AttributeID="Attribute-Group-Order">1</Value>
    <Value AttributeID="Enterprise-Label">NAV-GR-SEONAME</Value>
    <Value ID="#NAMED" AttributeID="Attribute-Group-Name">#NAMED</Value>
    <Value AttributeID="Enterprise-Description">Navigation Group SEO Name</Value>
    <Value AttributeID="Attribute-Order">3</Value>
  </MetaData>
  <AttributeGroupLink AttributeGroupID="HTCategorizationsNavigation"/>
  <AttributeGroupLink AttributeGroupID="HTDigitalServicesModifyClassifications"/>
  <UserTypeLink UserTypeID="ENT-Group"/>
  <UserTypeLink UserTypeID="NAVGRP"/>
  <UserTypeLink UserTypeID="ENT-SubCategory"/>
  <UserTypeLink UserTypeID="ENT-Category"/>

i want to convert this into data frame using R language.My expected output is

##   FullTextIndexed  MultiValued  ProductMode  ExternallyMaintained  Derived  Mandatory  Attribute-Group-Order  Enterprise-Description      UserTypeID 
1         false         false       Property            false          false    false             1     Navigation group seo name   ENT-Group,ENT-Category,..

i have searched the internet but couldn’t find a solution to my problem.
I got a code from internet

library("XML")
library("methods")
setwd("E:/Project")
xmldata<-xmlToDataFrame("Sample.xml")
print(xmldata)

but when i execute the code i get the below error

Error in `[<-.data.frame`(`*tmp*`, i, names(nodes[[i]]), value = c(Name = "You YoutubeLink7 (URL)",  : 
duplicate subscripts for columns
In addition: Warning message:
In names(x) == varNames :
longer object length is not a multiple of shorter object length
> print(xmldata)
Error in print(xmldata) : object 'xmldata' not found

could anyone help me know about what the error means and also a solution to my problem,sorry for the formatting issue.
Thanks in advance for the solution.

Thanks

2

Answers


  1. With a correct xml data (attribute tag at the end of the file).

    <?xml version="1.0" encoding="UTF-8"?>
    <Attribute ID="GroupSEO" MultiValued="false" ProductMode="Property" FullTextIndexed="false" ExternallyMaintained="false" Derived="false" Mandatory="false">
      <Name>Group SEO Name</Name>
      <Validation BaseType="text" MinValue="" MaxValue="" MaxLength="1024" InputMask=""/>
      <DimensionLink DimensionID="Language"/>
      <MetaData>
        <Value AttributeID="Attribute-Group-Order">1</Value>
        <Value AttributeID="Enterprise-Label">NAV-GR-SEONAME</Value>
        <Value ID="#NAMED" AttributeID="Attribute-Group-Name">#NAMED</Value>
        <Value AttributeID="Enterprise-Description">Navigation Group SEO Name</Value>
        <Value AttributeID="Attribute-Order">3</Value>
      </MetaData>
      <AttributeGroupLink AttributeGroupID="HTCategorizationsNavigation"/>
      <AttributeGroupLink AttributeGroupID="HTDigitalServicesModifyClassifications"/>
      <UserTypeLink UserTypeID="ENT-Group"/>
      <UserTypeLink UserTypeID="NAVGRP"/>
      <UserTypeLink UserTypeID="ENT-SubCategory"/>
      <UserTypeLink UserTypeID="ENT-Category"/>
    </Attribute>
    

    Then we use xpath to get all we need. Change the path to your xml file in the htmlParse step.

    library(XML)
    data=htmlParse("C:/Users/.../yourxmlfile.xml")
    fulltextindexed=xpathSApply(data,"normalize-space(//attribute/@fulltextindexed)")
    multivalued=xpathSApply(data,"normalize-space(//attribute/@multivalued)")
    productmode=xpathSApply(data,"normalize-space(//attribute/@productmode)")
    externallymaintained=xpathSApply(data,"normalize-space(//attribute/@externallymaintained)")
    derived=xpathSApply(data,"normalize-space(//attribute/@derived)")
    mandatory=xpathSApply(data,"normalize-space(//attribute/@mandatory)")
    attribute.group.order=xpathSApply(data,"//value[@attributeid='Attribute-Group-Order']",xmlValue)
    enterprise.description=xpathSApply(data,"//value[@attributeid='Enterprise-Description']",xmlValue)
    user.type.id=paste(xpathSApply(data,"//usertypelink/@usertypeid"),collapse = "|")
    df=data.frame(fulltextindexed,multivalued,productmode,externallymaintained,derived,mandatory,attribute.group.order,enterprise.description,user.type.id)
    

    Result :

    enter image description here

    Login or Signup to reply.
  2. Using tidyverse and xml2

    DATA

    data <- read_xml('<Attribute ID="GroupSEO" MultiValued="false" ProductMode="Property" FullTextIndexed="false" ExternallyMaintained="false" Derived="false" Mandatory="false">
      <Name>Group SEO Name</Name>
      <Validation BaseType="text" MinValue="" MaxValue="" MaxLength="1024" InputMask=""/>
      <DimensionLink DimensionID="Language"/>
      <MetaData>
        <Value AttributeID="Attribute-Group-Order">1</Value>
        <Value AttributeID="Enterprise-Label">NAV-GR-SEONAME</Value>
        <Value ID="#NAMED" AttributeID="Attribute-Group-Name">#NAMED</Value>
        <Value AttributeID="Enterprise-Description">Navigation Group SEO Name</Value>
        <Value AttributeID="Attribute-Order">3</Value>
      </MetaData>
      <AttributeGroupLink AttributeGroupID="HTCategorizationsNavigation"/>
      <AttributeGroupLink AttributeGroupID="HTDigitalServicesModifyClassifications"/>
      <UserTypeLink UserTypeID="ENT-Group"/>
      <UserTypeLink UserTypeID="NAVGRP"/>
      <UserTypeLink UserTypeID="ENT-SubCategory"/>
      <UserTypeLink UserTypeID="ENT-Category"/>
    </Attribute>')
    

    CODE

    #For attribute tag
    Attributes <- xml_find_all(data, "//Attribute")
    Attributes <- Attributes %>% 
            map(xml_attrs) %>%
            map_df(~as.list(.))
    
    #find AttributeID nodes
    nodes <- xml_find_all(data, "//Value")
    
    AGO <- nodes[xml_attr(nodes, "AttributeID")=="Attribute-Group-Order"]
    Attributes["Attribute-Group-Order"] <- xml_text(AGO)
    
    ED <- nodes[xml_attr(nodes, "AttributeID")=="Enterprise-Description"]
    Attributes["Enterprise-Description"] <- xml_text(ED)
    
    
    #UserTypelink tags
    UserTypeLink <- xml_find_all(data, "//UserTypeLink")
    UserTypeLink <- UserTypeLink %>% 
            map(xml_attrs) %>%
            map_df(~as.list(.)) %>%
            mutate(UserTypeID=map_chr(UserTypeID, ~toString(UserTypeID, .x))) %>%
            filter(row_number()==1)
    
    #Final output
    do.call("cbind", list(Attributes,UserTypeLink))
    
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search