skip to Main Content

I’m writing a program in Python that looks at an XML file that I get from an API and should return a list of users’ initials to a list for later use. My XML file looks like this with about 60 users:

<ArrayOfuser xmlns="WebsiteWhereDataComesFrom.com" xmlns:i="http://www.w3.org/2001/XMLSchema-instance">
    <user>
        <active>true</active>
        <datelastlogin>8/21/2019 9:16:30 PM</datelastlogin>
        <dept>3</dept>
        <email>useremail</email>
        <firstname>userfirstname</firstname>
        <lastname>userlastname</lastname>
        <lastupdated>2/6/2019 11:10:29 PM</lastupdated>
        <lastupdatedby>lastupdateduserinitials</lastupdatedby>
        <loginemail>userloginemail</loginemail>
        <phone1>userphone</phone1>
        <phone2/>
        <rep>userinitials1</rep>
    </user>
    <user>
        <active>true</active>
        <datelastlogin>12/1/2022 3:31:25 PM</datelastlogin>
        <dept>5</dept>
        <email>useremail</email>
        <firstname>userfirstname</firstname>
        <lastname>userlastname</lastname>
        <lastupdated>4/8/2020 3:02:08 PM</lastupdated>
        <lastupdatedby>lastupdateduserinitials</lastupdatedby>
        <loginemail>userloginemail</loginemail>
        <phone1>userphone</phone1>
        <phone2/>
        <rep>userinitials2</rep>
    </user>
...
...
...
</ArrayOfuser>

I’m trying to use an XML parser to return the text in the <rep> tag for each user to a list. I would also love to have it sorted by date of last login, but that’s not something I need and I’ll just alphabetize the list if sorting by date overcomplicates this process.

The code below shows my attempt at just printing the data without saving it to a list, but the output is unexpected as shown below as well.
Code I tried:

#load file
activeusers = etree.parse("activeusers.xml")

#declare namespaces
ns = {'xx': 'http://schemas.datacontract.org/2004/07/IQWebAPI.Users'}

#locate rep tag and print (saving to list once printing shows expected output)
targets = activeusers.xpath('//xx:user[xx:rep]',namespaces=ns)
for target in targets:
    print(target.attrib)

Output:

{}
{}

I’m expecting the output to look like the below codeblock. Once it looks something like that I should be able to change the print statement to instead save to a list.

{userinitials1}
{userinitials2}

I think my issue comes from what’s inside my print statement with printing the attribute. I tried this with variations of target.getparent() with keys(), items(), and get() as well and they all seem to show the same empty output when printed.

EDIT: I found a post from someone with a similar problem that had been solved and the solution was to use this code but I changed filenames to suit my need:

root = (etree.parse("activeusers.xml"))
values = [s.find('rep').text for s in root.findall('.//user') if s.find('rep') is not None]
print(values)

Again, the expected output was a populated list but when printed the list is empty. I think now my issue may have to do with the fact that my document contains namespaces. For my use, I may just delete them since I don’t think these will end up being required so please correct me if namespaces are more important than I realize.

SECOND EDIT: I also realized the API can send me this data in a JSON format and not just XML so that file would look like the below codeblock. Any solution that can append the text in the "rep" child of each user to a list in JSON format or XML is perfect and would be greatly appreciated since once I have this list, I will not need to use the XML or JSON file for any other use.

[
    {
        "active": true,
        "datelastlogin": "8/21/2019 9:16:30 PM",
        "dept": 3,
        "email": "useremail",
        "firstname": "userfirstname",
        "lastname": "userlastname",
        "lastupdated": "2/6/2019 11:10:29 PM",
        "lastupdatedby": "lastupdateduserinitials",
        "loginemail": "userloginemail",
        "phone1": "userphone",
        "phone2": "",
        "rep": "userinitials1"
    },
    {
        "active": true,
        "datelastlogin": "12/1/2022 3:31:25 PM",
        "dept": 5,
        "email": "useremail",
        "firstname": "userfirstname",
        "lastname": "userlastname",
        "lastupdated": "4/8/2020 3:02:08 PM",
        "lastupdatedby": "lastupdateduserinitials",
        "loginemail": "userloginemail",
        "phone1": "userphone",
        "phone2": "",
        "rep": "userinitials2"
    }
]

3

Answers


  1. As this is xml with namespace, you can have like

    import xml.etree.ElementTree as ET
    root = ET.fromstring(xml_in_qes)
    my_ns = {'root': 'WebsiteWhereDataComesFrom.com'}
    myUser=[]
    for eachUser in root.findall('root:user',my_ns):
        rep=eachUser.find("root:rep",my_ns)
        print(rep.text)
        myUser.append(rep.text)
    

    note: xml_in_qes is the XML attached in this question.

    ('root:user',my_ns): search user in my_ns which has key root i.e WebsiteWhereDataComesFrom.com

    Login or Signup to reply.
  2. XML data implementation:

    import xml.etree.ElementTree as ET
    xmlstring = '''
    <ArrayOfuser>
        <user>
            <active>true</active>
            <datelastlogin>8/21/2019 9:16:30 PM</datelastlogin>
            <dept>3</dept>
            <email>useremail</email>
            <firstname>userfirstname</firstname>
            <lastname>userlastname</lastname>
            <lastupdated>2/6/2019 11:10:29 PM</lastupdated>
            <lastupdatedby>lastupdateduserinitials</lastupdatedby>
            <loginemail>userloginemail</loginemail>
            <phone1>userphone</phone1>
            <phone2/>
            <rep>userinitials1</rep>
        </user>
        <user>
            <active>true</active>
            <datelastlogin>8/21/2019 9:16:30 PM</datelastlogin>
            <dept>3</dept>
            <email>useremail</email>
            <firstname>userfirstname</firstname>
            <lastname>userlastname</lastname>
            <lastupdated>2/6/2019 11:10:29 PM</lastupdated>
            <lastupdatedby>lastupdateduserinitials</lastupdatedby>
            <loginemail>userloginemail</loginemail>
            <phone1>userphone</phone1>
            <phone2/>
            <rep>userinitials2</rep>
        </user>
        <user>
            <active>true</active>
            <datelastlogin>8/21/2019 9:16:30 PM</datelastlogin>
            <dept>3</dept>
            <email>useremail</email>
            <firstname>userfirstname</firstname>
            <lastname>userlastname</lastname>
            <lastupdated>2/6/2019 11:10:29 PM</lastupdated>
            <lastupdatedby>lastupdateduserinitials</lastupdatedby>
            <loginemail>userloginemail</loginemail>
            <phone1>userphone</phone1>
            <phone2/>
            <rep>userinitials3</rep>
        </user>
    </ArrayOfuser>
    '''
    
    user_array = ET.fromstring(xmlstring)
    
    replist = []
    for users in user_array.findall('user'):
        replist.append((users.find('rep').text))
    
    print(replist)
    

    Output:

    ['userinitials1', 'userinitials2', 'userinitials3']
    

    JSON data implementation:

    userlist = [
        {
            "active": "true",
            "datelastlogin": "8/21/2019 9:16:30 PM",
            "dept": 3,
            "email": "useremail",
            "firstname": "userfirstname",
            "lastname": "userlastname",
            "lastupdated": "2/6/2019 11:10:29 PM",
            "lastupdatedby": "lastupdateduserinitials",
            "loginemail": "userloginemail",
            "phone1": "userphone",
            "phone2": "",
            "rep": "userinitials1"
        },
        {
            "active": "true",
            "datelastlogin": "12/1/2022 3:31:25 PM",
            "dept": 5,
            "email": "useremail",
            "firstname": "userfirstname",
            "lastname": "userlastname",
            "lastupdated": "4/8/2020 3:02:08 PM",
            "lastupdatedby": "lastupdateduserinitials",
            "loginemail": "userloginemail",
            "phone1": "userphone",
            "phone2": "",
            "rep": "userinitials2"
        },
            {
            "active": "true",
            "datelastlogin": "12/1/2022 3:31:25 PM",
            "dept": 5,
            "email": "useremail",
            "firstname": "userfirstname",
            "lastname": "userlastname",
            "lastupdated": "4/8/2020 3:02:08 PM",
            "lastupdatedby": "lastupdateduserinitials",
            "loginemail": "userloginemail",
            "phone1": "userphone",
            "phone2": "",
            "rep": "userinitials3"
        }
    ]
    
    replist = []
    for user in userlist:
        replist.append(user["rep"])
    
    print(replist)
    

    Output:

    ['userinitials1', 'userinitials2', 'userinitials3']
    
    Login or Signup to reply.
  3. If you like a sorted tabel of users who have last logged on you can put the parsed values into pandas:

    import xml.etree.ElementTree as ET
    import pandas as pd
    
    tree = ET.parse("activeusers.xml")
    root = tree.getroot()
    
    namespaces = {"xmlns":"WebsiteWhereDataComesFrom.com" , "xmlns:i":"http://www.w3.org/2001/XMLSchema-instance"}
    
    columns =["rep", "datelastlogin"]
    login = []
    usr = []
    for user in root.findall("xmlns:user", namespaces):
        for lastlog in user.findall("xmlns:datelastlogin", namespaces):
            login.append(lastlog.text)
            
        for activ in user.findall("xmlns:rep", namespaces):
            usr.append(activ.text)
            
    data = list(zip(usr, login))
    
    
    df = pd.DataFrame(data, columns=columns)
    df["datelastlogin"] = df["datelastlogin"].astype('datetime64[ns]')
    df = df.sort_values(by='datelastlogin', ascending = False)
    print(df.to_string())
    

    Output:

                 rep       datelastlogin
    1  userinitials2 2022-12-01 15:31:25
    0  userinitials1 2019-08-21 21:16:30
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search