skip to Main Content

I want to parse and process my dataframe data.

I tried using join, assign … etc.
I succeed parsing ‘allowed’ column with below code:

allowed_expanded = df1.allowed.apply(lambda x:pd.Series(x))
allowed_expanded.columns = ['{}.{}'.format('allowed',i) for i in allowed_expanded]

and the result:

# allowed_expanded

                                             allowed.0                  allowed.1   allowed.2
0           {'IPProtocol': 'tcp', 'ports': ['53']}                        NaN         NaN
1   {'IPProtocol': 'tcp', 'ports': ['22', '3389']}                    NaN         NaN
2                               {'IPProtocol': 'icmp'}     {'IPProtocol': 'sctp'}         NaN
3                            {'IPProtocol': 'all'}                        NaN         NaN

but this is not what I want.

what should do I do ?

now my data looks:

# print(df)
          network                                            allowed
0           vpc-1           [{'IPProtocol': 'tcp', 'ports': ['53']}]
1           vpc-1   [{'IPProtocol': 'tcp', 'ports': ['22', '3389']}]
2           vpc-1   [{'IPProtocol': 'icmp'}, {'IPProtocol': 'sctp'}]
3           vpc-1                            [{'IPProtocol': 'all'}]
...

and…
what I want:

# print(df)
          network           allowed.IPProtocol    allowed.ports
0           vpc-1                          tcp               53
1           vpc-1                          tcp         22, 3389
2           vpc-1                   icmp, sctp                -
3           vpc-1                          all                -
...

3

Answers


  1. def func(row):
        IPProtocol = []
        ports = []
        for item in row:
            IPProtocol.append(item.get('IPProtocol', None))
            ports.append(item.get('ports', None))
        return pd.Series([IPProtocol, ports])
    
    df[['allowed.IPProtocol', 'allowed.ports']] = df['allowed'].apply(lambda x: func(x))
    

    I hope it helps!

    Login or Signup to reply.
  2. Can you try this:

    import numpy as np
    
    df['allowed.IPProtocol']=df['allowed'].apply(lambda x: ', '.join([i['IPProtocol'] for i in x] if pd.notnull(x) else np.nan))
    df['allowed.ports']=df['allowed'].apply(lambda x: ', '.join([', '.join(i['ports']) if 'ports' in list(i.keys()) else 'nan' for i in x] if pd.notnull(x) else np.nan))
    

    Output:

    |    | network   | allowed                                          | allowed.IPProtocol   | allowed.ports   |
    |---:|:----------|:-------------------------------------------------|:---------------------|:----------------|
    |  0 | vpc-1     | [{'IPProtocol': 'tcp', 'ports': ['53']}]         | tcp                  | 53              |
    |  1 | vpc-1     | [{'IPProtocol': 'tcp', 'ports': ['22', '3389']}] | tcp                  | 22, 3389        |
    |  2 | vpc-1     | [{'IPProtocol': 'icmp'}, {'IPProtocol': 'sctp'}] | icmp, sctp           | nan, nan        |
    |  3 | vpc-1     | [{'IPProtocol': 'all'}]                          | all                  | nan             |
    
    Login or Signup to reply.
  3. Example

    data = {'network': {0: 'vpc-1',   1: 'vpc-1',   2: 'vpc-1',   3: 'vpc-1'},  'allowed': {0: "[{'IPProtocol': 'tcp', 'ports': ['53']}]",   1: "[{'IPProtocol': 'tcp', 'ports': ['22', '3389']}]",   2: "[{'IPProtocol': 'icmp'}, {'IPProtocol': 'sctp'}]",   3: "[{'IPProtocol': 'all'}]"   }}
    df = pd.DataFrame(data)
    

    df

        network allowed
    0   vpc-1   [{'IPProtocol': 'tcp', 'ports': ['53']}]
    1   vpc-1   [{'IPProtocol': 'tcp', 'ports': ['22', '3389']}]
    2   vpc-1   [{'IPProtocol': 'icmp'}, {'IPProtocol': 'sctp'}]
    3   vpc-1   [{'IPProtocol': 'all'}]
    

    Step1

    eval and explode and so on..

    df1 = df['allowed'].apply(eval).explode().apply(pd.Series).explode('ports')
    

    df1

        IPProtocol  ports
    0   tcp         53
    1   tcp         22
    1   tcp         3389
    2   icmp        NaN
    2   sctp        NaN
    3   all         NaN
    

    Step2

    groupby df1 by index

    df2 = df1.fillna('-').groupby(level=0).agg(lambda x: ','.join(x.unique()))
    

    df2

        IPProtocol  ports
    0   tcp         53
    1   tcp         22,3389
    2   icmp,sctp   -
    3   all         -
    

    join df and df2

    out = df[['network']].join(df2.add_prefix('allowed.'))
    

    out

        network allowed.IPProtocol  allowed.ports
    0   vpc-1   tcp                 53
    1   vpc-1   tcp                 22,3389
    2   vpc-1   icmp,sctp           -
    3   vpc-1   all                 -
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search