Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

Json – How can I identify two substrings from a list that make up one unique substring from a larger list into a new dictionary?

kenai
December 26, 2022
284 views
0 votes
2 Answers

I have a large list of unique strings (~1000), for instance: [bbbhbbbh, jjjhhssa, eeeffus,…]

And a smaller list of sub-string pairs (~50) that make up each of these unique strings: [bbbh, jjjh, hssa, eeef, fus,…]

I want to create a function that takes the large unique string list (~1000) as an argument and returns a dictionary with the unique string and the corresponding values of its two unique sub-strings.

For example:

result = {'bbbhbbbh': 'bbbh/bbbh', 
            'jjjhhssa': 'jjjh/hssa', 
            'eeeffus': 'eeef/fus',...}

I’ve tried with a for loop but I am not able to print the unique strings with duplicates, I am wondering if there is a more concise way with list comprehension along with returning the two corresponding values that make up the unique string? I only want to use the json package at this point and solve this without importing any new packages. Thank you for any help with this.

My current loop and output:

result = []    

for string in pair_list:
    matches = []
    for substring in sub_list:
        if substring in string:
            matches.append(substring)
    if matches:
        result.append(matches)

print(result)

[['bbbh'], ['jjjh', 'hssa'], ['eeef', 'fus'],...

Answers

AS per your output format I think you are expecting a dictionary kind of object.
Where Long string is Key and all matched sub string is value. Just modifying your code, I added a dict object to store the result and append the sub string to the values. Since we also need to get repeated sub string we can use count method and string.

Code:

pair_list = ["bbbhbbbh", "jjjhhssa", "eeeffus"]
sub_list = ["bbbh", "jjjh", "hssa", "eeef", "fus"]
pair_mapping_result = dict() 

for pair_string in pair_list:
    for sub_string in sub_list:
        if sub_string in pair_string:
            matched_sub_pairs = "/".join([sub_string] * pair_string.count(sub_string))
            pair_mapping_result[pair_string] = (r"{}/{}".format(pair_mapping_result[pair_string], 
                                                               matched_sub_pairs) 
                                                if pair_mapping_result.get(pair_string) else matched_sub_pairs)

print(pair_mapping_result)

Output

{'bbbhbbbh': 'bbbh/bbbh', 'jjjhhssa': 'jjjh/hssa', 'eeeffus': 'eeef/fus'}

We can do it using a dict comprehension

Code

{pair_string: "/".join(["/".join([sub_string] * pair_string.count(sub_string)) 
                        for sub_string in sub_list 
                        if sub_string in pair_string]) 
                        for pair_string in pair_list}

- SatGuruTechnologies
- December 26, 2022 at 12:22 pm
- 0 votes
0
Previously in the code it was not searching for its duplicate in the list items. Once it gets the desired substring in the main string it passes onto another. But now its finding multiple duplicate through reggex finditer method.
```
pair_list= ['bbbhbbbh', 'jjjhhssa', 'eeeffus', 'aaaabbbh', 'ccccdddd','eeefff']

sub_list = ['bbbh', 'jjjh', 'hssa', 'eeef', 'fus']

import re
result = []    
for string in pair_list:
    matches = []
    for substring in sub_list:
        for duplicate in re.finditer(substring, string):
            matches.append(substring)
    if matches:
        result.append(matches)

print(result)
```
I hope this might help you.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.