I have a large list of unique strings (~1000), for instance: [bbbhbbbh, jjjhhssa, eeeffus,…]
And a smaller list of sub-string pairs (~50) that make up each of these unique strings: [bbbh, jjjh, hssa, eeef, fus,…]
I want to create a function that takes the large unique string list (~1000) as an argument and returns a dictionary with the unique string and the corresponding values of its two unique sub-strings.
For example:
result = {'bbbhbbbh': 'bbbh/bbbh',
'jjjhhssa': 'jjjh/hssa',
'eeeffus': 'eeef/fus',...}
I’ve tried with a for loop but I am not able to print the unique strings with duplicates, I am wondering if there is a more concise way with list comprehension along with returning the two corresponding values that make up the unique string? I only want to use the json package at this point and solve this without importing any new packages. Thank you for any help with this.
My current loop and output:
result = []
for string in pair_list:
matches = []
for substring in sub_list:
if substring in string:
matches.append(substring)
if matches:
result.append(matches)
print(result)
[['bbbh'], ['jjjh', 'hssa'], ['eeef', 'fus'],...
2
Answers
AS per your output format I think you are expecting a dictionary kind of object.
Where Long string is Key and all matched sub string is value. Just modifying your code, I added a dict object to store the result and append the sub string to the values. Since we also need to get repeated sub string we can use count method and string.
Code:
Output
We can do it using a dict comprehension
Code
Previously in the code it was not searching for its duplicate in the list items. Once it gets the desired substring in the main string it passes onto another. But now its finding multiple duplicate through reggex finditer method.
I hope this might help you.