I have two arrays of hashes which are related by a common set of keys:
Array 1 is:
[
{0=>"pmet-add-install-module-timings.patch"},
{1=>"pmet-change-sample-data-load-order.patch"},
{2=>"pmet-configurable-recurring.patch"},
{3=>"pmet-consumers-run-staggered-by-sleep.patch"},
{4=>"pmet-dynamic-block-segment-display.patch"},
{5=>"pmet-fix-admin-label-word-breaking.patch"},
{6=>"pmet-fix-invalid-module-dependencies.patch"},
{7=>"pmet-fix-invalid-sample-data-module-dependencies.patch"},
{8=>"pmet-fix-module-loader-algorithm.patch"},
{9=>"pmet-fix-sample-data-code-generator.patch"},
{10=>"pmet-remove-id-requirement-from-layout-update-file.patch"},
{11=>"pmet-specify-store-id-for-order.patch"},
{12=>"pmet-staging-preview-js-fix.patch"},
{13=>"pmet-stop-catching-sample-data-errrors-during-install.patch"},
{14=>"pmet-visitor-segment.patch"}
]
Array 2 is:
[
{0=>"magento2-base"},
{1=>"magento/module-sample-data"},
{2=>"magento/module-configurable-sample-data"},
{3=>"magento/module-message-queue"},
{4=>"magento/module-banner"},
{5=>"magento/theme-adminhtml-backend"},
{6=>"magento/module-staging"},
{7=>"magento/module-gift-registry-sample-data"},
{8=>"magento2-base"},
{9=>"magento/module-downloadable-sample-data"},
{10=>"magento/module-catalog"},
{11=>"magento/module-sales-sample-data"},
{12=>"magento/module-staging"},
{13=>"magento2-base"},
{14=>"magento/module-customer"}
]
The hashes in these arrays have the same set of indexes, and the second array has duplicate values in keys 0
, 8
, and 13
as well as in 6
and 12
.
My goal is to stitch the values from these two data sets together into a set of nested hashes. Wherever there is a duplicated value in Array 2, I need to collect its associated values from Array 1 and include them in a nested hash.
For example, take the magento2-base
values from Array 2 and the key-associated values from Array 1. The hash structure in Ruby would look like:
hash = {
"magento2-base" => [
{0 => "m2-hotfixes/pmet-add-install-module-timings.patch"},
{8 => "m2-hotfixes/pmet-fix-module-loader-algorithm.patch"},
{13 => "m2-hotfixes/pmet-stop-catching-sample-data-errrors-during-install.patch"}
]
}
The same would hold true for any other duplicated values from Array 2, so, for example, magento/module-staging
would be:
hash = {
"magento/module-staging" => [
{6 => "pmet-fix-invalid-module-dependencies.patch"},
{12 => "pmet-staging-preview-js-fix.patch"}
]
}
A larger excerpt of the resultant hash which combines these needs together would look like this:
hash = {
"magento2-base" =>
[
{0 => "m2-hotfixes/pmet-add-install-module-timings.patch"},
{8 => "m2-hotfixes/pmet-fix-module-loader-algorithm.patch"},
{13 => "m2-hotfixes/pmet-stop-catching-sample-data-errrors-during-install.patch"}
],
"magento/module-sample-data" =>
{0 => "pmet-change-sample-data-load-order.patch"},
"magento/module-configurable-sample-data" =>
{2 => "pmet-configurable-recurring.patch"},
"magento/module-message-queue" =>
{3 => "pmet-consumers-run-staggered-by-sleep.patch"}
"magento/module-staging" =>
[
{6 => "pmet-fix-invalid-module-dependencies.patch"},
{12 => "pmet-staging-preview-js-fix.patch"}
],
...
}
I used a nested loop which combines both arrays to link up the keys, and attempted to pull out the duplicates from Array 2, and was thinking I’d need to maintain both an array of the duplicate values from array 2 as well as an array of their associated values from Array 1. Then, I’d use some array merging magic to put it all back together.
Here’s what I have:
found_modules_array = []
duplicate_modules_array = []
duplicate_module_hash = {}
file_collection_array = []
modules_array.each do |module_hash|
module_hash.each do |module_hash_key, module_hash_value|
files_array.each do |file_hash|
file_hash.each do |file_hash_key, file_hash_value|
if module_hash_key == file_hash_key
if found_modules_array.include?(module_hash_value)
duplicate_module_hash = {
module_hash_key => module_hash_value
}
duplicate_modules_array << duplicate_module_hash
end
found_modules_array << module_hash_value
end
end
end
end
end
In this code, files_array
is Array 1 and modules_array
is Array 2. found_modules_array
is a bucket to hold any duplicates before pushing them into a duplicate_module_hash
which would then be pushed into the duplicates_modules_array
.
This solution:
- Doesn’t work
- Doesn’t take advantage of the power of Ruby
- Isn’t performant
EDIT
The path to the above data structure is explained in full detail in the following post: Using array values as hash keys to create nested hashes in Ruby
I’ll summarize it below:
I have a directory of files. The majority of them are .patch
files, although some of them are not. For each patch file, I need to scan the first line which is always a string and extract a portion of that line. With a combination of each file’s name, that portion of each first line, and a unique identifier for each file, I need to create a hash which I will then convert to json and write to a file.
Here are examples:
Directory of Files:
|__ .gitkeep
|__ pmet-add-install-module-timings.patch
|__ pmet-change-sample-data-load-order.patch
First Line Examples:
File Name: `pmet-add-install-module-timings.patch`
First Line: `diff --git a/setup/src/Magento/Setup/Model/Installer.php b/setup/src/Magento/Setup/Model/Installer.php`
File Name: `pmet-change-sample-data-load-order.patch`
First Line: `diff --git a/vendor/magento/module-sample-data/etc/module.xml b/vendor/magento/module-sample-data/etc/module.xml`
File Name: `pmet-stop-catching-sample-data-errrors-during-install.patch`
First Line: `diff --git a/vendor/magento/framework/Setup/SampleData/Executor.php b/vendor/magento/framework/Setup/SampleData/Executor.php`
File Name: `pmet-fix-admin-label-word-breaking.patch`
First Line: `diff --git a/vendor/magento/theme-adminhtml-backend/web/css/styles-old.less b/vendor/magento/theme-adminhtml-backend/web/css/styles-old.less`
Example Json File:
{
"patches": {
"magento/magento2-base": {
"Patch 1": "m2-hotfixes/pmet-add-install-module-timings.patch"
},
"magento/module-sample-data": {
"Patch 2": "m2-hotfixes/pmet-change-sample-data-load-order.patch"
},
"magento/theme-adminhtml-backend": {
"Patch 3": "m2-hotfixes/pmet-fix-admin-label-word-breaking.patch"
},
"magento/framework": {
"Patch 4": "m2-hotfixes/pmet-stop-catching-sample-data-errrors-during-install.patch"
}
}
}
The problem I encountered is that while json allows for duplicate keys, ruby hashes don’t, so items were removed from the json file because they were removed from the hash. To solve this, I assumed I needed to create the array structure I specified so as to keep the IDs as the consistent identifier between the files scraped and the corresponding data belonging to them so that I could put the data together in a different arrangement. Now I realize this isn’t the case, so I have switched the approach to use the following:
files.each_with_index do |file, key|
value = File.open(file, &:readline).split('/')[3]
if value.match(/module-/) || value.match(/theme-/)
result = "magento/#{value}"
else
result = "magento2-base"
end
file_array << file
module_array << result
end
This yields the flat hashes that have been suggested below.
2
Answers
So first of all, the structure
is a little odd. It’s easier to work with as a flat hash, e.g.
Fortunately it’s quite easy to transform between the two:
From this point, Enumerable methods (in this case, the ever-useful map, group_by, and transform_values) will take you the rest of the way:
Which gives you a map of val to indexes:
and then we can replace those lists of indexes with the corresponding values in h1:
which produces your desired data structure:
I did notice that in your expected output that you specified, the values are hashes or arrays. I would recommend against this practice. It’s much better to have a uniform data type for all a hash’s keys and values. But, if you really did want to do this for whatever reason, it’s not too difficult:
By the way, I know this kind of functional programming / enumerable chaining code can be a bit hard to decipher. so I would recommend running it line-by-line for your understanding.
Assuming you’re using the unified data structure I mentioned above, I would recommend calling
.transform_values { |vals| vals.reduce(&:merge) }
on your final result so that the values are single hashes instead of multiple hashes:Let
arr1
andarr2
be your two arrays. Due to the fact that they are the same size and that for each indexi
,arr1[i][i]
andarr2[i][i]
are the values of the keyi
of the hashesarr1[i]
andarr2[i]
, the desired result can be obtained quite easily:The fragment
is effectively expanded to
If the hash
h
has no key[g[i]]
,so * becomes
after which
(which works with
"dog"
as well). The above expression can instead be written:This uses the form of Hash::new that employs a block (here
{|h,k| h[k]=[]}
) that is called when the hash is accessed by a value that is not one of its keys.An alternative method is:
This uses the form of Hash#update (aka
merge!
) that employs a block to determine the values of keys that are in both hashes being merged.A third way is to use Enumerable#group_by: