How to stitch together two arrays based on a common set of keys in Ruby - Magento

SteveK
March 22, 2020
142 views
0 votes
2 Answers

I have two arrays of hashes which are related by a common set of keys:

Array 1 is:

[
  {0=>"pmet-add-install-module-timings.patch"},
  {1=>"pmet-change-sample-data-load-order.patch"},
  {2=>"pmet-configurable-recurring.patch"},
  {3=>"pmet-consumers-run-staggered-by-sleep.patch"},
  {4=>"pmet-dynamic-block-segment-display.patch"},
  {5=>"pmet-fix-admin-label-word-breaking.patch"},
  {6=>"pmet-fix-invalid-module-dependencies.patch"},
  {7=>"pmet-fix-invalid-sample-data-module-dependencies.patch"},
  {8=>"pmet-fix-module-loader-algorithm.patch"},
  {9=>"pmet-fix-sample-data-code-generator.patch"},
  {10=>"pmet-remove-id-requirement-from-layout-update-file.patch"},
  {11=>"pmet-specify-store-id-for-order.patch"},
  {12=>"pmet-staging-preview-js-fix.patch"},
  {13=>"pmet-stop-catching-sample-data-errrors-during-install.patch"},
  {14=>"pmet-visitor-segment.patch"}
]

Array 2 is:

[
  {0=>"magento2-base"},
  {1=>"magento/module-sample-data"},
  {2=>"magento/module-configurable-sample-data"},
  {3=>"magento/module-message-queue"},
  {4=>"magento/module-banner"},
  {5=>"magento/theme-adminhtml-backend"},
  {6=>"magento/module-staging"},
  {7=>"magento/module-gift-registry-sample-data"},
  {8=>"magento2-base"},
  {9=>"magento/module-downloadable-sample-data"},
  {10=>"magento/module-catalog"},
  {11=>"magento/module-sales-sample-data"},
  {12=>"magento/module-staging"},
  {13=>"magento2-base"},
  {14=>"magento/module-customer"}
]

The hashes in these arrays have the same set of indexes, and the second array has duplicate values in keys 0, 8, and 13 as well as in 6 and 12.

My goal is to stitch the values from these two data sets together into a set of nested hashes. Wherever there is a duplicated value in Array 2, I need to collect its associated values from Array 1 and include them in a nested hash.

For example, take the magento2-base values from Array 2 and the key-associated values from Array 1. The hash structure in Ruby would look like:

hash = {
  "magento2-base" => [
    {0 => "m2-hotfixes/pmet-add-install-module-timings.patch"},
    {8 => "m2-hotfixes/pmet-fix-module-loader-algorithm.patch"},
    {13 => "m2-hotfixes/pmet-stop-catching-sample-data-errrors-during-install.patch"}
  ]
}

The same would hold true for any other duplicated values from Array 2, so, for example, magento/module-staging would be:

hash = {
  "magento/module-staging" => [
    {6 => "pmet-fix-invalid-module-dependencies.patch"},
    {12 => "pmet-staging-preview-js-fix.patch"}
  ]
}

A larger excerpt of the resultant hash which combines these needs together would look like this:

hash = {
  "magento2-base" => 
  [
    {0 => "m2-hotfixes/pmet-add-install-module-timings.patch"},
    {8 => "m2-hotfixes/pmet-fix-module-loader-algorithm.patch"},
    {13 => "m2-hotfixes/pmet-stop-catching-sample-data-errrors-during-install.patch"}
  ],
  "magento/module-sample-data" => 
  {0 => "pmet-change-sample-data-load-order.patch"},
    "magento/module-configurable-sample-data" =>
  {2 => "pmet-configurable-recurring.patch"},
    "magento/module-message-queue" =>
  {3 => "pmet-consumers-run-staggered-by-sleep.patch"}
  "magento/module-staging" => 
  [
    {6 => "pmet-fix-invalid-module-dependencies.patch"},
    {12 => "pmet-staging-preview-js-fix.patch"}
  ],
    ...
}

I used a nested loop which combines both arrays to link up the keys, and attempted to pull out the duplicates from Array 2, and was thinking I’d need to maintain both an array of the duplicate values from array 2 as well as an array of their associated values from Array 1. Then, I’d use some array merging magic to put it all back together.

Here’s what I have:

found_modules_array = []
duplicate_modules_array = []
duplicate_module_hash = {}
file_collection_array = []

modules_array.each do |module_hash|
  module_hash.each do |module_hash_key, module_hash_value|
    files_array.each do |file_hash|
      file_hash.each do |file_hash_key, file_hash_value|
        if module_hash_key == file_hash_key
          if found_modules_array.include?(module_hash_value)
            duplicate_module_hash = {
              module_hash_key => module_hash_value
            }
            duplicate_modules_array << duplicate_module_hash
          end
          found_modules_array << module_hash_value
        end
      end
    end
  end
end

In this code, files_array is Array 1 and modules_array is Array 2. found_modules_array is a bucket to hold any duplicates before pushing them into a duplicate_module_hash which would then be pushed into the duplicates_modules_array.

This solution:

Doesn’t work
Doesn’t take advantage of the power of Ruby
Isn’t performant

EDIT

The path to the above data structure is explained in full detail in the following post: Using array values as hash keys to create nested hashes in Ruby

I’ll summarize it below:

I have a directory of files. The majority of them are .patch files, although some of them are not. For each patch file, I need to scan the first line which is always a string and extract a portion of that line. With a combination of each file’s name, that portion of each first line, and a unique identifier for each file, I need to create a hash which I will then convert to json and write to a file.

Here are examples:

Directory of Files:

|__ .gitkeep
|__ pmet-add-install-module-timings.patch
|__ pmet-change-sample-data-load-order.patch

First Line Examples:

File Name: `pmet-add-install-module-timings.patch`
First Line: `diff --git a/setup/src/Magento/Setup/Model/Installer.php b/setup/src/Magento/Setup/Model/Installer.php`

File Name: `pmet-change-sample-data-load-order.patch`
First Line: `diff --git a/vendor/magento/module-sample-data/etc/module.xml b/vendor/magento/module-sample-data/etc/module.xml`

File Name: `pmet-stop-catching-sample-data-errrors-during-install.patch`
First Line: `diff --git a/vendor/magento/framework/Setup/SampleData/Executor.php b/vendor/magento/framework/Setup/SampleData/Executor.php`

File Name: `pmet-fix-admin-label-word-breaking.patch`
First Line: `diff --git a/vendor/magento/theme-adminhtml-backend/web/css/styles-old.less b/vendor/magento/theme-adminhtml-backend/web/css/styles-old.less`

Example Json File:

{
    "patches": {
        "magento/magento2-base": {
            "Patch 1": "m2-hotfixes/pmet-add-install-module-timings.patch"
        },
        "magento/module-sample-data": {
            "Patch 2": "m2-hotfixes/pmet-change-sample-data-load-order.patch"
        },
        "magento/theme-adminhtml-backend": {
            "Patch 3": "m2-hotfixes/pmet-fix-admin-label-word-breaking.patch"
        },
        "magento/framework": {
            "Patch 4": "m2-hotfixes/pmet-stop-catching-sample-data-errrors-during-install.patch"
        }
    }
}

The problem I encountered is that while json allows for duplicate keys, ruby hashes don’t, so items were removed from the json file because they were removed from the hash. To solve this, I assumed I needed to create the array structure I specified so as to keep the IDs as the consistent identifier between the files scraped and the corresponding data belonging to them so that I could put the data together in a different arrangement. Now I realize this isn’t the case, so I have switched the approach to use the following:

files.each_with_index do |file, key|
    value = File.open(file, &:readline).split('/')[3]
    if value.match(/module-/) || value.match(/theme-/)
        result = "magento/#{value}"
    else
        result = "magento2-base"
    end
    file_array << file
    module_array << result
end

This yields the flat hashes that have been suggested below.

Answers

So first of all, the structure

arr1 = [
    {0=>"pmet-add-install-module-timings.patch"},
    {1=>"pmet-change-sample-data-load-order.patch"},
    {2=>"pmet-configurable-recurring.patch"},
    {3=>"pmet-consumers-run-staggered-by-sleep.patch"},
    # etc
]

is a little odd. It’s easier to work with as a flat hash, e.g.

h1 = {
    0 => "pmet-add-install-module-timings.patch",
    1 => "pmet-change-sample-data-load-order.patch",
    2 => "pmet-configurable-recurring.patch",
    3 =>"pmet-consumers-run-staggered-by-sleep.patch",
    # etc
}

Fortunately it’s quite easy to transform between the two:

h1 = arr1.reduce(&:merge)
h2 = arr2.reduce(&:merge)

From this point, Enumerable methods (in this case, the ever-useful map, group_by, and transform_values) will take you the rest of the way:

indexed_by_val = h2.
  group_by { |k,v| v }.
  transform_values { |vals| vals.map(&:first) }

Which gives you a map of val to indexes:

{
  "magento2-base"=>[0, 8, 13],
  "magento/module-sample-data"=>[1],
  "magento/module-configurable-sample-data"=>[2],
  # etc
}

and then we can replace those lists of indexes with the corresponding values in h1:

result = indexed_by_val.transform_values do |indexes|
  indexes.map do |idx|
    { idx => h1[idx] }
  end
end

which produces your desired data structure:

{
  "magento2-base"=>[
    {0=>"pmet-add-install-module-timings.patch"},
    {8=>"pmet-fix-module-loader-algorithm.patch"},
    {13=>"pmet-stop-catching-sample-data-errrors-during-install.patch"}
  ],
  "magento/module-sample-data"=>[
    {1=>"pmet-change-sample-data-load-order.patch"}
  ],
  "magento/module-configurable-sample-data"=>[
    {2=>"pmet-configurable-recurring.patch"}
  ],
  # etc
}

I did notice that in your expected output that you specified, the values are hashes or arrays. I would recommend against this practice. It’s much better to have a uniform data type for all a hash’s keys and values. But, if you really did want to do this for whatever reason, it’s not too difficult:

# I am not advising this approach
result2 = result.transform_values do |arr|
  arr.length > 1 ? arr : arr[0]
end

By the way, I know this kind of functional programming / enumerable chaining code can be a bit hard to decipher. so I would recommend running it line-by-line for your understanding.

Assuming you’re using the unified data structure I mentioned above, I would recommend calling .transform_values { |vals| vals.reduce(&:merge) } on your final result so that the values are single hashes instead of multiple hashes:

{
  "magento2-base"=>{
    0=>"pmet-add-install-module-timings.patch",
    8=>"pmet-fix-module-loader-algorithm.patch",
    13=>"pmet-stop-catching-sample-data-errrors-during-install.patch"
  },
  "magento/module-sample-data"=>{
    1=>"pmet-change-sample-data-load-order.patch"
  ],
  "magento/module-configurable-sample-data"=>{
    2=>"pmet-configurable-recurring.patch"
  },
  # etc
}

Let arr1 and arr2 be your two arrays. Due to the fact that they are the same size and that for each index i, arr1[i][i] and arr2[i][i] are the values of the key i of the hashes arr1[i] and arr2[i], the desired result can be obtained quite easily:

arr2.each_with_index.with_object({}) do |(g,i),h|
  (h[g[i]] ||= []) << arr1[i][i]
end
  #=> {
  #    "magento2-base"=>[
  #    "pmet-add-install-module-timings.patch",
  #    "pmet-fix-module-loader-algorithm.patch",
  #    "pmet-stop-catching-sample-data-errrors-during-install.patch"
  #    ],
  #    "magento/module-sample-data"=>[
  #      "pmet-change-sample-data-load-order.patch"
  #    ],
  #    ...
  #    "magento/module-staging"=>[
  #      "pmet-fix-invalid-module-dependencies.patch",
  #      "pmet-staging-preview-js-fix.patch"
  #    ],
  #    "magento/module-customer"=>[
  #      "pmet-visitor-segment.patch"
  #    ]
  #   }

The fragment

h[g[i]] ||= []

is effectively expanded to

h[g[i]] = h[g[i]] || []  # *

If the hash h has no key [g[i]],

h[g[i]] #=> nil

so * becomes

h[g[i]] = nil || [] #=> []

after which

h[g[i]] << "cat"
  #=> ["cat"]

(which works with "dog" as well). The above expression can instead be written:

arr2.each_with_index.with_object(Hash.new {|h,k| h[k]=[]}) do |(g,i),h|
  h[g[i]] << arr1[i][i]
end

This uses the form of Hash::new that employs a block (here {|h,k| h[k]=[]}) that is called when the hash is accessed by a value that is not one of its keys.

An alternative method is:

arr2.each_with_index.with_object({}) do |(g,i),h|
  h.update(g[i]=>[arr1[i][i]]) { |_,o,n| o+n }
end

This uses the form of Hash#update (aka merge!) that employs a block to determine the values of keys that are in both hashes being merged.

A third way is to use Enumerable#group_by:

arr2.each_with_index.group_by { |h,i| arr2[i][i] }.
     transform_values { |a| a.map { |_,i| arr1[i][i] } }

Please signup or login to give your own answer.

Click here to cancel reply.

How to stitch together two arrays based on a common set of keys in Ruby – Magento

EDIT

Answers