skip to Main Content

I parse a string to HTML and extract tables from it.

The tables have two columns: 1st single (key), 2nd multi-value (values)

I want to store the values in a hash to an arrays.

use strict;
use warnings;

use Data::Dumper qw(Dumper);

my $html='<p class="auto-cursor-target"><br /></p><table class="wrapped"><colgroup><col style="width: 50.0px;" /><col style="width: 29.0px;" /></colgroup><tbody><tr><th><p>Wikispace</p></th><th><p>right</p></th></tr><tr><td>mimi</td><td>right1</td></tr><tr><td colspan="1">mama</td><td colspan="1">right3,right2</td></tr></tbody></table><p class="auto-    cursor-target"><br /></p>';

use HTML::TableExtract;
my $te = HTML::TableExtract->new( headers => [qw(Wikispace right)] );
$te->parse($html);

my %known;
foreach my $ts ($te->tables) {
   foreach my $row ($ts->rows) {
     print @$row[0], ":::", @$row[1], ":  ";
     foreach my $val (split(/,/,@$row[1])) {
             print $val, ";";
             if (! $known{@$row[0]}) {
               my @arr = ($val);
               @known{@$row[0]}=@arr;
             } else {
                     # my @arr = @known{@$row[0]};
                     #              push (@arr, $val);
                     #         print Dumper @arr;
                     push (@$known{@$row[0]}, $val);
             };
     }
     print "n";
   }
 }

print Dumper %known;

What am I doing wrong? What’s wrong with the last push, and how would you do it differently?

Also is there no way to assign an array directly to a hash (dictionary) instead of first having to generate an array and later linking its address?

2

Answers


  1. You get a syntax error on the line:

                     push (@$known{@$row[0]}, $val);
    

    because you declared the variable as a hash (%known), but you are trying to access it as a scalar ($known).

    Here is a simpler version of your code that runs without errors:

    use strict;
    use warnings;
    
    use Data::Dumper qw(Dumper);
    
    my $html='<p class="auto-cursor-target"><br /></p><table class="wrapped"><colgroup><col style="width: 50.0px;" /><col style="width: 29.0px;" /></colgroup><tbody><tr><th><p>Wikispace</p></th><th><p>right</p></th></tr><tr><td>mimi</td><td>right1</td></tr><tr><td colspan="1">mama</td><td colspan="1">right3,right2</td></tr></tbody></table><p class="auto-    cursor-target"><br /></p>';
    
    use HTML::TableExtract;
    my $te = HTML::TableExtract->new( headers => [qw(Wikispace right)] );
    $te->parse($html);
    
    my %known;
    foreach my $ts ($te->tables) {
        foreach my $row ($ts->rows) {
            my @vals = split(/,/, $row->[1]);
            $known{ $row->[0] } = [@vals];
        }
     }
    print Dumper(%known);
    

    Output:

    $VAR1 = {
              'mama' => [
                          'right3',
                          'right2'
                        ],
              'mimi' => [
                          'right1'
                        ]
            };
    
    Login or Signup to reply.
  2. The overall approach is fine but there are many basic errors throughout. I’d suggest to first make a good go over a solid introductory material, instead of suffering with basic notions and syntax of the language.

    Basic errors: that $row is an array reference (often called "arrayref" for short) so to extract an element you need $row->[0]; then, those elements themselves are not arrayrefs so you can’t dereference them (@{ $row->[0] } is wrong). And, the headers you specify are wrong — your document doesn’t have such headers.

    I don’t fully understand the whole purpose but here is youor program cleaned up so that it works

    use strict;
    use warnings;
    use feature 'say';
    
    use Data::Dumper qw(Dumper);
    
    my $html='<p class="auto-cursor-target"><br /></p><table class="wrapped"><colgroup><col style="width: 50.0px;" /><col style="width: 29.0px;" /></colgroup><tbody><tr><th><p>Wiki    space</p></th><th><p>right</p></th></tr><tr><td>mimi</td><td>right1</td></tr><tr><td colspan="1">mama</td><td colspan="1">right3,right2</td></tr></tbody></table><p class="auto-    cursor-target"><br /></p>';
    
    use HTML::TableExtract;
    
    my $te = HTML::TableExtract->new( headers => ['Wiki    space', 'right'] );
    $te->parse($html);
    
    my %known;
    foreach my $ts ($te->tables) {
        #say "ts: $ts";
        foreach my $row ($ts->rows) {
            #say "row: @{$row}";
            foreach my $val ( split /,/, $row->[1] ) {
                print $val, ";";
                if (not $known{$row->[0]}) {
                    $known{$row->[0]} = [ $val ];
                }
                else {
                    push @{$known{$row->[0]}}, $val;
                };
            }
            say '';
        }
    }
    
    print Dumper %known;
    

    This prints

    right1;
    right3;right2;
    $VAR1 = {
              'mimi' => [
                          'right1'
                        ],
              'mama' => [
                          'right3',
                          'right2'
                        ]
            };
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search