skip to Main Content

I’m getting the error invalid byte sequence in UTF-8 when trying to import a CSV file in my Rails application. Everything was working fine until I added a gsub method to compare one of the CSV columns to a field in my database.

When I import a CSV file, I want to check whether the address for each row is included in an array of different addresses for a specific client. I have a client model with an alt_addresses property which contains a few different possible formats for the client’s address.

I then have a citation model (if you’re familiar with local SEO you’ll know this term). The citation model doesn’t have an address field, but it has a nap_correct? field (NAP stands for “Name”, “Address”, “Phone Number”). If the name, address, and phone number for a CSV row is equivalent to what I have in the database for that client, the nap_correct? field for that citation gets set to “correct”.

Here’s what the import method looks like in my citation model:

def self.import(file, client_id)
  @client = Client.find(client_id)
  CSV.foreach(file.path, headers: true) do |row|
    @row = row.to_hash
    @citation = Citation.new
    if @row["Address"]
      if @client.alt_addresses.include?(@row["Address"].to_s.downcase.gsub(/W+/, '')) && self.phone == @row["Phone Number"].gsub(/[^0-9]/, '')
        @citation.nap_correct = true
      end
    end
    @citation.name = @row["Domain"]
    @citation.listing_url = @row["Citation Link"]
    @citation.save
  end
end

And then here’s what the alt_addresses property looks like in my client model:

def alt_addresses
  address = self.address.downcase.gsub(/W+/, '')
  address_with_zip = (self.address + self.zip_code).downcase.gsub(/W+/, '')
  return [address, address_with_zip]
end

I’m using gsub to reformat the address column in the CSV as well as the field in my client database table so I can compare the two values. This is where the problem comes in. As soon as I added the gsub method I started getting the invalid byte-sequence error.

I’m using Ruby 2.1.3. I’ve noticed a lot of the similar errors I find searching Stack Overflow are related to an older version of Ruby.

2

Answers


  1. Specify the encoding with encoding option:

    CSV.foreach(file.path, headers: true, encoding: 'iso-8859-1:utf-8') do |row|
     # your code here
    end
    
    Login or Signup to reply.
  2. One way I’ve figured out to get around this is to “Save As” on open office or libre office and then click “Edit Filter Settings”, then make sure the character set is UTF-8 and save. Bottom line, use some external tool to convert the characters to utf-8 compatible characters before loading it into ruby. This issue can be a true f-ing labyrinth within ruby alone

    A unix tool called iconv can apparently do this sort of thing. https://superuser.com/questions/588048/is-there-any-tools-which-can-convert-any-strings-to-utf-8-encoded-values-in-linu

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search