I’m getting the error invalid byte sequence in UTF-8
when trying to import a CSV file in my Rails application. Everything was working fine until I added a gsub
method to compare one of the CSV columns to a field in my database.
When I import a CSV file, I want to check whether the address for each row is included in an array of different addresses for a specific client. I have a client model with an alt_addresses
property which contains a few different possible formats for the client’s address.
I then have a citation model (if you’re familiar with local SEO you’ll know this term). The citation model doesn’t have an address field, but it has a nap_correct?
field (NAP stands for “Name”, “Address”, “Phone Number”). If the name, address, and phone number for a CSV row is equivalent to what I have in the database for that client, the nap_correct?
field for that citation gets set to “correct”.
Here’s what the import
method looks like in my citation model:
def self.import(file, client_id)
@client = Client.find(client_id)
CSV.foreach(file.path, headers: true) do |row|
@row = row.to_hash
@citation = Citation.new
if @row["Address"]
if @client.alt_addresses.include?(@row["Address"].to_s.downcase.gsub(/W+/, '')) && self.phone == @row["Phone Number"].gsub(/[^0-9]/, '')
@citation.nap_correct = true
end
end
@citation.name = @row["Domain"]
@citation.listing_url = @row["Citation Link"]
@citation.save
end
end
And then here’s what the alt_addresses
property looks like in my client model:
def alt_addresses
address = self.address.downcase.gsub(/W+/, '')
address_with_zip = (self.address + self.zip_code).downcase.gsub(/W+/, '')
return [address, address_with_zip]
end
I’m using gsub
to reformat the address column in the CSV as well as the field in my client database table so I can compare the two values. This is where the problem comes in. As soon as I added the gsub
method I started getting the invalid byte-sequence error.
I’m using Ruby 2.1.3. I’ve noticed a lot of the similar errors I find searching Stack Overflow are related to an older version of Ruby.
2
Answers
Specify the encoding with
encoding
option:One way I’ve figured out to get around this is to “Save As” on open office or libre office and then click “Edit Filter Settings”, then make sure the character set is UTF-8 and save. Bottom line, use some external tool to convert the characters to utf-8 compatible characters before loading it into ruby. This issue can be a true f-ing labyrinth within ruby alone
A unix tool called iconv can apparently do this sort of thing. https://superuser.com/questions/588048/is-there-any-tools-which-can-convert-any-strings-to-utf-8-encoded-values-in-linu