skip to Main Content

I’m trying to make an app using a specific API, but the problem is the response is quite tricky. Most of the keys and values are inside straight quotes (") except for one. One of the values starts with a curly quote (“) and ends with the straight quote("). Using JSONDecoder and codable is not working.

Any work around for this one without modifying the response on server side?

I tried replacing occurences of curly quotes with straight quotes only to find that some of the values are using curly quotes inside them so it wont work.

2

Answers


  1. You may need to fiddle with regular expressions, things like:

    ^“ for curly quotes in the start of fields, and “$ for curly quotes in the end of words. This way you can match only curly quotes that start or end key/values of your JSON. It shouldn’t be very complex to pinpoint and replace them with today’s languages support for REGEX and text substitutions.

    Login or Signup to reply.
  2. First, you absolutely should fix the server. This is invalid JSON.

    If you can’t fix the server, a substitution based on your knowledge of the format is probably the right thing. Look for something like /“[^"]*"/: as a regex to look for a smart quote followed by a bunch of non-", followed by " followed immediately by :. Maybe in your particular case that will happen to work.

    But these are boring answers. Let’s do it the hard way and validate some JSON by hand, and fix this one special case (smart quote at the beginning of a key).

    One tool you could use is a JSON tokenizer. For something fairly off-the-shelf (but not particularly supported), see RNJSON. After importing with:

        .package(url: "https://github.com/rnapier/RNJSON", branch: "main"),
    

    The following would rewrite your JSON:

    let tokenizer = JSONTokenizer()
    var json = originalJSON[...]
    var outJSON = Data()
    let smartQuote = "“".utf8
    
    while !json.isEmpty {
        // Grab a token and append it
        let result = try tokenizer.firstToken(from: json)
        outJSON.append(result.data)
        json.removeFirst(result.length)
    
        // If the next token may be a key, drop whitespace, and if there's a smart-quote, replace it
        if result is JSONTokenObjectOpen || result is JSONTokenListSeparator {
            // Drop whitespace
            try json.trimPrefix(while: { [0x09, 0x0a, 0x0d, 0x20].contains($0) })
            if json.starts(with: smartQuote) {
                let range = json.startIndex..<json.index(json.startIndex, offsetBy: smartQuote.count)
                json.replaceSubrange(range, with: """.utf8)
            }
        }
    }
    

    RNJSON is a toy of mine (as is the closely related RNAJSON), and I don’t recommend relying on the full package. I recommend just copying what you need and learning from it.

    So another way to approach this is to parse it directly. It’s a lot more code, but it’s a good introduction to really simple parsing. This code does not try to carefully validate the JSON. It does the minimum it can to make sure it’s at an object-key before looking for smart-quotes. Other than that, it assumes a later parser will check the details.

    The key block of code for this problem is:

                   // Detect smart quotes 0xe2, 0x80, 0x9c and replace with "  ""
                case 0xe2 where awaiting.contains(.objectKey) && peekAhead(2, is: [0x80, 0x9c]):
                    let startRange = self.index(before: index)
                    let endRange = self.index(after: index)
                    self[startRange...endRange] = SubSequence("  "".utf8)
    

    Here’s the full function to your specific problem.

    enum JSONError: Swift.Error, Hashable {
        case unexpectedCharacter(ascii: UInt8)
        case unexpectedEndOfFile
    }
    
    private enum Awaiting: Hashable {
        case start, end
        case objectKey, keyValueSeparator, objectValue, objectSeparator, objectClose
        case arrayValue, arraySeparator, arrayClose
    }
    
    private enum Container { case object, array }
    
    extension MutableDataProtocol {
        mutating func fixLeadingSmartQuotesOnJSONKeys() throws {
            var index = startIndex
            var containers: [Container] = []
            var awaiting: Set<Awaiting> = [.start]
            let digitCharacters = "0123456789-+.eE".utf8
    
            while let byte = nextByte() {
                switch byte {
                case UInt8(ascii: "t"), UInt8(ascii: "n"), UInt8(ascii: "r"), UInt8(ascii: " "):
                    break
    
                case UInt8(ascii: "{") where !awaiting.isDisjoint(with: [.start, .objectValue, .arrayValue]):
                    containers.append(.object)
                    awaiting = [.objectKey, .objectClose]
    
                case UInt8(ascii: """) where awaiting.contains(.objectKey):
                    awaiting = [.keyValueSeparator]
                    try consumeOpenString()
    
                case UInt8(ascii: ":") where awaiting.contains(.keyValueSeparator):
                    awaiting = [.objectValue]
    
                case UInt8(ascii: ",") where awaiting.contains(.objectSeparator):
                    awaiting = [.objectKey, .objectClose]
    
                case UInt8(ascii: "}") where containers.last == .object && awaiting.contains(.objectClose):
                    popContainer()
    
                case UInt8(ascii: "[") where !awaiting.isDisjoint(with: [.start, .objectValue, .arrayValue]):
                    containers.append(.array)
                    awaiting = [.arrayValue, .arrayClose]
    
                case UInt8(ascii: ",") where awaiting.contains(.arraySeparator):
                    awaiting = [.arrayValue, .arrayClose]
    
                case UInt8(ascii: "]") where containers.last == .array && awaiting.contains(.arrayClose):
                    popContainer()
    
                    // Detect smart quotes 0xe2, 0x80, 0x9c and replace with "  ""
                case 0xe2 where awaiting.contains(.objectKey) && peekAhead(2, is: [0x80, 0x9c]):
                    let startRange = self.index(before: index)
                    let endRange = self.index(after: index)
                    self[startRange...endRange] = SubSequence("  "".utf8)
    
                case _ where awaiting.contains(.objectValue):
                    awaiting = [.objectSeparator, .objectClose]
                    try consumeScalarValue(first: byte)
    
                case _ where awaiting.contains(.arrayValue):
                    awaiting = [.arraySeparator, .arrayClose]
                    try consumeScalarValue(first: byte)
    
                case _ where awaiting.contains(.start):
                    awaiting = [.end]
                    try consumeScalarValue(first: byte)
    
                default:
                    throw JSONError.unexpectedCharacter(ascii: byte)
                }
            }
    
            func peekAhead(_ count: Int, is value: [UInt8]) -> Bool {
                self[index...].prefix(count).elementsEqual(value)
            }
    
            func consumeOpenString() throws {
                while let byte = nextByte() {
                    switch byte {
                    case UInt8(ascii: """): return
    
                        // This intentionally doesn't worry about u as a special case.
                        // It'll just be handled as more of string. The only reason for this
                        // test is to handle escaped close-quotes.
                    case UInt8(ascii: "\"): try consume(count: 1)
    
                    default: break
                    }
                }
                throw JSONError.unexpectedEndOfFile
            }
    
            func popContainer() {
                containers.removeLast()
                switch containers.last {
                case .none: awaiting = [.end]
                case .some(.object): awaiting = [.objectSeparator, .objectClose]
                case .some(.array): awaiting = [.arraySeparator, .arrayClose]
                }
            }
    
            func consumeScalarValue(first: UInt8) throws {
                switch first {
                case UInt8(ascii: """): try consumeOpenString()
                case UInt8(ascii: "-"), UInt8(ascii: "0")...UInt8(ascii: "9"): try consumeDigits()
                case UInt8(ascii: "t"): try consumeOpenLiteral("true")
                case UInt8(ascii: "f"): try consumeOpenLiteral("false")
                case UInt8(ascii: "n"):try consumeOpenLiteral("null")
                default: throw JSONError.unexpectedCharacter(ascii: first)
                }
            }
    
            func consumeDigits() throws {
                while let byte = nextByte(), digitCharacters.contains(byte) {}
                formIndex(before: &index) // Back-up one character
            }
    
            func consume(count: Int) throws {
                for _ in 0..<count where nextByte() == nil {
                    throw JSONError.unexpectedEndOfFile
                }
            }
    
            func consumeOpenLiteral(_ literal: String) throws {
                try consume(count: literal.count - 1)
            }
    
            func nextByte() -> UInt8? {
                guard index != endIndex else { return nil }
                let currentIndex = index
                formIndex(after: &index)
                return self[currentIndex]
            }
        }
    }
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search