skip to Main Content

Edit: In the end, it turns out that PokeAPI uses "soft hyphens." I think some people see a hyphen in the example Pokedex entry I posted and some people won’t. If you see a hyphen below, know that I don’t and was originally working from the assumption there wasn’t one.


From here: https://pokeapi.co/api/v2/pokemon-species/8

flavor-text-entries[4] is the following string:
"It cleverly con­ntrols its furrynears and tail tou000cmaintain itsnbalance whilenswimming."

I’m trying to get the entries into a one-line string; my endlines may not be where the API’s endlines are. I’m using PHP, so the correctly formatted version of string_replace([<array of \r\n varities>], " ") works for most things. The u000c’s I can handle too now that I see it. The problem with this specific entry is that one n is the end of a line, and needs to be replaced with " ". The other n is in the middle of a word, and needs to be replaced with "", and I’m not smart enough to figure out how to handle that.

Doing the replacement I showed above turns that string into:

"It cleverly con­ trols its furry ears and tail to maintain its balance while swimming."

With a wayward space in the word "controls." I imagine this is true for other entries too, this is just the one I stumbled on. Is there anything I can do about this? It doesn’t even have a hyphen for the end of a word before a linebreak or whatever you call that, y’know? Then I’d just replace -n with "" and n with " " and be on my way.

Edit: Of course, I asked this question specific to the one entry I found, but if there’s a "universal" answer for other entries that undoubtedly do this, I can edit the title/tags to make it easier for someone to find via Google in the future. I’m just not sure the best way to word it.

2

Answers


  1. Chosen as BEST ANSWER

    TL;DR -- the PokeAPI uses soft hyphens when it breaks a word up over multiple lines. On Windows 11, Firefox wasn't displaying these, and a soft-hyphen and a regular hyphen are different, so a simple search-and-replace to swap "-n" to "" wouldn't cut it either.

    I found a solution, but because the soft-hyphen doesn't show up for me in my browser, you may or may not be able to see it properly in the first str_replace here:

    $flavorText = str_replace("­n", "", $flavorText); //remove [soft hyphen-new line] entirely
    $flavorText = preg_replace('/[x00-x1Fx80-xC0]/u', ' ', $flavorText); //replaces a wide range of things with a " "
    

    Even if you can't see it, it IS there, and copy-pasting the code into Notepad++ causes the soft-hyphen to appear where it should be.


    TSCAmerica had suggested to use a regex replace to, in plain english, "if it's a hyphen before a newline, replace it with nothing." Specifically they suggested preg_replace('/(?<![-w])n|n(?!w)/', '', $flavorText); which I think equates to how I defined it. That was my first thought as well, because if it's a hyphen before a linebreak, then it's probably wrapping a word around and you need to replace it with a "" instead of a space. The issue is that the entry doesn't actually hyphenate when it wraps a word around that way...

    ...or so I thought? There was no hyphen when I echo-ed out the string I pulled from the API. But then I output the entries into a text file for something else to use, and when I checked the text file there is a hyphen! It turns out that it's using a "soft hyphen." When I copy-paste the hyphen from the output into a search bar, it searches for '%C2%AD' which the University of Edinburgh helpfully explained. I honestly couldn't understand why the word-wraps weren't hyphenated because surely it would have been hyphenated in the game, but now I understand that it is! Just, uh, "softly" hyphenated. I'm guessing there's an encoding "issue" where it's just not showing up in Firefox for me due to my settings.

    So when it "wasn't there" in the original data, I wasn't sure what to do with the data, then when it WAS in the output but I couldn't get rid of it by looking for a regular hyphen, I was really confused. Then I went back to the original data to post the question and totally forgot I ended up with a hyphen in the first place...

    Thanks for helping, everyone! Checking out all the different suggestions ended up getting me there in the end.


  2. I used PHP regular expression to differentiate between these two cases. Hope the following code helps

    $flavorText = "It cleverly con­ntrols its furrynears and tail tou000cmaintain itsnbalance whilenswimming.";
    
    $flavorText = preg_replace('/(?<![-w])n|n(?!w)/', '', $flavorText);
    
    $flavorText = str_replace("n", " ", $flavorText);
    
    $flavorText = str_replace("u000c", " ", $flavorText);
    
    echo $flavorText;
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search