I stuck with getting the position inside a String.
I read the content of a file
with io.open(testfile, 'r', encoding='utf-8') as f
u2705 Offizielle Kanu00e4le ud83cudde9ud83cuddea ud83cudde6ud83cuddf9 ud83cudde8ud83cuddedn@GET_THIS_STING
What do I have to do – that “u2705” is counted as 1 letter?
Then Position 36 would be the start of @GET_THIS_STING
–== EDIT ==–
I can now better show whats the problem:
import json
from io import open
line = '{"message":{"message_id":3052,"text":"u2705 Offizielle Kanu00e4le ud83cudde9ud83cuddea ud83cudde6ud83cuddf9 ud83cudde8ud83cudded\n@GET_THIS_STING\n123456789","entities":[{"offset":36,"length":26,"type":"mention"}]}}'
myjson = json.loads(line)
text = myjson.get("message", {}).get("text", None)
print(str(text).encode('utf-8', 'replace').decode())
print("string length: " + str(len(text)))
print(text[36:36+15])
print("-------------")
with open("/home/pi/telegram/phpLogs/test.txt", 'r', encoding='utf-8', errors="surrogateescape") as f:
for line in f:
myjson = json.loads(line)
text = myjson.get("message", {}).get("text", None)
print(text)
print("string length: " + str(len(text)))
print(text[36:36+15])
RESULT:
✅ Offizielle Kanäle ???? ???? ????
@GET_THIS_STING
123456789
string length: 61
@GET_THIS_STING
-------------
✅ Offizielle Kanäle 🇩🇪 🇦🇹 🇨🇭
@GET_THIS_STING123456789
string length: 54
HIS_STING123456
So when I have the string inside my code (UTF-8) as a variable (String), everything works fine.
But when I create a file with content and read it
"{"message":{"message_id":3052,"text":"u2705 Offizielle Kanu00e4le ud83cudde9ud83cuddea ud83cudde6ud83cuddf9 ud83cudde8ud83cudded\n@GET_THIS_STING\n123456789","entities":[{"offset":36,"length":26,"type":"mention"}]}}"
I always receive a “wrong” result 🙁
So reading a file is my problem, because the strings are not the same afterwards – even the length is different!
2
Answers
If your file
text.txt
literally contains,Try:
And, this outputs:
With a file
text.txt
that looks something like this,We can do,
It outputs,
If this string represents
✅ Offizielle Kanäle 🇩🇪 🇦🇹 🇨🇭
as suggested by @scribe’s answer, then I think you run into the problem mentioned here: Converting to EmojiTherefore I suggest replacing
with
or, if the file is “JSON lines” rather than single JSON:
and then
text
will be a proper Unicode string, sotext[36:]
should get you what you asked for.