When running my code I realized that after around 30 requests it gets very slow to fetch the rest, taking a long time to complete the code.
My goal with this code is to fetch users from a Telegram’s group and dump their information in a JSON file, with ther name, username and bio.
This is the code I am running:
import configparser
import json
import asyncio
from telethon.tl.functions.users import GetFullUserRequest
from telethon import TelegramClient
from telethon.errors import SessionPasswordNeededError
from telethon.tl.functions.channels import GetParticipantsRequest
from telethon.tl.types import ChannelParticipantsSearch
from telethon.tl.types import (
PeerChannel
)
# Reading Configs
config = configparser.ConfigParser()
config.read("config.ini")
# Setting configuration values
api_id = config['Telegram']['api_id']
api_hash = config['Telegram']['api_hash']
api_hash = str(api_hash)
phone = config['Telegram']['phone']
username = config['Telegram']['username']
# Create the client and connect
client = TelegramClient(username, api_id, api_hash)
async def main(phone):
await client.start()
print("Client Created")
# Ensure you're authorized
if await client.is_user_authorized() == False:
await client.send_code_request(phone)
try:
await client.sign_in(phone, input('Enter the code: '))
except SessionPasswordNeededError:
await client.sign_in(password=input('Password: '))
me = await client.get_me()
user_input_channel = input("enter entity(telegram URL or entity id):")
if user_input_channel.isdigit():
entity = PeerChannel(int(user_input_channel))
else:
entity = user_input_channel
my_channel = await client.get_entity(entity)
offset = 0
limit = 100
all_participants = []
count = 0
while True:
participants = await client(GetParticipantsRequest(my_channel, ChannelParticipantsSearch(''), offset, limit,hash=0))
if not participants.users or offset >= 100:
break
all_participants.extend(participants.users)
count +=1
print(len(participants.users))
offset += len(participants.users)
print("finished")
all_user_details = []
for participant in all_participants:
user_full = await client(GetFullUserRequest(participant.id))
all_user_details.append({
"id": user_full.full_user.id,
"bio": user_full.full_user.about
})
with open('user_data.json', 'w') as outfile:
json.dump(all_user_details, outfile)
with client:
client.loop.run_until_complete(main(phone))
With some debug I could realize that the problem is the in the for loop, this is the loop it is taking a long time to complete
for participant in all_participants:
user_full = await client(GetFullUserRequest(participant.id))
all_user_details.append({
"id": user_full.full_user.id,
"bio": user_full.full_user.about
})
Why is this happening and how do I optmize this code to run more effiently?
My compiler is python3 version 3.11.6
Do you guys need more information?
Update—————-
What I have tried:
As jsbueno suggested, I’ve adjusted my code to send multiple requests instead of one request by user, using the asyncio. It worked well for a group base with 80 users, but above from that, I got the following errors:
Server response had invalid buffer: Invalid response buffer (HTTP code 429)
Server indicated flood error at transport level: Invalid response buffer (HTTP code 429)
which indicates too many concurrent requests to the API, so I have tried with the semaphore method, and ended up with the code like this:
# ... (other imports and code)
api_semaphore = asyncio.Semaphore(10) #Updated line
async def main(phone):
await client.start()
print("Client Created")
# Ensure you're authorized
if await client.is_user_authorized() == False:
await client.send_code_request(phone)
try:
await client.sign_in(phone, input('Enter the code: '))
except SessionPasswordNeededError:
await client.sign_in(password=input('Password: '))
me = await client.get_me()
user_input_channel = input("enter entity(telegram URL or entity id):")
if user_input_channel.isdigit():
entity = PeerChannel(int(user_input_channel))
else:
entity = user_input_channel
my_channel = await client.get_entity(entity)
# Fetch all participants
offset = 0
limit = 100
all_participants = []
count = 0
while True:
participants = await client(GetParticipantsRequest(my_channel, ChannelParticipantsSearch(''), offset, limit,hash=0))
if not participants.users or offset >= 1000:
break
all_participants.extend(participants.users)
count +=1
print(len(participants.users))
offset += len(participants.users)
print("finished")
# Process the participants as needed
all_user_details = []
tasks = []
for participant in all_participants:
async with api_semaphore:
# This will prepare all requests and let them be ready to go
tasks.append(asyncio.create_task(client(GetFullUserRequest(participant.id)))) #Updated line
# ... (rest of the code)
With that code, I still have the same problems mentioned earlier, even though they are limited in concurrent requests now. Why is this happening, and what is taking so much time of processing after the loops are finished?
2
Answers
Although you are usign asynchronous libs, you are making all your requests for participants in series – meaning for 30 uers, you make 30 requests, one just after the previous other was finished.
One can easily do this concurrently with asyncio code – for example, you can rewrite this part of your code:
more or less like this:
Now, depending on the number of users and on your network, sending all requests "at once" (not really "at once" but they are sent as fast as possible), may cause some failures. If that is the case, you will have to create some more code and make use of an
asyncio.Semaphore
or other strategy to limit the number of concurrent requests to the API.hello i am making new telegram bot which will login user from phone number and otp code send. then display list of all channels where user can select from channel and after selecting channel it will copy messages from it. My code run on render. But every time when user enter OTP, it does get forward and in logger got message – "User already connected". Here is below code please help me correct it. Thanks in advance