skip to Main Content

I would like to download the chat history (all messages) that were posted in a public group on Telegram. How can I do this with python?

I’ve found this method in the API https://core.telegram.org/method/messages.getHistory which I think looks like what I’m trying to do. But how do I actually call it? It seems there’s no python examples for the MTproto protocol they use.

I also looked at the Bot API, but it doesn’t seem to have a method to download messages.

5

Answers


  1. Now, you can use TDesktop to export chats.

    Here is the blog post about Aug 2018 update.


    Original Answer:

    Telegram MTProto is hard to use to newbies, so I recommend telegram-cli.

    You can use third-party tg-export script, but still not easy to newbies too.

    Login or Signup to reply.
  2. You can use Telethon. Telegram API is fairly complicated and with the telethon, you can start using telegram API in a very short time without any pre-knowledge about the API.

    pip install telethon
    

    Then register your app (taken from telethon):

                      
    

    the link is: https://my.telegram.org/

    Then to obtain message history of a group (assuming you have the group id):

    chat_id = YOUR_CHAT_ID
    api_id=YOUR_API_ID
    api_hash = 'YOUR_API_HASH'
    
    from telethon import TelegramClient
    from telethon.tl.types.input_peer_chat import InputPeerChat
    
    client = TelegramClient('session_id', api_id=api_id, api_hash=api_hash)
    client.connect()
    chat = InputPeerChat(chat_id)
    
    total_count, messages, senders = client.get_message_history(
                            chat, limit=10)
    
    for msg in reversed(messages):
        # Format the message content
        if getattr(msg, 'media', None):
            content = '<{}> {}'.format(  # The media may or may not have a caption
            msg.media.__class__.__name__,
            getattr(msg.media, 'caption', ''))
        elif hasattr(msg, 'message'):
            content = msg.message
        elif hasattr(msg, 'action'):
            content = str(msg.action)
        else:
            # Unknown message, simply print its class name
            content = msg.__class__.__name__
    
        text = '[{}:{}] (ID={}) {}: {} type: {}'.format(
                msg.date.hour, msg.date.minute, msg.id, "no name",
                content)
        print (text)
    

    The example is taken and simplified from telethon example.

    Login or Signup to reply.
  3. With an update (August 2018) now Telegram Desktop application supports saving chat history very conveniently.
    You can store it as json or html formatted.

    To use this feature, make sure you have the latest version of Telegram Desktop installed on your computer, then click Settings > Export Telegram data.

    https://telegram.org/blog/export-and-more

    Login or Signup to reply.
  4. The currently accepted answer is for very old versions of Telethon. With Telethon 1.0, the code can and should be simplified to the following:

    # chat can be:
    # * int id (-12345)
    # * str username (@chat)
    # * str phone number (+12 3456)
    # * Peer (types.PeerChat(12345))
    # * InputPeer (types.InputPeerChat(12345))
    # * Chat object (types.Chat)
    # * ...and many more types
    chat = ...
    api_id = ...
    api_hash = ...
    
    from telethon.sync import TelegramClient
    
    client = TelegramClient('session_id', api_id, api_hash)
    
    with client:
        # 10 is the limit on how many messages to fetch. Remove or change for more.
        for msg in client.iter_messages(chat, 10):
            print(msg.sender.first_name, ':', msg.text)
    

    Applying any formatting is still possible but hasattr is no longer needed. if msg.media for example would be enough to check if the message has media.

    A note, if you’re using Jupyter, you need to use async directly:

    from telethon import TelegramClient
    
    client = TelegramClient('session_id', api_id, api_hash)
    
    # Note `async with` and `async for`
    async with client:
        async for msg in client.iter_messages(chat, 10):
            print(msg.sender.first_name, ':', msg.text)
    
    Login or Signup to reply.
  5. You can use the Telethon library. for this you need to register your app and connect your client code to it (look at this).
    Then to obtain message history of a entry (such as channel, group or chat):

    from telethon.sync import TelegramClient
    from telethon.errors import SessionPasswordNeededError
    
    
    client = TelegramClient(username, api_id, api_hash, proxy=("socks5", proxy_ip, proxy_port))  # if in your country telegram is banned, you can use the proxy, otherwise remove it.
    client.start()
    
    # for login
    if not client.is_user_authorized():
        client.send_code_request(phone)
        try:
            client.sign_in(phone, input('Enter the code: '))
        except SessionPasswordNeededError:
            client.sign_in(password=input('Password: '))
    
    async for message in client.iter_messages(chat_id, wait_time=0):            
        messages.append(Message(message))
            # write your code
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search