skip to Main Content

I have thousands of Telegram messages stored in my Elasticsearch index. I need to extract the email addresses that have been mentioned by users on Telegram. email addresses are within [_source][text] and are posted within posts, so I need to use REGEX:

([s]{0,10}[w.]{1,63}@[w.]{1,63}[s]{0,10})

to do the following:

  • a) extract the email address from each message;
  • b) create a new Maltego entity

I am trying this code (I am totally new to Python/to coding!), but it does not work:

    #!/usr/bin/env python

    from elasticsearch import Elasticsearch
    from MaltegoTransform import *
    import json
    import os
    import re


    m = MaltegoTransform()

    indexname = sys.argv[1]

    es = Elasticsearch('localhost:9200')

    res = es.search(index=indexname, size=1000, body={"query": {"match": 
    {"entities.type": "email"}}})

    for doc in res['hits']['hits']:

     def get_emails(data=""):

      addresses = re.findall(r'[s]{0,10}[w.]{1,63}@[w.]{1,63}[s]{0,10}', data)
      print addresses #does not print anything#

     m.addEntity('maltego.EmailAddress', ''.join(WHAT?))

    m.returnOutput()

This is a sample of my json output:

    {
    took: 5,
    timed_out: false,
    _shards: {
    total: 1,
    successful: 1,
    skipped: 0,
    failed: 0
    },
    hits: {
    total: 43,
    max_score: 7.588423,
    hits: [
    {
    _index: "MY_INDEX",
    _type: "items",
    _id: "CHANNEL ID",
    _score: 7.588423,
    _source: {
    id: 2411,
    audio: { },
    author_signature: null,
    caption: null,
    channel_chat_created: null,
    chat: {},
    command: null,
    service: null,
    sticker: { },
    supergroup_chat_created: null,
    text: HERE'S THE TEXT CONTAINING EMAIL ADDRESS.

The text I need to search into for emails is therefore nested in [_source][text]. I need to extract only the email address (by regex) withi it, and be able to print it and use it in a “function”, in order to create a graph entity in Maltego. The function looks like this:

m.addEntity('maltego.EmailAddress', ''.join(THE EMAIL ENTITY EXTRACTED WITH REGEX)

2

Answers


  1. Chosen as BEST ANSWER

    In the end I was able to get the code working, like this:

        es = Elasticsearch()
    
       res = es.search(index=indexname, size=1000, body={"query": {"match": 
        {"entities.type": "email"}}})
        response = json.dumps(res)
        data = json.loads(response)
    
       fulltext = []
    
        for row in data['hits']['hits']:
         fulltext.append(row['_source']['text'].encode('utf8'))
    
        for text in fulltext:
    
         email = re.findall("([s]{0,10}[w.]{1,63}@[w.]{1,63}[s]{0,10})", text)
    
         m.addEntity('maltego.EmailAddress', ''.join(email))
    
        m.returnOutput() 
    

    The problem with this code is that if multiple email addresses are in the same post, i get the results like [email protected]@domain.com.

    Ho can I split the two addresses in order to add each one to my Maltego graph with .join(email)?


  2. Adding the email addresses will depend on what your library requires. The correct approach could be to use addEntity() once for each email address, or it might be to add all addresses to a single call.

    To add each email address using addEntity() use:

    es = Elasticsearch()
    res = es.search(index=indexname, size=1000, body={"query": {"match": {"entities.type": "email"}}})
    response = json.dumps(res)
    data = json.loads(response)
    
    fulltext = []
    
    for row in data['hits']['hits']:
        fulltext.append(row['_source']['text'].encode('utf8'))
    
    for text in fulltext:
        emails = re.findall("[s]{0,10}([w.]{1,63}@[w.]{1,63})[s]{0,10}", text)
    
        for email in set(emails):
            m.addEntity('maltego.EmailAddress', email)
    
    m.returnOutput() 
    

    Using ''.join(email) as you have seen will create a single string with no delimiters between email addresses. To add all email addresses with a , separating them:

    emails = re.findall("[s]{0,10}([w.]{1,63}@[w.]{1,63})[s]{0,10}", text)
    m.addEntity('maltego.EmailAddress', ','.join(emails))
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search