skip to Main Content

I have a row named filename in postgresql table named xml_joblist which contain many files, and a list named files_name in python which contain several files that are sorted after some process. I want to compare filename & files_name and check whether there are any files matching.

|filename |
|---------|
|file_111 |
|file_555 |
|file_888 |
|file_333 | 
|file_445 |
|   .     |
|   .     |
| goes-on |

the above given is the filename row in postgresql table

files_name = [file_789, file_456, file_555, file_111]

the above given is the files_name list i python

How can i write a sql statement to do this process in python?

Expected result:

matchin_files = [file_555, file_111]

5

Answers


  1. If you’re using psycopg2 directly, you can use its tuple to SELECT IN capabilities. You then retrieve the file name directly with a list comprehension.

    See the following demo a table that contains file_100 through to file_109.

    # setup
    import psycopg2
    
    con = psycopg2.connect(
        host="localhost",
        port="5432",
        database="postgres",
        user="postgres",
        password="postgres",
    )
    
    cur = con.cursor()
    cur.execute("CREATE TABLE xml_joblist (filename VARCHAR(10))")
    cur.executemany(
        "INSERT INTO xml_joblist (filename) VALUES (%s)",
        [(f"file_10{n}",) for n in range(10)],
    )
    
    # query
    files_name = ["file_103", "file_107", "file_108"]  # keep as list or assign tuple directly
    
    cur.execute(
        "SELECT filename FROM xml_joblist WHERE filename IN %s",
        (tuple(files_name),)
    )  # convert files_name to tuple if not already a tuple
    results = cur.fetchall()  # [('file_103',), ('file_107',), ('file_108',)]
    matching_files = [t[0] for t in results]  # ['file_103', 'file_107', 'file_108'
    
    # closing
    con.close()
    

    If you use anything else (like SQLAlchemy), please leave a comment.

    Login or Signup to reply.
  2. Set up table:

    create table xml_joblist(filename varchar);
    insert into xml_joblist values ('file_111'), ('file_555'), ('file_888'), ('file_333'), ('file_445');
    

    Python code:

    import psycopg2
    con = psycopg2.connect(dbname="test", host='localhost', user='postgres', port=5432)
    cur = con.cursor()
    
    files_name = ["file_789", "file_456", "file_555", "file_111"]
    
    cur.execute("select filename from xml_joblist where filename = ANY(%s)", [files_name])
    
    matching_files = [row[0] for row in cur.fetchall()]
    
    matching_files
    ['file_111', 'file_555']
    
    

    This uses psycopg2 list adaptation:

    Python lists are converted into PostgreSQL ARRAYs:

    cur.mogrify("SELECT %s;", ([10, 20, 30], ))
    ‘SELECT ARRAY[10,20,30];’

    Note

    You can use a Python list as the argument of the IN operator using the PostgreSQL ANY operator.

    ids = [10, 20, 30]
    cur.execute("SELECT * FROM data WHERE id = ANY(%s);", (ids,))

    Furthermore ANY can also work with empty lists, whereas IN () is a SQL syntax error.

    Note

    to select those file names that are in the file_names list that are also in the xml_joblist table using the ANY operator:

    9.24.3. ANY/SOME (array)

    expression operator ANY (array expression)

    expression operator SOME (array expression)

    The right-hand side is a parenthesized expression, which must yield an array value. The left-hand expression is evaluated and compared to each element of the array using the given operator, which must yield a Boolean result. The result of ANY is “true” if any true result is obtained. The result is “false” if no true result is found (including the case where the array has zero elements).

    If the array expression yields a null array, the result of ANY will be null. If the left-hand expression yields null, the result of ANY is ordinarily null (though a non-strict comparison operator could possibly yield a different result). Also, if the right-hand array contains any null elements and no true comparison result is obtained, the result of ANY will be null, not false (again, assuming a strict comparison operator). This is in accordance with SQL’s normal rules for Boolean combinations of null values.

    SOME is a synonym for ANY.

    Login or Signup to reply.
  3. Connect to the postgres db

    from sqlalchemy import create_engine, select, Table, Column, String, MetaData,  Float, Integer
    from sqlalchemy.engine import URL
    
    # Creating URL object
    
    url_object  = URL.create(
        "postgresql",
        username="YourUserName",
        password="YourPassword",  # plain (unescaped) text
        host="YourHostName",
        database="YourDBName",
    )
    
    # Define the Connection Object
    db = create_engine(url_object)
    

    Get the xml_joblist table

    # Create the Metadata Object
    meta = MetaData(bind=db)
    MetaData.reflect(meta)
    
    # Get the `xml_joblist` table from the Metadata object
    xml_joblist = meta.tables['xml_joblist']
    

    Select filename from xml_joblist table and compare it to files_name

    # List with File names
    files_name = ['file_789', 'file_456', 'file_555', 'file_111']
    
    # Read
    with db.connect() as conn:
        #SELECT xml_joblist.filename FROM xml_joblist WHERE xml_joblist.filename IN (files_name)
        select_statement = select(xml_joblist.c.filename).where(xml_joblist.c.filename.in_(files_name))
        result_set = conn.execute(select_statement)
        matchin_files = result_set.scalars().all()
    

    This gives matchin_files as a list like this-

    ['file_111', 'file_555']
    
    Login or Signup to reply.
  4. If you are using a library such as psycopg2, then the cur.execute method() will return you a list of tuples, with each tuple containing a single item.

    So essentially your question simplifies to
    How to compare a list of tuples against a list.

    One of the ways to do so is to use something like below

    list_of_tuples = [(i,) for i in range(1000000)]
    my_list = list(range(1000000))
    
    if list_of_tuples == [(elem,) for elem in my_list]:
        print("The lists are equal")
    

    Other way is to use set.

    list_of_tuples = [(i,) for i in range(1000000)]
    my_list = list(range(1000000))
    
    if set(list_of_tuples) == set(my_list):
        print("The lists are equal")
    
    Login or Signup to reply.
  5. you can use SQLAlchemy IN comparison operator(check here for more inforamtion!)

    stmt = select(xml_joblist.filename).where(xml_joblist.filename.in_(files_name))
    result = conn.execute(stmt)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search