skip to Main Content

I want to read multi big files that exist on centos server with python.I wrote a simple code for that and it’s worked but entire file came to a paramiko object (paramiko.sftp_file.SFTPFile) after that I can process line. it has not good performance and I want process file and write to csv piece by piece because process entire file can affect performance. Is there a way to solve the problem?

ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(host, port, username, password)

sftp_client = ssh.open_sftp()
remote_file = sftp_client.open(r'/root/bigfile.csv')

try:
    for line in remote_file:
        #Proccess
finally:
    remote_file.close()

2

Answers


  1. Reading in chunks will help you here:

    import pandas as pd
    chunksize = 1000000
    for chunk in pd.read_csv(filename, chunksize=chunksize):
        process(chunk)
    

    Update:

    Yeah, I’m aware that my answer written based on a local file. Just giving example for reading file in chunks.

    To answer the question, check out this one:

    1. paramiko.sftp_client.SFTPClient.putfo
    2. Functions for working with remote files using pandas and paramiko (SFTP/SSH). – pass the chunk size as I mentioned above.
    Login or Signup to reply.
  2. Here could solve your problem.

     def lazy_loading_ftp_file(sftp_host_conn, filename):
        """
            Lazy loading ftp file when exception simple sftp.get call
            :param sftp_host_conn: sftp host
            :param filename: filename to be downloaded
            :return: None, file will be downloaded current directory
        """
        import shutil
        try:
            with sftp_host_conn() as host:
                sftp_file_instance = host.open(filename, 'r')
                with open(filename, 'wb') as out_file:
                    shutil.copyfileobj(sftp_file_instance.raw, out_file)
                return {"status": "sucess", "msg": "sucessfully downloaded file: {}".format(filename)}
        except Exception as ex:
            return {"status": "failed", "msg": "Exception in Lazy reading too: {}".format(ex)}
    

    This will avoid reading the whole thing into memory at once.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search