Extracting ID from file using python script? - Ubuntu

Caroline
January 18, 2023
133 views
0 votes
2 Answers

I’m trying to extract just the sequence ID from the the file in Linux server.
To give you few examples TRINITY_DN0_c0_g1_i1.p1 and TRINITY_DN0_c0_g1_i3.p1 and sequence IDs. The sequence ID lengths are not same but they all start with TRINITY and ends with .p1.

I tried using awk '{print$1}' filename.cdhit > seq_id.fasta but instead I got this

I just want the ID but it would also extract non-interest information as well (the lengthy alphabetic protein seq).
I have attempted to create a python script in hopes to just extract the IDs:

import re

file_path = '/var2/user/de_novo/data/transdecoder_dir/Trinity.fasta.transdecoder.pep.cdhi
t'
new_file_path = '/var2/user/de_novo/data/transdecoder_dir/seqID.fasta'

with open(file_path, 'rt') as file:
    for myline in file:
        if ".p1" in file:
            with open(new_file_path, 'w') as new_file:
                new_file.write()
        else:
            print('No match found.')

Tried creating python script, running linux command
But comes out as not match found. Not sure where I went wrong.
Would appreciate with any help, thank you.

Answers

- cybe
- January 18, 2023 at 8:25 am
- 0 votes
0
This should do the trick:
```
awk '{print$1}' filename.cdhit | grep TRINITY | cut -c2-
```
As for python, check out https://biopython.org/. It’s a collection of bioinformatics-related goodies.

OR

Here a ‘.cdhit’-file reader:
https://pypi.org/project/cdhit-reader/
Login or Signup to reply.

- RAREKpopManifesto
- January 18, 2023 at 12:35 pm
- 0 votes
0
you can combine that awk | grep | cut into :
```
mawk 'sub(".",_, $(NF = /TRINITY/))'
```
1. using regex to set NF = 1/0, which is same as print $1 | grep
2. when it’s false, NF is set to 0, so whole line is blanked out, thus sub() will fail and return a 0 (# instances replaced), thus skipping ineligible lines
3. the sub() does same thing as cut
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Extracting ID from file using python script? – Ubuntu

Answers