Postgresql - What is the best way to split a column of data in a txt file based on the first delimiter?

BentleyHensel
August 26, 2022
151 views
2 votes
4 Answers

I have a large number of txt files with a single column of data. There are no headers in the files.

The data are an email address followed by a : and then a string of varchar, which sometimes includes :s.

My goal is to convert the following

[email protected]:v@rch:r$tR:ng
[email protected]::multipleTypes
[email protected]:&ofTxtGoAfT3rThe:

To a tsv with headers.

column1           column2
[email protected]   v@rch:r$tR:ng
[email protected]   :multipleTypes
[email protected]   &ofTxtGoAfT3rThe:

These files will then be uploaded into a postgres database.

Any insight/advice is greatly appreciated.

Answers

- choroba
- August 26, 2022 at 10:52 pm
- 0 votes
0
sed can do that. Without /g, its substitution replaces the first occurrence on each line:
```
sed -e 's/:/t/' *.txt > file.tsv
```
Login or Signup to reply.

In postgres you can split the text at the appropriate position and export the hole with COPY to an TSV

this would look like

CREATE TABLE mytab(mytext text)

INSERT INTO mytab VALUES('[email protected]:v@rch:r$tR:ng'),('[email protected]::multipleTypes'),('[email protected]:&ofTxtGoAfT3rThe:')

SELECT 'column1', 'column2'
UNION ALL
SELECT  substr(mytext,1, strpos(mytext, ':') -1) col1,
substr(mytext,strpos(mytext, ':')+1) col2 FROM mytab

?column?        | ?column?         
:-------------- | :----------------
column1         | column2          
[email protected] | v@rch:r$tR:ng    
[email protected] | :multipleTypes   
[email protected] | &ofTxtGoAfT3rThe:

COPY (SELECT 'column1', 'column2'
UNION ALL
SELECT  substr(mytext,1, strpos(mytext, ':') -1) col1,
substr(mytext,strpos(mytext, ':')+1) col2 FROM mytab) TO '/tmp/ind.tsv'  CSV  HEADER DELIMITER E't';

SELECT  substr('[email protected]::multipleTypes',1, strpos('[email protected]::multipleTypes', ':') -1) col1,
substr('[email protected]::multipleTypes',strpos('[email protected]::multipleTypes', ':')+1) col2

col1            | col2          
:-------------- | :-------------
[email protected] | :multipleTypes

SELECT  substr('[email protected]:&ofTxtGoAfT3rThe:',1, strpos('[email protected]:&ofTxtGoAfT3rThe:', ':') -1) col1,
substr('[email protected]:&ofTxtGoAfT3rThe:',strpos('[email protected]:&ofTxtGoAfT3rThe:', ':')+1) col2

col1            | col2             
:-------------- | :----------------
[email protected] | &ofTxtGoAfT3rThe:

db<>fiddle here

- ServerS
- August 26, 2022 at 10:59 pm
- 0 votes
0
@choroba is correct with Sed. This can also be done in python.
```
x = '[email protected]::multipleTypes'

x = x.split(':', 1)

print(x)
```
This splits the text at the first :
Login or Signup to reply.

- j_b
- August 26, 2022 at 11:47 pm
- 0 votes
0
Using bash parameter expansion for splitting the lines and printf to format the output:
```
printf "%-24s%sn" "column1" "column2" ; while read -r line ; do printf "%-24s%sn" "${line%%:*}" "${line#*:}" ; done <your_data.txt

column1                 column2
[email protected]         v@rch:r$tR:ng
[email protected]         :multipleTypes
[email protected]         &ofTxtGoAfT3rThe:
```
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

Postgresql – What is the best way to split a column of data in a txt file based on the first delimiter?

Answers