I’m new to python coding and wanted to create a new column based on the condition statement on existing columns.
I use python 2.7 version and run the code on CentOs.
import pandas as pd
file1 = pd.read_csv("/root/Documents/temp_file_{}.csv".format(timestr))
file1['FileName'] = ''
file1['FileName'] = file1['FileType'].apply(lambda x: df['Path'].str.extract('[^/]+$', expand=False) if x=='f' else '')
file1.to_csv('/root/Documents/temp1_file_{}.csv'.format(timestr),index = False)
Below is my CSV file:
FileType,Path
d,/
f,/documents/csv/.zip
d,/documents/images
d,/hive/28374849
f,/hadoop/jdjdjd/dnejfn.img
Required CSV file:
FileType,Path,FileName
d,/,
f,/documents/csv/.zip,.zip
d,/documents/images,
d,/hive/28374849,
f,/hadoop/jdjdjd/dnejfn.img,dnejfn.img
I want to create a new column FileName where the data in that column should be extracted from Path column only when column Filetype==’f’ else the data in that column should be a null value or no data
2
Answers
Try using this approach to get the last part from the paths. Split the string from the rightmost side one time using “/” as a separator and fetch the last element (which is basically the filename) using the negative indexing:
using numpy’s where which is faster than the apply function because of the vectorization:
Using
numpy.where
withpandas.Series.str.rsplit
:Output: