The 'pwd' command take much time cost after loading a big file(eg:40M) - CentOS

fucheris
July 1, 2019
148 views
0 votes
2 Answers

The following test shell code test at: centOS 7 with bash shell; The code contains three phrase; phrase 1, call pwd command; phrase 2, read a big file(cat the file); phrase 3, do the same thing as phrase 1;

The phrase 3 time cost is much bigger than phrase 1(eg: 21s vs 7s)

But at the MacOS platform, the time cost of phrase 1 and phrase 3 is equal.
#!/bin/bash

#phrase 1
timeStart1=$(date +%s)
for ((ip=1;ip<=10000;ip++));
do
nc_result=$(pwd)
done
timeEnd1=$(date +%s)
timeDelta=$((timeEnd1-timeStart1))

echo $timeDelta

#phrase 2
fileName='./content.txt'   #one big file,eg. a 39M file
content=`cat $fileName`

#phrase 3
timeStart2=$(date +%s)
for ((ip=1;ip<=10000;ip++));
do
nc_result=$(pwd)
done
timeEnd2=$(date +%s)
timeDelta2=$((timeEnd2-timeStart2))
echo $timeDelta2

Tags: bash

Answers

- SlawomirDziuba
- July 1, 2019 at 6:52 pm
- 0 votes
0
$() invokes a subshell. The 3rd command starts when the second command is running.

Login or Signup to reply.

- CharlesDuffy
- July 1, 2019 at 6:59 pm
- 0 votes
0
Slawomir’s answer has a key part of the problem, but without full explanation. The answer to What does it mean ‘fork()’ will copy address space of original process? on Unix & Linux StackExchange has some good background.

A command substitution — $(...) — is implemented by fork()ing off a separate copy of your shell, which a command — in this case pwd — is executed in.

Now, on most UNIXlike systems, fork() is extremely efficient, and doesn’t actually copy all your memory until an operation is performed that changes those memory blocks: Each copy keeps the same virtual memory ranges as the original (so its pointers remain valid), but with the MMU configured to throw an error when there’s a write to it, so the OS can silently catch that error and allocate separate physical memory for each branch.

There’s still a cost to setting up the pages configured to be copied to new physical memory when they change, though! Some platforms — like Cygwin — have worse / more expensive fork implementations; some (apparently MacOS?) have faster ones; that difference is what you’re measuring here.

Two takeaways:
- It’s not pwd that’s slow, it’s $( ). It’d be just as slow with $(true) or any other shell builtin, and considerably slower with any non-builtin command.
- Don’t use $(pwd) at all — there’s no reason to pay that cost to split off a child process to measure its working directory, when you could just ask the parent shell for its working directory directly by using nc_result=$PWD.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

The 'pwd' command take much time cost after loading a big file(eg:40M) – CentOS

Answers