The Mighty Named Pipe

by vsbuffaloon 3/11/2015, 7:38 AMwith 96 comments

by aidoson 3/11/2015, 9:46 AM

Nice article. Really easy to follow introduction.

I only discovered process substitution a few months ago but it's already become a frequently used tool in my kit.

One thing that I find a little annoying about unix commands sometimes is how hard it can be to google for them. '<()', nope, "command as file argument to other command unix," nope. The first couple of times I tried to use it, I knew it existed but struggled to find any documentation. "Damnit, I know it's something like that, how does it work again?..."

Unless you know to look for "Process Substitution" it can be hard to find information on these things. And that's once you even know these things exist....

Anyone know a good resource I should be using when I find myself in a situation like that?

by unhammeron 3/11/2015, 10:22 AM

Once you discover <() it's hard not to (ab)use it everywhere :-)

    # avoid temporary files when some program needs two inputs:
    join -e0 -o0,1.1,2.1 -a1 -a2 -j2 -t$'\t' \
      <(sort -k2,2 -t$'\t' freq/forms.${lang}) \
      <(sort -k2,2 -t$'\t' freq/lms.${lang})
    
    # gawk doesn't care if it's given a regular file or the output fd of some process:
    gawk -v dict=<(munge_dict) -f compound_translate.awk <in.txt
    
    # prepend a header:
    cat <(echo -e "${word}\t% ${lang}\tsum" | tr [:lower:] [:upper:]) \
        <(coverage ${lang})

by larsfon 3/11/2015, 12:31 PM

Pipes are probably the original instantiation of dataflow processing (dating back to the 1960s). I gave a tech talk on some of the frameworks: https://www.youtube.com/watch?v=3oaelUXh7sE

And my company creates a cool dataflow platform - https://composableanalytics.com

by Malarkey73on 3/11/2015, 9:18 AM

Vince Buffalo is author of the best book on bioinformatics: Bioinformatics Data Skills (O'Reilly). It's worth a read for learning unix/bash style data science of any flavour.

Or even if you think you know unix/bash and data there are new and unexpected snippets every few pages that surprise you.

by dbboltonon 3/11/2015, 5:34 PM

In zsh, =(cmd) will create a temporary file, <(cmd) will create a named pipe, and $(cmd) creates a subshell. There are also fancy options that use MULTIOS. For example:

    paste <(cut -f1 file1) <(cut -f3 file2) | tee >(process1) >(process2) >/dev/null
can be re-written as:

    paste <(cut -f1 file1) <(cut -f3 file2) > >(process1) > >(process2)
http://zsh.sourceforge.net/Doc/Release/Expansion.html#Proces...

http://zsh.sourceforge.net/Doc/Release/Redirection.html#Redi...

by ameliuson 3/11/2015, 10:06 AM

If you like pipes, then you will love lazy evaluation. It is unfortunate, though, that Unix doesn't support that (operations can block when "writing" only, not when "nobody is reading").

by baschismon 3/11/2015, 12:34 PM

AFAIK process substitution is a bash-ism (not part of POSIX spec for /bin/sh). I recently had to go with the slightly less wieldy named pipes in a dash environment and put the pipe setup, command execution and teardown in a script.

by mhaxon 3/11/2015, 9:34 AM

I've used *nix for ~15 years and never used a named pipe or process substitution before. Great to know about!

by anateuson 3/11/2015, 6:43 PM

In fish shell the canonical example is this:

   diff (sort a.txt|psub) (sort b.txt|psub)
The psub command performs the process substitution.

by AndrewSBon 3/11/2015, 10:52 AM

Does anyone have a working link to Gary Bernhardt's The Unix Chainsaw, as mentioned in the article?

by frankerzon 3/11/2015, 9:45 AM

How does the > process substitution differ from simply piping the output with | ?

For example (from Wikipedia)

tee >(wc -l >&2) < bigfile | gzip > bigfile.gz

vs

tee < bigfile | wc -l | gzip > bigfile.gz

by chuckcodeon 3/11/2015, 4:20 PM

Anybody know of a way to increase the buffer size of pipes? I've experienced cases where piping a really fast program to a slow one caused them both to go slower as the OS pauses first program writing when pipe buffer is full. This seemed to ruin the caching for the first program and caused them both to be slower even though normally pipes are faster as you're not touching disk.

by jamesromon 3/11/2015, 1:24 PM

Is this guy a bioinformatician? I think he's a bioinformatician.

Can't be sure if he is a bioinformatician because he never really mentions that he is a bioinformatician.

by leni536on 3/11/2015, 2:37 PM

moreutils [1] has some really cool programs for pipe handling.

pee: tee standard input to pipes sponge: soak up standard input and write to a file ts: timestamp standard input vipe: insert a text editor into a pipe

[1] https://joeyh.name/code/moreutils/

by hitlin37on 3/11/2015, 10:48 AM

i heard somewhere that go follows unix pipe link interfaces.

by Dewieon 3/11/2015, 10:39 AM

Pipes are very cool and useful, but it's hard for me to understand this common worship of something like that. Yes, it's useful and elegant, but is it really the best thing since Jesus Christ?