Tuesday, September 11, 2012

Extract filaname from path with regex

Extracting filname from the full path when the path is OPTIONAL is not so easy as it looks at first look.
I would like to extract cc.cc from
/aa/bb/cc.cc 
either from
cc.cc
as well.

Solution:

$ echo "/aa/bb/cc.cc" | perl -pe "s|(?:.*/)?([^/]*)|\1|"
cc.cc
$ echo "/aa/cc.cc" | perl -pe "s|(?:.*/)?([^/]*)|\1|"
cc.cc
$ echo "cc.cc" | perl -pe "s|(?:.*/)?([^/]*)|\1|"
cc.cc

Some explanation: 
  • ?: start non capturing group
  • [^/] any character except slash

4 comments:

  1. Probably this won't need you anymore, but here is one (similar) way to do it:

    $ echo -e "/aa/bb/cc.cc\ncc.cc"|grep -oP "^(.*/)?\K[^/]+"
    cc.cc
    cc.cc


    \K is a perl regex speciality, everything before \K was used only to positioning. It bypasses the limitations of the lookbehind assertions.
    $ echo appletree|grep -oP "apple\K.*"
    tree

    ReplyDelete
    Replies
    1. Or with other similar aproach:
      $ echo -e "/aa/bb/cc.cc\ncc.cc"|grep -oP "/?\K[^/]+$"
      cc.cc
      cc.cc

      Delete
    2. Or with AWK:
      $ echo -e "/aa/bb/cc.cc\ncc.cc"|awk -F/ '{print $NF}'
      cc.cc
      cc.cc

      Delete
    3. Promise this will be the last one.

      $ echo -e "/aa/bb/cc.cc\n/aa/cc.cc\ncc.cc"|xargs -n1 -I{} echo var="{}" \&\& echo \${var##*/} |bash
      cc.cc
      cc.cc
      cc.cc

      Delete