Tuesday, April 8, 2014

egrep tips : only matching

In a file SLES11SP3.html which each line look like this  : 
<img src="sles11SP3_files/unknown.gif" alt="[   ]"> <a href="https://suse/sle-11-x86_64/rpm/x86_64/3ddiag-0.742-32.25.x86_64.rpm">3ddiag-0.742-32.25.x86_64.rpm</a>                                           23-Feb-2009 12:53   18K  <img src="sles11SP3_files/unknown.gif" alt="[   ]"> <a href="https://suse/sle-11-x86_64/rpm/x86_64/ConsoleKit-0.2.10-64.65.1.x86_64.rpm">ConsoleKit-0.2.10-64.65.1.x86_64.rpm</a>                                    27-May-2011 03:49   79K  <img src="sles11SP3_files/unknown.gif" alt="[   ]"> <a href="https://suse/sle-11-x86_64/rpm/x86_64/ConsoleKit-32bit-0.2.10-64.65.1.x86_64.rpm">ConsoleKit-32bit-0.2.10-64.65.1.x86_64.rpm</a>                              27-May-2011 03:49   15K  <img src="sles11SP3_files/unknown.gif" alt="[   ]"> <a href="https://suse/sle-11-x86_64/rpm/x86_64/ConsoleKit-doc-0.2.10-64.13.6.x86_64.rpm">ConsoleKit-doc-0.2.10-64.13.6.x86_64.rpm</a>                                11-May-2010 17:08   18K  <img src="sles11SP3_files/unknown.gif" alt="[   ]"> <a href="https://suse/sle-11-x86_64/rpm/x86_64/ConsoleKit-x11-0.2.10-64.65.1.x86_64.rpm">ConsoleKit-x11-0.2.10-64.65.1.x86_64.rpm</a>

And I want to match only the rpm name, and have the following output :
3ddiag-0.742-32.25.x86_64.rpm ConsoleKit-0.2.10-64.65.1.x86_64.rpm ConsoleKit-32bit-0.2.10-64.65.1.x86_64.rpm ConsoleKit-doc-0.2.10-64.13.6.x86_64.rpm ConsoleKit-x11-0.2.10-64.65.1.x86_64.rpm

It's easy with egrep :
# egrep -o ">\w.*\.rpm" SLES11SP3SAP.html

Let's explain the options used in this command line,
So from the man page of egrep :
-o, --only-matching
              Print only the matched (non-empty) parts of a matching line, with each such part on a separate output line.

">\w.*\.rpm" will match the part in bold :
<img src="sles11SP3_files/unknown.gif" alt="[   ]"> <a href="https://suse/sle-11-x86_64/rpm/x86_64/ConsoleKit-x11-0.2.10-64.65.1.x86_64.rpm">ConsoleKit-x11-0.2.10-64.65.1.x86_64.rpm</a>