2007-10 List html files by titles with Awk

List html files by titles with Awk

Task

Create list of html/xhtml files in html by titles.

Input data

html , xhtml files

Output data

html file
with list of files by it's titles … something alike

Solution in Awk

BEGIN { 
  printit=0; 
  stopit=0; 
  print "<html>\n<head>\n<title>List of webpages</title>\n</head>\n<body>\n\n<ul>" ;
  }

/<title>/ { printit=1; sub(".*<title>","<li><a href=\"" FILENAME "\">"); }
/<TITLE>/ { printit=1; sub(".*<TITLE>","<li><a href=\"" FILENAME "\">"); }
/<\/TITLE>/ { stopit=1; sub("</TITLE>.*","</a></li>"); }
/<\/title>/ { stopit=1; sub("</title>.*","</a></li>"); }

{ 
  if(printit) print $0;
  if(stopit){
    printit=0;
    stopit=0;
    nextfile;  
  }
}

END { print "</ul>\n\n</body></html>"; }

Usage:

awk -f get_title.awk *.html > output.html
O ile nie zaznaczono inaczej, treść tej strony objęta jest licencją Creative Commons Attribution-ShareAlike 3.0 License