Extract title from HTML and rename file to title

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Extract title from HTML and rename file to title



I have multiple files named output.html. I want to extract their title, which I can do successfully using following command:


cat output.html | sed -n 's/.*<title>(.*)</title>.*/1/ip;T;q'



Example:


7N8UGL0:~/Downloads$ cat output.html | sed -n 's/.*<title>(.*)</title>.*/1/ip;T;q'
SEIKO 5 Finder - SNK559 Automatic Watch



Now I want to rename the output.html to the extracted title:


SEIKO 5 Finder - SNK559 Automatic Watch.html



I already managed to put this into a script:


#!/bin/bash
title=`cat output.html | sed -n 's/.*<title>(.*)</title>.*/1/ip;T;q'`
echo $title



Further, I have a lot of these output.html files in directories named in epoch time format


ls -l
drwxrwxrwx 1 userna userna 512 Aug 7 19:33 1500122724.81
drwxrwxrwx 1 userna userna 512 Aug 7 19:33 1500122724.82
drwxrwxrwx 1 userna userna 512 Aug 7 19:33 1500122724.83
drwxrwxrwx 1 userna userna 512 Aug 7 19:32 1500122724.84
drwxrwxrwx 1 userna userna 512 Aug 7 18:36 1500122724.85
drwxrwxrwx 1 userna userna 512 Aug 7 18:35 1500122724.86



I would like to be able to extract the html title for all output.html in all the directories and rename the output.html accordingly.



Many thanks in advance,



jmt




2 Answers
2



Use the command find to


find


-type f


-exec rename.bash ;



Find is recursive through each directory.



So the complete command would look like:


find <YOUR TOP DIRECTORY> -type f -name output.html -exec rename.bash ; -print



The -print at the end will list all processed files to stdout.
Your rename script receives in argument the full path and filename of the output.html it found. So you will have to do your sed command, then a mv from the argument you received to the path/THE-TITLE-VALUE-YOU-JUST-EXTRACTED-WITH-SED.html.


-print


mv


path/THE-TITLE-VALUE-YOU-JUST-EXTRACTED-WITH-SED.html



FYI I would suggest you be careful with this renaming. Spaces in filenames, although perfectly "legal" can cause issues later. Make sure also your titles do not include special characters to the shell like *,!(). and many more. All alphanumeric is fine, along with - and _.


*,!().


-


_





Thanks for the reply Nic3500. For this to work I think I'd have to write the rename.bash script, which I did not do yet. I was able to resolve this, using the method which I provided as an anwser. Thank you again for your support.
– jmt
Aug 9 at 15:34



I was able to solve this by writing following script:


#!/bin/bash
for file in $(find . -name output.html)
do
newfilename=`cat $file | sed -n 's/.*<title>(.*)</title>.*/1/ip;T;q'`
mv $file "$newfilename.html"
done



It does as follows:



Now I want to find a way to identify special characters like /: as I get an error when the HTML title contains any of those.






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard