sed command to read the name of .js file from script call in html
Clash Royale CLAN TAG#URR8PPP
sed command to read the name of .js file from script call in html
I want to read all .js files called from an HTML file. The following command reads but if there are other features in the call it does not read..
jsFiles=$(echo "$BODY" | sed -rn 's/<scriptssrc=W(.*.js).*/1/p')
For the above command, script file call should be like this:
<script src='js/default.js'></script>
The output:
js/default.js
How can I modify that it works for other options that script call may include?
For example;
<script type="text/javascript" src="'$lastJsLocation'" language="javascript"> </script>
2 Answers
2
If your HTML is really that regular and the target sections are on 1 line at a time:
$ sed -n 's/.*<script.*src=["''']*([^"''']*).*/1/p' file
js/default.js
$lastJsLocation
Thanks, it worked
– unh
Aug 12 at 16:21
Hi, I want to also check this match ends with .js .. Where should I add it @Ed Morton
– unh
Aug 15 at 13:31
Add
.js
before the last closing brace of the capture group, i.e. change *)
to *.js)
. You should update your question though as it's not clear that that's a requirement and in particular your question (How can I modify that it works for other options that script call may include?
) makes it sound like you DO want to print $lastJsLocation
while with this new requirement in your comment means that you won't.– Ed Morton
Aug 15 at 13:59
.js
*)
*.js)
How can I modify that it works for other options that script call may include?
$lastJsLocation
I advise to use an XML parser to extract the value you want.
Given the HTML you want to parse are on files
$ cat file1
<script src='js/default.js'></script>
$ cat file2
<script type="text/javascript" src="'$lastJsLocation'" language="javascript"></script>
If you have xmllint
available, you can use this command:
xmllint
$ xmllint --xpath 'string(//script/@src)' file1
js/default.js
$ xmllint --xpath 'string(//script/@src)' file2
'$lastJsLocation'
If you have xmlstarlet
, you can use this command:
xmlstarlet
$ xmlstarlet sel -T -t -m /script/@src -v . -n file1
js/default.js
$ xmlstarlet sel -T -t -m /script/@src -v . -n file2
'$lastJsLocation'
The xmlstarlet
options seems complicated, but there aren't if you look at xmlstartlet sel --help
. Partial output below:
xmlstarlet
xmlstartlet sel --help
-T - output is text (default is XML)
-t - template
-m - match XPATH expression
-v - print value of XPATH expression
-n - print new line
I'm very interested in learning to use xmlstarlet and with that in mind - the OP said that she only wants to print the value if it ends in
.js
, how would you do that in xmlstarlet?– Ed Morton
Aug 15 at 14:18
.js
Also - though the script you posted works when there's only 1 "script" line in each input file it fails with
file:2.1: Extra content at the end of the document <script type="text/javascript" src="'$lastJsLocation'" language="javascript"></ ^
when both "script" lines are in one file. How do you use it for the normal case where there's multiple similar tags+values in an HTML file?– Ed Morton
Aug 15 at 14:27
file:2.1: Extra content at the end of the document <script type="text/javascript" src="'$lastJsLocation'" language="javascript"></ ^
@EdMorton Comment2: In the normal case (a HTML document for example), the
script
tags would be enclosed within a root tag. If you add a <dummy>...</dummy>
tag and use the xpath /dummy/script/@src
, that would work.– oliv
Aug 16 at 7:04
script
<dummy>...</dummy>
/dummy/script/@src
@EdMorton Comment1: As far as I know (and I'm no expert at all), this is not possible with xmlstarlet. If this tool would support Xpath 2.0, I would use this xpath
/dummy/script[end-with(@src,".js")]
. Unfortunately that doesn't work...– oliv
Aug 16 at 7:04
/dummy/script[end-with(@src,".js")]
@EdMorton I don't think xmllint can do this because it doesn't support xpath 2.0. I didn't find so far a "golden" command line xml parser. I often use either tool and pipe to sed or awk when parsing more complex xml file...
– oliv
Aug 16 at 16:28
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
suggestion: don't use sed, instead use html/xml parser like xmlstarlet or a programming language like perl/python along with html/xml module...
– Sundeep
Aug 12 at 15:14