Convert case of an HTML page to lower in R

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Convert case of an HTML page to lower in R



I need to convert all content of an HTML page to lower. But I got an error.


library(stringr)
library(httr)
library(XML)

url <- "https://stackoverflow.com/"
request <- GET(url)
doc <- htmlParse(request, encoding = "UTF-8")
doc <- str_to_lower(doc)



Error in as.vector(x, "character") : cannot coerce type
'externalptr' to vector of type 'character'



My need is to keep the XML structure of the doc cause I will have to use xpath.



Thanks for your help!





Thanks for your reply but I can't... I first need to convert all elements in lower before to extract them. I have chosen this method cause I have to extract element that match a list I have created and by converting all elements in lower ( + removing all accents) before, that allows me to reduce the number of elements in my list.
– Remi
Aug 3 at 15:08






I would try doc <- content(request, "text").
– Stéphane Laurent
Aug 4 at 21:23


doc <- content(request, "text")




1 Answer
1



You can attempt to convert the doc into characters, change the case and then repeat the parsing into HTML code.


library(stringr)
library(httr)
library(XML)

url <- "https://stackoverflow.com/"
request <- GET(url)

#convert to character then covert case
newdoc<-str_to_lower(as.character(request))

#reread the new doc to convert back to html
doc <- htmlParse(newdoc, encoding = "UTF-8")



This should create the desired readable document.





Hello @Dave2e, thanks for your reply but this doesn't work. There is an error when creating the newdoc variable : Error in as.vector(x, "character") : cannot coerce type 'externalptr' to vector of type 'character'
– Remi
Aug 4 at 20:39






@Remi, Sorry corrected error. Need to change case of the Get(URL) call and not the htmlParse. See above, this should work now.
– Dave2e
Aug 6 at 1:56





Thanks Dave2e, It works now but there is one thing that is strange in the doc variable. All content attribute in meta and other tags have been transformed. For exemple a meta description like this <meta http-equiv="description" name="description" content="hello I am a meta description">, now looks like this <meta http-equiv="description" name="description" content="hello" I am a meta description> (The quotation marks are closing on the first word inside the content attribute). Do you have any idea how to fix it?
– Remi
Aug 6 at 9:29







By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard