Extract substrings defined by positioning relative to other relatively positioned characters
Clash Royale CLAN TAG#URR8PPP
Extract substrings defined by positioning relative to other relatively positioned characters
I have many URLs in a character vector and I'm trying to extract substrings from them using base R. There are two types of substrings I want to extract:
I've hacked together a solution to this, but it involves many unecessary steps. Is there a way to accomplish this using a single regex per substring?
Below is my working example:
# An example URL
a <- "https://cran.r-project.org/src/contrib/Archive/ggplot2/ggplot2_0.4.5.tar.gz"
# Keep everything after the last slash
b <- sub('.*\/', '', a)
# Keep everything before .tar.gaz
c <- sub('.tar.*', '', b)
# Extract desired strings based on underscore
foo <- sub('.*\_', '', c)
bar <- sub('\_.*', '', c)
It is important for this example to use base R.
sub(".tar.*", "", strsplit(basename(a), "_")[[1]])
This is great! I didn't know about basename(). Add it as an answer.
– user3614648
Aug 6 at 7:59
Did it work? Maybe there is some way I can improve my answer?
– PoGibas
Aug 27 at 12:55
3 Answers
3
Solution that uses basename
and strsplit
at _
:
basename
strsplit
_
sub(".tar.*", "", strsplit(basename(a), "_")[[1]])
[1] "ggplot2" "0.4.5"
Using lookarounds:
regmatches(a, regexpr('(?<=\/)[^\/]+(?=_)', a, perl = T))
[1] "ggplot2"
regmatches(a, regexpr('(?<=_)[^_]+(?=\.tar\.gz)', a, perl = T))
[1] "0.4.5"
Try this pattern: /(?<package>[^/]+)_(?<version>[^_/]+).tar.gz$
.
/(?<package>[^/]+)_(?<version>[^_/]+).tar.gz$
In a match, first capturing group, named package
will give you The substring after the last slash (/) in the string and before the last underscore (_) and second, named version
, will give you The substring after the last underscore (_) and before the substring .tar.gz.
package
version
Demo
By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.
With given example this works:
sub(".tar.*", "", strsplit(basename(a), "_")[[1]])
, but it might not work with more complicated file.– PoGibas
Aug 6 at 7:50