Extract substrings defined by positioning relative to other relatively positioned characters

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Extract substrings defined by positioning relative to other relatively positioned characters



I have many URLs in a character vector and I'm trying to extract substrings from them using base R. There are two types of substrings I want to extract:



I've hacked together a solution to this, but it involves many unecessary steps. Is there a way to accomplish this using a single regex per substring?



Below is my working example:


# An example URL
a <- "https://cran.r-project.org/src/contrib/Archive/ggplot2/ggplot2_0.4.5.tar.gz"

# Keep everything after the last slash
b <- sub('.*\/', '', a)
# Keep everything before .tar.gaz
c <- sub('.tar.*', '', b)

# Extract desired strings based on underscore
foo <- sub('.*\_', '', c)
bar <- sub('\_.*', '', c)



It is important for this example to use base R.





With given example this works: sub(".tar.*", "", strsplit(basename(a), "_")[[1]]), but it might not work with more complicated file.
– PoGibas
Aug 6 at 7:50


sub(".tar.*", "", strsplit(basename(a), "_")[[1]])





This is great! I didn't know about basename(). Add it as an answer.
– user3614648
Aug 6 at 7:59





Did it work? Maybe there is some way I can improve my answer?
– PoGibas
Aug 27 at 12:55




3 Answers
3



Solution that uses basename and strsplit at _:


basename


strsplit


_


sub(".tar.*", "", strsplit(basename(a), "_")[[1]])
[1] "ggplot2" "0.4.5"



Using lookarounds:


regmatches(a, regexpr('(?<=\/)[^\/]+(?=_)', a, perl = T))
[1] "ggplot2"
regmatches(a, regexpr('(?<=_)[^_]+(?=\.tar\.gz)', a, perl = T))
[1] "0.4.5"



Try this pattern: /(?<package>[^/]+)_(?<version>[^_/]+).tar.gz$.


/(?<package>[^/]+)_(?<version>[^_/]+).tar.gz$



In a match, first capturing group, named package will give you The substring after the last slash (/) in the string and before the last underscore (_) and second, named version, will give you The substring after the last underscore (_) and before the substring .tar.gz.


package


version



Demo






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard