Adding back trailing zeros in an identification column

I have the following data which looks like the following;

COLUPC UPC 19638 715643501208 00-01-82000-72608 11783 15230000022 00-01-01820-00198 17140 18769210012 07-01-93271-20600 19624 711925600019 00-01-86243-11167 18706 28382012393 27-01-15503-01673 8716 13410025574 00-01-78250-00236

The columns COLUPC and UPC come from two different data frames and a random sample of each was taken. The COLUPC and UPC are the same unique identifiers. It is just that COLUPC has lost its format.

COLUPC

UPC

COLUPC

UPC

COLUPC

Taking the first row value for COLUPC - 715643501208. It should look like 07-01-56435-01208 following the same structure as in column UPC.

COLUPC

715643501208

07-01-56435-01208

UPC

It gets a little more tricky with observation 15230000022 It should look like 01-05-23000-00022 where there was an extra 0 added in the 4th component 00022 along with component 1 and 2.

15230000022

01-05-23000-00022

0

00022

The documentation that comes along with the data file states that COLUPC has no leading zeros shown.

COLUPC

So;

Component 1 is 2 digits

Component 2 is 2 digits (1 digit with a 0 at the begining)

Component 3 is 5 digits

Component 4 is 5 digits

Taking this number 00-01-01820-00198 from the UPC column. In the COLUPC column this would collapse to 11820198 where all the leading zeros have been removed from component 1, 2, 3 and 4.

00-01-01820-00198

UPC

COLUPC

11820198

If I am a little unclear let me know.

Data:

structure(list(COLUPC = c(715643501208, 15230000022, 18769210012, 711925600019, 28382012393, 13410025574, 716708427455, 12210000157, 13410025602, 17143945712, 17336011341, 13410062505, 11820053218, 11820002989, 13410059505, 11820043550, 18417331130, 711820053025, 18982650001, 18248813000, 17199077603, 18834510005, 13410048602, 18417310545, 11820000987, 13410000217, 17089701331, 13410017256, 618516510302, 23410030602, 17336080301, 613498630003, 17825000257, 16206705155, 17031000013, 13410000555, 16206704972, 37031030703, 27336082301, 17031023882, 18769227102, 18382056793, 31820000697, 13410048574, 17199048004, 11820041550, 11820000884, 18572511509, 14182700012, 13410010505, 18700050898, 18248812910, 18015761147, 18549873837, 16821309117, 618516510402, 17231163009, 611969900006, 11820011349, 11820000769, 13410046805, 13410000162, 13410000579, 18374110012, 21820000801, 17231163011, 23410068505, 17199030030, 11820000771, 17031000009, 28549873736, 11820000784, 17199048016, 13410057256, 11820054028, 11820001991, 13410057602, 17336080341, 21820029031, 715643501210, 17231163012, 17199047004, 18769201103, 13410000623, 15230000540, 618516510704, 17336011348, 15230001301, 618516510707, 17143934806, 18572511510, 28248800006, 13410057536, 611969900002, 11820041047, 13410057340, 18769241103, 17199000025, 21820000803, 14182700712), UPC = c("00-01-82000-72608", "00-01-01820-00198", "07-01-93271-20600", "00-01-86243-11167", "27-01-15503-01673", "00-01-78250-00236", "00-01-18200-96664", "00-01-99232-00055", "27-01-15504-02696", "00-01-34100-15306", "00-01-95301-13152", "00-03-22405-55555", "00-02-76398-50000", "00-01-82000-72750", "00-01-88573-50001", "00-01-72890-00122", "00-02-18200-00406", "00-03-18200-01168", "00-01-83820-73500", "27-01-00007-13019", "00-01-82548-20702", "07-01-19283-60388", "07-01-14834-00121", "07-01-02596-10001", "27-01-15503-03400", "06-01-30279-33706", "27-01-04200-18188", "07-01-80369-61992", "00-01-82153-19309", "00-01-34100-59205", "27-01-15504-00652", "00-02-18200-86030", "07-01-08820-49939", "27-01-04203-24019", "27-01-15503-00627", "27-01-15504-01210", "27-01-15504-01305", "00-02-18200-00006", "00-01-85498-66662", "00-01-50939-12201", "00-01-95374-70000", "00-01-81986-00187", "00-01-72620-20101", "07-02-19256-00043", "00-01-68213-05117", "07-01-86243-01106", "00-03-21242-00004", "27-01-15503-00969", "00-01-62067-38055", "07-01-36040-00657", "00-01-78250-00388", "00-01-62067-04971", "00-01-18200-15047", "27-01-00001-60539", "00-01-70378-54652", "00-01-52300-00005", "27-01-15504-01620", "00-01-70310-35738", "07-03-36920-11141", "27-01-15503-00049", "00-01-80480-90593", "00-01-18200-27168", "00-01-34100-14574", "07-01-92098-07234", "27-01-00007-05978", "06-01-05748-02000", "00-01-34100-92528", "06-01-43775-10306", "27-01-15503-01146", "27-01-00007-06142", "07-03-02770-00802", "00-04-62067-00900", "27-01-15502-01229", "00-01-73832-06340", "00-01-18200-00180", "07-01-94028-20030", "27-01-15504-02559", "07-01-90586-00122", "00-01-92212-11102", "07-01-80369-91993", "00-01-18200-18500", "00-01-18200-00997", "00-01-18200-41372", "00-01-34100-50505", "00-01-92212-12111", "27-01-15503-00078", "27-01-15503-07766", "00-01-80069-00221", "27-01-15504-03101", "00-02-77940-00002", "00-01-02770-01302", "07-01-27658-12302", "00-01-34100-62152", "00-01-72620-20128", "00-01-71990-77952", "00-01-85725-21004", "07-01-08820-58624", "07-02-88946-10141", "00-01-81986-00144", "00-01-18200-00242")), class = "data.frame", row.names = c(19638L, 11783L, 17140L, 19624L, 18706L, 8716L, 19653L, 5259L, 8738L, 13844L, 15570L, 11543L, 3919L, 1345L, 11412L, 3122L, 16899L, 19583L, 17617L, 16753L, 15011L, 17582L, 9364L, 16889L, 1310L, 5602L, 13696L, 8276L, 19528L, 18189L, 15840L, 19217L, 15982L, 13285L, 13568L, 5737L, 13263L, 19052L, 18282L, 13613L, 17300L, 16882L, 18785L, 9214L, 14925L, 2868L, 1009L, 17053L, 11674L, 7364L, 17077L, 16752L, 15990L, 17038L, 13449L, 19530L, 15166L, 19167L, 2539L, 542L, 9054L, 5597L, 5757L, 16754L, 17682L, 15167L, 18203L, 14441L, 657L, 13551L, 18748L, 747L, 14953L, 9903L, 4182L, 1330L, 11094L, 15936L, 17768L, 19642L, 15173L, 14921L, 17090L, 5759L, 12765L, 19579L, 15588L, 12885L, 19581L, 13814L, 17054L, 18697L, 11085L, 19136L, 2764L, 10523L, 17465L, 13960L, 17724L, 11687L))

1 Answer
1

I think you can do this with two gsub statements, based on nchar. I'm making the assumption here that for an 11 digit value in COLUPC, you are always missing a zero on the fourth group. And I called your data frame df1 in my sapply call. If you named yours something different you'll need to swap in your name.

gsub

nchar

COLUPC

df1

sapply

We'll define a helper function to pass to sapply:

sapply

fixUPC <- function(x) if (nchar(x) == 11) # we catch each Component in a separate group, then reconstruct in # our substitution adding zeros and dashes as necessary gsub("^(\d)(\d)(\d5)(\d4)","0\1-0\2-\3-0\4",x) else if (nchar(x) == 12) gsub("^(\d)(\d)(\d5)(\d5)","0\1-0\2-\3-\4",x) df1$fixedUPC <- unlist(sapply(df1$COLUPC, fixUPC)) head(df1$fixedUPC) [1] "07-01-56435-01208" "01-05-23000-00022" "01-08-76921-00012" [4] "07-01-19256-00019" "02-08-38201-02393" "01-03-41002-05574"

You could also extend this to situations where you lost more than 3 characters, i.e. nchar == 10, however, as you lose more numbers, your ability to accurately fill in zeros will depend on how consistent the UPCs were to begin with. For your last example of 11820198, it becomes ambiguous whether the original UPC was 00-01-01820-00198 or 01-01-82019-00008. If you can say with certainty that the zeros always go one place or another for a certain nchar value, you're good to go, but that may not be true in your full data set.

nchar == 10

11820198

00-01-01820-00198

01-01-82019-00008

nchar

Thanks you this does work however it stores the UPC as a list: structure ` $ COLUPC : num 1.09e+10 1.09e+10 1.09e+10 1.09e+10 1.09e+10 ... $ UPC :List of 20015 ..$ : chr "01-00-94390-00008"`
– user113156
10 mins ago

I will add a small reproducible example
– user113156
10 mins ago

Just added unlist() should be okay now
– Mako212
54 secs ago

unlist()

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

搜尋此網誌

Sfyjdyy

Adding back trailing zeros in an identification column

Adding back trailing zeros in an identification column

1 Answer
1

Popular posts from this blog

make 2 or more post in bootsrap

Store custom data using WC_Cart add_to_cart() method in Woocommerce 3

React Native Navigation and navigating to another Screen problem

Adding back trailing zeros in an identification column

Adding back trailing zeros in an identification column

1 Answer 1

Popular posts from this blog

make 2 or more post in bootsrap

Store custom data using WC_Cart add_to_cart() method in Woocommerce 3

React Native Navigation and navigating to another Screen problem

1 Answer
1