How to standardize a column to be able to parse into multiple columns?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



How to standardize a column to be able to parse into multiple columns?



There are a lot of posts on how to split a single column value into multiple columns, however I am having trouble standardizing data in the column to be able to use those techniques. What is the best approach for standardizing the below example?



For example:


**emails**
first_email,second_email
third_email; fourth_email
fifth_email ;sixth_email
seventh_email, eight_email
ninth_email.tenth_email



My expected output from standardization is:


**emails**
first_email, second_email
third_email, fourth_email
fifth_email, sixth_email
seventh_email, eight_email
ninth_email, tenth_email





If possible, I would highly suggest changing your database design to store only a single email in each row. Another issue is that periods are allowed characters in email addresses. So splitting on them wouldn't necessarily produce your expected results.
– sgeddes
Aug 10 at 17:03





A comma or semicolon are valid separators for multiple email, but a period is not (since it is commonly used as first.last@domain.com). Is . really a separator between two email addresses?, or is . part of a valid email address?
– James L.
Aug 10 at 17:11


first.last@domain.com


.


.





I would suggest using regular expressions to parse the list. I doubt it's possible without regexp.
– Ildar Akhmetov
Aug 10 at 17:12





Are there other email where the . is part of the email address, or are all . used to separate two email addresses?
– James L.
Aug 10 at 17:16


.


.





How many records are we talking about? Can you manually sort through a few stragglers if necessary? The , and ; are no problem, the . on the other hand will cause you a lot of grief.
– Dave Cullum
Aug 10 at 17:21


,


;


.




1 Answer
1



This shows how you can normalize the formatting, but because the . character is sometimes used in an email address and other times used to separate email addresses, you should probably not include the replace for the ., and just work through those manually.


.


.



You can used a nested replace() to eliminate the spaces first, then replace the valid email address separators with ,[space].


replace()


,[space]


declare @str varchar(max) = 'first_email,second_email
third_email; fourth_email
fifth_email ;sixth_email
seventh_email, eight_email
ninth_email.tenth_email'

select replace(replace(replace(replace(
@str, ' ', ''), -- a space is not valid in an email
',', ', '),
';', ', '),
'.', ', '
)



Which returns:


first_email, second_email
third_email, fourth_email
fifth_email, sixth_email
seventh_email, eight_email
ninth_email, tenth_email





Replacing '.', ', ' will not work with fn.ln@provider.com.
– Olivier Jacot-Descombes
Aug 10 at 17:32


'.', ', '


fn.ln@provider.com





Yes, that is why I said she should not include the replace for .. She should really clean those records up first and exclude the '.', ', ' from the the final script.
– James L.
Aug 10 at 17:49


.


'.', ', '






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

Firebase Auth - with Email and Password - Check user already registered

Dynamically update html content plain JS

How to determine optimal route across keyboard