Posted by: Hydrogen Sulphide January 17, 2016
Any knows how to do this in R programming language?
Login in to Rate this Post:     0       ?        
I have a dataframe called my mydf. I want to split the contents in columns ASM and GPM based on the format given in the FORMAT column and get the result. So basically, there will be as many columns for ASM and GPM columns as there are total unique elements (i.e. 5 different unique elements) in FORMAT column separated by : to unwind in the result. Then need to place the right value in the right columns (with .GT, .FT, and so on) as indicated in FORMAT column.

mydf <- structure(list(`#CHROM` = c(1L, 1L, 1L), POS = c(10490L, 10493L,
10494L), FORMAT = c("GT:FT:GQ", "GT:PS:GL", "GT:PS:FT"), ASM = c("1/1:TRUE:4,2,333",
"./.:.:.", "0/1:.:VQLOW"), GPM = c("./.:.:.", "1/1:4:2,233",
"0/1:22:VQHIGH")), .Names = c("#CHROM", "POS", "FORMAT", "ASM",
"GPM"), class = "data.frame", row.names = c(NA, -3L))
result:

result <- structure(list(`#CHROM` = c(1L, 1L, 1L), POS = c(10490L, 10493L,
10494L), FORMAT = c("GT:FT:GQ", "GT:PS:GL", "GT:PS:FT"), ASM.GT = c("1/1",
"./.", "0/1"), ASM.FT = c("TRUE", NA, "VQLOW"), ASM.GQ = c("4,2,333",
NA, NA), ASM.PS = c(NA, NA, NA), ASM.GL = c(NA, NA, NA), GPM.GT = c("./.",
"1/1", "0/1"), GPM.FT = c(NA, NA, "VQHIGH"), GPM.GQ = c(NA, NA,
NA), GPM.PS = c(NA, 4L, 22L), GPM.GL = c(NA, 2233L, NA)), .Names = c("#CHROM",
"POS", "FORMAT", "ASM.GT", "ASM.FT", "ASM.GQ", "ASM.PS", "ASM.GL",
"GPM.GT", "GPM.FT", "GPM.GQ", "GPM.PS", "GPM.GL"), class = "data.frame", row.names = c(NA,
-3L))
Read Full Discussion Thread for this article