Convert row with columns of data into column with multiple rows
Moderator: DosItHelp
Re: Convert row with columns of data into column with multiple rows
How long are the lines (in characters) at maximum?
If it is < 1023 one could use "set/P" to read each line and a simple for loop to process the lines (avoiding "for/F").
penpen
If it is < 1023 one could use "set/P" to read each line and a simple for loop to process the lines (avoiding "for/F").
penpen
Re: Convert row with columns of data into column with multiple rows
HI PenPen -
Right now, the longest line is 115 characters.
Thanks!
Right now, the longest line is 115 characters.
Thanks!
Re: Convert row with columns of data into column with multiple rows
penpen wrote:How long are the lines (in characters) at maximum?
If it is < 1023 one could use "set/P" to read each line and a simple for loop to process the lines (avoiding "for/F").
penpen
Hmm. So are you thinking a couple of FOR /L commands?
Re: Convert row with columns of data into column with multiple rows
Using the input file given at first post of this thread, this code:
... produce this output:
Antonio
Code: Select all
@echo off
setlocal EnableDelayedExpansion
set "n=0"
(for /F "delims=" %%a in (input.txt) do (
set "line=%%a"
if !n! equ 0 (
call :GetCols
) else (
for /F "tokens=1* delims=," %%b in ("!line!") do set "Name=%%~b" & set "line=%%c"
for /L %%i in (2,1,!n!) do (
for /F "tokens=1* delims=," %%b in ("!line!") do (
echo "!Name!","!Col[%%i]!",%%b
set "line=%%c"
)
)
)
)) > output.txt
goto :EOF
:GetCols
for /F "tokens=1* delims=," %%b in ("!line!") do (
set /A "n+=1"
set "Col[!n!]=%%~b"
set "line=%%c"
)
if defined line goto GetCols
exit /B
... produce this output:
Code: Select all
"PFP-B1033400","Alias","TAK-331 - Bioanalytical Support (GLP)"
"PFP-B1033400","TEXT_JP","Bioanalytical Support (GLP) (TAK-331)"
"PFP-B1033400","Direct Flag","DIRECT"
"PFP-B1033400","IP Owner","JPTP001"
"PFP-B1033400","Modality","Large Molecule"
"PFP-A1043200","Alias","C16028 Japan P2 study MLN9708 RRMM (Ixazomib)"
"PFP-A1043200","TEXT_JP","C16028 Japan P2 study MLN9708 RRMM"
"PFP-A1043200","Direct Flag","DIRECT"
"PFP-A1043200","IP Owner","USTP003"
"PFP-A1043200","Modality","Small Molecule"
"PFP-A1011300","Alias","Norovirus vaccine - DoD (Government grant)"
"PFP-A1011300","TEXT_JP","Norovirus vaccine - DoD (Government grant)"
"PFP-A1011300","Direct Flag","DIRECT"
"PFP-A1011300","IP Owner","USTP013"
"PFP-A1011300","Modality","Vaccine"
"PFP-A1012800","Alias","NOR-327"
"PFP-A1012800","TEXT_JP","NOR-327 "
"PFP-A1012800","Direct Flag","DIRECT"
"PFP-A1012800","IP Owner","USTP013"
"PFP-A1012800","Modality","Vaccine"
Antonio
Re: Convert row with columns of data into column with multiple rows
That is great Antonio. Completely dynamic.
Re: Convert row with columns of data into column with multiple rows
Here is a robust JREPL.BAT solution that supports any valid CSV file as input. It properly handles both commas and escaped quotes within quoted values. It dynamically adapts to any number of columns.
Perhaps later I will explain how it works
EDIT - Explanation added:
Starting with the last line:
Now for an explanation of the variables:
BEG - Initializes global variables that are needed by the replacement code:
BEGLN - Resets the column counter c to 0 at the start of each line
FIND - The regular expression search string that matches a single column, including any leading comma and enclosing quotes. It can be broken up into two components
REPL - The replacement JScript code - where the rubber meets the road
It consists of three distinct statements (expressions) delimited by a semicolons. The value of the last expression is what is used as the replacement value.
Ta Da ... Simple, right
Dave Benham
Perhaps later I will explain how it works
Code: Select all
@echo off
setlocal
set "input=input.csv"
set "output=output.csv"
set "beg=var id,c,col=new Array()"
set "begln=c=0"
set "find=(?:^|,)(\q(\q\q|[^\q])*\q|[^,]*)"
set "repl=if(++c==1)id=$0;if(ln==1)col[c]=$0;ln>2&&c>1?id+col[c]+$0:false"
call jrepl find repl /x /v /jmatch /jbeg beg /jbegln begln /f "%input%" /o "%output%"
EDIT - Explanation added:
Starting with the last line:
- The /X option enables use of \q as an escape sequence for a double quote within the search string
- The /V option indicates that the search string, replace string, /JBEG value, and /JBEGLN values all represent names of variables that contain the actual values
- The /JMATCH option does two things - 1) Causes the result of each match/replace to be printed on a new line, and 2) Interprets the replace string as JScript that operates like a call back function. For each match, the JScript is executed, and the resulting string is used as the replacement value.
- The /JBEG defines JScript code that is run once during initialization
- The /JBEGLN defines JScript code that is run for each line of input prior to performing the search on that line
- The /F and /O options specify the input and output files
Now for an explanation of the variables:
BEG - Initializes global variables that are needed by the replacement code:
- id = holds the first column value of each line
- c = a number representing the current column number
- col = an array of column headers from the first line
BEGLN - Resets the column counter c to 0 at the start of each line
FIND - The regular expression search string that matches a single column, including any leading comma and enclosing quotes. It can be broken up into two components
- Alternation that matches either the beginning of a line, or a comma
- The column value. This is itself an alternation expression that matches either
- A quoted value, consisting of a quote, followed by any number of either paired (escaped) quotes or any non-quote character, followed by a closing quote
- Or an unquoted value, consisting of any number of characters that are not a comma
REPL - The replacement JScript code - where the rubber meets the road
It consists of three distinct statements (expressions) delimited by a semicolons. The value of the last expression is what is used as the replacement value.
- Increment the column counter c, and if the result is 1, then store the matched value in the id variable
- If the line number ln (standard variable available to all user JScript) is 1, then store the column value in the col array
- If the line number is > 2 and the column number is > 1, then return the formatted line, consisting of the id, followed by the column header value, followed by the matched value. Else return false, meaning discard the match
Ta Da ... Simple, right
Dave Benham
Last edited by dbenham on 07 Jul 2016 06:24, edited 7 times in total.
Re: Convert row with columns of data into column with multiple rows
dbenham wrote:Perhaps later I will explain how it works
Evil Indeed.
Re: Convert row with columns of data into column with multiple rows
Wow! Antonio - that works like a charm!! Thank you very much. My only issue is column #, but I'm hoping that is solved using Dave's method!
Dave -
I have my code setup as follows, but no luck in execution:
This looks very promising - I'm just going through your thread now Any ideas on why the above wont execute? In your link, I took all of that code and put that in a batch file named JREPL on my end.
Thanks, Dave.
Dave -
I have my code setup as follows, but no luck in execution:
Code: Select all
@echo off
setlocal
set "input=input.csv"
set "output=output.csv"
set "beg=var id,c,col=new Array()"
set "begln=c=0"
set "find=(?:^|,)(\q(\q\q|[^\q])*\q|[^,]*)"
set "repl=if(++c==1)id=$0;if(ln==1)col[c]=$0;ln>2&&c>1?id+col[c]+$0:false"
call "C:\JREPL.bat" find repl /x /v /jmatch /jbeg beg /jbegln begln /f "%input% /o "%output%"
This looks very promising - I'm just going through your thread now Any ideas on why the above wont execute? In your link, I took all of that code and put that in a batch file named JREPL on my end.
Thanks, Dave.
Re: Convert row with columns of data into column with multiple rows
SIMMS7400 wrote:Wow! Antonio - that works like a charm!! Thank you very much. My only issue is column #...
Perhaps if you explain what is that issue, I could fix it... My code have no limit in the number of columns.
Antonio
Re: Convert row with columns of data into column with multiple rows
Aacini wrote:SIMMS7400 wrote:Wow! Antonio - that works like a charm!! Thank you very much. My only issue is column #...
Perhaps if you explain what is that issue, I could fix it... My code have no limit in the number of columns.
Antonio
Hi Antonio -
My apologies! I need the code to support 58 columns; my example only had 6. Can your code be adapted to support 58? That would be great!
Thank you, Antonio!
Re: Convert row with columns of data into column with multiple rows
SIMMS7400 wrote:Hi Antonio -
My apologies! I need the code to support 58 columns; my example only had 6. Can your code be adapted to support 58? That would be great!
Thank you, Antonio!
My code don't need to be "adapted" to any number of columns, because my code have no limit in the number of columns...
Did you tested my code with your 58 columns?
Antonio
Re: Convert row with columns of data into column with multiple rows
SIMMS7400 wrote:Dave - ... This looks very promising - I'm just going through your thread now Any ideas on why the above wont execute? In your link, I took all of that code and put that in a batch file named JREPL on my end.
The code was fully tested and working with hard coded file names, but when I replaced them with variables, I forgot a quote with the /F parameter.
I fixed my post. Here is your problem line fixed:
Code: Select all
call "C:\JREPL.bat" find repl /x /v /jmatch /jbeg beg /jbegln begln /f "%input%" /o "%output%"
Make sure you change the values of INPUT and OUTPUT at the top of the script to match your situation. Or you can simply put the file names directly in the JREPL call. The only reason I used variables was for "documentation"
Dave Benham
Re: Convert row with columns of data into column with multiple rows
Aacini wrote:SIMMS7400 wrote:Hi Antonio -
My apologies! I need the code to support 58 columns; my example only had 6. Can your code be adapted to support 58? That would be great!
Thank you, Antonio!
My code don't need to be "adapted" to any number of columns, because my code have no limit in the number of columns...
Did you tested my code with your 58 columns?
Antonio
Antonio -
You are right! Wow! This code works like an absolute charm!!!!!!
Thank you so much! And thank you to all that helped me here! Dave, still interested to understand how your solution works as well.
Thank you again!!
Re: Convert row with columns of data into column with multiple rows
Hi Dave -
Sorry - you posted as I wrote my last response.
I just tested it out and it works perfect! WOW!! Absolutely incredible! However, I do notice a difference in total number of lines between your solution & Antonio's.
I'll do some extensive data validation tomorrow to understand those difference and report back. Perhaps it's nothing to worry about.
Thanks again, folks!
Sorry - you posted as I wrote my last response.
I just tested it out and it works perfect! WOW!! Absolutely incredible! However, I do notice a difference in total number of lines between your solution & Antonio's.
I'll do some extensive data validation tomorrow to understand those difference and report back. Perhaps it's nothing to worry about.
Thanks again, folks!
Re: Convert row with columns of data into column with multiple rows
SIMMS7400 wrote: I do notice a difference in total number of lines between your solution & Antonio's.
I'll do some extensive data validation tomorrow to understand those difference and report back. Perhaps it's nothing to worry about.
Well some simple math should tell you how many lines you should have.
Number of records on input file x 57 = total output lines.