Convert row with columns of data into column with multiple rows

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Convert row with columns of data into column with multiple rows

#16 Post by penpen » 06 Jul 2016 13:56

How long are the lines (in characters) at maximum?

If it is < 1023 one could use "set/P" to read each line and a simple for loop to process the lines (avoiding "for/F").


penpen

SIMMS7400
Posts: 546
Joined: 07 Jan 2016 07:47

Re: Convert row with columns of data into column with multiple rows

#17 Post by SIMMS7400 » 06 Jul 2016 14:01

HI PenPen -

Right now, the longest line is 115 characters.

Thanks!

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: Convert row with columns of data into column with multiple rows

#18 Post by Squashman » 06 Jul 2016 14:59

penpen wrote:How long are the lines (in characters) at maximum?

If it is < 1023 one could use "set/P" to read each line and a simple for loop to process the lines (avoiding "for/F").


penpen

Hmm. So are you thinking a couple of FOR /L commands?

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Convert row with columns of data into column with multiple rows

#19 Post by Aacini » 06 Jul 2016 15:29

Using the input file given at first post of this thread, this code:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

set "n=0"
(for /F "delims=" %%a in (input.txt) do (
   set "line=%%a"
   if !n! equ 0 (
      call :GetCols
   ) else (
      for /F "tokens=1* delims=," %%b in ("!line!") do set "Name=%%~b" & set "line=%%c"
      for /L %%i in (2,1,!n!) do (
         for /F "tokens=1* delims=," %%b in ("!line!") do (
            echo "!Name!","!Col[%%i]!",%%b
            set "line=%%c"
         )
      )
   )
)) > output.txt
goto :EOF

:GetCols
   for /F "tokens=1* delims=," %%b in ("!line!") do (
      set /A "n+=1"
      set "Col[!n!]=%%~b"
      set "line=%%c"
   )
if defined line goto GetCols
exit /B

... produce this output:

Code: Select all

"PFP-B1033400","Alias","TAK-331 - Bioanalytical Support (GLP)"
"PFP-B1033400","TEXT_JP","Bioanalytical Support (GLP) (TAK-331)"
"PFP-B1033400","Direct Flag","DIRECT"
"PFP-B1033400","IP Owner","JPTP001"
"PFP-B1033400","Modality","Large Molecule"
"PFP-A1043200","Alias","C16028 Japan P2 study MLN9708 RRMM (Ixazomib)"
"PFP-A1043200","TEXT_JP","C16028 Japan P2 study MLN9708 RRMM"
"PFP-A1043200","Direct Flag","DIRECT"
"PFP-A1043200","IP Owner","USTP003"
"PFP-A1043200","Modality","Small Molecule"
"PFP-A1011300","Alias","Norovirus vaccine - DoD (Government grant)"
"PFP-A1011300","TEXT_JP","Norovirus vaccine - DoD (Government grant)"
"PFP-A1011300","Direct Flag","DIRECT"
"PFP-A1011300","IP Owner","USTP013"
"PFP-A1011300","Modality","Vaccine"
"PFP-A1012800","Alias","NOR-327"
"PFP-A1012800","TEXT_JP","NOR-327 "
"PFP-A1012800","Direct Flag","DIRECT"
"PFP-A1012800","IP Owner","USTP013"
"PFP-A1012800","Modality","Vaccine"

Antonio

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: Convert row with columns of data into column with multiple rows

#20 Post by Squashman » 06 Jul 2016 15:40

That is great Antonio. Completely dynamic.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Convert row with columns of data into column with multiple rows

#21 Post by dbenham » 06 Jul 2016 16:48

Here is a robust JREPL.BAT solution that supports any valid CSV file as input. It properly handles both commas and escaped quotes within quoted values. It dynamically adapts to any number of columns.

Perhaps later I will explain how it works :twisted:

Code: Select all

@echo off
setlocal

set "input=input.csv"
set "output=output.csv"

set "beg=var id,c,col=new Array()"
set "begln=c=0"
set "find=(?:^|,)(\q(\q\q|[^\q])*\q|[^,]*)"
set "repl=if(++c==1)id=$0;if(ln==1)col[c]=$0;ln>2&&c>1?id+col[c]+$0:false"

call jrepl find repl /x /v /jmatch /jbeg beg /jbegln begln /f "%input%" /o "%output%"

EDIT - Explanation added:

Starting with the last line:
  • The /X option enables use of \q as an escape sequence for a double quote within the search string
  • The /V option indicates that the search string, replace string, /JBEG value, and /JBEGLN values all represent names of variables that contain the actual values
  • The /JMATCH option does two things - 1) Causes the result of each match/replace to be printed on a new line, and 2) Interprets the replace string as JScript that operates like a call back function. For each match, the JScript is executed, and the resulting string is used as the replacement value.
  • The /JBEG defines JScript code that is run once during initialization
  • The /JBEGLN defines JScript code that is run for each line of input prior to performing the search on that line
  • The /F and /O options specify the input and output files

Now for an explanation of the variables:

BEG - Initializes global variables that are needed by the replacement code:
  • id = holds the first column value of each line
  • c = a number representing the current column number
  • col = an array of column headers from the first line

BEGLN - Resets the column counter c to 0 at the start of each line

FIND - The regular expression search string that matches a single column, including any leading comma and enclosing quotes. It can be broken up into two components
  • Alternation that matches either the beginning of a line, or a comma
  • The column value. This is itself an alternation expression that matches either
    • A quoted value, consisting of a quote, followed by any number of either paired (escaped) quotes or any non-quote character, followed by a closing quote
    • Or an unquoted value, consisting of any number of characters that are not a comma

REPL - The replacement JScript code - where the rubber meets the road :D
It consists of three distinct statements (expressions) delimited by a semicolons. The value of the last expression is what is used as the replacement value.
  • Increment the column counter c, and if the result is 1, then store the matched value in the id variable
  • If the line number ln (standard variable available to all user JScript) is 1, then store the column value in the col array
  • If the line number is > 2 and the column number is > 1, then return the formatted line, consisting of the id, followed by the column header value, followed by the matched value. Else return false, meaning discard the match

Ta Da ... Simple, right :?: :wink:


Dave Benham
Last edited by dbenham on 07 Jul 2016 06:24, edited 7 times in total.

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: Convert row with columns of data into column with multiple rows

#22 Post by Squashman » 06 Jul 2016 16:58

dbenham wrote:Perhaps later I will explain how it works :twisted:

Evil Indeed.

SIMMS7400
Posts: 546
Joined: 07 Jan 2016 07:47

Re: Convert row with columns of data into column with multiple rows

#23 Post by SIMMS7400 » 06 Jul 2016 17:23

Wow! Antonio - that works like a charm!! Thank you very much. My only issue is column #, but I'm hoping that is solved using Dave's method!

Dave -

I have my code setup as follows, but no luck in execution:

Code: Select all

@echo off
setlocal

set "input=input.csv"
set "output=output.csv"

set "beg=var id,c,col=new Array()"
set "begln=c=0"
set "find=(?:^|,)(\q(\q\q|[^\q])*\q|[^,]*)"
set "repl=if(++c==1)id=$0;if(ln==1)col[c]=$0;ln>2&&c>1?id+col[c]+$0:false"

call "C:\JREPL.bat" find repl /x /v /jmatch /jbeg beg /jbegln begln /f "%input% /o "%output%"


This looks very promising - I'm just going through your thread now :) Any ideas on why the above wont execute? In your link, I took all of that code and put that in a batch file named JREPL on my end.

Thanks, Dave.

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Convert row with columns of data into column with multiple rows

#24 Post by Aacini » 06 Jul 2016 17:30

SIMMS7400 wrote:Wow! Antonio - that works like a charm!! Thank you very much. My only issue is column #...

Perhaps if you explain what is that issue, I could fix it... My code have no limit in the number of columns.

Antonio

SIMMS7400
Posts: 546
Joined: 07 Jan 2016 07:47

Re: Convert row with columns of data into column with multiple rows

#25 Post by SIMMS7400 » 06 Jul 2016 17:34

Aacini wrote:
SIMMS7400 wrote:Wow! Antonio - that works like a charm!! Thank you very much. My only issue is column #...

Perhaps if you explain what is that issue, I could fix it... My code have no limit in the number of columns.

Antonio


Hi Antonio -

My apologies! I need the code to support 58 columns; my example only had 6. Can your code be adapted to support 58? That would be great!

Thank you, Antonio!

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Convert row with columns of data into column with multiple rows

#26 Post by Aacini » 06 Jul 2016 17:39

SIMMS7400 wrote:Hi Antonio -

My apologies! I need the code to support 58 columns; my example only had 6. Can your code be adapted to support 58? That would be great!

Thank you, Antonio!

My code don't need to be "adapted" to any number of columns, because my code have no limit in the number of columns...

Did you tested my code with your 58 columns?

Antonio

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Convert row with columns of data into column with multiple rows

#27 Post by dbenham » 06 Jul 2016 17:55

SIMMS7400 wrote:Dave - ... This looks very promising - I'm just going through your thread now :) Any ideas on why the above wont execute? In your link, I took all of that code and put that in a batch file named JREPL on my end.

The code was fully tested and working with hard coded file names, but when I replaced them with variables, I forgot a quote with the /F parameter. :oops:

I fixed my post. Here is your problem line fixed:

Code: Select all

call "C:\JREPL.bat" find repl /x /v /jmatch /jbeg beg /jbegln begln /f "%input%" /o "%output%"

Make sure you change the values of INPUT and OUTPUT at the top of the script to match your situation. Or you can simply put the file names directly in the JREPL call. The only reason I used variables was for "documentation"


Dave Benham

SIMMS7400
Posts: 546
Joined: 07 Jan 2016 07:47

Re: Convert row with columns of data into column with multiple rows

#28 Post by SIMMS7400 » 06 Jul 2016 18:15

Aacini wrote:
SIMMS7400 wrote:Hi Antonio -

My apologies! I need the code to support 58 columns; my example only had 6. Can your code be adapted to support 58? That would be great!

Thank you, Antonio!

My code don't need to be "adapted" to any number of columns, because my code have no limit in the number of columns...

Did you tested my code with your 58 columns?

Antonio


Antonio -

You are right! Wow! This code works like an absolute charm!!!!!!

Thank you so much! And thank you to all that helped me here! Dave, still interested to understand how your solution works as well.

Thank you again!!

SIMMS7400
Posts: 546
Joined: 07 Jan 2016 07:47

Re: Convert row with columns of data into column with multiple rows

#29 Post by SIMMS7400 » 06 Jul 2016 18:20

Hi Dave -

Sorry - you posted as I wrote my last response.

I just tested it out and it works perfect! WOW!! Absolutely incredible! However, I do notice a difference in total number of lines between your solution & Antonio's.

I'll do some extensive data validation tomorrow to understand those difference and report back. Perhaps it's nothing to worry about.

Thanks again, folks!

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: Convert row with columns of data into column with multiple rows

#30 Post by Squashman » 06 Jul 2016 18:55

SIMMS7400 wrote: I do notice a difference in total number of lines between your solution & Antonio's.

I'll do some extensive data validation tomorrow to understand those difference and report back. Perhaps it's nothing to worry about.

Well some simple math should tell you how many lines you should have.

Number of records on input file x 57 = total output lines.

Post Reply