Page 1 of 1

Formatting a CSV file and sorting the values

Posted: 21 Sep 2015 08:23
by shaswat
Hi Team,

I am trying to sort the values of a CSV file but as some values in the CSV files are covered with double quotes and comma. So the resulted output is not what I am expecting. Please find the script, input file, output file and my expected output file. Please suggest me the change that is required in my script to get the desired output.

My Code:

Code: Select all

@echo off
for /f "tokens=1,2,3,4,5,6 delims=," %%a in (test.csv) do (
   echo "%%a","%%b","%%c","new_addition","%%d","%%e","%%f">>test_out.csv
)

Input File:

Code: Select all

sam,austraila,richard,AUS,sydney,"Associate, consultant"
shaswat,"san, matio",don,IN,"US, City",SE

Output file that is generated from the above code:

Code: Select all

""Associate","sam","sydney","LINE_TEST","austraila","AUS","richard"
""US","shaswat","IN","LINE_TEST",""san","don"," matio""

Desired output file:

Code: Select all

"Associate, consultant","sam","sydney","LINE_TEST","austraila","AUS","richard"
"SE","shaswat","US, City","LINE_TEST","san, matio","IN","don"

Thanks a lot for your help and time.

Regards,
Shaswat

Re: Formatting a CSV file and sorting the values

Posted: 21 Sep 2015 09:44
by Squashman
Your code and examples are not correct. There is no way you would get that output using that input from the code you are using. I think you are also confusing the word SORT with change the order.

This is what I get for output using your code and input.

Code: Select all

"sam","austraila","richard","new_addition","AUS","sydney",""Associate"
"shaswat",""san"," matio"","new_addition","don","IN",""US"

Re: Formatting a CSV file and sorting the values

Posted: 21 Sep 2015 10:24
by Squashman
Because you have commas embedded in your actual data, you will need to use an additional script to parse the data correctly.
Dave has written an excellent script to do just that. Copy his code for parsecsv.bat from here.
viewtopic.php?f=3&t=5702

Then use this for your batch file. Again, I am assuming you want to change the order of the fields, not sort the fields.

Code: Select all

@echo off

for /f "tokens=1-6 delims=|" %%a in ('parseCSV.bat "/o:|" ^<test.csv') do (
   echo "%%~f","%%~a","%%~e","new_addition","%%~b","%%~d","%%~c">>test_out.csv
)

Re: Formatting a CSV file and sorting the values

Posted: 21 Sep 2015 10:44
by Aacini
Try this:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

set "new_addition=LINE_TEST"

(for /F "delims=" %%a in (test.csv) do (
   set i=0
   for %%t in (%%a) do (
      set /A i+=1
      set "t!i!=%%~t"
   )
   echo "!t6!","!t1!","!t5!","%new_addition%","!t2!","!t4!","!t3!"
)) > test_out.csv

This program fail if wildcard characters (asterisk or question-mark) are included in the fields. Also, it remove exclamation marks.

Antonio

Re: Formatting a CSV file and sorting the values

Posted: 21 Sep 2015 12:10
by Squashman
Still baffles me today that the FOR command respects the quotes around delimited data but FOR /F does not!

Re: Formatting a CSV file and sorting the values

Posted: 21 Sep 2015 16:04
by penpen
The "for/F" command still bases on the default parsing of the referenced set in the same way "for" does:

Code: Select all

@echo off
setlocal enableExtensions enableDelayedExpansion
>abc.txt echo(a b c
>123.txt echo(1 2 3
for /F "tokens=1-3" %%a in (abc.txt 123.txt "footer1 footer2 footer3") do echo 1[%%a] 2[%%b] 3[%%c]
del abc.txt 123.txt
endlocal
goto :eof

:: Output
1[a] 2[b] 3[c]
1[1] 2[2] 3[3]
1[footer1] 2[footer2] 3[footer3]
The drawback is: It is extended, so after a doublequote all following is handled as one string up to the next (doublequote, closing bracket) combination; nevertheless all newlines in this "string" will be removed.
You may use "usebackq" so all is handled as a file token by "for/F".


penpen

Re: Formatting a CSV file and sorting the values

Posted: 22 Sep 2015 03:05
by shaswat
Aacini wrote:Try this:

This program fail if wildcard characters (asterisk or question-mark) are included in the fields. Also, it remove exclamation marks.

Antonio

Thanks a lot Antonio. This code did the work for me. I was completely blank what to do with this conversion, and your code completely solved my issue. Thanks for you time and help..!! :)

Re: Formatting a CSV file and sorting the values

Posted: 05 Oct 2015 08:33
by shaswat
Aacini wrote:This program fail if wildcard characters (asterisk or question-mark) are included in the fields. Also, it remove exclamation marks.

Antonio

Hi Antonio,
Your code helped me a lot for resolving my issue, but currently I am facing some issue while the input file format is slightly changes. You can find the script, input file, output file and my desired output file. Currently with the input I am not getting the correct required output file. Please suggest the changes required in the script so that I can get the desired output. Thanks a lot for your help.
Script:

Code: Select all

@echo off
setlocal EnableDelayedExpansion
set "new_addition=LINE_TEST"
(for /F "delims=" %%a in (test.csv) do (
   set i=0
   for %%t in (%%a) do (
      set /A i+=1
      set "t!i!=%%~t"
   )
   echo "!t6!","!t1!","!t5!","%new_addition%","garbage","!t2!","!t4!","!t3!"
)) > test_out_sample.csv

Input file:

Code: Select all

sam,aus traila,richard,AUS,sydney,"Associate, consultant"
shas wat,"san, matio",don,IN,"US, City",SE

Output file:

Code: Select all

"sydney","sam","AUS","LINE_TEST","garbage","aus","richard","traila"
"US, City","shas","IN","LINE_TEST","garbage","wat","don","san, matio"

Desired Output file:

Code: Select all

"Associate, consultant","sam","sydney","LINE_TEST","garbage","aus traila","AUS","richard"
"SE","shas wat","US, City","LINE_TEST","garbage","san, matio","IN","don"


My output is getting wrong if there is a space in between a token like (if australia is changed to aus traila my output is wrong, it is taking this as two different tokens, aus and traila) but I need this as a single token as "aus traila"

Please help in resolving my issue.

Re: Formatting a CSV file and sorting the values

Posted: 05 Oct 2015 08:41
by Squashman
shaswat wrote:My output is getting wrong if there is a space in between a token like (if australia is changed to aus traila my output is wrong, it is taking this as two different tokens, aus and traila) but I need this as a single token as "aus traila"

Please help in resolving my issue.

Did you ever take the time to try my suggestion above.
viewtopic.php?p=42985#p42985

Since your new revised input data has spaces in it and is not surrounded by quotes, the existing code you are using is not going to work.

Re: Formatting a CSV file and sorting the values

Posted: 05 Oct 2015 11:23
by shaswat
Squashman wrote:Since your new revised input data has spaces in it and is not surrounded by quotes, the existing code you are using is not going to work.

Hi Squashman,

I am currently modifying my script with the help of the link you have provided, but somehow I am still not getting the desired output. I'll do some modification with the script and will post it tomorrow if I find or didn't find the solution.

Thanks everyone for your help and support.

Re: Formatting a CSV file and sorting the values

Posted: 06 Oct 2015 06:22
by shaswat

Hi,
I have tried by using the help of the script in the link, please find my script, input and output file.
Script:

Code: Select all

@echo off
setlocal enableDelayedExpansion

:: Define LF to contain a linefeed (0x0A) character
set ^"LF=^

^" The empty line above is critical - DO NOT REMOVE

(for /f "tokens=1,2,3,4,5,6 delims=," %%A in ('parseCSV /e /d ^<input.csv') do (
  %== Load columns with encoded values. The trailing ! is important ==%
  set "A=%%~A"!
  set "B=%%~B"!
  set "C=%%~C"!
  set "D=%%~D"!
  set "E=%%~E"!
  set "F=%%~F"!
  %== Decode values ==%
  for %%L in ("!LF!") do for %%V in (A B C D E F) do if defined %%V (
    set "%%V=!%%V:\N=%%~L!"
    set "%%V=!%%V:\D=,!"
    set "%%V=!%%V:\S=\!"
  )
  %== Print results ==%
  for %%V in (A B C D E F) do echo(!%%V!
  echo(
))>output.csv

My Input File:

Code: Select all

sam,aus traila,richard,AUS,sydney,"Associate, consultant"
shas wat,"san, matio",van de parse,IN,"US, City",SE

Currently I'm getting this as my output:

Code: Select all

sam
aus traila
richard
AUS
sydney
Associate, consultant

shas wat
san, matio
van de parse
IN
US, City
SE

Desired Output

Code: Select all

"Associate, consultant","sam","aus traila","richard","sydney","AUS"
"SE","shas wat","san, matio","van de parse","US, City","IN"

I am just lagging somewhere and not able to find it out exactly where do I need to make the changes to have my output file in this expected order. Please suggest the changes required in the script to get the resulted output file. Thanks...!!

Re: Formatting a CSV file and sorting the values

Posted: 06 Oct 2015 06:47
by Squashman
Wow. You were way over thinking that. Some times you need to step back and keep it simple. All you had to do was add your variables to my code.

Code: Select all

@echo off
set "new_addition=LINE_TEST"
for /f "tokens=1-6 delims=|" %%a in ('parseCSV.bat "/o:|" ^<test.csv') do (
   echo "%%~f","%%~a","%%~e","%new_addition%","garbage","%%~b","%%~d","%%~c">>test_out.csv
)

Re: Formatting a CSV file and sorting the values

Posted: 06 Oct 2015 17:09
by Aacini
This method change spaces by another character that is not a delimiter (in this case an underscore) before pass the values to the FOR, and then change back underscores by spaces. If the underscore may appear in the file, just select another unused character.

Code: Select all

@echo off
setlocal EnableDelayedExpansion

set "new_addition=LINE_TEST"

(for /F "delims=" %%a in (test.csv) do (
   set "line=%%a"
   set i=0
   for %%t in (!line: ^=_!) do (
      set "token=%%~t"
      set /A i+=1
      set "t!i!=!token:_= !"
   )
   echo "!t6!","!t1!","!t5!","%new_addition%","garbage","!t2!","!t4!","!t3!"
)) > test_out.csv

Antonio