Replace with a column (advanced batch)

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
zimxavier
Posts: 53
Joined: 17 Jan 2016 10:09
Location: France

Replace with a column (advanced batch)

#1 Post by zimxavier » 08 Jun 2016 08:47

I have two folders with many csv files with the same name, but not the same content.

FOLDER1 : file1.csv, file2.csv,...
FOLDER2 : file1.csv, file2.csv,...

file1.csv
[i]#CODE;ENGLISH;FRENCH;GERMAN;SPANISH;
CODE1;IN ENGLISH;IN FRENCH;IN GERMAN;IN SPANISH;
CODE2;IN ENGLISH2;IN FRENCH2;IN GERMAN2;IN SPANISH2;
...
file2.csv
[i]#CODE;ENGLISH;FRENCH;GERMAN;SPANISH;
CODE1;IN ENGLISH;IN FRENCH;IN GERMAN;IN SPANISH;
CODE2;IN ENGLISH2;IN FRENCH2;IN GERMAN2;IN SPANISH2;
...


I need to replace all characters between second semi-colon and third semi-colon from line 1 (file1.csv, FOLDER1) with all characters between second semi-colon and third semi-colon from line 1 (file1.csv, FOLDER2)
Then idem with line 2, etc.
Then between file2.csv FOLDER1) and file2.csv (FOLDER2), etc.
=> Instead of replacing folder1 with folder2 I need to replace only the third column ("FRENCH").

Usually we can do it with Excel with columns (file by file) but in my case I can't do it (Excel change some values and it takes too much time...)

Thank you for any help :-)

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Replace with a column (advanced batch)

#2 Post by foxidrive » 08 Jun 2016 10:34

It helps me to understand if you give clear examples of before and after files.

Please use the actual text in your files, but disguise personal information, because character sets can change the code that will work.

zimxavier
Posts: 53
Joined: 17 Jan 2016 10:09
Location: France

Re: Replace with a column (advanced batch)

#3 Post by zimxavier » 08 Jun 2016 12:19

okay i will try to be clear :)
and thank you!

An example with one file in each folder (the two files have the same name - text1.csv)

BEFORE
Folder1
text1.csv
BARBER_TOOLTIP;Go to the Barber;Va cher le coiffeur;Zum Barbier gehen;;Ir al Peluquero;;;;;;;;;x
RENAME_TITLE_TOOLTIP;Rename Title;Renommer titre;Titel umbenennen;;Renombrar Título;;;;;;;;;x

Folder2
text1.csv
BARBER_TOOLTIP;Go to the Barber;Aller chez le barbier;dytntynyt;;ytndytnytn;;;;;;;;;x
RENAME_TITLE_TOOLTIP;Rename Title;Renommer le titre;dynytny;;Rytnytnytnytn;;;;;;;;;x


AFTER
Folder1
text1.csv
BARBER_TOOLTIP;Go to the Barber;Aller chez le barbier;Zum Barbier gehen;;Ir al Peluquero;;;;;;;;;x
RENAME_TITLE_TOOLTIP;Rename Title;Renommer le titre;Titel umbenennen;;Renombrar Título;;;;;;;;;x

No change in Folder2.

Semi-colons are separators, so it is between the second and the third semi-colon.
Files with the same name have lines in the same order.
I need a batch working for many files. Same principle.

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: Replace with a column (advanced batch)

#4 Post by foxidrive » 08 Jun 2016 15:54

Thank you, that is clear.

The other thing that is needed is the folder structure where the files are found, and where the files can be expected to be found.

A script can be written for "folder1" and "folder2"
but if the files can be in "main\folder1" and "main\apple\folder2" then different code is needed.

Without good details then someone may have to write and change the code two or three times.

aGerman
Expert
Posts: 4678
Joined: 22 Jan 2010 18:01
Location: Germany

Re: Replace with a column (advanced batch)

#5 Post by aGerman » 08 Jun 2016 16:31

Strongly advised: Before you run the batch code backup your CSV files!

Rules:
The text has to be single-byte encoded (no UTF-8 or the like).
Lines must not begin with a semicolon.
Multiple semicolons must not appear directly after one of the first 3 tokens.
Not only the order of the lines must be the same in the corresponding CSV files, also the number of lines and the line number of a certain entry must be the same.
All of the CSV files are saved directly in folder1 or folder2. Files in subfolders are ignored.

Code: Select all

@echo off &setlocal

set "dir1=folder1"
set "dir2=folder2"

for %%i in ("%dir1%\*.csv") do if exist "%dir2%\%%~nxi" (
  <"%dir2%\%%~nxi" >"%dir1%\csv.~tmp" (
    for /f delims^=^ eol^= %%j in ('findstr /n "^" "%%~i"') do (
      set "line1=%%j"
      set "line2="&set /p "line2="
      if not defined line2 (
        echo(
      ) else (
        setlocal EnableDelayedExpansion
        for /f "tokens=1-3* delims=; eol=" %%k in ("!line1:*:=!") do (
          for /f "tokens=3 delims=; eol=" %%M in ("!line2!") do (
            endlocal
            set "token1=%%k"&set "token2=%%l"&set "token3=%%M"&set "token4=%%n"
          )
        )
        setlocal EnableDelayedExpansion
        echo !token1!;!token2!;!token3!;!token4!
        endlocal
      )
    )
  )
  move /y "%dir1%\csv.~tmp" "%%~i"
)

Regards
aGerman

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Replace with a column (advanced batch)

#6 Post by Aacini » 08 Jun 2016 16:45

Another one!

Code: Select all

@echo off
setlocal EnableDelayedExpansion

cd Folder1
for %%f in (*.csv) do (
   < "..\Folder2\%%f" (for /F "usebackq tokens=1-3* delims=;" %%a in ("%%f") do (
      set /P "line2="
      for /F "tokens=3 delims=;" %%c in ("!line2!") do echo %%a;%%b;%%c;%%d
   )) > temp.tmp
   move /Y temp.tmp "%%f" > NUL
)

Antonio

zimxavier
Posts: 53
Joined: 17 Jan 2016 10:09
Location: France

Re: Replace with a column (advanced batch)

#7 Post by zimxavier » 09 Jun 2016 08:35

Thank you for your help!

foxidrive wrote:The other thing that is needed is the folder structure where the files are found, and where the files can be expected to be found.

A script can be written for "folder1" and "folder2"
but if the files can be in "main\folder1" and "main\apple\folder2" then different code is needed.

Without good details then someone may have to write and change the code two or three times.

Ok, I will try to give more details, but it is not easy for me to know which of them could cause problems.
I can move the files in "Folder1" and "Folder2" myself (anyway, I don't want to overwrite my files, I need to check before)

Details: There is no subfolder. There are 59 files and around thousand lines at all. Some lines are very long. They can contain at least:
[ ] \ ! ? . : , $ § % * « » " and accented characters from 3 European languages (French, German, Spanish).
If there is no character between the second and the third semi-colon in a line from a file in Folder2, I need to have no character in the corresponding line in Folder1. ( = a strict replacement)

=====> Here, an old version for testing. <=====

aGerman wrote:Rules:
The text has to be single-byte encoded (no UTF-8 or the like). => Files are usually encoded in ANSI, but I can check each time if some of them are in UTF8. That's okay.
Lines must not begin with a semicolon. => Ok
Multiple semicolons must not appear directly after one of the first 3 tokens. => Syntax is always CODE;ENGLISH;FRENCH;GERMAN;;SPANISH; but there is a variable number of semi-colons after Spanish according to the line.
Not only the order of the lines must be the same in the corresponding CSV files, also the number of lines and the line number of a certain entry must be the same. =>Not sure what do you mean about "the line number of a certain entry must be the same".
All of the CSV files are saved directly in folder1 or folder2. Files in subfolders are ignored. =>No problem, there is no subfolder.

Both batchs work fine with my example, but not with my actual files. And probably for the same reasons. I noticed an issue with exclamation marks.

aGerman
Expert
Posts: 4678
Joined: 22 Jan 2010 18:01
Location: Germany

Re: Replace with a column (advanced batch)

#8 Post by aGerman » 09 Jun 2016 11:56

I fear you can't edit these files with Batch. (Maybe you could use a JScript hybrid.) Exclusion criterion is clearly that your files contain lines that are longer than 1021 characters. Those can't be read with SET /P. Sorry :cry:
Further more many files/lines do not meet the CODE;ENGLISH;FRENCH;GERMAN;;SPANISH; format. Some of them don't even have 3 contiguous tokens.

zimxavier
Posts: 53
Joined: 17 Jan 2016 10:09
Location: France

Re: Replace with a column (advanced batch)

#9 Post by zimxavier » 09 Jun 2016 13:59

Oh okay, thank you aGerman. Yes, some lines are commentaries (lines with #) and don't have semi-colon...
I daily use JREPL for more basic task and it is magical. It save me many hours. You think it can do it ?

aGerman
Expert
Posts: 4678
Joined: 22 Jan 2010 18:01
Location: Germany

Re: Replace with a column (advanced batch)

#10 Post by aGerman » 10 Jun 2016 11:29

I would rather use another language. Save this JScript code with file extension .js and give it a shot

Code: Select all

var sDir1 = 'Folder1',
    sDir2 = 'Folder2';


var objFSO = new ActiveXObject('Scripting.FileSystemObject'),
    objDir1 = objFSO.GetFolder(sDir1),
    colFiles = new Enumerator(objDir1.Files);

for (; !colFiles.atEnd(); colFiles.moveNext()){
  var sFile1 = colFiles.item().Path;

  if (objFSO.FileExists(objFSO.BuildPath(sDir2, colFiles.item().Name))){
    var objFile1 = colFiles.item().OpenAsTextStream(1),
        objFile2 = objFSO.OpenTextFile(objFSO.BuildPath(sDir2, colFiles.item().Name), 1),
        objFileTmp = objFSO.OpenTextFile(sFile1 + '.~tmp', 2, true);

    while (!objFile1.AtEndOfStream && !objFile2.AtEndOfStream){
      var arr1 = tokenize(objFile1.ReadLine()),
          arr2 = tokenize(objFile2.ReadLine());
      if (arr1.length > 2 && arr2.length > 2){
        arr1[2] = arr2[2];
      }
      objFileTmp.WriteLine(arr1.join(';'));
    }

    objFile1.Close();
    objFile2.Close();
    objFileTmp.Close();

    objFSO.DeleteFile(sFile1);
    objFSO.MoveFile(sFile1 + '.~tmp', sFile1);
  }
}


function tokenize(s){
  var a = s.split(';'),
      r = a.splice(0, 3);
  r.push(a.join(';'));
  return r;
}

Regards
aGerman

zimxavier
Posts: 53
Joined: 17 Jan 2016 10:09
Location: France

Re: Replace with a column (advanced batch)

#11 Post by zimxavier » 12 Jun 2016 02:03

That's awesome :D :D :D :D

Each file need to have an empty line at the end for avoiding issue (but it is a usual rule...). Other than that, it works perfectly.


Thank you aGerman

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Replace with a column (advanced batch)

#12 Post by dbenham » 19 Jun 2016 07:38

Here is a one-liner (with line continuation for readability) using JREPL.BAT

Code: Select all

@for %%F in ("folder1\*.csv") do if exist "folder2\%%~nxF" ^
  <"folder2\%%~nxF" call jrepl "^(.*?;.*?;).*?(?=;|$)" "$1+str" ^
  /jbegln "str=stdin.readLine().replace(/^.*?;.*?;(.*?)(;.*|$)/,'$1')" ^
  /j /f "%%F" /o -

A couple things to be aware of:
  • If a line in folder1\file.csv has a 3rd column, but the line in folder2\file.csv does not, then the folder1 3rd column will be replaced by the entire content of the folder2 line.
  • If folder2\file.csv has fewer lines, then the command will fail with an error, folder1\file.csv will be unchanged, and a file named folder1\file.csv.new will have the partial result up until the error.


Dave Benham

zimxavier
Posts: 53
Joined: 17 Jan 2016 10:09
Location: France

Re: Replace with a column (advanced batch)

#13 Post by zimxavier » 21 Jun 2016 06:47

dbenham wrote:Here is a one-liner (with line continuation for readability) using JREPL.BAT

Code: Select all

@for %%F in ("folder1\*.csv") do if exist "folder2\%%~nxF" ^
  <"folder2\%%~nxF" call jrepl "^(.*?;.*?;).*?(?=;|$)" "$1+str" ^
  /jbegln "str=stdin.readLine().replace(/^.*?;.*?;(.*?)(;.*|$)/,'$1')" ^
  /j /f "%%F" /o -

A couple things to be aware of:
  • If a line in folder1\file.csv has a 3rd column, but the line in folder2\file.csv does not, then the folder1 3rd column will be replaced by the entire content of the folder2 line.
  • If folder2\file.csv has fewer lines, then the command will fail with an error, folder1\file.csv will be unchanged, and a file named folder1\file.csv.new will have the partial result up until the error.

Dave Benham


Thank you Dave
That looks awesome, but nothing happens when I execute your batch (the file from aGerman works as expected). Maybe I missed something.
JREPL.bat (v4) is in the same folder, I didn't make any change in your file... No idea.

zimxavier
Posts: 53
Joined: 17 Jan 2016 10:09
Location: France

Re: Replace with a column (advanced batch)

#14 Post by zimxavier » 01 Jul 2016 12:24

aGerman wrote:I would rather use another language. Save this JScript code with file extension .js and give it a shot

Code: Select all

var sDir1 = 'Folder1',
    sDir2 = 'Folder2';


var objFSO = new ActiveXObject('Scripting.FileSystemObject'),
    objDir1 = objFSO.GetFolder(sDir1),
    colFiles = new Enumerator(objDir1.Files);

for (; !colFiles.atEnd(); colFiles.moveNext()){
  var sFile1 = colFiles.item().Path;

  if (objFSO.FileExists(objFSO.BuildPath(sDir2, colFiles.item().Name))){
    var objFile1 = colFiles.item().OpenAsTextStream(1),
        objFile2 = objFSO.OpenTextFile(objFSO.BuildPath(sDir2, colFiles.item().Name), 1),
        objFileTmp = objFSO.OpenTextFile(sFile1 + '.~tmp', 2, true);

    while (!objFile1.AtEndOfStream && !objFile2.AtEndOfStream){
      var arr1 = tokenize(objFile1.ReadLine()),
          arr2 = tokenize(objFile2.ReadLine());
      if (arr1.length > 2 && arr2.length > 2){
        arr1[2] = arr2[2];
      }
      objFileTmp.WriteLine(arr1.join(';'));
    }

    objFile1.Close();
    objFile2.Close();
    objFileTmp.Close();

    objFSO.DeleteFile(sFile1);
    objFSO.MoveFile(sFile1 + '.~tmp', sFile1);
  }
}


function tokenize(s){
  var a = s.split(';'),
      r = a.splice(0, 3);
  r.push(a.join(';'));
  return r;
}

Regards
aGerman


@aGerman, could you tell me how I can use your file, but for another column ? For example, if I want to change the fourth column instead of the third one ?

aGerman
Expert
Posts: 4678
Joined: 22 Jan 2010 18:01
Location: Germany

Re: Replace with a column (advanced batch)

#15 Post by aGerman » 01 Jul 2016 12:48

See the marked changes ...

Code: Select all

    while (!objFile1.AtEndOfStream && !objFile2.AtEndOfStream){
      var arr1 = tokenize(objFile1.ReadLine()),
          arr2 = tokenize(objFile2.ReadLine());
      if (arr1.length > 3 && arr2.length > 3){    /* 2 -> 3 */
        arr1[3] = arr2[3];                        /* 2 -> 3 */
      }
      objFileTmp.WriteLine(arr1.join(';'));
    }

Code: Select all

function tokenize(s){
  var a = s.split(';'),
      r = a.splice(0, 4);                         /* 3 -> 4 */
  r.push(a.join(';'));
  return r;
}

... yet untested though.

Regards
aGerman

Post Reply