JREPL.BAT v8.6 - regex text processor with support for text highlighting and alternate character sets

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: JREN.BAT - Rename files/folders using regular expression

#46 Post by foxidrive » 22 Jan 2015 03:33

dbenham wrote:That is exactly the type of problem that JREPL.BAT is designed to solve. The /J option provides a mechanism to apply the JScript toUpperCase() method to a portion of the replacement string.

Your question really belongs in the JREPL.BAT thread.


Mucho goodness Dave, thanks for taking the time to show and explain this. (posts moved too).

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#47 Post by dbenham » 22 Jan 2015 13:25

Here is version 3.4 with a minor bug fix. The option parser defined a variable named TEST that could interfere with a user supplied variable name via the /S or /V option. The internal name was changed to /TEST to make it unlikely to collide with user defined variables. Also, the documentation was updated to indicate that /S and /V variable names cannot match an option name, nor be /TEST.

JREPL.BAT Version 3.4
JREPL3.4.zip
(8.65 KiB) Downloaded 2471 times


Dave Benham

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#48 Post by foxidrive » 23 Jan 2015 22:51

Hi Dave,

The code below gives this line - have I done something unusual?

File 'Airborne.mkv': container: Matroska call jeval "-1000000000"1647920000000

I had expected this: edit: and I just found that it was because of the ? in (.*?)


File 'Airborne.mkv': container: Matroska call jeval "1647920000000-1000000000"


Code: Select all

@echo off
echo File 'Airborne.mkv': container: Matroska [duration:1647920000000|jrepl  "\[duration:(.*?)" "call jeval \q$1-1000000000\q" /x
    pause
   goto :EOF

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#49 Post by dbenham » 23 Jan 2015 23:51

Yes.

(.*?) is a non-greedy match, meaning it will match the minimum amount possible and still have the remainder of the search match. In this case it matches nothing.

(.*) is a greedy match, meaning it will match the maximum amount possible, yet still have the remainder of the search match. In this case it matches to the end of the string.


Dave Benham

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#50 Post by foxidrive » 24 Jan 2015 00:44

dbenham wrote:Yes.

(.*?) is a non-greedy match, meaning it will match the minimum amount possible and still have the remainder of the search match. In this case it matches nothing.


Thanks.

bars143
Posts: 87
Joined: 01 Sep 2013 20:47

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#51 Post by bars143 » 26 Jan 2015 20:28

Hi @Dbenham,

do you have Jreply script that translate output.txt from english-cebuano.txt source to translated.txt

please read my request to @Aacini in this link:

this page 39510

but the output should have " - " and " ' " :

Code: Select all

my
user-name
can't
be
Bars


the result after translation should be:

Code: Select all

ang akong
user-name
'ili
mao ang
Bars


source of translation is:

Code: Select all

my , ang akong
user-name , user-name
can't , 'ili
be , mao ang
Bars , Bars


my delimeter is comma symbol but you can change it if something wrong in using that character delimeter in creating my own tranlation text file.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#52 Post by dbenham » 26 Jan 2015 22:16

That will be a very crude translation, given that the same word can have many meanings, and thus different translations, depending on context.

But your request is easily done. I create a dictionary using environment variables. The JREPL search term identifies each word in the text, and then replaces it with the environment variable value if it exists. If the translation does not exist, then it preserves the original word.

I also check the initial letter of the original word, and if it is upper case, then I make the initial letter of the translation upper case as well.

I designed the code to work properly with normal text in paragraph form. The input does not need to have each word on a separate line.

For my test I removed user-name and Bars from the dictionary to verify that unknown words are preserved as is.

translate.txt

Code: Select all

my , ang akong
can't , 'ili
be , mao ang

test.txt

Code: Select all

"My user-name can't be Bars!"

code

Code: Select all

@echo off
setlocal

:: Delete any pre-existing _ variables
for /f "delims==" %%V in ('set _ 2^>nul') do set "%%V="

:: Load the dictionary
for /f "tokens=1* delims=, " %%A in (translate.txt) do set "_%%A=%%B"

:: Translate the text
call jrepl "(?:([A-Z])|[a-z0-9])(?:\S*[A-Za-z0-9])?" ^
           "word=env('_'+$0)?env('_'+$0):$0;$1?word.slice(0,1).toUpperCase()+word.slice(1):word" ^
           /j /f test.txt

--OUTPUT--

Code: Select all

"Ang akong user-name 'ili mao ang Bars!"


Dave Benham

bars143
Posts: 87
Joined: 01 Sep 2013 20:47

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#53 Post by bars143 » 27 Jan 2015 01:25

dbenham wrote:That will be a very crude translation, given that the same word can have many meanings, and thus different translations, depending on context.

But your request is easily done. I create a dictionary using environment variables. The JREPL search term identifies each word in the text, and then replaces it with the environment variable value if it exists. If the translation does not exist, then it preserves the original word.

I also check the initial letter of the original word, and if it is upper case, then I make the initial letter of the translation upper case as well.

I designed the code to work properly with normal text in paragraph form. The input does not need to have each word on a separate line.

For my test I removed user-name and Bars from the dictionary to verify that unknown words are preserved as is.

translate.txt

Code: Select all

my , ang akong
can't , 'ili
be , mao ang

test.txt

Code: Select all

"My user-name can't be Bars!"

code

Code: Select all

@echo off
setlocal

:: Delete any pre-existing _ variables
for /f "delims==" %%V in ('set _ 2^>nul') do set "%%V="

:: Load the dictionary
for /f "tokens=1* delims=, " %%A in (translate.txt) do set "_%%A=%%B"

:: Translate the text
call jrepl "(?:([A-Z])|[a-z0-9])(?:\S*[A-Za-z0-9])?" ^
           "word=env('_'+$0)?env('_'+$0):$0;$1?word.slice(0,1).toUpperCase()+word.slice(1):word" ^
           /j /f test.txt

--OUTPUT--

Code: Select all

"Ang akong user-name 'ili mao ang Bars!"


Dave Benham


@Dbenham,

Thanks for quick reply and its also a bigger help to your additional info as its a coincidence that i had planned it as you suggested above some option.

with your great answer then i can start creating dictionary file to suit my dialect language.
Cebuano language, is a second dialect language in my country Philippine .

i will start to create DVD subtitle's cebuano version soon.

Bars

bars143
Posts: 87
Joined: 01 Sep 2013 20:47

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#54 Post by bars143 » 27 Jan 2015 21:26

@Dbenham , a big thanks for your good scripts -- it works on multi-lines with one empty line per sentence including .srt format that required timestamp before a sentence like this example:

1
00:00:24,827 --> 00:00:29,827
"My user-name can't be Bars!"

2
00:00:59,587 --> 00:01:04,587
My other user-name is Bars143!



here are my files to works on my assignment:

translation source file --> english-cebuano.txt

Code: Select all

my , ang akong
can't , 'ili
be , mao ang
other , uban
is , ay


input subtitle file --> english.srt

Code: Select all

1
00:00:24,827 --> 00:00:29,827
My user-name can't be Bars!

2
00:00:59,587 --> 00:01:04,587
My other user-name is Bars143!



output subtitle file -->cebuano.srt

Code: Select all

1
00:00:24,827 --> 00:00:29,827
Ang akong user-name 'ili mao ang Bars!

2
00:00:59,587 --> 00:01:04,587
Ang akong uban user-name ay Bars143!



but the actual subtile file can be 625 sentences/lines or above in one DVD movie.
i think it can be working on big srt file too.

here is my edited code based on your scripts added output file at the end of script:

Code: Select all

@echo off
setlocal

:: Delete any pre-existing _ variables
for /f "delims==" %%V in ('set _ 2^>nul') do set "%%V="

:: Load the dictionary
for /f "tokens=1* delims=, " %%A in (english-cebuano.txt) do set "_%%A=%%B"

:: Translate the text
call jrepl "(?:([A-Z])|[a-z0-9])(?:\S*[A-Za-z0-9])?" ^
           "word=env('_'+$0)?env('_'+$0):$0;$1?word.slice(0,1).toUpperCase()+word.slice(1):word" ^
           /j /f english.srt >cebuano.srt


:D

Bars

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#55 Post by foxidrive » 06 Apr 2015 06:28

Can someone spare a few brain cells to say if this is feasible with jrepl, or findrepl?

I have a file like this and I want to remove the lines in each block of dates but keep last 4 lines.
So all lines would be kept - but in each block of dates only the last 4 would be kept.

There is further text on the end of many lines if that is significant.

Code: Select all

[FONT=Courier New]
See the top

[b][color=orange]============[/color][/b]

[b][color=green]Date:[/color]/[b]

02/02/2003          [b][color=green]$02.50=[/color][/b] 
25/00/2004      [color=red]$28.50 /05[/color]   -$6.54 
06/02/2004      [color=blue]$04.20 /05[/color]    $0.08
22/03/2004      [color=blue]$09.70 /05[/color]    $0.64
05/04/2004      [color=red]$28.50 /05[/color]   -$6.54 
03/05/2004       [color=blue]$9.40 /05[/color]    $0.78
30/05/2004      [color=blue]$00.20 /05[/color]    $0.93
07/06/2004      [color=blue]$00.95 /05[/color]    $0.00
04/06/2004      [color=red]$39.25 /05[/color]   -$3.27 
24/06/2004          [b][color=green]$00.00=[/color][/b] 
09/07/2004      [color=red]$84.50 /05[/color]   -$2.04 
26/07/2004      [color=blue]$04.85 /05[/color]    $0.20
06/09/2004      [color=blue]$03.45 /05[/color]    $0.02
03/09/2004      [color=blue]$02.40 /05[/color]    $0.03
20/09/2004      [color=blue]$32.85 /05[/color]    $2.70
27/09/2004      [color=red]$85.50 /05[/color]   -$2.03 
08/00/2004          [b][color=green]$08.00=[/color][/b] 
08/00/2004      [color=blue]$04.70 /05[/color]    $0.23
06/02/2004      [color=red]$85.50 /05[/color]   -$2.03 
04/02/2005      [color=red]$85.50 /05[/color]   -$2.03 
22/02/2005      [color=blue]$35.50 /05[/color]    $2.96
04/03/2005      [color=blue]$22.50 /05[/color]    $0.88
28/03/2005      [color=blue]$20.80 /05[/color]    $0.73

[b][color=orange]============[/color][/b]

[b][color=green]Date:[/color]/[b]

09/00/2003      [color=blue]$20.40 /00=[/color]    $2.04
06/00/2003      [color=red]$28.50 /00=[/color]   -$2.04 
25/00/2004      [color=red]$28.50 /05[/color]   -$6.54 
06/00/2004          [b][color=green]$00.00=[/color][/b] 
08/02/2004          [b][color=green]$02.00=[/color][/b] 
06/02/2004      [color=blue]$04.20 /05[/color]    $0.08
22/03/2004      [color=blue]$09.70 /05[/color]    $0.64
05/04/2004      [color=red]$28.50 /05[/color]   -$6.54 
03/05/2004       [color=blue]$9.40 /05[/color]    $0.78
30/05/2004      [color=blue]$00.20 /05[/color]    $0.93
07/06/2004      [color=blue]$00.95 /05[/color]    $0.00
04/06/2004      [color=red]$39.25 /05[/color]   -$3.27 
09/07/2004      [color=red]$84.50 /05[/color]   -$2.04 
26/07/2004      [color=blue]$04.85 /05[/color]    $0.20
06/09/2004      [color=blue]$03.45 /05[/color]    $0.02
03/09/2004      [color=blue]$02.40 /05[/color]    $0.03
20/09/2004      [color=blue]$32.85 /05[/color]    $2.70
27/09/2004      [color=red]$85.50 /05[/color]   -$2.03 
08/00/2004      [color=blue]$04.70 /05[/color]    $0.23
06/02/2004      [color=red]$85.50 /05[/color]   -$2.03 
04/02/2005      [color=red]$85.50 /05[/color]   -$2.03 
22/02/2005      [color=blue]$35.50 /05[/color]    $2.96
04/03/2005      [color=blue]$22.50 /05[/color]    $0.88
28/03/2005      [color=blue]$20.80 /05[/color]    $0.73

[b][color=orange]============[/color][/b]

[b][color=green]Date:[/color]/[b]

09/00/2003      [color=blue]$20.40 /00=[/color]    $2.04
06/00/2003      [color=red]$28.50 /00=[/color]   -$2.04 
25/00/2004      [color=red]$28.50 /05[/color]   -$6.54 
06/02/2004      [color=blue]$04.20 /05[/color]    $0.08
22/03/2004      [color=blue]$09.70 /05[/color]    $0.64
05/04/2004      [color=red]$28.50 /05[/color]   -$6.54 
03/05/2004       [color=blue]$9.40 /05[/color]    $0.78
30/05/2004      [color=blue]$00.20 /05[/color]    $0.93
07/06/2004      [color=blue]$00.95 /05[/color]    $0.00
04/06/2004      [color=red]$39.25 /05[/color]   -$3.27 
09/07/2004      [color=red]$84.50 /05[/color]   -$2.04 
26/07/2004      [color=blue]$04.85 /05[/color]    $0.20
06/09/2004      [color=blue]$03.45 /05[/color]    $0.02
03/09/2004      [color=blue]$02.40 /05[/color]    $0.03
20/09/2004      [color=blue]$32.85 /05[/color]    $2.70
27/09/2004      [color=red]$85.50 /05[/color]   -$2.03 
08/00/2004      [color=blue]$04.70 /05[/color]    $0.23
06/02/2004      [color=red]$85.50 /05[/color]   -$2.03 
06/00/2005          [b][color=green]$20.00=[/color][/b] 
04/02/2005      [color=red]$85.50 /05[/color]   -$2.03 
22/02/2005      [color=blue]$35.50 /05[/color]    $2.96
04/03/2005      [color=blue]$22.50 /05[/color]    $0.88
28/03/2005      [color=blue]$20.80 /05[/color]    $0.73

[/FONT]


I could split it up into files and use the tail feature but I wondered if there was a 'cleverer' method. Ta.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#56 Post by dbenham » 06 Apr 2015 07:29

25/00/2004 is a date :?: :!: :shock:
So your months are 0 based (0=jan, 11=dec) :?:

Anyway, the answer is actually quite simple - look for consecutive one or more lines beginning with a date that precede (look ahead) 4 lines that begin with a date, and replace with nothing.

Code: Select all

jrepl "(^\d\d/\d\d/\d{4} .*\n)+(?=(^\d\d/\d\d/\d{4} .*\n){4})" "" /m /f test.txt

sample output:

Code: Select all

[FONT=Courier New]
See the top

[b][color=orange]============[/color][/b]

[b][color=green]Date:[/color]/[b]

04/02/2005      [color=red]$85.50 /05[/color]   -$2.03
22/02/2005      [color=blue]$35.50 /05[/color]    $2.96
04/03/2005      [color=blue]$22.50 /05[/color]    $0.88
28/03/2005      [color=blue]$20.80 /05[/color]    $0.73

[b][color=orange]============[/color][/b]

[b][color=green]Date:[/color]/[b]

04/02/2005      [color=red]$85.50 /05[/color]   -$2.03
22/02/2005      [color=blue]$35.50 /05[/color]    $2.96
04/03/2005      [color=blue]$22.50 /05[/color]    $0.88
28/03/2005      [color=blue]$20.80 /05[/color]    $0.73

[b][color=orange]============[/color][/b]

[b][color=green]Date:[/color]/[b]

04/02/2005      [color=red]$85.50 /05[/color]   -$2.03
22/02/2005 ]     [color=blue]$35.50 /05[/color]    $2.96
04/03/2005      [color=blue]$22.50 /05[/color]    $0.88
28/03/2005      [color=blue]$20.80 /05[/color]    $0.73

[/FONT]


Having solved this, I now realize that head and tail could be implemented without any user suplied JScript or /C option.

head.bat

Code: Select all

::head.bat   count   [/F inFile]  [/O outFile|-]  [/N minWidth]
@echo off
jrepl "((.*\n){%~1})[\s\S]+" "$1" /m %2 %3 %4 %5 %6 %7

tail.bat

Code: Select all

::tail.bat   count   [/F inFile]   [/O outFile|-]   [/N minWidth]
@echo off
jrepl "(.*\n)+(?=(.*\n|.+(?![\s\S])){%~1})" "" /m %2 %3 %4 %5 %6 %7

But these new versions are limited to ~2GB files because of the /M option, and I think they also may be less efficient. I would still use the older versions.


Dave Benham

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#57 Post by foxidrive » 07 Apr 2015 04:56

dbenham wrote:25/00/2004 is a date :?: :!: :shock:
So your months are 0 based (0=jan, 11=dec) :?:


I obfuscated the data a tad by globally replacing a few numbers with a different number, but your solution looks wonderful - if only I could figure out how it works.

Anyway, the answer is actually quite simple - look for consecutive one or more lines beginning with a date that precede (look ahead) 4 lines that begin with a date, and replace with nothing.

Code: Select all

jrepl "(^\d\d/\d\d/\d{4} .*\n)+(?=(^\d\d/\d\d/\d{4} .*\n){4})" "" /m /f test.txt



I can see it's checking for the date format at the start of the lines and taking the entire line with the CR at the end
and the next bit +(?=(^\d\d/\d\d/\d{4} .*\n){4}) is adding 4 more of the same - using the + as some arithmetic
to add 4 of the next (same) terms.

I have a basic appreciation of how it works but the technique mashes my brain like adding a quart of whiskey to my orange juice,
and I don't quite follow the lookaheads. But thank you for your generous assistance and I'll be able to apply the same code
in future by copying and testing. :)

Having solved this, I now realize that head and tail could be implemented without any user suplied JScript or /C option.

head.bat

Code: Select all

::head.bat   count   [/F inFile]  [/O outFile|-]  [/N minWidth]
@echo off
jrepl "((.*\n){%~1})[\s\S]+" "$1" /m %2 %3 %4 %5 %6 %7

tail.bat

Code: Select all

::tail.bat   count   [/F inFile]   [/O outFile|-]   [/N minWidth]
@echo off
jrepl "(.*\n)+(?=(.*\n|.+(?![\s\S])){%~1})" "" /m %2 %3 %4 %5 %6 %7

But these new versions are limited to ~2GB files because of the /M option, and I think they also may be less efficient. I would still use the older versions.


Dave Benham


Thanks for these extra options too.

My file is only 300 kb so it scrapes in. :D

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#58 Post by dbenham » 07 Apr 2015 05:20

You just need to study the regex syntax a bit more :wink:

The + means the previous expression must match 1 or more times (similar to * which matches 0 or more times)

The ?= is a look-ahead construct, meaning the expression within the parentheses (the 4 lines) must follow the previous expression, but the content is not considered part of the match.

So, in summary, it looks for as many matching lines as it can find that precede the 4 matching lines. Since the last 4 matching lines are not included in the match, we can replace everything with nothing.


Dave Benham

foxidrive
Expert
Posts: 6031
Joined: 10 Feb 2012 02:20

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#59 Post by foxidrive » 07 Apr 2015 08:45

Thanks for the clear explanation. I see now where my mistaken impressions were failing me.



Juts now I've been playing with another aspect, using the multi-line /m switch and here once more I seem to have misunderstood something.

I had expected every text from *** and onwards to be removed but this is not to be.
Can you please advise how I can remove the last part of the file from *** at the start of a line?

Code: Select all

@echo off

(
echo(aaa
echo(
echo(***
echo(
echo(bbb
)>accounts.txt

:remove text at bottom from *** and onward
echo ======
call jrepl "^(.*)\*\*\*.*" "$1" /m /f accounts.txt
echo ======
call jrepl "^(.*)\2A\2A\2A.*" "$1" /x /m /f accounts.txt
echo ======
pause

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#60 Post by dbenham » 07 Apr 2015 10:08

^\*\*\*$ Matches your *** line
[\s\S] matches any character
so [\s\S]* matches everything after your ***, including the \r and \n characters at the end of ***

Putting it all together:

Code: Select all

jrepl "^\*\*\*$[\s\S]*" "" /m /f test.txt


Dave Benham

Post Reply