JREPL.BAT v8.6 - regex text processor with support for text highlighting and alternate character sets

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
brinda
Posts: 78
Joined: 25 Apr 2012 23:51

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#16 Post by brinda » 17 Nov 2014 21:32

thanks dave.

mcnd
Posts: 27
Joined: 08 Jan 2014 07:29

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#17 Post by mcnd » 18 Nov 2014 10:21

brinda wrote:dave,

Would it be possible to use Jrepl to remove duplicate lines (case sensitive) and leaving the lines in original order stripping blank lines as well.



More or less ...

Code edited to simplify the process

Code: Select all

jrepl "" "" /N 10 /f "inputFile" ^
    | sort /+11 ^
    | jrepl ".*?:(.*)$"  "x=p;p=$1;($1==x?false:$src);" /jmatch /jbeg "var p='',x" ^
    | sort ^
    | jrepl "^.*?:" "" > "outputFile"


Number the lines, sort on the data (past the line numbers), remove duplicates, sort on line numbers, remove line numbers
Last edited by mcnd on 19 Nov 2014 06:32, edited 1 time in total.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#18 Post by dbenham » 18 Nov 2014 21:51

Version 2 is here with two major new features :!: :D

Below is a brief summary with examples. See the help for more details.

/C (count lines)

The /C option counts the number of lines in the input and stores the value in a global cnt variable. This can be useful with the /J and /JMATCH options. If input is piped or redirected data, then a temporary file is written and deleted when done.

This TAIL.BAT example demonstrates one possible use of /C.

Code: Select all

::tail.bat  count  [/F inFile] [/O outFile|-] [/N minWidth]
@echo off
jrepl "^.*" "ln>cnt-%1?$0:false" /jmatch /c %2 %3 %4 %5 %6 %7
sample usage:

Code: Select all

C:\test>dir | tail 2
              13 File(s)         21,805 bytes
              26 Dir(s)  906,455,531,520 bytes free


/T (translate similar to unix tr or sed y commands) As requested by foxidrive - good idea!

This one is potentially huge. Expressions can break at each character, or at a user specified delimiter. Pretty much all the options can be used along with /T as needed: /L /X /J /JMATCH, etc.
I will be somewhat surprised if there is not a bug somewhere given the number of permutations possible. Please report if you find a problem.

And now for a few examples:

ROT13.BAT

Code: Select all

::ROT13.bat [/F inFile] [/O outFile|-] [/N minWidth]
@echo off
jrepl "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" "nopqrstuvwxyzabcdefghijklmNOPQRSTUVWXYZABCDEFGHIJKLM" /t "" %*
sample usage:

Code: Select all

C:\test>echo Hello world!|rot13
Uryyb jbeyq!

C:\test>echo Hello world!|rot13|rot13
Hello world!


InitCaps.bat

Code: Select all

::InitCaps.bat [/F inFile] [/O outFile|-] [/N minWidth]
@echo off
jrepl "\b[a-z] \B[A-Z]*" "$0.toUpperCase() $0.toLowerCase()" /j /t " " %*
sample usage:

Code: Select all

C:\test>echo goodbye CrUeL WORLD!|initcaps
Goodbye Cruel World!


Silly demonstration of captured subexpressions with /T (keeping track of reference numbers can get tricky):

Code: Select all

C:\test>echo Z-A a+z|jrepl "(.)\+(.) (.)-(.)" "$&:$2$3 $&:$6$5" /T " "
Z-A:AZ a+z:az
Note that each expression is translated into a captured submatch, so the reference numbers of the explicit captures must be adjusted accordingly.

---------------------------------------

So here is the code for version 2.2
JREPL2.2.zip
(7.29 KiB) Downloaded 2251 times


Dave Benham
Last edited by dbenham on 23 Nov 2014 18:13, edited 3 times in total.

brinda
Posts: 78
Joined: 25 Apr 2012 23:51

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#19 Post by brinda » 19 Nov 2014 06:58

dave,

the tail is good. can please help to show for head as well?


would like to know if below is doable with Jrepl.

text when copy paste from pdf or OCR scanned produce a lot of tabs spaces etc. To remove this we used the link below MS word macro
http://www.mobileread.com/forums/showthread.php?t=8793

it does not work at times on certain space so if jrepl could do this than it would be great. Plus we do not need MS words to format the text anymore.

sample test code below do not know on how to enter diacritic character

Code: Select all

this  is 1 paragraph .this  is 1 paragraph. this  is 1 paragraph .

this  is 2 paragraph     .   
this <tab>       is 2 paragraph , this  is 2, paragraph .
this   is 2 paragraph .

this is 3 paragraph ,this is 3 paragraph, this is 3 paragraph


snippet of original. Tab is denoted as <tab>

Code: Select all

very truthful,    and was very fond of serving brahmanas. He was indifferent to his family life, knowing that eve¬rything is temporary except d¬ie holy nam¬e of the Lord. In this way, he spent his days <tab> happily.    

Bha¬ndu Moh¬anty we¬nt to some villages to beg alms, but the peo¬ple had no food even for th¬emselves—how could they give alms to Bandhu Mohanty? He returned to his ho¬use without any food , all the while medit¬ating on the Lord . His wife told him that the children were very hungry.

They could not tolerate their hunger pa¬ngs any lo¬nger. She asked , " Don 't' you have some relative who can help us during this difficult time? Let us leave this pl¬ace and go to some other place where your rela¬tives are staying." Bandhu Mohanty replied, "I have no relatives to he¬lp me, but I do have a friend. But He liv¬es far from here. He is the best am¬ong all the peo¬ple. There is no one equal to him. He liv¬es in Sri Kshetra Puri dham



so far the one below works but i need to try a few times for desired affect

Code: Select all

jrepl "\t" "  " /x /l /f draw.txt /o -
jrepl "  " " " /l /f draw.txt /o -
jrepl "  " " " /l /f draw.txt /o -
jrepl "  " " " /l /f draw.txt /o -
jrepl " ." "." /l /f draw.txt /o -
jrepl " ," "," /l /f draw.txt /o -


here are the rules

a) for the symbols ,?!. is found, they should follow a word(no space). A space should be there after ,?!. before the next word

b) "(open quote) should have a space before and "(close quote) a space after

c) Capitalization of first character after ?!.

d) paragraph begins is determined by 2 times ENTER and end after with 2 times enter as well.

e) A tab is needed at the beginning of a paragraph with first letter capitalize.

f) If there is ENTER in between this paragraph, strip the enter

g) Any other tab or extra spaces should be strip. Only a single space before and after a word.

thanks,
brinda

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#20 Post by Squashman » 19 Nov 2014 07:41

Brinda,
You could use the MORE command to get rid of TABS.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#21 Post by dbenham » 19 Nov 2014 19:40

Head is a trivial variation on tail, only simpler because the /C is not needed.

Code: Select all

::head.bat  count  [/F inFile] [/O outFile|-] [/N minWidth]
@echo off
jrepl "^.*" "ln<=%1?$0:false" /jmatch %2 %3 %4 %5 %6 %7


Regarding your complex problem - I think this is what you are looking for.

I refined the spacing and capitalization rules for when quotes interact with the beginning and/or end of a sentence or paragraph. (quote before/after punctuation or tab) I also preserve decimal points within numbers (no spacing added).

Rather than create monster command lines, I build the command incrementally using environment variables. It also helps with documentation.

It took me a total of 5 JREPL calls. It seems to me it could be done in 4 calls (perhaps less), but this is the best I could do.
bug fix and simplification

Code: Select all

@echo off
setlocal
set "input=old.txt"
set "output=new.txt"

>"%output%" (
  echo(
  echo(
  type "%input%"
)

:: ----------------------------------------------
set "paraS=(?:[ \f\r\t\v]*\n){2}\s*" 1
set "paraR=quote=false; '\n\n\t'"

set "quoteS=\q"                      2
set "quoteR=(quote=!quote)?' '+$0+'{':$0+'} '"

set "whiteSpaceS=\s+"                3
set "whiteSpaceR=' '"

set "find=%paraS%@%quoteS%@%whiteSpaceS%"
set "repl=%paraR%@%quoteR%@%whiteSpaceR%"

call jrepl find repl /v /m /x /j /t @ /jbeg "var quote=false;" /f "%output%" /o -

:: ----------------------------------------------
set "decimalS=\d[.,]\d"              1
set "decimalR=$&"

set "endS=\s*([.!?,;]+)\s*"          2,3
set "endR=$3 "

set "find=%decimalS%@%endS%"
set "repl=%decimalR%@%endR%"

call jrepl find repl /v /t @ /f "%output%" /o -

:: ----------------------------------------------
set "begQuoteS=(\t| )? *(\q){ *"      1,2,3
set "begQuoteR=($2?$2:'')+$3"

set "endQuoteS= *(\q)} *([.!?])?"    4,5,6
set "endQuoteR=$5+($6?$6:' ')"

set "spaceS= +"                      7
set "spaceR=' '"

set "find=%begQuoteS%@%endQuoteS%@%spaceS%"
set "repl=%begQuoteR%@%endQuoteR%@%spaceR%"

call jrepl find repl /v /x /j /t @ /f "%output%" /o -

:: ----------------------------------------------
set "upperS=([\t.!?](?:\q ?| \q?)?)(\S)" 1,2,3
set "upperR=$2+$3.toUpperCase()"

set "trimS= +$"                          4
set "trimR=''"

set "find=%upperS%@%trimS%"
set "repl=%upperR%@%trimR%"

call jrepl find repl /v /m /x /j /t @ /f "%output%" /o -

:: ----------------------------------------------
set "emptyBottomS=(?:\r\n\t?)+(?!.)"     1
set "emptyBottomR=''"

set "emptyTopS=^[^\t]+"                  2
set "emptyTopR=ln++==0?'':$0"

set "find=%emptyBottomS%@%emptyTopS%"
set "repl=%emptyBottomR%@%emptyTopR%"

call jrepl find repl /v /m /j /t @ /f "%output%" /o -

type "%output%"


Dave Benham
Last edited by dbenham on 20 Nov 2014 00:08, edited 1 time in total.

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#22 Post by Squashman » 19 Nov 2014 23:16

dbenham wrote:/C (count lines)

The /C option counts the number of lines in the input and stores the value in a global cnt variable. This can be useful with the /J and /JMATCH options. If input is piped or redirected data, then a temporary file is written and deleted when done.

So in theory could you string together two JREPL commands to count the number of occurrences of a character in a line or string?

brinda
Posts: 78
Joined: 25 Apr 2012 23:51

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#23 Post by brinda » 20 Nov 2014 06:51

squashman,

did not know about MORE command on tabs.

/Tn Expand tabs to n spaces (default 8)

Thanks for letting me know. I could use this on small scripts.

Dave,

thank you for the tail variation(head). I could now have jrepl as a centrepoint for multiple command usage.

Could not thank you enough for Word replacement script. Works beautifully on windows 2000. In fact it works so well that even the error on old documents are found and adjusted. :D

Thank you so much for taking the time on doing this.

brinda

Aacini
Expert
Posts: 1913
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#24 Post by Aacini » 20 Nov 2014 15:40

Squashman wrote:
dbenham wrote:/C (count lines)

The /C option counts the number of lines in the input and stores the value in a global cnt variable. This can be useful with the /J and /JMATCH options. If input is piped or redirected data, then a temporary file is written and deleted when done.

So in theory could you string together two JREPL commands to count the number of occurrences of a character in a line or string?


Code: Select all

C:\> echo count the number of occurrences of a character in a line or string? |
FindRepl "o" /$:0
 "o"
 "o"
 "o"
 "o"
 "o"

C:\> echo %errorlevel%
5

C:\> (echo count the number of occurrences of a character in a line or string? &
 echo Or even in the whole file!) | FindRepl "o" /$:0 > nul

C:\> echo %errorlevel%
6

C:\> (echo count the number of occurrences of a character in a line or string? &
 echo Or even in the whole file!) | FindRepl /I "o" /$:0 > nul

C:\> echo %errorlevel%
7


Antonio

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#25 Post by Squashman » 20 Nov 2014 15:55

Thanks Antonio!

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#26 Post by dbenham » 20 Nov 2014 16:03

I've fixed a significant bug regarding /T option: It failed to work properly if a match was an empty string.

original bugged line 555 from version 2.0

Code: Select all

      _g.replFunc+=',$off,$src){for (var i=1;i<arguments.length-2;i++) if (arguments[i]) ';

Fixed line became

Code: Select all

      _g.replFunc+=',$off,$src){for (var i=1;i<arguments.length-2;i++)if (arguments[i]!==undefined) ';

I've updated the code in the prior posts to 2.1.


Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#27 Post by dbenham » 20 Nov 2014 17:24

Squashman wrote:
dbenham wrote:/C (count lines)

The /C option counts the number of lines in the input and stores the value in a global cnt variable. This can be useful with the /J and /JMATCH options. If input is piped or redirected data, then a temporary file is written and deleted when done.

So in theory could you string together two JREPL commands to count the number of occurrences of a character in a line or string?

I don't know what you had in mind with the /C option. But a minimal amount of JScript code can easily provide the anser:

Count the number of vowels in file test.txt

Code: Select all

jrepl "[aeiou]" ^
      "vowel++;false" ^
      /jbeg "vowel=0;" ^
      /jend "output.WriteLine(vowel);" ^
      /i /jmatch /f "old.txt"

/I - makes the search case insensitive
/JMATCH - discards content that does not match
/JBEG - declare and initialize a variable to hold the count (global by default since VAR keyword not used)
replace code - increment the count, return false to surpress match output
/JEND - print out the result.


Prefix each line with the number of vowels in the line:

Code: Select all

jrepl "^$ ^ [aeiou] $" ^
      "'0:'+$src vowel=0;false vowel++;false vowel+':'+$src" ^
      /i /jmatch /t " " /f test.bat

Explanation:
/I and /JMATCH are used as before
/T " " treats search and replace as an array of expressions. The searches are executed from left to right. Once a match is found, all subsequent searches are ignored until the search position is incremented

Code: Select all

index| search                  | replace
-----+-------------------------+--------------------------------------------------
  1  | match empty line        | print constant result
  2  | match beginning of line | reset count to 0, return false to suppress output
  3  | match vowel             | increment count, return false to suppress output
  4  | match end of line       | print the count, followed by the source line

If I "cheat" and re-purpose the $OFF variable, I can easily fix the width of the vowel count. The $OFF variable normally contains the offset of the match within the source line. It is also used by the /OFF option to display the offset with a fixed (minimum actually) width. I simply assign the vowel count at the end of the line to $OFF and add the /OFF option. I must first accumulate the count in a separate variable because $OFF is reset on each match.

Code: Select all

jrepl "^$ ^ [aeiou] $" ^
      "'0:'+$src vowel=0;false vowel++;false $off=vowel;$src" ^
      /i /jmatch /t " " /f test.bat /off 3



One other point especially for you Squashman - JREPL can be used with massive files as long as you do not use the /M option, and each line must be less than 2GB. But the total file size can be as big as you can stand to wait for. :D

VBS and JScript strings are limited to 2GB, which is where the restrictions come from. The /M option must load the entire file into a single string.


Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#28 Post by dbenham » 21 Nov 2014 21:43

Previous version 2 posts were updated to v2.2 to fix a bug when both /T and /L options were used.

original bugged lined 488:

Code: Select all

        cnt, test;

fixed line:

Code: Select all

        cnt=0, test;


Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#29 Post by dbenham » 23 Nov 2014 18:32

New features for version 3 :!:

1) /JBEGLN and /JENDLN options allow you to specify JScript code to execute at the beginning and end of a line. The code can modify the line if you want.


2) New global variables:

skip - If set to true, then skip the match phase of each line until set back to false. Very handy for restricting actions to a limited number of lines.

quit - If set to true, then do not read any more lines of input. This is an efficient way to terminate early once you get the result you are looking for.


3) New global method:

lpad(val, pad) - used to left pad strings (typically numbers).


4) Exception handling has been modified to identify when user code is the source of a run time error. It reports which code is the source of the problem.


Examples:

Much improved head.bat - it will terminate immediately once the desired number of lines have been printed. Great for huge files.

Code: Select all

::head.bat  count  [/F inFile] [/O outFile|-] [/N minWidth]
@echo off
jrepl "^" "" /jbeg "quit=(%1<=0)" /jbegln "quit=(ln>=%1)" %2 %3 %4 %5 %6 %7
exit /b


Improved tail.bat - it doesn't waste time matching skipped lines. Again an improvement for huge files, but not as significant as the new head.bat:

Code: Select all

::tail.bat  count  [/F inFile] [/O outFile|-] [/N minWidth]
@echo off
jrepl "^.*" "$0" /jbegln "skip=(ln<=cnt-(%1));" /jmatch /c %2 %3 %4 %5 %6 %7


Improved count of vowels per line - (the vowel count is prepended to each line). The logic is much simpler with JBEGLN/JENDLN then it was with the /T option.

Code: Select all

jrepl "[aeiou]" "vowel++;$0" /j /jbegln "vowel=0" /jendln "$txt=lpad(vowel,'000')+':'+$txt;" /f input.txt


JREPL.BAT v3.0
JREPL3.0.zip
(8.13 KiB) Downloaded 2293 times


Dave Benham

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: JREPL.BAT - regex text processor - successor to REPL.BAT

#30 Post by dbenham » 25 Nov 2014 20:13

Version 3.1 Changes

1) Exception handler now reports when there is a problem with the Search regular expression. Some users were not recognizing the error message as being related to the regex.

2) Added the /JLIB option that lets you load (include) JScript code from one or more files. Multiple files are delimited by forward slashes (/). The code is executed prior to any /JBEG code. It is useful for accessing a library of functions that can be used by any of the other /Jxxxx options.

3) Fixed a bug with the /X option - the extended ASCII escape sequences were not being decoded properly.

Version 3.2 Changes

4) Bug fix for /T without /JMATCH - Fixed dynamic repl function: was missing a set of {}

5) Added GOTO at top to improve startup performance


Version 3.3 Changes

6) Bug fix for when /JMATCH is combined with /M or /S


JREPL v3.3
JREPL3.3.zip
(8.5 KiB) Downloaded 2344 times


Dave Benham
Last edited by dbenham on 24 Dec 2014 12:20, edited 4 times in total.

Post Reply