JREPL.BAT v8.6 - regex text processor with support for text highlighting and alternate character sets
Moderator: DosItHelp
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
brinda wrote:dave,
Would it be possible to use Jrepl to remove duplicate lines (case sensitive) and leaving the lines in original order stripping blank lines as well.
More or less ...
Code edited to simplify the process
Code: Select all
jrepl "" "" /N 10 /f "inputFile" ^
| sort /+11 ^
| jrepl ".*?:(.*)$" "x=p;p=$1;($1==x?false:$src);" /jmatch /jbeg "var p='',x" ^
| sort ^
| jrepl "^.*?:" "" > "outputFile"
Number the lines, sort on the data (past the line numbers), remove duplicates, sort on line numbers, remove line numbers
Last edited by mcnd on 19 Nov 2014 06:32, edited 1 time in total.
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
Version 2 is here with two major new features
Below is a brief summary with examples. See the help for more details.
/C (count lines)
The /C option counts the number of lines in the input and stores the value in a global cnt variable. This can be useful with the /J and /JMATCH options. If input is piped or redirected data, then a temporary file is written and deleted when done.
This TAIL.BAT example demonstrates one possible use of /C.
sample usage:
/T (translate similar to unix tr or sed y commands) As requested by foxidrive - good idea!
This one is potentially huge. Expressions can break at each character, or at a user specified delimiter. Pretty much all the options can be used along with /T as needed: /L /X /J /JMATCH, etc.
I will be somewhat surprised if there is not a bug somewhere given the number of permutations possible. Please report if you find a problem.
And now for a few examples:
ROT13.BAT
sample usage:
InitCaps.bat
sample usage:
Silly demonstration of captured subexpressions with /T (keeping track of reference numbers can get tricky):
Note that each expression is translated into a captured submatch, so the reference numbers of the explicit captures must be adjusted accordingly.
---------------------------------------
So here is the code for version 2.2
Dave Benham
Below is a brief summary with examples. See the help for more details.
/C (count lines)
The /C option counts the number of lines in the input and stores the value in a global cnt variable. This can be useful with the /J and /JMATCH options. If input is piped or redirected data, then a temporary file is written and deleted when done.
This TAIL.BAT example demonstrates one possible use of /C.
Code: Select all
::tail.bat count [/F inFile] [/O outFile|-] [/N minWidth]
@echo off
jrepl "^.*" "ln>cnt-%1?$0:false" /jmatch /c %2 %3 %4 %5 %6 %7
Code: Select all
C:\test>dir | tail 2
13 File(s) 21,805 bytes
26 Dir(s) 906,455,531,520 bytes free
/T (translate similar to unix tr or sed y commands) As requested by foxidrive - good idea!
This one is potentially huge. Expressions can break at each character, or at a user specified delimiter. Pretty much all the options can be used along with /T as needed: /L /X /J /JMATCH, etc.
I will be somewhat surprised if there is not a bug somewhere given the number of permutations possible. Please report if you find a problem.
And now for a few examples:
ROT13.BAT
Code: Select all
::ROT13.bat [/F inFile] [/O outFile|-] [/N minWidth]
@echo off
jrepl "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ" "nopqrstuvwxyzabcdefghijklmNOPQRSTUVWXYZABCDEFGHIJKLM" /t "" %*
Code: Select all
C:\test>echo Hello world!|rot13
Uryyb jbeyq!
C:\test>echo Hello world!|rot13|rot13
Hello world!
InitCaps.bat
Code: Select all
::InitCaps.bat [/F inFile] [/O outFile|-] [/N minWidth]
@echo off
jrepl "\b[a-z] \B[A-Z]*" "$0.toUpperCase() $0.toLowerCase()" /j /t " " %*
Code: Select all
C:\test>echo goodbye CrUeL WORLD!|initcaps
Goodbye Cruel World!
Silly demonstration of captured subexpressions with /T (keeping track of reference numbers can get tricky):
Code: Select all
C:\test>echo Z-A a+z|jrepl "(.)\+(.) (.)-(.)" "$&:$2$3 $&:$6$5" /T " "
Z-A:AZ a+z:az
---------------------------------------
So here is the code for version 2.2
Dave Benham
Last edited by dbenham on 23 Nov 2014 18:13, edited 3 times in total.
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
dave,
the tail is good. can please help to show for head as well?
would like to know if below is doable with Jrepl.
text when copy paste from pdf or OCR scanned produce a lot of tabs spaces etc. To remove this we used the link below MS word macro
http://www.mobileread.com/forums/showthread.php?t=8793
it does not work at times on certain space so if jrepl could do this than it would be great. Plus we do not need MS words to format the text anymore.
sample test code below do not know on how to enter diacritic character
snippet of original. Tab is denoted as <tab>
so far the one below works but i need to try a few times for desired affect
here are the rules
a) for the symbols ,?!. is found, they should follow a word(no space). A space should be there after ,?!. before the next word
b) "(open quote) should have a space before and "(close quote) a space after
c) Capitalization of first character after ?!.
d) paragraph begins is determined by 2 times ENTER and end after with 2 times enter as well.
e) A tab is needed at the beginning of a paragraph with first letter capitalize.
f) If there is ENTER in between this paragraph, strip the enter
g) Any other tab or extra spaces should be strip. Only a single space before and after a word.
thanks,
brinda
the tail is good. can please help to show for head as well?
would like to know if below is doable with Jrepl.
text when copy paste from pdf or OCR scanned produce a lot of tabs spaces etc. To remove this we used the link below MS word macro
http://www.mobileread.com/forums/showthread.php?t=8793
it does not work at times on certain space so if jrepl could do this than it would be great. Plus we do not need MS words to format the text anymore.
sample test code below do not know on how to enter diacritic character
Code: Select all
this is 1 paragraph .this is 1 paragraph. this is 1 paragraph .
this is 2 paragraph .
this <tab> is 2 paragraph , this is 2, paragraph .
this is 2 paragraph .
this is 3 paragraph ,this is 3 paragraph, this is 3 paragraph
snippet of original. Tab is denoted as <tab>
Code: Select all
very truthful, and was very fond of serving brahmanas. He was indifferent to his family life, knowing that eve¬rything is temporary except d¬ie holy nam¬e of the Lord. In this way, he spent his days <tab> happily.
Bha¬ndu Moh¬anty we¬nt to some villages to beg alms, but the peo¬ple had no food even for th¬emselves—how could they give alms to Bandhu Mohanty? He returned to his ho¬use without any food , all the while medit¬ating on the Lord . His wife told him that the children were very hungry.
They could not tolerate their hunger pa¬ngs any lo¬nger. She asked , " Don 't' you have some relative who can help us during this difficult time? Let us leave this pl¬ace and go to some other place where your rela¬tives are staying." Bandhu Mohanty replied, "I have no relatives to he¬lp me, but I do have a friend. But He liv¬es far from here. He is the best am¬ong all the peo¬ple. There is no one equal to him. He liv¬es in Sri Kshetra Puri dham
so far the one below works but i need to try a few times for desired affect
Code: Select all
jrepl "\t" " " /x /l /f draw.txt /o -
jrepl " " " " /l /f draw.txt /o -
jrepl " " " " /l /f draw.txt /o -
jrepl " " " " /l /f draw.txt /o -
jrepl " ." "." /l /f draw.txt /o -
jrepl " ," "," /l /f draw.txt /o -
here are the rules
a) for the symbols ,?!. is found, they should follow a word(no space). A space should be there after ,?!. before the next word
b) "(open quote) should have a space before and "(close quote) a space after
c) Capitalization of first character after ?!.
d) paragraph begins is determined by 2 times ENTER and end after with 2 times enter as well.
e) A tab is needed at the beginning of a paragraph with first letter capitalize.
f) If there is ENTER in between this paragraph, strip the enter
g) Any other tab or extra spaces should be strip. Only a single space before and after a word.
thanks,
brinda
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
Brinda,
You could use the MORE command to get rid of TABS.
You could use the MORE command to get rid of TABS.
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
Head is a trivial variation on tail, only simpler because the /C is not needed.
Regarding your complex problem - I think this is what you are looking for.
I refined the spacing and capitalization rules for when quotes interact with the beginning and/or end of a sentence or paragraph. (quote before/after punctuation or tab) I also preserve decimal points within numbers (no spacing added).
Rather than create monster command lines, I build the command incrementally using environment variables. It also helps with documentation.
It took me a total of 5 JREPL calls. It seems to me it could be done in 4 calls (perhaps less), but this is the best I could do.
bug fix and simplification
Dave Benham
Code: Select all
::head.bat count [/F inFile] [/O outFile|-] [/N minWidth]
@echo off
jrepl "^.*" "ln<=%1?$0:false" /jmatch %2 %3 %4 %5 %6 %7
Regarding your complex problem - I think this is what you are looking for.
I refined the spacing and capitalization rules for when quotes interact with the beginning and/or end of a sentence or paragraph. (quote before/after punctuation or tab) I also preserve decimal points within numbers (no spacing added).
Rather than create monster command lines, I build the command incrementally using environment variables. It also helps with documentation.
It took me a total of 5 JREPL calls. It seems to me it could be done in 4 calls (perhaps less), but this is the best I could do.
bug fix and simplification
Code: Select all
@echo off
setlocal
set "input=old.txt"
set "output=new.txt"
>"%output%" (
echo(
echo(
type "%input%"
)
:: ----------------------------------------------
set "paraS=(?:[ \f\r\t\v]*\n){2}\s*" 1
set "paraR=quote=false; '\n\n\t'"
set "quoteS=\q" 2
set "quoteR=(quote=!quote)?' '+$0+'{':$0+'} '"
set "whiteSpaceS=\s+" 3
set "whiteSpaceR=' '"
set "find=%paraS%@%quoteS%@%whiteSpaceS%"
set "repl=%paraR%@%quoteR%@%whiteSpaceR%"
call jrepl find repl /v /m /x /j /t @ /jbeg "var quote=false;" /f "%output%" /o -
:: ----------------------------------------------
set "decimalS=\d[.,]\d" 1
set "decimalR=$&"
set "endS=\s*([.!?,;]+)\s*" 2,3
set "endR=$3 "
set "find=%decimalS%@%endS%"
set "repl=%decimalR%@%endR%"
call jrepl find repl /v /t @ /f "%output%" /o -
:: ----------------------------------------------
set "begQuoteS=(\t| )? *(\q){ *" 1,2,3
set "begQuoteR=($2?$2:'')+$3"
set "endQuoteS= *(\q)} *([.!?])?" 4,5,6
set "endQuoteR=$5+($6?$6:' ')"
set "spaceS= +" 7
set "spaceR=' '"
set "find=%begQuoteS%@%endQuoteS%@%spaceS%"
set "repl=%begQuoteR%@%endQuoteR%@%spaceR%"
call jrepl find repl /v /x /j /t @ /f "%output%" /o -
:: ----------------------------------------------
set "upperS=([\t.!?](?:\q ?| \q?)?)(\S)" 1,2,3
set "upperR=$2+$3.toUpperCase()"
set "trimS= +$" 4
set "trimR=''"
set "find=%upperS%@%trimS%"
set "repl=%upperR%@%trimR%"
call jrepl find repl /v /m /x /j /t @ /f "%output%" /o -
:: ----------------------------------------------
set "emptyBottomS=(?:\r\n\t?)+(?!.)" 1
set "emptyBottomR=''"
set "emptyTopS=^[^\t]+" 2
set "emptyTopR=ln++==0?'':$0"
set "find=%emptyBottomS%@%emptyTopS%"
set "repl=%emptyBottomR%@%emptyTopR%"
call jrepl find repl /v /m /j /t @ /f "%output%" /o -
type "%output%"
Dave Benham
Last edited by dbenham on 20 Nov 2014 00:08, edited 1 time in total.
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
dbenham wrote:/C (count lines)
The /C option counts the number of lines in the input and stores the value in a global cnt variable. This can be useful with the /J and /JMATCH options. If input is piped or redirected data, then a temporary file is written and deleted when done.
So in theory could you string together two JREPL commands to count the number of occurrences of a character in a line or string?
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
squashman,
did not know about MORE command on tabs.
/Tn Expand tabs to n spaces (default 8)
Thanks for letting me know. I could use this on small scripts.
Dave,
thank you for the tail variation(head). I could now have jrepl as a centrepoint for multiple command usage.
Could not thank you enough for Word replacement script. Works beautifully on windows 2000. In fact it works so well that even the error on old documents are found and adjusted.
Thank you so much for taking the time on doing this.
brinda
did not know about MORE command on tabs.
/Tn Expand tabs to n spaces (default 8)
Thanks for letting me know. I could use this on small scripts.
Dave,
thank you for the tail variation(head). I could now have jrepl as a centrepoint for multiple command usage.
Could not thank you enough for Word replacement script. Works beautifully on windows 2000. In fact it works so well that even the error on old documents are found and adjusted.
Thank you so much for taking the time on doing this.
brinda
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
Squashman wrote:dbenham wrote:/C (count lines)
The /C option counts the number of lines in the input and stores the value in a global cnt variable. This can be useful with the /J and /JMATCH options. If input is piped or redirected data, then a temporary file is written and deleted when done.
So in theory could you string together two JREPL commands to count the number of occurrences of a character in a line or string?
Code: Select all
C:\> echo count the number of occurrences of a character in a line or string? |
FindRepl "o" /$:0
"o"
"o"
"o"
"o"
"o"
C:\> echo %errorlevel%
5
C:\> (echo count the number of occurrences of a character in a line or string? &
echo Or even in the whole file!) | FindRepl "o" /$:0 > nul
C:\> echo %errorlevel%
6
C:\> (echo count the number of occurrences of a character in a line or string? &
echo Or even in the whole file!) | FindRepl /I "o" /$:0 > nul
C:\> echo %errorlevel%
7
Antonio
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
Thanks Antonio!
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
I've fixed a significant bug regarding /T option: It failed to work properly if a match was an empty string.
original bugged line 555 from version 2.0
Fixed line became
I've updated the code in the prior posts to 2.1.
Dave Benham
original bugged line 555 from version 2.0
Code: Select all
_g.replFunc+=',$off,$src){for (var i=1;i<arguments.length-2;i++) if (arguments[i]) ';
Fixed line became
Code: Select all
_g.replFunc+=',$off,$src){for (var i=1;i<arguments.length-2;i++)if (arguments[i]!==undefined) ';
I've updated the code in the prior posts to 2.1.
Dave Benham
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
Squashman wrote:dbenham wrote:/C (count lines)
The /C option counts the number of lines in the input and stores the value in a global cnt variable. This can be useful with the /J and /JMATCH options. If input is piped or redirected data, then a temporary file is written and deleted when done.
So in theory could you string together two JREPL commands to count the number of occurrences of a character in a line or string?
I don't know what you had in mind with the /C option. But a minimal amount of JScript code can easily provide the anser:
Count the number of vowels in file test.txt
Code: Select all
jrepl "[aeiou]" ^
"vowel++;false" ^
/jbeg "vowel=0;" ^
/jend "output.WriteLine(vowel);" ^
/i /jmatch /f "old.txt"
/I - makes the search case insensitive
/JMATCH - discards content that does not match
/JBEG - declare and initialize a variable to hold the count (global by default since VAR keyword not used)
replace code - increment the count, return false to surpress match output
/JEND - print out the result.
Prefix each line with the number of vowels in the line:
Code: Select all
jrepl "^$ ^ [aeiou] $" ^
"'0:'+$src vowel=0;false vowel++;false vowel+':'+$src" ^
/i /jmatch /t " " /f test.bat
Explanation:
/I and /JMATCH are used as before
/T " " treats search and replace as an array of expressions. The searches are executed from left to right. Once a match is found, all subsequent searches are ignored until the search position is incremented
Code: Select all
index| search | replace
-----+-------------------------+--------------------------------------------------
1 | match empty line | print constant result
2 | match beginning of line | reset count to 0, return false to suppress output
3 | match vowel | increment count, return false to suppress output
4 | match end of line | print the count, followed by the source line
If I "cheat" and re-purpose the $OFF variable, I can easily fix the width of the vowel count. The $OFF variable normally contains the offset of the match within the source line. It is also used by the /OFF option to display the offset with a fixed (minimum actually) width. I simply assign the vowel count at the end of the line to $OFF and add the /OFF option. I must first accumulate the count in a separate variable because $OFF is reset on each match.
Code: Select all
jrepl "^$ ^ [aeiou] $" ^
"'0:'+$src vowel=0;false vowel++;false $off=vowel;$src" ^
/i /jmatch /t " " /f test.bat /off 3
One other point especially for you Squashman - JREPL can be used with massive files as long as you do not use the /M option, and each line must be less than 2GB. But the total file size can be as big as you can stand to wait for.
VBS and JScript strings are limited to 2GB, which is where the restrictions come from. The /M option must load the entire file into a single string.
Dave Benham
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
Previous version 2 posts were updated to v2.2 to fix a bug when both /T and /L options were used.
original bugged lined 488:
fixed line:
Dave Benham
original bugged lined 488:
Code: Select all
cnt, test;
fixed line:
Code: Select all
cnt=0, test;
Dave Benham
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
New features for version 3
1) /JBEGLN and /JENDLN options allow you to specify JScript code to execute at the beginning and end of a line. The code can modify the line if you want.
2) New global variables:
skip - If set to true, then skip the match phase of each line until set back to false. Very handy for restricting actions to a limited number of lines.
quit - If set to true, then do not read any more lines of input. This is an efficient way to terminate early once you get the result you are looking for.
3) New global method:
lpad(val, pad) - used to left pad strings (typically numbers).
4) Exception handling has been modified to identify when user code is the source of a run time error. It reports which code is the source of the problem.
Examples:
Much improved head.bat - it will terminate immediately once the desired number of lines have been printed. Great for huge files.
Improved tail.bat - it doesn't waste time matching skipped lines. Again an improvement for huge files, but not as significant as the new head.bat:
Improved count of vowels per line - (the vowel count is prepended to each line). The logic is much simpler with JBEGLN/JENDLN then it was with the /T option.
JREPL.BAT v3.0
Dave Benham
1) /JBEGLN and /JENDLN options allow you to specify JScript code to execute at the beginning and end of a line. The code can modify the line if you want.
2) New global variables:
skip - If set to true, then skip the match phase of each line until set back to false. Very handy for restricting actions to a limited number of lines.
quit - If set to true, then do not read any more lines of input. This is an efficient way to terminate early once you get the result you are looking for.
3) New global method:
lpad(val, pad) - used to left pad strings (typically numbers).
4) Exception handling has been modified to identify when user code is the source of a run time error. It reports which code is the source of the problem.
Examples:
Much improved head.bat - it will terminate immediately once the desired number of lines have been printed. Great for huge files.
Code: Select all
::head.bat count [/F inFile] [/O outFile|-] [/N minWidth]
@echo off
jrepl "^" "" /jbeg "quit=(%1<=0)" /jbegln "quit=(ln>=%1)" %2 %3 %4 %5 %6 %7
exit /b
Improved tail.bat - it doesn't waste time matching skipped lines. Again an improvement for huge files, but not as significant as the new head.bat:
Code: Select all
::tail.bat count [/F inFile] [/O outFile|-] [/N minWidth]
@echo off
jrepl "^.*" "$0" /jbegln "skip=(ln<=cnt-(%1));" /jmatch /c %2 %3 %4 %5 %6 %7
Improved count of vowels per line - (the vowel count is prepended to each line). The logic is much simpler with JBEGLN/JENDLN then it was with the /T option.
Code: Select all
jrepl "[aeiou]" "vowel++;$0" /j /jbegln "vowel=0" /jendln "$txt=lpad(vowel,'000')+':'+$txt;" /f input.txt
JREPL.BAT v3.0
Dave Benham
Re: JREPL.BAT - regex text processor - successor to REPL.BAT
Version 3.1 Changes
1) Exception handler now reports when there is a problem with the Search regular expression. Some users were not recognizing the error message as being related to the regex.
2) Added the /JLIB option that lets you load (include) JScript code from one or more files. Multiple files are delimited by forward slashes (/). The code is executed prior to any /JBEG code. It is useful for accessing a library of functions that can be used by any of the other /Jxxxx options.
3) Fixed a bug with the /X option - the extended ASCII escape sequences were not being decoded properly.
Version 3.2 Changes
4) Bug fix for /T without /JMATCH - Fixed dynamic repl function: was missing a set of {}
5) Added GOTO at top to improve startup performance
Version 3.3 Changes
6) Bug fix for when /JMATCH is combined with /M or /S
JREPL v3.3
Dave Benham
1) Exception handler now reports when there is a problem with the Search regular expression. Some users were not recognizing the error message as being related to the regex.
2) Added the /JLIB option that lets you load (include) JScript code from one or more files. Multiple files are delimited by forward slashes (/). The code is executed prior to any /JBEG code. It is useful for accessing a library of functions that can be used by any of the other /Jxxxx options.
3) Fixed a bug with the /X option - the extended ASCII escape sequences were not being decoded properly.
Version 3.2 Changes
4) Bug fix for /T without /JMATCH - Fixed dynamic repl function: was missing a set of {}
5) Added GOTO at top to improve startup performance
Version 3.3 Changes
6) Bug fix for when /JMATCH is combined with /M or /S
JREPL v3.3
Dave Benham
Last edited by dbenham on 24 Dec 2014 12:20, edited 4 times in total.