REPLVAR.BAT - regex search and replace for variables
Posted: 05 Apr 2014 00:12
I was thinking about the problem of replacing = characters in variable content, as well as other thorny string search and replace issues in batch. My REPL.BAT utility was primarily built to work with files (via pipes or redirection), but it also supports input via an environment variable. The difficulty is how to reliably capture the stdout output in a variable. FOR /F can be used, but carriage returns and line feeds can be troublesome. And then there is the problem of corruption of ! (and ^) when expanding FOR variables while delayed expansion is enabled.
I initially came up with a powerful solution combining the safe return technique with a series of piped REPL.BAT operations. It worked well, but it was very slow. So I decided to build a dedicated REPLVAR.BAT hybrid JScript/batch utility that manages to do all the needed replacements with a single JScript call. It borrows heavily from the REPL.BAT script.
It has most of the same options as REPL.BAT, except input is always from a variable, so no S option, and the search always uses multi-line mode, so no M option.
It has an impressive list of features:
Note - REPLVAR.BAT effectively treats all strings as extended ASCII. The source variable value should map properly to the active code page. If the source value is unicode that does not map to the active code page, then the value will be silently transformed into a different value that does map to the active code page. Also, the final output must be compatible with the active code page, otherwise an error is raised.
Usage is simple:
--OUTPUT--
A single call takes about 110 milliseconds on my machine. Certainly not fast, but not too bad for batch, considering the power.
Full documentation is embedded within the script.
Let me know if you find any bugs. I've done moderate testing, but I wouldn't be shocked if there are some bugs lurking somewhere.
REPLVAR.BAT
EDIT 2014-04-06, version 1.1: Detect and raise an error if the result is incompatible with the active code page. Also explicitly set ERRORLEVEL to 0 or 1 as appropriate upon return.
EDIT 2014-04-07, version 1.2: Fixed a bug with output when the input included extended ASCII values. Also dropped the V option, so the search and replace strings must now be passed as strings, never by reference using variable names.
EDIT 2014-04-08, version 1.3: Modified the documentation to better explain the limits on the source content.
EDIT 2014-04-24, version 1.4: Fixed the A option that was broken with V1.2.
Dave Benham
I initially came up with a powerful solution combining the safe return technique with a series of piped REPL.BAT operations. It worked well, but it was very slow. So I decided to build a dedicated REPLVAR.BAT hybrid JScript/batch utility that manages to do all the needed replacements with a single JScript call. It borrows heavily from the REPL.BAT script.
It has most of the same options as REPL.BAT, except input is always from a variable, so no S option, and the search always uses multi-line mode, so no M option.
It has an impressive list of features:
- Both the input and output values are passed by reference via variable names.
- Searches can be interpreted as regular expressions, or as string literals.
- Searches can be case sensitive or insensitive.
- Replacement strings can reference matched content from the search.
- Many escape sequences are supported in both the search and target strings: All possible byte codes are supported (except NULL 0x00).
- The utility can be safely called with delayed expansion enabled or disabled, and all input and output characters will be preserved.
Note - REPLVAR.BAT effectively treats all strings as extended ASCII. The source variable value should map properly to the active code page. If the source value is unicode that does not map to the active code page, then the value will be silently transformed into a different value that does map to the active code page. Also, the final output must be compatible with the active code page, otherwise an error is raised.
Usage is simple:
Code: Select all
@echo off
setlocal enableDelayedExpansion
set "input=1 + 1 = 3!"
call replVar input output "=" "<>" L
echo(!output!
Code: Select all
1 + 1 <> 3!
A single call takes about 110 milliseconds on my machine. Certainly not fast, but not too bad for batch, considering the power.
Full documentation is embedded within the script.
Let me know if you find any bugs. I've done moderate testing, but I wouldn't be shocked if there are some bugs lurking somewhere.
REPLVAR.BAT
EDIT 2014-04-06, version 1.1: Detect and raise an error if the result is incompatible with the active code page. Also explicitly set ERRORLEVEL to 0 or 1 as appropriate upon return.
EDIT 2014-04-07, version 1.2: Fixed a bug with output when the input included extended ASCII values. Also dropped the V option, so the search and replace strings must now be passed as strings, never by reference using variable names.
EDIT 2014-04-08, version 1.3: Modified the documentation to better explain the limits on the source content.
EDIT 2014-04-24, version 1.4: Fixed the A option that was broken with V1.2.
Code: Select all
@if (@X)==(@Y) @end /* Harmless hybrid line that begins a JScript comment
::************ Documentation ***********
::REPLVAR.BAT version 1.4
:::
:::REPLVAR InVar OutVar Search Replace [Options]
:::REPLVAR /?[REGEX|REPLACE]
:::REPLVAR /V
:::
::: Performs a global regular expression search and replace on the contents of
::: variable InVar and writes the result to variable OutVar.
:::
::: REPLVAR.BAT works properly with delayed expansion enabled or disabled.
:::
::: REPLVAR.BAT treats the source variable value as extended ASCII. The value
::: should map properly to the active code page. Unicode source values that
::: do not map to the active code page will be silently transformed to a new
::: value that does map to the active code page. The result of the search and
::: replace must be compatible with the active code page, otherwise an error
::: is raised.
:::
::: The maximum supported output string length usually approaches the 8191
::: maximum for most strings. But it could be significantly less if the output
::: string contains many % " \r or \n characters, as they must be temporarily
::: expanded into 2 or 3 bytes. Also, ^ and ! characters are temporarily
::: expanded into 2 bytes if delayed expansion is enabled.
:::
::: REPLVAR.BAT returns with ERRORLEVEL 0 upon success, and ERRORLEVEL 1
::: upon error. If the A option is used and the input was not altered then
::: OutVar is undefined and ERRORLEVEL set to 2.
:::
::: Each parameter may be optionally enclosed by double quotes. The double
::: quotes are not considered part of the argument. The quotes are required
::: if the parameter contains a batch token delimiter like space, tab, comma,
::: semicolon. The quotes should also be used if the argument contains a
::: batch special character like &, |, etc. so that the special character
::: does not need to be escaped with ^.
:::
::: If called with a single argument of /?, then prints help documentation
::: to stdout. If a single argument of /?REGEX, then opens up Microsoft's
::: JScript regular expression documentation within your browser. If a single
::: argument of /?REPLACE, then opens up Microsoft's JScript REPLACE
::: documentation within your browser.
:::
::: If called with a single argument of /V, case insensitive, then prints
::: the version of REPLVAR.BAT.
:::
::: InVar - The name of a variable containing the source string.
:::
::: OutVar - The name of a variable where the result should be stored.
:::
::: Search - By default, this is a case sensitive JScript (ECMA) regular
::: expression expressed as a string.
:::
::: The search is conducted using the regular expression g (global)
::: and m (multilline) flags.
:::
::: JScript regex syntax documentation is available at
::: http://msdn.microsoft.com/en-us/library/ae5bf541(v=vs.80).aspx
:::
::: Replace - By default, this is the string to be used as a replacement for
::: each found search expression. Full support is provided for
::: substituion patterns available to the JScript replace method.
:::
::: For example, $& represents the portion of the source that matched
::: the entire search pattern, $1 represents the first captured
::: submatch, $2 the second captured submatch, etc. A $ literal
::: can be escaped as $$.
:::
::: An empty replacement string must be represented as "".
:::
::: Replace substitution pattern syntax is fully documented at
::: http://msdn.microsoft.com/en-US/library/efy6s3e6(v=vs.80).aspx
:::
::: Options - An optional string of characters used to alter the behavior
::: of REPLVAR. The option characters are case insensitive, and may
::: appear in any order.
:::
::: I - Makes the search case-insensitive.
:::
::: L - The Search is treated as a string literal instead of a
::: regular expression. Also, all $ found in Replace are
::: treated as $ literals.
:::
::: B - The Search must match the beginning of a line.
::: Mostly used with literal searches.
:::
::: E - The Search must match the end of a line.
::: Mostly used with literal searches.
:::
::: A - Only return a value if the input was altered. If not altered,
::: then ERRORLEVEL is set to 2.
:::
::: X - Enables extended substitution pattern syntax with support
::: for the following escape sequences within the Replace string:
:::
::: \\ - Backslash
::: \b - Backspace
::: \f - Formfeed
::: \n - Newline
::: \q - Quote
::: \r - Carriage Return
::: \t - Horizontal Tab
::: \v - Vertical Tab
::: \xnn - Extended ASCII byte code expressed as 2 hex digits
::: \unnnn - Unicode character expressed as 4 hex digits
:::
::: Also enables the \q escape sequence for the Search string.
::: The other escape sequences are already standard for a regular
::: expression Search string.
:::
::: Also modifies the behavior of \xnn in the Search string to work
::: properly with extended ASCII byte codes.
:::
::: Extended escape sequences are supported even when the L option
::: is used. Both Search and Replace support all of the extended
::: escape sequences if both the X and L opions are combined.
:::
::: REPLVAR.BAT was written by Dave Benham, with assistance from DosTips users
::: Aacini and Liviu regarding complications due to JScript's use of unicode vs.
::: cmd.exe's use of extended ASCII. REPLVAR.BAT also uses a modifed form of the
::: safe return technique developed by DosTips user jeb. Updates to REPLVAR.BAT
::: will be posted to the original posting site:
::: http://www.dostips.com/forum/viewtopic.php?f=3&t=5492
:::
::************ Batch portion ***********
@echo off
if .%4 equ . (
if "%~1" equ "/?" (
for /f "delims=: tokens=1*" %%A in ('findstr /n "^:::" "%~f0"') do echo(%%B
exit /b 0
) else if /i "%~1" equ "/?REGEX" (
start "" "http://msdn.microsoft.com/en-us/library/ae5bf541(v=vs.80).aspx"
exit /b 0
) else if /i "%~1" equ "/?REPLACE" (
start "" "http://msdn.microsoft.com/en-US/library/efy6s3e6(v=vs.80).aspx"
exit /b 0
) else if /i "%~1" equ "/V" (
for /f "delims=: tokens=1*" %%A in ('findstr /nblc:"::REPLVAR.BAT version" "%~f0"') do echo(%%B
exit /b 0
) else (
call :err "Insufficient arguments"
exit /b 1
)
)
echo(%~5|findstr /i "[^ILEBXA]" >nul && (
call :err "Invalid option(s)"
exit /b 1
)
setlocal
set "$replVar.notDelayed=!!"
setlocal enableDelayedExpansion
for /f "delims==" %%V in ('set ~ 2^>nul') do set "%%V="
set "~=!%~1!"
setlocal disableDelayedExpansion
set "rtn="
for /f delims^=^ eol^= %%A in (
'set ~ 2^>nul^|cscript //E:JScript //nologo "%~f0" "%$replVar.notDelayed%" %3 %4 %5'
) do set "rtn=%%A"
if defined rtn (
set "err=%rtn:~0,1%"
set "rtn=%rtn:~1%"
) else set "err=2"
if %err% equ 1 (echo ERROR: Result not compatible with active code page) >&2
if %err% equ 2 (echo Input not altered) >&2
setlocal enableDelayedExpansion
set ^"LF=^
^"
for /f %%A in ('copy /z "%~dpf0" nul') do set "CR=%%A"
set "replace=%% """ !CR!!CR!"
for /f "tokens=1,2,3" %%J in ("!replace!") do for %%M in ("!LF!") do (
endlocal
endlocal
endlocal
endlocal
set "%~2=%rtn%" !
exit /b %err%
)
:err
>&2 echo ERROR: %~1. Use replVar /? to get help.
exit /b
************* JScript portion **********/
var env=WScript.CreateObject("WScript.Shell").Environment("Process");
var args=WScript.Arguments;
var search=args.Item(1);
var replace=args.Item(2);
var options="gm";
if (args.length>3) options+=args.Item(3).toLowerCase();
var alterations=(options.indexOf("a")>=0);
if (alterations) options=options.replace(/a/g,"");
if (options.indexOf("x")>=0) {
options=options.replace(/x/g,"");
replace=replace.replace(/\\\\/g,"\\B");
replace=replace.replace(/\\q/g,"\"");
replace=replace.replace(/\\x80/g,"\\u20AC");
replace=replace.replace(/\\x82/g,"\\u201A");
replace=replace.replace(/\\x83/g,"\\u0192");
replace=replace.replace(/\\x84/g,"\\u201E");
replace=replace.replace(/\\x85/g,"\\u2026");
replace=replace.replace(/\\x86/g,"\\u2020");
replace=replace.replace(/\\x87/g,"\\u2021");
replace=replace.replace(/\\x88/g,"\\u02C6");
replace=replace.replace(/\\x89/g,"\\u2030");
replace=replace.replace(/\\x8[aA]/g,"\\u0160");
replace=replace.replace(/\\x8[bB]/g,"\\u2039");
replace=replace.replace(/\\x8[cC]/g,"\\u0152");
replace=replace.replace(/\\x8[eE]/g,"\\u017D");
replace=replace.replace(/\\x91/g,"\\u2018");
replace=replace.replace(/\\x92/g,"\\u2019");
replace=replace.replace(/\\x93/g,"\\u201C");
replace=replace.replace(/\\x94/g,"\\u201D");
replace=replace.replace(/\\x95/g,"\\u2022");
replace=replace.replace(/\\x96/g,"\\u2013");
replace=replace.replace(/\\x97/g,"\\u2014");
replace=replace.replace(/\\x98/g,"\\u02DC");
replace=replace.replace(/\\x99/g,"\\u2122");
replace=replace.replace(/\\x9[aA]/g,"\\u0161");
replace=replace.replace(/\\x9[bB]/g,"\\u203A");
replace=replace.replace(/\\x9[cC]/g,"\\u0153");
replace=replace.replace(/\\x9[dD]/g,"\\u009D");
replace=replace.replace(/\\x9[eE]/g,"\\u017E");
replace=replace.replace(/\\x9[fF]/g,"\\u0178");
replace=replace.replace(/\\b/g,"\b");
replace=replace.replace(/\\f/g,"\f");
replace=replace.replace(/\\n/g,"\n");
replace=replace.replace(/\\r/g,"\r");
replace=replace.replace(/\\t/g,"\t");
replace=replace.replace(/\\v/g,"\v");
replace=replace.replace(/\\x[0-9a-fA-F]{2}|\\u[0-9a-fA-F]{4}/g,
function($0,$1,$2){
return String.fromCharCode(parseInt("0x"+$0.substring(2)));
}
);
replace=replace.replace(/\\B/g,"\\");
search=search.replace(/\\\\/g,"\\B");
search=search.replace(/\\q/g,"\"");
search=search.replace(/\\x80/g,"\\u20AC");
search=search.replace(/\\x82/g,"\\u201A");
search=search.replace(/\\x83/g,"\\u0192");
search=search.replace(/\\x84/g,"\\u201E");
search=search.replace(/\\x85/g,"\\u2026");
search=search.replace(/\\x86/g,"\\u2020");
search=search.replace(/\\x87/g,"\\u2021");
search=search.replace(/\\x88/g,"\\u02C6");
search=search.replace(/\\x89/g,"\\u2030");
search=search.replace(/\\x8[aA]/g,"\\u0160");
search=search.replace(/\\x8[bB]/g,"\\u2039");
search=search.replace(/\\x8[cC]/g,"\\u0152");
search=search.replace(/\\x8[eE]/g,"\\u017D");
search=search.replace(/\\x91/g,"\\u2018");
search=search.replace(/\\x92/g,"\\u2019");
search=search.replace(/\\x93/g,"\\u201C");
search=search.replace(/\\x94/g,"\\u201D");
search=search.replace(/\\x95/g,"\\u2022");
search=search.replace(/\\x96/g,"\\u2013");
search=search.replace(/\\x97/g,"\\u2014");
search=search.replace(/\\x98/g,"\\u02DC");
search=search.replace(/\\x99/g,"\\u2122");
search=search.replace(/\\x9[aA]/g,"\\u0161");
search=search.replace(/\\x9[bB]/g,"\\u203A");
search=search.replace(/\\x9[cC]/g,"\\u0153");
search=search.replace(/\\x9[dD]/g,"\\u009D");
search=search.replace(/\\x9[eE]/g,"\\u017E");
search=search.replace(/\\x9[fF]/g,"\\u0178");
if (options.indexOf("l")>=0) {
search=search.replace(/\\b/g,"\b");
search=search.replace(/\\f/g,"\f");
search=search.replace(/\\n/g,"\n");
search=search.replace(/\\r/g,"\r");
search=search.replace(/\\t/g,"\t");
search=search.replace(/\\v/g,"\v");
search=search.replace(/\\x[0-9a-fA-F]{2}|\\u[0-9a-fA-F]{4}/g,
function($0,$1,$2){
return String.fromCharCode(parseInt("0x"+$0.substring(2)));
}
);
search=search.replace(/\\B/g,"\\");
} else search=search.replace(/\\B/g,"\\\\");
}
if (options.indexOf("l")>=0) {
options=options.replace(/l/g,"");
search=search.replace(/([.^$*+?()[{\\|])/g,"\\$1");
replace=replace.replace(/\$/g,"$$$$");
}
if (options.indexOf("b")>=0) {
options=options.replace(/b/g,"");
search="^"+search
}
if (options.indexOf("e")>=0) {
options=options.replace(/e/g,"");
search=search+"$"
}
var search=new RegExp(search,options);
var str1, str2, delay;
delay=args.Item(0);
if (!WScript.StdIn.AtEndOfStream) str1=WScript.StdIn.ReadAll(); else str1="";
str1=str1.substr(2,str1.length-4);
str2=str1.replace(search,replace);
if (!alterations || str1!=str2) {
str2=str2.replace(/%/g,"%J");
str2=str2.replace(/\"/g,"%~K");
str2=str2.replace(/\r/g,"%L");
str2=str2.replace(/\n/g,"%~M");
if (delay=="") {
str2=str2.replace(/\^/g,"^^");
str2=str2.replace(/!/g,"^!");
}
try {
WScript.Stdout.Write("0"+str2);
} catch (e) {
WScript.Stdout.Write("1");
}
}
Dave Benham