I've been playing with
GnuWin32 sed on a test machine, and I think I've come up with a semi-reasonable way to integrate it with cmd scripts.
The trouble with mixing sed and cmd is that sed scripts are full of cryptic punctuation characters and cmd has a totally toxic parser. Persuading cmd not to screw up strings that don't resemble filenames is hard.
The subroutine I'll present here lets you put sed scripts inside the same file as the cmd scripts that use them, and guarantees that they won't be parsed by cmd
at all, ever. It includes a couple of lines at the top that call the subroutine to clean up report.xml as you requested.
Code: Select all
@echo off
set this="%~dpnx0"
call :sed fix-html-quote "%~1" report.xml
goto :eof
:: Run a labelled sed script fragment over any number of input files to produce
:: an output file. If output file is -, use standard output.
:: %1: sed script label
:: %2: output file
:: %3 on: sed option flags and input files
:sed
setlocal disabledelayedexpansion
set out=
if not "%~2" == "-" set out=^>"%~2"
for /f "usebackq tokens=2*" %%A in ('%*') do set args=%%B
set sed="%ProgramFiles%\GnuWin32\bin\sed"
%sed% -e "1,/^#%~1$/d;/^#end$/q" %this% | %sed% -f- %args% %out%
endlocal
goto :eof
:: sed script library. Each stanza beginning with #label and ending with
:: #end can be called by name using call :sed label as described above. This
:: stuff is never parsed by cmd, so the syntax is pure sed script with no
:: weird-ass escaping required.
#fix-html-quote
s/"/"/g
#end
Each call to :sed invokes the actual sed.exe executable twice: once to extract a sed script, and once to feed that script into the second invocation of sed.exe on its standard input. Because it's sed, not cmd, that's reading the sed script from the file, cmd's parser doesn't get a chance to screw it up on the way in; and because the script is piped to the second sed on its standard input rather than being supplied as a command line argument, the cmd parser doesn't get to screw it up on the way out either.
Line by line:
set this="%~dpnx0"We're going to need the pathname to the script file we're running from so that we can hand it to sed to get sed scripts from, and we're going to do that in a subroutine where the script file's own %n arguments won't be available. Sock the script file's full pathname away in a variable for later use.
call :sed fix-html-quote "%~1" report.xml
goto :eofCall the :sed subroutine and make it run a sed script labelled fix-html-quote over report.xml to produce an output file named according to the script file's first argument ("%~1"). The output file gets named first on the call :sed command line, because :sed will accept an arbitrary number of input files.
If you don't want :sed's output redirected to a file, perhaps because you're piping it into something else, use - or "-" as :sed's second argument.
:sed
setlocal disabledelayedexpansionMake sure the variables we use here don't cause side effects elsewhere, and avoid unintended !expansions!
set out=
if not "%~2" == "-" set out=^>"%~2"
for /f "usebackq tokens=2*" %%A in ('%*') do set args=%%BTurn the second argument into a redirection if it isn't "-", and gather the third and subsequent arguments.
set sed="%ProgramFiles%\GnuWin32\bin\sed"
%sed% -e "1,/^#%~1$/d;/^#end$/q" %this% ... Run a sed script against %this% file. The script contains two sed commands:
1,/^#%~1$/d skips (deletes from output) all lines up to and including the first one consisting solely of a # followed by the string given as call :sed's first argument; the second,
/^#end$/q quits sed as soon as a line consisting solely of
#end is encountered.
#fix-html-quoteThis will be the last line deleted as sed reads %this% file.
s/"/"/g
#endThese two lines will be the only thing left in sed's output.
| %sed% -f- %args% %out%They get piped to a second instance of sed, which the -f- option has told to read a script from its standard input; that script gets applied to all the files listed in %args% (report.xml with this test stub) and the output redirected according to %out% (which may be empty).
Any line starting with # is treated as a comment by sed, so the #end line does nothing; the only active script line is
s/"/"/g which says to substitute a
" for every occurrence of
" on every input line.
This is hell's own long-winded way to use sed, but it's quite general-purpose; you can just keep tacking on stanzas like
#fix-html-entities
s/'/'/g
s/©/©/g
s/&/&/g
#end
secure in the knowledge that cmd won't screw them up for you.