Shohreh wrote: ↑06 Mar 2020 06:48
As a work-around, I'll just install Perl and run a one-liner.
Another option is to use my
JREPL.BAT regular expression file processing utility. It is pure script (hybrid JScript/batch) that runs natively on any any Windows version from XP onward, without the need of any 3rd party exe or dll file.
Code: Select all
jrepl "<desc>[\s\S]*?</desc>" "" /m /f "input.xml|utf-8" /o -
The above relies on the /M option, which requires that the entire file be loaded into memory. This limits the size of the file that can be processed (I think the max size is some value that approaches 1 GB, but I'm not sure).
The output will include the UTF-8 BOM in the final output. If you don't want it, then use
Code: Select all
jrepl "<desc>[\s\S]*?</desc>" "" /m /f "input.xml|utf-8|NB" /o -
If the command is included in a batch script, then you must use CALL JREPL, because JREPL is itself a batch script.
Since the find/replace operation does not need to interpret any multi-byte unicode characters, the utf-8 specification can probably be dropped as follows. This would probably improve performance, and might increase the maximum file size limit.
Code: Select all
jrepl "<desc>[\s\S]*?</desc>" "" /m /f "input.xml" /o -
As long as your machines default character set is a single byte character set, then each byte of a multi-byte unicode character would be treated as its own character that would either be preserved if outside a <desc></desc> block, or dropped if within one. It will not work if your default character set uses a variable number of bytes per character.
This last version would neither remove any pre-existing BOM, nor would it add one.
Dave Benham