Page 1 of 1

BatchSubstitute Maximum Input/Output

Posted: 19 Jan 2011 12:07
by JDV
Is there some limit to the amount of text that BatchSub (the improved, forum version) can search through?

I'm wondering because I'm using BatchSub in a second project involving searching/replacing text in an XML file & saving as a new XML (the first project was near identical as worked perfectly, the XML files involved just were not as large) .

The XML is created by a stored procedure. When I retrieve 15 records, BatchSub is able to search/replace correctly.
When I retrieve more than 15 records, perhaps the amount of text throws it off - and much of the original text is deleted in the resulting file.

The odd thing is (the XML file is an RSS) that BatchSub leaves almost all RSS header and leaves the footer. It only deletes the body (all the "<items>") as well as one line of the header that suspiciously happens be on the same line as the deleted body.

The original XML file seems to be created properly no matter the number of records, so I've narrowed it to BatchSub.
I've also tried changing the words being searched for and the records being retrieved and those don't seem to be a problem.

Any help is appreciated, Thanks.

Re: BatchSubstitute Maximum Input/Output

Posted: 19 Jan 2011 12:24
by aGerman
There is definately a limit for the length of a line (I forgot the number of characters, sorry).
If you open the file in a text editor, do you find the file content cascaded or all in one line?

Regards
aGerman

Re: BatchSubstitute Maximum Input/Output

Posted: 19 Jan 2011 13:28
by JDV
Thanks. The length of a line is probably it.
All the deleted lines are fairly long, while the sections that are left are more cascaded.

I would have thought the lines would be truncated instead of deleted. Is this limit part of the Bat file or just something with CMD? Can it be adjusted?

Re: BatchSubstitute Maximum Input/Output

Posted: 19 Jan 2011 14:20
by ChickenSoup

Re: BatchSubstitute Maximum Input/Output

Posted: 19 Jan 2011 16:06
by aGerman
Good link, ChickenSoup.

JDV
As you can see, there is no chance to adjust.
IMHO you should learn something about XML DOM (can be used with VBScript). This provides a better way to manipulate xml files.

Concerning the long lines: Are these single nodes or are there child nodes inside which could be cascaded?

Regards
aGerman

Re: BatchSubstitute Maximum Input/Output

Posted: 19 Jan 2011 16:22
by JDV
ChickenSoup wrote:http://support.microsoft.com/kb/830473

"Modify programs that require long command lines so that they use a file that contains the parameter information, and then include the name of the file in the command line.

For example, instead of using the ExecutableFile.exe Parameter1 Parameter2 ...ParameterN command line in a batch file, modify the program to use a command line that is similar to the following command line, where ParameterFile is a file that contains the required parameters (parameter1 parameter2 ...ParameterN):
ExecutableFile.exe c:\temp\ParameterFile.txt"

This workaround the KB mentions seems to be exactly what I already do when running a .BAT containing BatchSub, essentially this:

CALL BatchSubstitute.bat "OldWord" "NewWord" oldfile.xml>newfile.xml

Concerning the long lines: it is an RSS feed like <channel><item(1)></item><item(2)></item>...etc.</channel>. "Item" and its sub-tags are children nodes, correct?
When opening the .XML in notepad the section that is deleted is wrapped along several lines (probably as a function of notepad), opening in Visual Studio forms the XML properly (I assume it's just being "smart"), but the section starts off as a single line from a single, long line making up a single row in a database table.

Re: BatchSubstitute Maximum Input/Output

Posted: 19 Jan 2011 17:59
by aGerman
JDV wrote:This workaround the KB mentions seems to be exactly what I already do when running a .BAT containing BatchSub, essentially this:

CALL BatchSubstitute.bat "OldWord" "NewWord" oldfile.xml>newfile.xml

No, because BatchSubstitute.bat has to process each line of oldfile.xml. If a line is too large for a variable (that is expanded to the content of the line again) then it will not work.

OK, it's a bit off topic for a batch forum, but lets try to transform your long lines to an indented block (BTW I'm not an expert for XML DOM ...)

I wrote the following xml file for testing:
test.xml

Code: Select all

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<test><string1>qwe</string1><string2>asd</string2><string3>yxc</string3></test>

As you can see all nodes are in a single line.

Now we need a stylesheet that tells the parser how to transform the file:
transform.xslt

Code: Select all

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xalan="http://xml.apache.org/xslt" version="1.0">
   <xsl:output method="xml" encoding="UTF-8" standalone="yes" indent="yes" xalan:indent-amount="4"/>
   <xsl:strip-space elements="*"/>
   <xsl:template match="@*|node()">
      <xsl:copy>
         <xsl:apply-templates select="@*|node()"/>
      </xsl:copy>
   </xsl:template>
</xsl:stylesheet>

And a VBScript to do the job:
transform.vbs

Code: Select all

Const xmlfile = "test.xml"
Const xsltfile = "transform.xslt"

Set oXmlDoc = CreateObject("Microsoft.XMLDOM")
oXmlDoc.async = False
oXmlDoc.load(xmlfile)

Set oXslDoc = CreateObject("Microsoft.XMLDOM")
oXslDoc.async = False
oXslDoc.load(xsltfile)

Set oXmlOutDoc = CreateObject("Microsoft.XMLDOM")
oXmlOutDoc.async = False

oXmlDoc.transformNodeToObject oXslDoc, oXmlOutDoc

oXmlOutDoc.save(xmlfile)


If I execute the transform.vbs the new content of the xml file is as follows:

Code: Select all

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<test>
   <string1>qwe</string1>
   <string2>asd</string2>
   <string3>yxc</string3>
</test>


NOTE:
Have a look at the encoding in your xml file. I used UTF-8. Change it in the output encoding of the xslt file.

Hope this will help
aGerman

Re: BatchSubstitute Maximum Input/Output

Posted: 19 Jan 2011 21:47
by ghostmachine4
JDV wrote:Any help is appreciated, Thanks.

people should stop using batch to do things like file processing (and others), especially if its XML or HTML. Use a programming language with XML/HTML facilities, ( or at least with regular expression support + string manipulation functions )

Re: BatchSubstitute Maximum Input/Output

Posted: 20 Jan 2011 09:50
by JDV
ghostmachine4 wrote:people should stop using batch to do things like file processing (and others), especially if its XML or HTML. Use a programming language with XML/HTML facilities, ( or at least with regular expression support + string manipulation functions )

:oops: I know. I'm just rather....limited in that area.

aGerman wrote:No, because BatchSubstitute.bat has to process each line of oldfile.xml. If a line is too large for a variable (that is expanded to the content of the line again) then it will not work.

OK, thanks for the explanation.

aGerman wrote:Hope this will help
aGerman


This was indeed a great help. Thank you! :D
*SOLVED*