[Destructive, Use Caution] Copy tags and its child elements from specific XML schema

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
Alanick
Posts: 12
Joined: 29 Oct 2022 14:06

[Destructive, Use Caution] Copy tags and its child elements from specific XML schema

#1 Post by Alanick » 29 Oct 2022 14:48

Hello everyone, new here,

Based on the following link: https://stackoverflow.com/a/21789983
I can successfully copy each and every line between two XML tags with its child elements to a new file and it works as expected, speed is very low though, but it is what it is.
The current batch copied from the link above is:

Code: Select all

@ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION

SET INSIDE_ELT=0
FOR /F "delims=" %%l IN (input.xml) DO (
    SET "LINE=%%~l"
    SET "LINE=!LINE:<=__-_!"
    SET "LINE=!LINE:>=_-__!"
    CALL :STRIP !LINE!
    SET "LINE=!_STRIPPED:__-_=<!"
    SET "LINE=!LINE:_-__=>!"
    IF "!LINE!"=="</library_images>" SET INSIDE_ELT=0
    IF "!INSIDE_ELT!"=="1" @ECHO %%l >> output.xml
    IF "!LINE!"=="<library_images>" SET INSIDE_ELT=1
)


:STRIP
SET "_STRIPPED=%*"
EXIT /B
Now, these XML files i'm trying to parse are using the OpenCOLLADA (DAE) schema version 1.4.1.
Based on all of that, example bat above, and samples included here: Sample.7z
would any one know how can batch multiple XML files in to one, while copying the required tags and their child elements, BUT with out duplicated child elements.
Each and every XML has some child elements that repeat themselves in other XMLs, i have a bunch of them that i would like to concatenate in to one, but to go through them in a specific order, and make a single file, with OUT duplicated tags and their child elements.
The order must be:

Code: Select all

<?xml ..... >
<COLLADA ...>
  <asset>
  ...child elements, etc. ...
  </asset>
  <library_images>
  ...child elements, etc. ...
  </library_images>
  <library_materials>
  ...child elements, etc. ...
  </library_materials>
  <library_effects>
  ...child elements, etc. ...
  </library_effects>
  <library_geometries>
  ...child elements, etc. ...
  </library_geometries>
  <library_visual_scenes>
  ...child elements, etc. ...
  </library_visual_scenes>
  <scene>
  ...child elements, etc. ...
  </scene>
</COLLADA>
Normally i would create a batch for every needed main tag to be saved to new file, and after that i add the collada tag, etc.
Is there a way i can batch using multiple files and remove duplicated child elements in one go, also, using (*.xml) doesn't seem to work on bat above, not sure why.
Last edited by Alanick on 18 Nov 2022 14:42, edited 3 times in total.

aGerman
Expert
Posts: 4678
Joined: 22 Jan 2010 18:01
Location: Germany

Re: Copy tags and its child elemnts from specific XML schema

#2 Post by aGerman » 30 Oct 2022 03:58

Why oh why are people still trying to accomplish those tasks in batch :( Batch supports line-wise processing of text while XML would be perfectly valid if the whole content was in just one line. Batch is not able to grasp the logical object structure of XML. So what you're doing in Batch is kind of reinventing the wheel, mimicking a piece of a DOM processor which is doomed to failure earlier or later anyway.
So, rather choose a language that supports the Data Object Model. At least I'm not bored enough writing hundreds of lines of batch code for something that can be done in a much easier way. Sorry.

Steffen

Alanick
Posts: 12
Joined: 29 Oct 2022 14:06

Re: Copy tags and its child elemnts from specific XML schema

#3 Post by Alanick » 30 Oct 2022 08:08

There is no way for me to use another language for support, these are original OpenCOLLADA XML format assets, its how they are, very old assets i might add, i am stuck with them, and i am trying to do the task above, to simplify my workflow so later on i can have a proper way of dealing with them.
Until then, I am trying to see what i can find, i do not need the BS politics of the XML conundrum that has been discussed in many ways shape and forms from what i gathered in these past few days while searching for a solution.
The .bat above works just fine, i just need to find a way to skip duplicated child elements between tags.

I am no expert, just very basic dos batch user, i joined here in the hopes someone with the right skill set might have an actual solution, not to give a political BS about XML, i am well aware of it already, it is kind of disturbing, i just need to see whats available given my situation with the above samples, that's all there is to it.

Can kindly some help, at all?

aGerman
Expert
Posts: 4678
Joined: 22 Jan 2010 18:01
Location: Germany

Re: Copy tags and its child elemnts from specific XML schema

#4 Post by aGerman » 30 Oct 2022 08:54

I've been not talking about using something else than XML. I've been talking about using another scripting language to process the XML text. PowerShell, VBScript, JScript for example support the XML DOM to process XML data in a proper way, while Batch does not.

Steffen

Alanick
Posts: 12
Joined: 29 Oct 2022 14:06

Re: Copy tags and its child elemnts from specific XML schema

#5 Post by Alanick » 30 Oct 2022 11:24

aGerman wrote:
30 Oct 2022 08:54
PowerShell, VBScript, JScript for example support the XML DOM to process XML data in a proper way, while Batch does not.
Alright, i have never used or have any knowledge about those examples, happen to have anything that would work for my given example at all?

Aacini
Expert
Posts: 1913
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Copy tags and its child elemnts from specific XML schema

#6 Post by Aacini » 01 Nov 2022 17:31

The task you requested is complicated and complex; however, you have not posted a single specification that could help us to develop such a solution. How many files could be? Have all the files the elements in the same order? Are there special characters in the file? What is the length of the longest line? In order to write a program that solve this request we have to assume a lot of things; the problem will arise if the assumptions do not correspond to the actual data...

This problem is interesting, so I wrote a possible solution for it. There are many different ways to solve this problem. I choose a method based on several concurrent (parallel) processes that process a file each, so the different sections of the output result are synchronized via WAITFOR signals. IMHO this is the most efficient method to solve this problem.

The structure of the *.xml files must be this one:
- <?xml ..... >
- <COLLADA ...>
- <asset>
- Variable number of lines
- </asset>
From next line on, this tag structure repeats:
- <tagName>
- <name id="child element 1 id" ...>
- variable number of lines
- </name>
- <name id="child element 2 id" ...>
- variable number of lines
- </name>
- . . .
- </tagName>
And the file ends in:
- </COLLADA>

The *First* input file define the shape of the output result, that is, it specifies the order of output elements. If another input file have elements in different order or have not the same elements than the first file, the method will fail.

No line longer than 8192 characters will be read (nor copied). There is no easy way to circumvent this Batch limitation.

This is the first version of my solution:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

REM https://www.dostips.com/forum/viewtopic.php?f=3&t=10579
REM Antonio Perez Ayala

rem If this .bat file is asynchronously invoked as a coroutine: start it
if "%1" equ "" goto begin
if %1 equ 1 (goto StartFirst) else goto StartRest

rem Start the asynchronous process of each *.xml file:
rem First Part:
rem - Copy up to "</asset>" line of first file
rem - and omit up to "</asset>" line in rest of files
rem Second Part:
rem - Process each parent tag in first file; then
rem - process same parent tag in rest of files

:begin
set n=0
for %%f in (*.xml) do (
   set /A n+=1
   set "file[!n!]=%%~Nf"
)

ECHO Start Process @ %time:~0,-3%
del output.txt 2> NUL
for /L %%i in (1,1,%n%) do start "" /B "%~F0" %%i
WaitFor File1End > NUL
ECHO End Process @ %time:~0,-3%
goto :EOF


=================================================


:StartFirst	Start the asynchronous coroutine to process the first file

ECHO - Process of file #%1 (!file[%1]!.xml) START

set "inTag="
for /F "usebackq delims=" %%a in ("!file[%1]!.xml") do (

   if not defined inTag (
      rem First part: Copy to output file up to "</asset>" input line
      >> output.txt echo %%a
      if "%%a" equ "  </asset>" (
         set "inTag=1"
         set "tagName="
      )
   ) else (
      rem Second part: Copy each tag and its children
      for /F "tokens=2 delims=<>" %%b in ("%%a") do (
         if not defined tagName (
            rem Start of tagNameN
            set "tagName=%%b"
            set "childName="
            del childIds.txt 2> NUL
            set /A "add=0"
            SET /P "=- - File #%1 (!file[%1]!.xml) TAG %%b: " < NUL
            >> output.txt echo %%a
         ) else if "%%b" equ "/!tagName!" (
            rem End of tagNameN in *this* First File:
            ECHO !add! items added
            rem process same tagNameN in rest of files
            for /L %%i in (2,1,%n%) do (
               WaitFor /SI File%%iON > NUL
               WaitFor File%%iOFF > NUL
            )
            >> output.txt echo %%a
            set "tagName="
         ) else if not defined childName (
            for /F "tokens=1,3 delims=<= " %%c in ("%%b") do set "childName=%%c" & set "childId=%%~d"
            >> childIds.txt echo !childId!
            >> output.txt echo %%a
            set /A add+=1
         ) else if "%%b" equ "/!childName!" (
            set "childName="
            >> output.txt echo %%a
         ) else (
            >> output.txt echo %%a
         )
      )

   )

)
>> output.txt echo ^</COLLADA^>
del childIds.txt

ECHO - Process of file #%1 (!file[%1]!.xml) END
WaitFor /SI File1End > NUL
exit


=================================================


:StartRest	Start the asynchronous coroutine to process each one of the rest of files

ECHO - Process of file #%1 (!file[%1]!.xml) START

set "inTag="
for /F "usebackq delims=" %%a in ("!file[%1]!.xml") do (

   if not defined inTag (
      rem First part: Omit up to "</asset>" input line
      if "%%a" equ "  </asset>" (
         set "inTag=1"
         set "tagName="
      )
   ) else (
      rem Second part: Copy each tag and its children
      for /F "tokens=2 delims=<>" %%b in ("%%a") do (
         if not defined tagName (
            rem Start of tagNameN, wait for "master's" signal to proceed
            WaitFor File%1ON > NUL
            rem Load current childIds
            setlocal EnableDelayedExpansion
            for /F %%i in (childIds.txt) do set "child[%%i]=1"
            set "tagName=%%b"
            set "childName="
            set /A "add=0, omit=0"
            SET /P "=- - File #%1 (!file[%1]!.xml) TAG %%b: " < NUL
         ) else if "%%b" equ "/!tagName!" (
            rem End of tagNameN in this additional File:
            ECHO !add! items added, !omit! omitted
            rem release childIds and inform to "master"
            endlocal
            set "tagName="
            WaitFor /SI File%1OFF > NUL
         ) else if not defined childName (
            for /F "tokens=1,3 delims=<= " %%c in ("%%b") do set "childName=%%c" & set "childId=%%~d"
            if not defined child[!childId!] (
               set "child[!childId!]=1"
               >> childIds.txt echo !childId!
               >> output.txt echo %%a
               set /A "add+=1, inChild=1"
            ) else (
               set /A "omit+=1"
               set "inChild="
            )
         ) else if "%%b" equ "/!childName!" (
            set "childName="
            if defined inChild >> output.txt echo %%a
            set "inChild="
         ) else if defined inChild (
            >> output.txt echo %%a
         )
      )

   )

)

ECHO - Process of file #%1 (!file[%1]!.xml) END
exit
This is the output report when I run this program with the posted data:

Code: Select all

Start Process @ 17:04:17
- Process of file #1 (001.xml) START
- Process of file #2 (002.xml) START
- Process of file #3 (003.xml) START
- - File #1 (001.xml) TAG library_images: 83 items added
- - File #2 (002.xml) TAG library_images: 79 items added, 58 omitted
- - File #3 (003.xml) TAG library_images: 26 items added, 39 omitted
- - File #1 (001.xml) TAG library_materials: 83 items added
- - File #2 (002.xml) TAG library_materials: 79 items added, 58 omitted
- - File #3 (003.xml) TAG library_materials: 26 items added, 39 omitted
- - File #1 (001.xml) TAG library_effects: 83 items added
- - File #2 (002.xml) TAG library_effects: 79 items added, 58 omitted
- - File #3 (003.xml) TAG library_effects: 26 items added, 39 omitted
- - File #1 (001.xml) TAG library_geometries: 27 items added
- - File #2 (002.xml) TAG library_geometries: 46 items added, 8 omitted
- - File #3 (003.xml) TAG library_geometries: 18 items added, 11 omitted
- - File #1 (001.xml) TAG library_visual_scenes: 1 items added
- - File #2 (002.xml) TAG library_visual_scenes: 1 items added, 0 omitted
- - File #3 (003.xml) TAG library_visual_scenes: 1 items added, 0 omitted
- - File #1 (001.xml) TAG scene: 1 items added
- - File #2 (002.xml) TAG scene: 1 items added, 0 omitted
- Process of file #2 (002.xml) END
- - File #3 (003.xml) TAG scene: 1 items added, 0 omitted
- Process of file #3 (003.xml) END
- Process of file #1 (001.xml) END
End Process @ 17:17:09
It seems that the output result is correct. However, it is also obvious that the result does not include such a tags that are not included in the first file... I must modify the method in order to fix this point, but I first need to know if the order of the elements (tags) is the same in all files.

Antonio

Alanick
Posts: 12
Joined: 29 Oct 2022 14:06

Re: Copy tags and its child elemnts from specific XML schema

#7 Post by Alanick » 01 Nov 2022 20:23

Aacini wrote:
01 Nov 2022 17:31
The task you requested is complicated and complex;
Yes, I discovered that as soon as i started doing research in achieving this task, boy... if i knew.
Aacini wrote:
01 Nov 2022 17:31
however, you have not posted a single specification that could help us to develop such a solution.
I was not aware such specifications would be needed, now that i know i can sure provide them, best way i can.
Aacini wrote:
01 Nov 2022 17:31
How many files could be?
Hundreds, it depends on folder structure, or if from same source/project.
File size of the XMLs varies from few KB up to 50MB, a project can have a total of 1.3GB-2.5GB in XMLs.
Aacini wrote:
01 Nov 2022 17:31
Have all the files the elements in the same order?
Yes, exactly as seen in samples provided above.
Aacini wrote:
01 Nov 2022 17:31
Are there special characters in the file?
No, they are all identical in characters based on sample posted above.
Aacini wrote:
01 Nov 2022 17:31
What is the length of the longest line?
7,360,817 characters
Aacini wrote:
01 Nov 2022 17:31
In order to write a program that solve this request we have to assume a lot of things; the problem will arise if the assumptions do not correspond to the actual data...
Alright, now i know, thank you for the explanation.
Aacini wrote:
01 Nov 2022 17:31
No line longer than 8192 characters will be read (nor copied). There is no easy way to circumvent this Batch limitation.
Now that i learnt this, something i was never aware of, means this will fail entirely, based on what the other expert mentioned above, it seems i may have to look in another solution using different language, problem is, these being specific OpenCOLLADA schema XMLs, anything else will fail to recognize them as such, i do not know what could i possibly use to make this work, as this seems as you said complex and way over my head i might add.

Aacini
Expert
Posts: 1913
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Copy tags and its child elemnts from specific XML schema

#8 Post by Aacini » 02 Nov 2022 12:14

I suggest you to run my program with the same 3 data files you posted here and review the output.txt generated file. If you could identify the problems in the output result, perhaps I could write a patch in the program to fix just those points. This way I would not write a general method to solve these complex type of problems with a Batch file, but just a specific Batch file that solve your problem...

Please, be as clear as possible when you describe the problems with the actual program. You may post the specific line or section of the input data that cause problems, the output that is created now from such a data and the correct output you want. If the same type of problem appear several times, just describe it once. However, if a different problem also appear in another part of the data, describe it in the same way.

Can be emtpy lines in the files?

Please confirm this point: have all files all the tags?

Antonio

Alanick
Posts: 12
Joined: 29 Oct 2022 14:06

Re: Copy tags and its child elemnts from specific XML schema

#9 Post by Alanick » 02 Nov 2022 20:26

After testing the generated output.txt it failed and then i looked at it closely in a text editor, based on the batch above, at this very moment ONLY the <float_array id=" ... lines that go beyond 8192 characters are missing, it doesn't create empty lines, just as you said, they are NOT copied, with out them the XML renders as corrupt.

Other than that, based on the batch above and that test alone i did not see anything else wrong with the output, the structure remains as desired, i found no duplicates from many child element tags i checked, now that i know the limitation in characters length per line is reached i guess it is a dead end, with out those lines beyond the limitation i can not test anything else further, every single line MUST be present from their respective tags and child elements, regardless of length, if that can somehow be supported, will be able to test further.

Aacini
Expert
Posts: 1913
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Copy tags and its child elemnts from specific XML schema

#10 Post by Aacini » 02 Nov 2022 23:56

Ok. This is my first version that try to fix the long lines problem. I hope this version solve the problem, but I can't be sure. Please, run this version and check the generated output.txt file, specially the long lines...

Code: Select all

@echo off
setlocal EnableDelayedExpansion

REM https://www.dostips.com/forum/viewtopic.php?f=3&t=10579
REM Antonio Perez Ayala
REM Version 2: Manage long lines

rem If this .bat file is asynchronously invoked as a coroutine: start it
if "%1" equ "" goto begin
if %1 equ 1 (goto StartFirst) else goto StartRest

rem Start the asynchronous process of each *.xml file:
rem First Part:
rem - Copy up to "</asset>" line of first file
rem - and omit up to "</asset>" line in rest of files
rem Second Part:
rem - Process each parent tag in first file; then
rem - process same parent tag in rest of files

:begin

rem Create a file with a space and no CR-LF at end
for %%X in (^"^
% Do NOT remove this line %
^") do set /P "=X%%~X " < NUL > space.tmp
findstr /V "X" space.tmp > space.txt
del space.tmp

set n=0
for %%f in (*.xml) do (
   set /A n+=1
   set "file[!n!]=%%~Nf"
)

ECHO Start Process @ %time:~0,-3%
del output.txt 2> NUL
for /L %%i in (1,1,%n%) do start "" /B "%~F0" %%i
WaitFor File1End > NUL
ECHO End Process @ %time:~0,-3%
del space.txt
goto :EOF


=================================================


:StartFirst	Start the asynchronous coroutine to process the first file

ECHO - Process of file #%1 (!file[%1]!.xml) START

set "inTag="
rem Assemble a While-loop to read all file lines
< "!file[%1]!.xml" ( for /L %%? in () do (

   rem Read next line
   set "line="
   set /P "line="
   if "!line!" equ "" (  rem End Of File
      >> output.txt echo ^</COLLADA^>
      del childIds.txt tagData.txt
      ECHO - Process of file #%1 (!file[%1]!.xml^) END
      WaitFor /SI File1End > NUL
      exit
   )

   if not defined inTag (
      rem First part: Copy to output file up to "</asset>" input line
      >> output.txt echo !line!
      if "!line!" equ "  </asset>" (
         set "inTag=1"
         set "tagName="
      )
   ) else (
      rem Second part: Copy each tag and its children
      for /F "tokens=2 delims=<>" %%b in ("!line!") do (
         if not defined tagName (
            rem Start of tagNameN
            set "tagName=%%b"
            set "childName="
            del childIds.txt 2> NUL
            set /A "add=0"
            SET /P "=- - File #%1 (!file[%1]!.xml) TAG %%b: " < NUL
            > tagData.txt echo !line!
         ) else if "%%b" equ "/!tagName!" (
            rem End of tagNameN in *this* First File:
            ECHO !add! items added
            rem process same tagNameN in rest of files
            for /L %%i in (2,1,%n%) do (
               WaitFor /SI File%%iON > NUL
               WaitFor File%%iOFF > NUL
            )
            >> tagData.txt echo !line!
            >> output.txt type tagData.txt
            set "tagName="
         ) else if not defined childName (
            for /F "tokens=1,3 delims=<= " %%c in ("%%b") do set "childName=%%c" & set "childId=%%~d"
            >> childIds.txt echo !childId!
            >> tagData.txt echo !line!
            set /A add+=1
         ) else if "%%b" equ "/!childName!" (
            set "childName="
            >> tagData.txt echo !line!
         ) else (
            if "!line:~1022!" neq "" >> tagData.txt call :longLine
            >> tagData.txt echo !line!
         )
      )

   )

) )

rem Previous ")" close a While-loop: for /L %%? in () do (
rem so execution never reach this point


=================================================


:StartRest	Start the asynchronous coroutine to process each one of the rest of files

ECHO - Process of file #%1 (!file[%1]!.xml) START

set "inTag="
rem Assemble a While-loop to read all file lines
< "!file[%1]!.xml" ( for /L %%? in () do (

   rem Read next line
   set "line="
   set /P "line="
   if "!line!" equ "" (  rem End Of File
      ECHO - Process of file #%1 (!file[%1]!.xml^) END
      exit
   )

   if not defined inTag (
      rem First part: Omit up to "</asset>" input line
      if "!line!" equ "  </asset>" (
         set "inTag=1"
         set "tagName="
      )
   ) else (
      rem Second part: Copy each tag and its children
      for /F "tokens=2 delims=<>" %%b in ("!line!") do (
         if not defined tagName (
            rem Start of tagNameN, wait for "master's" signal to proceed
            WaitFor File%1ON > NUL
            rem Load current childIds
            setlocal EnableDelayedExpansion
            for /F %%i in (childIds.txt) do set "child[%%i]=1"
            set "tagName=%%b"
            set "childName="
            set /A "add=0, omit=0"
            SET /P "=- - File #%1 (!file[%1]!.xml) TAG %%b: " < NUL
         ) else if "%%b" equ "/!tagName!" (
            rem End of tagNameN in this additional File:
            ECHO !add! items added, !omit! omitted
            rem release childIds and inform to "master"
            endlocal
            set "tagName="
            WaitFor /SI File%1OFF > NUL
         ) else if not defined childName (
            for /F "tokens=1,3 delims=<= " %%c in ("%%b") do set "childName=%%c" & set "childId=%%~d"
            if not defined child[!childId!] (
               set "child[!childId!]=1"
               >> childIds.txt echo !childId!
               >> tagData.txt echo !line!
               set /A "add+=1, inChild=1"
            ) else (
               set /A "omit+=1"
               set "inChild="
            )
         ) else if "%%b" equ "/!childName!" (
            set "childName="
            if defined inChild >> tagData.txt echo !line!
            set "inChild="
         ) else if defined inChild (
            if "!line:~1022!" neq "" >> tagData.txt call :longLine
            >> tagData.txt echo !line!
         )
      )

   )

) )

rem Previous ")" close a While-loop: for /L %%? in () do (
rem so execution never reach this point

=======================================

:longLine
if "%line:~0,1%" equ " " (
   type space.txt
   set "line=%line:~1%"
   goto longLine
)
set /P "line=%line%"
if "%line:~1022%" neq "" goto longLine
exit /B
Antonio

Alanick
Posts: 12
Joined: 29 Oct 2022 14:06

Re: Copy tags and its child elemnts from specific XML schema

#11 Post by Alanick » 03 Nov 2022 14:08

Thank you for the updated version.

Based on the 3 XMLs i provided as sample above, i can clearly confirm, the output.txt is no longer corrupt, everything looks as should at first glance, will continue to test further, also, will try next step to batch an entire project that holds hundreds of XMLs, to see if it handles well a very large scale project, testing it would require a lot of time, i will do my best to test as thorough as possible and get back to you in about a week or two if that's ok with you?

Once again thank you so much for now.

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Copy tags and its child elemnts from specific XML schema

#12 Post by penpen » 04 Nov 2022 02:59

Though i like Aacini's version, it is not completely safe (for example if you change the codepage of the source-xml).
Here is an example for how to read and process xml using jscript (in the hybrid "xslt.bat") in a safe way:
viewtopic.php?p=32941#p32941.

Alanick
Posts: 12
Joined: 29 Oct 2022 14:06

Re: Copy tags and its child elemnts from specific XML schema

#13 Post by Alanick » 04 Nov 2022 11:38

penpen wrote:
04 Nov 2022 02:59
Here is an example for how to read and process xml using jscript (in the hybrid "xslt.bat") in a safe way:
viewtopic.php?p=32941#p32941.
Thank you, unfortunately for me i have no clue how to use that for my situation described initially, yet alone modify it for my purpose, personally i never even knew batch cmd can do so many complex things, i am impressed by what you guys do here.
Aacini wrote:
02 Nov 2022 23:56
Ok. This is my first version that try to fix the long lines problem. I hope this version solve the problem, but I can't be sure. Please, run this version and check the generated output.txt file, specially the long lines...
I tried to run the bat on the entire project XMLs, this is the log saved, no errors, i checked the XML it stops at, its same order as the rest, all XML have same order, not sure what could be the cause for the halt.
_merge_XML_2.log
Included samples that fail, they are not corrupt, the batch seems to end on them, i had to run the batch every time after i removed those that end the process, as it seems the batch can not resume from the already saved *.txt it creates. A the moment of this writing, i am re-running over and over again to see how many do fail eventually, with out any type of error, i have no clue why, as the XMLs are not corrupt at all, all have same TAG order, etc..
sample_fail.zip
Last edited by Alanick on 09 Nov 2022 14:12, edited 2 times in total.

Aacini
Expert
Posts: 1913
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Copy tags and its child elements from specific XML schema

#14 Post by Aacini » 07 Nov 2022 11:04

I am pretty sure that the problem is caused by the very high number of files that are keep open (and the very high number of waitfor different signals, etc).

I modified the method in order to completely process each file, one by one. In this way, the number of files should not affect the task, although the total elapsed time may be larger than before...

Code: Select all

@echo off
setlocal EnableDelayedExpansion

REM https://www.dostips.com/forum/viewtopic.php?f=3&t=10579
REM Antonio Perez Ayala
REM Version 2: Manage long lines
REM Version 3: Modified method to process whole files one-by-one

if "%1" neq "" goto ProcessFile

rem In this version 3 each file is processed completely
rem so each tag of every file is stored in its own file
rem At end, all tag files are combined in output.txt result file

rem Create a file with a space and no CR-LF at end
for %%X in (^"^
% Do NOT remove this line %
^") do set /P "=X%%~X " < NUL > space.tmp
findstr /V "X" space.tmp > space.txt
del *.tmp output.txt tags.txt

ECHO %time:~0,-3% - Start Process
set i=0
set "firstFile=true"
for %%f in (*.xml) do (
   set /A i+=1
   set "file=%%f"
   cmd /C "%~F0" !i!
   set "firstFile="
)

ECHO %time:~0,-3% - - All input files processed, creating output.txt file
(
   for /F "delims=" %%a in (tags.txt) do (
      for /F "tokens=2 delims=</>" %%f in ("%%a") do (
         type "tagData-%%f.tmp"
         echo %%a
      )
   )
   echo ^</COLLADA^>
) >> output.txt

ECHO %time:~0,-3% - End Process
del *.tmp tags.txt space.txt
goto :EOF


=================================================


:ProcessFile		Process all tags of the file given as parameter

ECHO %time:~0,-3% - - Process file #%1: %file%

set "inTag="
rem Assemble a While-loop to read all file lines
< "%file%" ( for /L %%? in () do (

   rem Read next line; if EOF: exit
   set "line="
   set /P "line="
   if "!line!" equ "" exit

   if not defined inTag (
      rem First part: Copy/Ignore up to "</asset>" input line
      if defined firstFile >> output.txt echo !line!
      if "!line!" equ "  </asset>" (
         set "inTag=1"
         set "tagName="
      )
   ) else for /F "tokens=2 delims=<>" %%b in ("!line!") do (
      rem Second part: Copy each tag and its children to it's own file

      if not defined tagName (
         rem Start of tagNameN

         set "tagName=%%b"
         set "childName="
         set /A "add=0, omit=0"
         SET /P "=|        - - - TAG !tagName!: " < NUL

         if defined firstFile (
            rem Initialize files for tagData and childIds of this tagName
            > "tagData-!tagName!.tmp" echo !line!
            del "childIds-!tagName!.tmp" 2> NUL
         ) else (
            rem Load current childIds of this tagName
            setlocal EnableDelayedExpansion
            for /F "usebackq" %%i in ("childIds-!tagName!.tmp") do set "child[%%i]=1"
         )

      ) else if "%%b" equ "/!tagName!" (
         rem End of tagNameN

         ECHO !add! items added, !omit! omitted

         if defined firstFile (
            rem Store the closing line for this tagData file
            >> tags.txt echo !line!
         ) else (
            rem Release childIds of this tagName
            endlocal
         )

         set "tagName="

      ) else if not defined childName (
         rem Start of new child of tagNameN

         for /F "tokens=1,3 delims=<= " %%c in ("%%b") do set "childName=%%c" & set "childId=%%~d"
         if not defined child[!childId!] (
            >> "tagData-!tagName!.tmp" echo !line!
            >> "childIds-!tagName!.tmp" echo !childId!
            set /A "add+=1, inChild=1"
         ) else (
            set /A "omit+=1"
            set "inChild="
         )

      ) else if "%%b" equ "/!childName!" (
         rem End of this child

         set "childName="
         if defined inChild >> "tagData-!tagName!.tmp" echo !line!
         set "inChild="

      ) else if defined inChild (

         if "!line:~1022!" neq "" >> "tagData-!tagName!.tmp" call :longLine
         >> "tagData-!tagName!.tmp" echo !line!
      )

   )

) )

rem Previous ")" close a While-loop: for /L %%? in () do (
rem so execution never reach this point

=======================================

:longLine
if "%line:~0,1%" equ " " (
   type space.txt
   set "line=%line:~1%"
   goto longLine
)
set /P "line=%line%"
if "%line:~1022%" neq "" goto longLine
exit /B
Antonio

Alanick
Posts: 12
Joined: 29 Oct 2022 14:06

Re: Copy tags and its child elements from specific XML schema

#15 Post by Alanick » 07 Nov 2022 18:59

Aacini wrote:
07 Nov 2022 11:04
I modified the method in order to completely process each file, one by one. In this way, the number of files should not affect the task, although the total elapsed time may be larger than before...
Thank you for the update.

Hm, for some odd reason, on some of the <float_array> elements have the following result:
error.png
error.png (181.05 KiB) Viewed 12501 times
A big chunk from the end of the float_array, gets copied after </float_array> while adding once again the </float_array> closing element, the next element called <technique_common> is copied right after it including its indentation, thus corrupting the XMLs correct syntax.

I do not know if this problem happened on the prior version, as the process would never finish as i have mentioned, for me to have the ability to check it.
It seems the batch is indeed struggling with very long lines, yikes.

Post Reply