Page 1 of 2

file splitting

Posted: 09 Mar 2011 10:18
by cactusman252
I need to split a master text file into multiple text files based on user name. I have a script that does the job but it splits the file based on the last character instead of the user name. If the user name is repeated it needs to append to the same file.

master file's content is such:
==================================================
Directory of Y:\DDAN\Desktop Backups\2010-02-27

26/08/2009 09:53 AM 4,707,780 101_0159.AVI
26/08/2009 09:54 AM 6,794,386 102_0213.AVI
2 File(s) 11,502,166 bytes

Directory of Y:\SJOHN\My Pictures\typo

01/11/2009 09:11 AM 21,874,732 Pictures Misc 2009 098.avi
1 File(s) 21,874,732 bytes

===================================================

Just need to modify the script so it splits based on user name. Any questions or uncertainties feel free to ask. Thanks


@echo off & setLocal EnableDELAYedExpansion
if exist file?.txt del file?.txt

for /f "tokens=* delims= " %%a in (masterfile.txt) do (
set str=%%a
echo !str! | find "Directory of" > nul
if not errorlevel 1 (
set dest=!str!
if defined dest set dest=!dest:~-1!
)
if defined dest echo !str!>> file!dest!.txt
)

Re: file splitting

Posted: 09 Mar 2011 21:19
by InterociterOperator
You guys make my head hurt.

The closest I could come to what you want is to figure out the shortest user name (4 characters) and use that in a revised set command.

Instead of

Code: Select all

if defined dest set dest=!dest:~-1!

I used

Code: Select all

if defined dest set dest=!dest:~16,4!

...which counts from the left 16 characters
"Directory of Y:\" and then grabs the next four characters.

I think your code was grabbing the last character in the line due to the "-1".

So the code looks like...

Code: Select all

@echo off & setLocal EnableDELAYedExpansion
if exist file?.txt del file?.txt

for /f "tokens=* delims= " %%a in (masterfile.txt) do (
set str=%%a
echo !str! | find "Directory of" > nul
if not errorlevel 1 (
set dest=!str!
rem  if defined dest set dest=!dest:~-1!
if defined dest set dest=!dest:~16,4!
)
if defined dest echo !str!>> file!dest!.txt
)


What do you think? Close enough? Or do we have to capture the exact users name?

Re: file splitting

Posted: 09 Mar 2011 21:28
by InterociterOperator
dang.... up at the top...

Code: Select all

if exist file?.txt del file?.txt


would be ...

Code: Select all

if exist file*.txt del file*.txt


. ... also.

Re: file splitting

Posted: 10 Mar 2011 15:10
by aGerman
Difficult thing.
Try this:

Code: Select all

@echo off &setlocal enabledelayedexpansion
for /f "delims=:" %%a in ('findstr /n "^" "masterfile.txt"') do set "last=%%a"

set /a n+=0
for /f "tokens=1* delims=:" %%a in ('findstr /n "Directory of " "masterfile.txt"') do (
  for /f "tokens=2 delims=\" %%c in ("%%~b") do (
    set /a end_!n!=%%a-1
    set /a n+=1
    set "start_!n!=%%a"
    set "name_!n!=%%c"
    set "end_!n!=%last%"
  )
)

for /l %%a in (1,1,%n%) do (
  set "range=false"
  >"!name_%%a!.txt" type nul
  for /f "tokens=1* delims=:" %%b in ('findstr /n "^" "masterfile.txt"') do (
    if %%b==!start_%%a! set "range=true"
    if !range!==true (
      >>"!name_%%a!.txt" echo(%%c
    )
    if %%b==!end_%%a! set "range=false"
  )
)


Regards
aGerman

Re: file splitting

Posted: 11 Mar 2011 07:11
by jjj2k
Hi there guys!

I have a similar problem for splitting this CSV file:

DH,aosd,aoisjd,,,,,,, [first dh - for demo]
DA,asjd,oaisjd,,,,,,,
DB,aosd,oasdioaj,,,,,,,
DC,aosda,0q4234,,,,,,,
DH,12039,019823,,,,,, [second dh - for demo]
DA,aosijd,apisjd,,,,,
DB,19823now,owiejhw,,,,,

I want to split the master CSV file by the DH rows. So I want the script to split the master file to blocks of DH, so the first file will be DHaosd and the second split will be DH12039. the problem is that the lines between the DH's are variable so how would I do it? Is DOS capable of doing it? If not if you could show me or suggest me how it could be done otherwise.

Re: file splitting

Posted: 11 Mar 2011 09:22
by !k
jjj2k

Code: Select all

@echo off &setlocal enabledelayedexpansion

for /f "delims=" %%a in (master.csv) do (
  for /f "tokens=1,2 delims=," %%b in ("%%a") do (
    if "%%b"=="DH" (set "file=%%b%%c.csv" &>!file! cd.)
    >>!file! echo.%%a
  )
)

Re: file splitting

Posted: 11 Mar 2011 16:57
by InterociterOperator
aGerman,

You got the username to filename perfectly, but does it capture multiple directories with the same user name in one file?

"If the user name is repeated it needs to append to the same file."

What do we have to tweak to get it to do that.

Re: file splitting

Posted: 11 Mar 2011 17:14
by aGerman
Hmm, more difficult.

Code: Select all

@echo off &setlocal enabledelayedexpansion
for /f "delims=:" %%a in ('findstr /n "^" "masterfile.txt"') do set "last=%%a"

set /a n+=0
for /f "tokens=1* delims=:" %%a in ('findstr /n "Directory of " "masterfile.txt"') do (
  for /f "tokens=2 delims=\" %%c in ("%%~b") do (
    set /a end_!n!=%%a-1
    set /a n+=1
    set "start_!n!=%%a"
    set "name_!n!=%%c"
    set "end_!n!=%last%"
  )
)

for /l %%a in (1,1,%n%) do (
  if exist "!name_%%a!.txt" del "!name_%%a!.txt"
)

for /l %%a in (1,1,%n%) do (
  set "range=false"
  for /f "tokens=1* delims=:" %%b in ('findstr /n "^" "masterfile.txt"') do (
    if %%b==!start_%%a! set "range=true"
    if !range!==true (
      >>"!name_%%a!.txt" echo(%%c
    )
    if %%b==!end_%%a! set "range=false"
  )
)


The additional FOR /L loop is to protect you from appending the same things if you run the batch file twice. It deletes existing files before it writes the new files. Hope this will work for you.

Regards
aGerman

Re: file splitting

Posted: 11 Mar 2011 17:18
by InterociterOperator
Oh geez. The Submit button is so smart. As soon as I hit it the answer pops up.

Data file with two occurrences of DDAN

Code: Select all

Directory of Y:\DDAN\Desktop Backups\2010-02-27

26/08/2009 09:53 AM 4,707,780 101_0159.AVI
26/08/2009 09:54 AM 6,794,386 102_0213.AVI
2 File(s) 11,502,166 bytes

Directory of Y:\SJOHN\My Pictures\typo

01/11/2009 09:11 AM 21,874,732 Pictures Misc 2009 098.avi
1 File(s) 21,874,732 bytes

Directory of Y:\DDAN\Desktop \2010-03-27

27/09/2010 09:53 AM 4,707,780 111_1159.AVI
27/09/2010 09:54 AM 6,794,386 112_1213.AVI
27/09/2010 09:55 AM 6,794,387 112_1214.AVI
3 File(s) 11,502,166 bytes


In the second paragraph of the code... >> instead of >

Code: Select all

for /l %%a in (1,1,%n%) do (
  set "range=false"
rem  >"!name_%%a!.txt" type nul
>>"!name_%%a!.txt" type nul

Re: file splitting

Posted: 11 Mar 2011 17:23
by aGerman
Yeah, you could remove the entire line. But have a look at my comment above :wink:

Regards
aGerman

Re: file splitting

Posted: 13 Mar 2011 19:13
by jjj2k
Hi Guys,

Thanks for your help but I have run into a problem with the script.

--- script ---
@echo off &setlocal enabledelayedexpansion

for /f "delims=" %%a in (combined.csv) do (
for /f "tokens=1,2 delims=," %%b in ("%%a") do (
if "%%b"=="DH" (set "file=%%b%%c.csv" &>!file! cd.)
>>!file! echo.%%a
)
)
---

My CSV files were initially split into 5 files each 640 mb as Bill.CSV Bill.001 Bill.002 and so on...

I merged them together using: copy Bill.CSV+Bill.001+Bill.002 Combined.CSV
I also tried with copy /a Bill.CSV+Bill.001+Bill.002 Combined.CSV

If I run my bat script on individual files it extracts the blocks of DH records, but when I run it on Combined.csv the script doesn't initiate anything. Am I doing anything wrong? The problem is the DH records run across files so if I don't merge them together I risk losing billing data for some accounts.

Re: file splitting

Posted: 15 Mar 2011 15:56
by aGerman
Hmm, not sure, but you could try to use the TYPE command.

Code: Select all

for /f "delims=" %%a in ('type "combined.csv"') do (


Regards
aGerman

Re: file splitting

Posted: 16 Mar 2011 22:17
by ghostmachine4
jjj2k wrote:Hi Guys,

Thanks for your help but I have run into a problem with the script.

--- script ---
@echo off &setlocal enabledelayedexpansion

for /f "delims=" %%a in (combined.csv) do (
for /f "tokens=1,2 delims=," %%b in ("%%a") do (
if "%%b"=="DH" (set "file=%%b%%c.csv" &>!file! cd.)
>>!file! echo.%%a
)
)
---

My CSV files were initially split into 5 files each 640 mb as Bill.CSV Bill.001 Bill.002 and so on...

I merged them together using: copy Bill.CSV+Bill.001+Bill.002 Combined.CSV
I also tried with copy /a Bill.CSV+Bill.001+Bill.002 Combined.CSV

If I run my bat script on individual files it extracts the blocks of DH records, but when I run it on Combined.csv the script doesn't initiate anything. Am I doing anything wrong? The problem is the DH records run across files so if I don't merge them together I risk losing billing data for some accounts.



yet again, another thread trying to parse files with batch... @jjj2k, use a good programming language (Perl,Python, Ruby, vbscript, etc) or some tools that are suited for file parsing. Batch is definitely not one of them. If you can, download gawk for windows and use this one liner

Code: Select all

C:\test>more file
DH,aosd,aoisjd,,,,,,, [first dh - for demo]
DA,asjd,oaisjd,,,,,,,
DB,aosd,oasdioaj,,,,,,,
DC,aosda,0q4234,,,,,,,
DH,12039,019823,,,,,, [second dh - for demo]
DA,aosijd,apisjd,,,,,
DB,19823now,owiejhw,,,,,

C:\test>gawk -vRS="DH" "NF{print \"DH\"$0 > \"file\"++c }" file

C:\test>more file1
DH,aosd,aoisjd,,,,,,, [first dh - for demo]
DA,asjd,oaisjd,,,,,,,
DB,aosd,oasdioaj,,,,,,,
DC,aosda,0q4234,,,,,,,


C:\test>more file2
DH,12039,019823,,,,,, [second dh - for demo]
DA,aosijd,apisjd,,,,,
DB,19823now,owiejhw,,,,,



Re: file splitting

Posted: 16 Mar 2011 23:18
by jjj2k
Hi,

I installed GAWK but unable to use it on my Win 7 64 bit. I run your command but no output.

ghostmachine4 wrote:
jjj2k wrote:Hi Guys,

Thanks for your help but I have run into a problem with the script.

--- script ---
@echo off &setlocal enabledelayedexpansion

for /f "delims=" %%a in (combined.csv) do (
for /f "tokens=1,2 delims=," %%b in ("%%a") do (
if "%%b"=="DH" (set "file=%%b%%c.csv" &>!file! cd.)
>>!file! echo.%%a
)
)
---

My CSV files were initially split into 5 files each 640 mb as Bill.CSV Bill.001 Bill.002 and so on...

I merged them together using: copy Bill.CSV+Bill.001+Bill.002 Combined.CSV
I also tried with copy /a Bill.CSV+Bill.001+Bill.002 Combined.CSV

If I run my bat script on individual files it extracts the blocks of DH records, but when I run it on Combined.csv the script doesn't initiate anything. Am I doing anything wrong? The problem is the DH records run across files so if I don't merge them together I risk losing billing data for some accounts.



yet again, another thread trying to parse files with batch... @jjj2k, use a good programming language (Perl,Python, Ruby, vbscript, etc) or some tools that are suited for file parsing. Batch is definitely not one of them. If you can, download gawk for windows and use this one liner

Code: Select all

C:\test>more file
DH,aosd,aoisjd,,,,,,, [first dh - for demo]
DA,asjd,oaisjd,,,,,,,
DB,aosd,oasdioaj,,,,,,,
DC,aosda,0q4234,,,,,,,
DH,12039,019823,,,,,, [second dh - for demo]
DA,aosijd,apisjd,,,,,
DB,19823now,owiejhw,,,,,

C:\test>gawk -vRS="DH" "NF{print \"DH\"$0 > \"file\"++c }" file

C:\test>more file1
DH,aosd,aoisjd,,,,,,, [first dh - for demo]
DA,asjd,oaisjd,,,,,,,
DB,aosd,oasdioaj,,,,,,,
DC,aosda,0q4234,,,,,,,


C:\test>more file2
DH,12039,019823,,,,,, [second dh - for demo]
DA,aosijd,apisjd,,,,,
DB,19823now,owiejhw,,,,,



Re: file splitting

Posted: 16 Mar 2011 23:51
by ghostmachine4
jjj2k wrote:Hi,
I installed GAWK but unable to use it on my Win 7 64 bit. I run your command but no output.

did you expect output to be printed to screen? or did you already check file1 and file2 that is produced by the command?