Extract rows from text file and slight reformatting
Moderator: DosItHelp
-
- Posts: 9
- Joined: 15 Sep 2008 17:20
Extract rows from text file and slight reformatting
Hi:
I have a pretty straightforward objective, but I am having problems getting the code to work. Any help or suggestions from the DOS gurus here would be most appreciated!
Short version
---------------
- want to extract a consecutive set of rows from a text file (omitting rows from the top and the bottom of the file)
- add a string to the beginning of each of these rows and then write to a new file
More details
--------------
The original file (tmp.csv) has N+T+2 rows (N and T are integers),
row1
row2
.
.
rowN
City, Visits, Page_Visits,...
City1, Visits1, Page_Visits1,...
.
.
.
CityT, VisitsT, Page_VisitsT,...
# --------------------------------------------------------------------------------
Note that row N+1 is a text string (variable names) while the T rows below it are numbers (the data).
I am interested only in the data rows: the T rows starting just after row N+1 and ending with the second to last row.
I would like to save these data rows to a file (NewData.csv) but with one small change: before each row I would like to add the contents of a variable (%%S). So for example the first row of NewData.csv would be,
%%S, City1, Visits1, Page_Visits1,...
There are two other issues, since I am actually doing this for a series of tmp.csv files:
(i) sometimes tmp.csv has no data (e.g. the bottom rows are,
rowN
City, Visits, Page_Visits,...
# --------------------------------------------------------------------------------
). I am not sure how to keep the batch file from crashing here.
(ii) I would like the first row of the new file NewData.csv to be the variable names (e.g. row N+1). I do not know the variable names in advance (e.g. I do not know the entire string in row N+1), but of course it can be read from the first tmp.csv that is used.
I have a pretty straightforward objective, but I am having problems getting the code to work. Any help or suggestions from the DOS gurus here would be most appreciated!
Short version
---------------
- want to extract a consecutive set of rows from a text file (omitting rows from the top and the bottom of the file)
- add a string to the beginning of each of these rows and then write to a new file
More details
--------------
The original file (tmp.csv) has N+T+2 rows (N and T are integers),
row1
row2
.
.
rowN
City, Visits, Page_Visits,...
City1, Visits1, Page_Visits1,...
.
.
.
CityT, VisitsT, Page_VisitsT,...
# --------------------------------------------------------------------------------
Note that row N+1 is a text string (variable names) while the T rows below it are numbers (the data).
I am interested only in the data rows: the T rows starting just after row N+1 and ending with the second to last row.
I would like to save these data rows to a file (NewData.csv) but with one small change: before each row I would like to add the contents of a variable (%%S). So for example the first row of NewData.csv would be,
%%S, City1, Visits1, Page_Visits1,...
There are two other issues, since I am actually doing this for a series of tmp.csv files:
(i) sometimes tmp.csv has no data (e.g. the bottom rows are,
rowN
City, Visits, Page_Visits,...
# --------------------------------------------------------------------------------
). I am not sure how to keep the batch file from crashing here.
(ii) I would like the first row of the new file NewData.csv to be the variable names (e.g. row N+1). I do not know the variable names in advance (e.g. I do not know the entire string in row N+1), but of course it can be read from the first tmp.csv that is used.
Hi NYTReader123,
try to use the FOR statement to extract single lines from the file.
But first you have to simple count the number of lines (also with the FOR),
then it should be simple.
Then you start a second FOR-run and append your string like
hope it helps
jeb
try to use the FOR statement to extract single lines from the file.
Code: Select all
for /f "tokens=*" %%a in (tmp.csv) do @echo %%a
But first you have to simple count the number of lines (also with the FOR),
then it should be simple.
Then you start a second FOR-run and append your string like
Code: Select all
echo PRESTR %%a
hope it helps
jeb
-
- Posts: 9
- Joined: 15 Sep 2008 17:20
Thanks Jeb for the suggestion (and sorry for my slow response)!
My main stumbling block is how to limit the lines which are echo'd from the tmp.csv file. My strategy was to find the row right before the data starts and also to count the number of rows in the file (since I want to stop echo'ing at the second to last row).
The following does not quite work:
REM FR will hold the row number just before the data begins
For /F %%A in ('Find /V /C "City,Visits" tmp.csv') Do set FR=%%A
REM numrows will hold the number of rows in tmp.csv
set /a numrows=0
for /f %%n in ('type "tmp.csv"|find "" /v /c') do set /a numrows=%%n
Any suggestions on what is going wrong?
And then if I do get this to work, how do I use the two variables here to limit the rows which are echoed? I know how to skip the first FR rows,
For /F "tokens=* skip=%FR%" %%A in (tmp.csv) do echo %%A
but am not sure how to skip the last row.
Any help on these two questions would be most appreciated!
My main stumbling block is how to limit the lines which are echo'd from the tmp.csv file. My strategy was to find the row right before the data starts and also to count the number of rows in the file (since I want to stop echo'ing at the second to last row).
The following does not quite work:
REM FR will hold the row number just before the data begins
For /F %%A in ('Find /V /C "City,Visits" tmp.csv') Do set FR=%%A
REM numrows will hold the number of rows in tmp.csv
set /a numrows=0
for /f %%n in ('type "tmp.csv"|find "" /v /c') do set /a numrows=%%n
Any suggestions on what is going wrong?
And then if I do get this to work, how do I use the two variables here to limit the rows which are echoed? I know how to skip the first FR rows,
For /F "tokens=* skip=%FR%" %%A in (tmp.csv) do echo %%A
but am not sure how to skip the last row.
Any help on these two questions would be most appreciated!
-
- Posts: 9
- Joined: 15 Sep 2008 17:20
follow-up
Hi everyone:
I hope this is not a violation of the etiquette here, but I am still struggling with this problem. If anyone can pass along some suggestions on my code and questions from my 9 Jan posting I would be greatly indebted!
I hope this is not a violation of the etiquette here, but I am still struggling with this problem. If anyone can pass along some suggestions on my code and questions from my 9 Jan posting I would be greatly indebted!
Hi NTY,
so I try to find a hint.
I build my own tmp.csv
First, I tried "Find /V /C" and it will result on my system (Vista) with
---------- tmp.csv: 9
this could be the first problem, because the set FR=%%A will fail.
But perhaps on your system find.exe works different.
But the /C stands for counting the lines which contains the text, and /V shows only the line which no contain the text, that's not excactly what you want.
This code works with my tmp.csv, but perhaps your problem is quite different from my solution.
hope it helps
Jan Erik
so I try to find a hint.
I build my own tmp.csv
Code: Select all
remark 1
remark 2
remark ...
remark N
City,Visits
Hamburg,1
Berlin,8
Bochum,1000
This is the end, but should not used
Code: Select all
For /F %%A in ('Find /V /C "City,Visits" tmp.csv') Do set FR=%%A
First, I tried "Find /V /C" and it will result on my system (Vista) with
---------- tmp.csv: 9
this could be the first problem, because the set FR=%%A will fail.
But perhaps on your system find.exe works different.
But the /C stands for counting the lines which contains the text, and /V shows only the line which no contain the text, that's not excactly what you want.
This code works with my tmp.csv, but perhaps your problem is quite different from my solution.
Code: Select all
@ECHO off
setlocal ENABLEDELAYEDEXPANSION
REM FR will hold the row number where the data begins
For /F "delims=[]" %%A in ('type "tmp.csv" ^| find /N "City,Visits"') Do set /a FirstLine=%%A
REM numrows will hold the total number of rows in tmp.csv
set /a numrows=0
for /f %%n in ('type "tmp.csv" ^| find "" /V /C') do set /a numrows=%%n
set /a showRows=numrows-FirstLine-1
echo **** Info FirstLine=%FirstLine% num=%numrows% ShowRows=%showRows%
set /a row=0
for /F "tokens=* skip=%FirstLine%" %%r in (tmp.csv) do (
set /a row=row + 1
if !row! LEQ !showRows! echo %%r
)
hope it helps
Jan Erik
-
- Posts: 9
- Joined: 15 Sep 2008 17:20
Jan:
Thanks a ton for your comment!
I had actually figured out something along the lines of your first points over the weekend, but I never would have thought of the loop at the end of your code. Really helpful!!
I have been a bit buried at work, but will try your code in the next couple of days. I may have one more short question, but either way will report back.
Again, many thanks for your assistance.
Thanks a ton for your comment!
I had actually figured out something along the lines of your first points over the weekend, but I never would have thought of the loop at the end of your code. Really helpful!!
I have been a bit buried at work, but will try your code in the next couple of days. I may have one more short question, but either way will report back.
Again, many thanks for your assistance.
-
- Posts: 9
- Joined: 15 Sep 2008 17:20
code that works (and one more question)
Jan and board readers:
Following Jan's great suggestions, I got the code working. Here is what I used:
Note that this code works even in the case where showrows is zero (e.g. there is no data to add).
I have one last question about an item which occurs in my code before the chunk above: I need to loop through some numbers and it is important that all leading zeros be included (in my case that there are always two digits). Is there a way to do it in one step? My attempt listed below fails: basically I do no know how to set a new variable (dd) equal to the counter (d) --> in my code "dd" is never set to any value.
Note that I tried setting an initial value for the new variable (set dd=0), but then this value was never changed. I also tried variations discussed on the board,
but this does not work either.
Does anyone have a suggestion? Again thanks for any help which can be offered.
Following Jan's great suggestions, I got the code working. Here is what I used:
Code: Select all
@ECHO off
SETLOCAL ENABLEEXTENSIONS
SETLOCAL ENABLEDELAYEDEXPANSION
REM firstrow will hold the row number just before the data begins
REM headers will hold list of variable names
For /F "tokens=1,2,* delims=[]" %%A in ('type tmp.csv^|Find /N "City,Visits"') Do set /a firstrow=%%A
For /F "tokens=*" %%B in ('type tmp.csv^|Find "City,Visits"') Do set headers=%%B
REM numrows will hold the number of rows in tmp.csv
set /a numrows=0
FOR /f %%n in ('type tmp.csv^|find "" /v /c') do set /a numrows=%%n
set /a showRows=numrows-firstrow-1
REM check code
echo Info: row before data=!firstrow!, num rows=!numrows!, num rows with data=!showRows!, headers=!headers!
REM save relevant data (with state name listed in front)
REM add var headers in first row
IF NOT EXIST "MyFile.csv" (
echo State,!headers!>>"MyFile.csv"
)
set /a row=0
for /F "tokens=* skip=%firstrow%" %%r in (tmp.csv) do (
set /a row=row + 1
if !row! LEQ !showRows! echo %%S,%%r>>"=MyFile.csv"
)
Note that this code works even in the case where showrows is zero (e.g. there is no data to add).
I have one last question about an item which occurs in my code before the chunk above: I need to loop through some numbers and it is important that all leading zeros be included (in my case that there are always two digits). Is there a way to do it in one step? My attempt listed below fails: basically I do no know how to set a new variable (dd) equal to the counter (d) --> in my code "dd" is never set to any value.
Code: Select all
for /l %%d in (1,1,30) do (
REM ensures two digits
if %%d lss 10 (
set dd=0%%d
) else (
set dd=%%d
)
Note that I tried setting an initial value for the new variable (set dd=0), but then this value was never changed. I also tried variations discussed on the board,
Code: Select all
set /a j=%%d
set dd=0!j!&set dd=!dd:~-2!
but this does not work either.
Does anyone have a suggestion? Again thanks for any help which can be offered.
Hi NTYeader,
it's nice that your code works finally.
Your minor problem is not a problem.
Works fine.
Or you can solve it this way
Hope it helps
jeb
it's nice that your code works finally.
Your minor problem is not a problem.
Code: Select all
setlocal ENABLEDELAYEDEXPANSION
for /l %%d in (1,1,30) do (
REM ensures two digits
if %%d lss 10 (
set dd=0%%d
) else (
set dd=%%d
)
echo dd=!dd!
)
Works fine.
Or you can solve it this way
Code: Select all
setlocal ENABLEDELAYEDEXPANSION
for /l %%d in (101,1,130) do (
set temp=%%d
set dd=!temp:~1!
echo !dd!
)
Hope it helps
jeb
Or that way:
Code: Select all
setlocal ENABLEDELAYEDEXPANSION
for /l %%d in (1,1,30) do (
set dd=0%%d
set dd=!dd:~-2!
echo !dd!
)
-
- Posts: 9
- Joined: 15 Sep 2008 17:20
thanks again (and one more question: ugh)
Jeb:
Thanks again. I obviously should not be writing batch files at 4 in the morning. Your tweak of my code worked perfectly (again). I owe you a beer!!!
Unfortunately I have one one other issue which I had not noticed before. From the end of the code which you had suggested before, I cannot get the final FOR loop to work:
I get an error,
which I am pretty sure means it is not using the value for "firstrow". This is puzzling since in the earlier check,
it outputs correct numbers.
I am pretty sure this has to do with the SETLOCAL ENABLEDELAYEDEXPANSION but I have not been able to figure out what I am doing wrong:
- I tried "!firstrow!" in the FOR loop and I get a message that this is unexpected at this time
- I tried creating a new variable (set var temp=!firstrow!) and same error as before
If someone can point out what is probably a very stupid mistake on my end I would be greatly in their debt. I feel badly coming back to the board so many times with questions, but my hope is that I will have something to contribute in the future.
PS Thanks also DosItHelp!
Thanks again. I obviously should not be writing batch files at 4 in the morning. Your tweak of my code worked perfectly (again). I owe you a beer!!!
Unfortunately I have one one other issue which I had not noticed before. From the end of the code which you had suggested before, I cannot get the final FOR loop to work:
Code: Select all
set /a row=0
for /F "tokens=* skip=%firstrow%" %%r in (tmp.csv) do (
set /a row=row + 1
if !row! LEQ !showRows! echo %%S,%%r>>"=MyFile.csv"
)
I get an error,
Code: Select all
" was unexpected at this time
which I am pretty sure means it is not using the value for "firstrow". This is puzzling since in the earlier check,
Code: Select all
echo Info: row before data=!firstrow!
it outputs correct numbers.
I am pretty sure this has to do with the SETLOCAL ENABLEDELAYEDEXPANSION but I have not been able to figure out what I am doing wrong:
- I tried "!firstrow!" in the FOR loop and I get a message that this is unexpected at this time
- I tried creating a new variable (set var temp=!firstrow!) and same error as before
If someone can point out what is probably a very stupid mistake on my end I would be greatly in their debt. I feel badly coming back to the board so many times with questions, but my hope is that I will have something to contribute in the future.
PS Thanks also DosItHelp!
-
- Posts: 9
- Joined: 15 Sep 2008 17:20