Discussion forum for all Windows batch related topics.
Moderator: DosItHelp
-
SIMMS7400
- Posts: 546
- Joined: 07 Jan 2016 07:47
#1
Post
by SIMMS7400 » 03 Apr 2021 06:44
HI Dave/Team -
I have a text file that is 16 million rows and I need to extract the unique values from the 8th position of the file.
Code: Select all
"Property Gross Royalties" "Commercial" "Actual" "Final" "FY14" "Jan" "Periodic" "O-20003921" "FDR" "PA" 120
"Sides" "Commercial" "Actual" "Final" "FY14" "Jan" "Periodic" "O-20003921" "FDR" "PA" 1
"Volume" "Commercial" "Actual" "Final" "FY14" "Jan" "Periodic" "O-20003921" "FDR" "PA" 17000
"Property AGC" "Commercial" "Actual" "Final" "FY14" "Jan" "Periodic" "O-20003921" "FDR" "PA" 2000
Is JREPL the right tool for this?
-
Aacini
- Expert
- Posts: 1914
- Joined: 06 Dec 2011 22:15
- Location: México City, México
-
Contact:
#2
Post
by Aacini » 04 Apr 2021 08:21
I am afraid I don't understand what exactly you want. Do you want the values of the 8th position that appear just one time? (that is, in just one line)
This program counts the number of times that appear each value of the 8th column:
Code: Select all
@if (@CodeSection == @Batch) @then
@echo off
cscript //nologo //E:jscript "%~F0" < test.txt
goto :EOF
@end
var count = new Array();
while ( ! WScript.Stdin.AtEndOfStream ) {
var line = WScript.Stdin.ReadLine();
// WScript.Stdout.WriteLine(line);
var column = line.split("\t");
// WScript.Stdout.WriteLine(column[7]);
if (count[column[7]] == undefined) count[column[7]] = 0;
count[column[7]] += 1;
}
for ( var item in count) {
WScript.Stdout.WriteLine(item+": "+count[item]+" times");
}
For example:
... so just add an "if (count[item] == 1)" to show just the values that appear one time...
Antonio
-
SIMMS7400
- Posts: 546
- Joined: 07 Jan 2016 07:47
#3
Post
by SIMMS7400 » 04 Apr 2021 10:19
Hi Aacini !
Thank you ! Basically I need to pull all the unique values in column 8. Is that possible?
-
Squashman
- Expert
- Posts: 4486
- Joined: 23 Dec 2011 13:59
#4
Post
by Squashman » 04 Apr 2021 10:20
Technically speaking a well formatted delimited file could be read by a FOR /F command. Only drawback to using a FOR /F is that it will load the entire file into memory and will fail if the file is larger than the 32Bit limit. Roughly 2.1GB's.
-
Squashman
- Expert
- Posts: 4486
- Joined: 23 Dec 2011 13:59
#5
Post
by Squashman » 04 Apr 2021 10:27
Dave's JSORT might be another option for you as well. That would give you all the unique values in the 8th delimited field but would also give you the whole record. But again, you could use a FOR /F to parse the output of JSORT from inside the IN clause.
-
Aacini
- Expert
- Posts: 1914
- Joined: 06 Dec 2011 22:15
- Location: México City, México
-
Contact:
#6
Post
by Aacini » 04 Apr 2021 20:21
Use this code. I think it is the fastest way to get this result:
Code: Select all
@if (@CodeSection == @Batch) @then
@echo off
cscript //nologo //E:jscript "%~F0" < test.txt
goto :EOF
@end
var count = new Array();
while ( ! WScript.Stdin.AtEndOfStream ) {
var line = WScript.Stdin.ReadLine();
var column = line.split("\t");
count[column[7]] = 0;
}
for ( var item in count) {
WScript.Stdout.WriteLine(item);
}
Antonio
-
SIMMS7400
- Posts: 546
- Joined: 07 Jan 2016 07:47
#7
Post
by SIMMS7400 » 08 Apr 2021 02:16
Antonio -
That worked wonderfully!! 16+ millions rows very quickly. Thank you again for this, much appreciated!!