Page 1 of 1

Pulling out sets of fixed multiple lines onto single set

Posted: 29 Jul 2017 21:38
by plasma33
Hello everyone,

I would like pull out sets of fixed multiple lines from around 500,000 plus lines onto a single set. A simple demonstration of what I would like to achieve is represented below:
Input:

Code: Select all

RGRGRKRGRHRGRGRGIGRMKHIGRMRRIGKMKMIGRHRLIGRIRNIGRL
|||||||||||||||||||||||||.|:||:||.|||.:.|||:::|||.
RGRGRKRGRHRGRGRGIGRMKHIGRGRKIGRMKHIGRLKHIGRMKHIGRH

RGIGRKRGIGRGRGIGRQRHIGKLKHIGRGRIIGRGRGIGRGRGIGRGRG
:.||:.:.|||.|.|||.|.||:.:.|||.|.|||||||||.:.|||.:.
KLIGKMKMIGRHRLIGRGRGIGRQRGIGRKRNIGRGRGIGRMKHIGRNKM

IGRRRRIGKKKKGDGARGRGRKRGRHRGRHRGIGRMKHIGRGRGIGKMKM
|||.:.||::::|||||||||||||||||||||||||||||.:.||||||
IGRMKHIGRRRQGDGARGRGRKRGRHRGRHRGIGRMKHIGRRKMIGKMKM

IGRHRLIGRIRMIGRLRGIGRKRGIGRGRGIGRGRRIGKMKLIGRGRRIG
|||||||||.|.|||.|||||||.|||||||||.:.||:.:.|||.:.||
IGRHRLIGRGRKIGRQRGIGRKRNIGRGRGIGRMKHIGRHRRIGRMKHIG

KKKLIGRGRRIGKMRHIGRMRQIGRNRNGDGARGRGRKRGRHRGRIRGIG
:.|.|||.:.||:.:.||:|:.|||:|.||||||||||||||||||||||
RIKHIGRMKHIGRRKMIGKMKMIGRHRLGDGARGRGRKRGRHRGRIRGIG


Output:

Code: Select all

RGRGRKRGRHRGRGRGIGRMKHIGRMRRIGKMKMIGRHRLIGRIRNIGRLRGIGRKRGIGRGRGIGRQRHIGKLKHIGRGRIIGRGRGIGRGRGIGRGRGIGRRRRIGKKKKGDGARGRGRKRGRHRGRHRGIGRMKHIGRGRGIGKMKMIGRHRLIGRIRMIGRLRGIGRKRGIGRGRGIGRGRRIGKMKLIGRGRRIGKKKLIGRGRRIGKMRHIGRMRQIGRNRNGDGARGRGRKRGRHRGRIRGIG
|||||||||||||||||||||||||.|:||:||.|||.:.|||:::|||.:.||:.:.|||.|.|||.|.||:.:.|||.|.|||||||||.:.|||.:.|||.:.||::::|||||||||||||||||||||||||||||.:.|||||||||||||||.|.|||.|||||||.|||||||||.:.||:.:.|||.:.||:.|.|||.:.||:.:.||:|:.|||:|.||||||||||||||||||||||
RGRGRKRGRHRGRGRGIGRMKHIGRGRKIGRMKHIGRLKHIGRMKHIGRHKLIGKMKMIGRHRLIGRGRGIGRQRGIGRKRNIGRGRGIGRMKHIGRNKMIGRMKHIGRRRQGDGARGRGRKRGRHRGRHRGIGRMKHIGRRKMIGKMKMIGRHRLIGRGRKIGRQRGIGRKRNIGRGRGIGRMKHIGRHRRIGRMKHIGRIKHIGRMKHIGRRKMIGKMKMIGRHRLGDGARGRGRKRGRHRGRIRGIG


Thanks.

Plasma33

Re: Pulling out sets of fixed multiple lines onto single set

Posted: 29 Jul 2017 22:45
by ShadowThief
This is the third question (including the one StackOverflow question I saw) I've seen from you about data in this format. I'm really curious about what it could possibly be used for.

Re: Pulling out sets of fixed multiple lines onto single set

Posted: 29 Jul 2017 23:27
by plasma33
ShadowThief wrote:This is the third question (including the one StackOverflow question I saw) I've seen from you about data in this format. I'm really curious about what it could possibly be used for.


Hi there,

These are biological sequences and I am trying to extract the conserved regions (common substrings) from the aligned sequences for research purposes.

Thanks.

Plasma33

Re: Pulling out sets of fixed multiple lines onto single set

Posted: 30 Jul 2017 11:52
by Aacini

Code: Select all

@echo off
setlocal EnableDelayedExpansion

echo Processing file, please wait...
for /F %%a in ('copy /Z "%~F0" NUL') do set "CR=%%a"
for /L %%i in (1,1,3) do del output%%i.txt 2> nul

set /A "out=0, lineNum=0"
< nul (for /F "delims=" %%a in (input.txt) do (
   set "line=%%a"
   set /A "out=out%%3+1, lineNum+=1"
   set /P "=!line!" >> output!out!.txt
   set /P "=Line: !lineNum!!CR!"
))
(for /L %%i in (1,1,3) do type output%%i.txt & del output%%i.txt & echo/) > output.txt

Re: Pulling out sets of fixed multiple lines onto single set

Posted: 30 Jul 2017 20:16
by plasma33
@Aacini, thanks for your code. It works like I wanted. I love how it shows the number of lines that it has processed. Also, I love how your code divides each line into a separate text file. And on top of it, your code does the processing much faster than my one. It did the processing in under 5mins for a 17mb file. Hats off and thanks again.

Plasma33

Re: Pulling out sets of fixed multiple lines onto single set

Posted: 30 Jul 2017 20:56
by Aacini
Ops! I just realized that the program should run slightly faster modified in this way:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

echo Processing file, please wait...
for /F %%a in ('copy /Z "%~F0" NUL') do set "CR=%%a"
for /L %%i in (1,1,3) do del output%%i.txt 2> nul

set /A "out=0, lineNum=0"
< nul (for /F "delims=" %%a in (input.txt) do (
   set /A "out=out%%3+1, lineNum+=1"
   set /P "=%%a" >> output!out!.txt
   set /P "=Line: !lineNum!!CR!"
))
(for /L %%i in (1,1,3) do type output%%i.txt & del output%%i.txt & echo/) > output.txt

Antonio

Re: Pulling out sets of fixed multiple lines onto single set

Posted: 01 Aug 2017 20:46
by plasma33
Hello Aacini,

Yes, it does. Thanks for the modified code. You are a life saver!!

Plasma33