Page 1 of 1

Extract information from a website to txt

Posted: 03 Apr 2017 08:46
by Tami
Hello,

i have a big request and im not good in batch scripting at all...
So, here's my problem:
The cmd should start and should ask for a link (Set /P variable=Paste Link:)
I want to extract the title, the Episodes,etc. basically everything from the link. (The site should be https://www.anisearch.de/anime/9357,tokyo-ghoul )
How do i do that? I know that is possible with Python, but i dont understand python at all, too...

Thanks!

Re: Extract information from a website to txt

Posted: 03 Apr 2017 11:07
by ShadowThief
Is there an API you can call instead? I had to do something like this one with TheTVDB and calling the API was easier than grabbing the entire page with curl and parsing it.

Re: Extract information from a website to txt

Posted: 03 Apr 2017 11:13
by aGerman
The web site doesn't seem to have an API. To avoid the installation of any kind of 3rd party utilities I'd suggest to automate Internet Explorer using VBScript or JScript.
http://stackoverflow.com/questions/16629228/extract-text-between-html-tags

Steffen

Re: Extract information from a website to txt

Posted: 03 Apr 2017 13:16
by Tami
Could you tell me how to do that or do you have a tutorial anywhere? :/

Re: Extract information from a website to txt

Posted: 03 Apr 2017 14:53
by aGerman
That's quite a lot you have to learn. If you want to use it in a Batch code you have to learn Batch and how to write hybrid scripts. For the hybrid scripts you may use JScript and thus, you have to learn JScrip. If you want to automate the Internet Explorer you have to learn how to create a InternetExplorer object and how to use its properties and methods. Last but not least if you want to access elements of an HTML source text you have to learn how to work with the HTML document object model using JScript.
This having said you can imagine that there isn't simply a single tutorial that teaches you everything at once.

You want to see an example? Here you are

Code: Select all

@if (@a)==(@b) @end /* Batch part:

@echo off &setlocal
for /f "delims=" %%i in ('cscript //nologo //e:jscript "%~fs0"') do set "name=%%i"
echo %name%
pause
exit /b


JScript Part : */

var ie = null;
try {
  ie = new ActiveXObject('InternetExplorer.Application');
  ie.Navigate('https://www.anisearch.de/anime/9357,tokyo-ghoul');
  while (ie.Busy) { WScript.Sleep(100); }
  var name = ie.document.getElementById('content').getElementsByTagName('header')[0].getElementsByTagName('div')[0].getElementsByTagName('h1')[0].getElementsByTagName('a')[1].getElementsByTagName('span')[0].innerText;
  ie.Quit();
  ie = null;
  WScript.Echo(name);
}
catch(e) {
  if (ie != null) { ie.Quit(); }
  WScript.Echo('Error!');
}


Steffen

Re: Extract information from a website to txt

Posted: 03 Apr 2017 19:25
by igor_andreev
Trivial job for external tools.
curl(wget) host | grep regex | sed regex

Re: Extract information from a website to txt

Posted: 03 Apr 2017 21:47
by PaperTronics
The simple solution is: When the user gives that link, you add "view-source:" to the beginning of the link and then download that file using download.exe which can be found here: http://www.f2ko.de/en/cmd.php
After downloading the HTML code of the website you can extract the episodes, titles etc. whatever you want from that code.
If you don't have HTML knowledge tell me I'll extract the titles etc. for you.

Re: Extract information from a website to txt

Posted: 04 Apr 2017 05:49
by Tami
I asked them for an API and they will help me^^ If i get the API, would you help me making the tool? :)

Re: Extract information from a website to txt

Posted: 04 Apr 2017 08:33
by npocmaka_
You can try with winhttpjs.bat:



Code: Select all

call winhttpjs.bat "https://www.anisearch.de/anime/9357,tokyo-ghoul" -saveto tokyo-ghoul.txt


Though it will not render the javascript. Also the downloaded file probably will be not well-formatted xml and you wont be able to process it with an xml tool.
Probably the aGerman's option is the best you have.

You can take look also to phantomjs if the pages are not compatible with IE.

Re: Extract information from a website to txt

Posted: 04 Apr 2017 09:06
by ShadowThief
Tami wrote:I asked them for an API and they will help me^^ If i get the API, would you help me making the tool? :)

Honestly, the API will make it so easy that you'll likely be able to figure it out yourself, but sure.