Extract information from a website to txt

Message

Tami · #1 Post by **Tami** » 03 Apr 2017 08:46

Hello,

i have a big request and im not good in batch scripting at all...
So, here's my problem:
The cmd should start and should ask for a link (Set /P variable=Paste Link:)
I want to extract the title, the Episodes,etc. basically everything from the link. (The site should be https://www.anisearch.de/anime/9357,tokyo-ghoul )
How do i do that? I know that is possible with Python, but i dont understand python at all, too...

Thanks!

ShadowThief · #2 Post by **ShadowThief** » 03 Apr 2017 11:07

Is there an API you can call instead? I had to do something like this one with TheTVDB and calling the API was easier than grabbing the entire page with curl and parsing it.

#3 Post by **aGerman** » 03 Apr 2017 11:13

The web site doesn't seem to have an API. To avoid the installation of any kind of 3rd party utilities I'd suggest to automate Internet Explorer using VBScript or JScript.
http://stackoverflow.com/questions/16629228/extract-text-between-html-tags

Steffen

Tami · #4 Post by **Tami** » 03 Apr 2017 13:16

Could you tell me how to do that or do you have a tutorial anywhere? :/

#5 Post by **aGerman** » 03 Apr 2017 14:53

That's quite a lot you have to learn. If you want to use it in a Batch code you have to learn Batch and how to write hybrid scripts. For the hybrid scripts you may use JScript and thus, you have to learn JScrip. If you want to automate the Internet Explorer you have to learn how to create a InternetExplorer object and how to use its properties and methods. Last but not least if you want to access elements of an HTML source text you have to learn how to work with the HTML document object model using JScript.
This having said you can imagine that there isn't simply a single tutorial that teaches you everything at once.

You want to see an example? Here you are

Code: Select all

@if (@a)==(@b) @end /* Batch part:

@echo off &setlocal
for /f "delims=" %%i in ('cscript //nologo //e:jscript "%~fs0"') do set "name=%%i"
echo %name%
pause
exit /b


JScript Part : */

var ie = null;
try {
  ie = new ActiveXObject('InternetExplorer.Application');
  ie.Navigate('https://www.anisearch.de/anime/9357,tokyo-ghoul');
  while (ie.Busy) { WScript.Sleep(100); }
  var name = ie.document.getElementById('content').getElementsByTagName('header')[0].getElementsByTagName('div')[0].getElementsByTagName('h1')[0].getElementsByTagName('a')[1].getElementsByTagName('span')[0].innerText;
  ie.Quit();
  ie = null;
  WScript.Echo(name);
}
catch(e) {
  if (ie != null) { ie.Quit(); }
  WScript.Echo('Error!');
}

Steffen

igor_andreev · #6 Post by **igor_andreev** » 03 Apr 2017 19:25

Trivial job for external tools.
curl(wget) host | grep regex | sed regex

PaperTronics · #7 Post by **PaperTronics** » 03 Apr 2017 21:47

The simple solution is: When the user gives that link, you add "view-source:" to the beginning of the link and then download that file using download.exe which can be found here: http://www.f2ko.de/en/cmd.php
After downloading the HTML code of the website you can extract the episodes, titles etc. whatever you want from that code.
If you don't have HTML knowledge tell me I'll extract the titles etc. for you.

Tami · #8 Post by **Tami** » 04 Apr 2017 05:49

I asked them for an API and they will help me^^ If i get the API, would you help me making the tool?

npocmaka_ · #9 Post by **npocmaka_** » 04 Apr 2017 08:33

You can try with winhttpjs.bat:

Code: Select all

call winhttpjs.bat "https://www.anisearch.de/anime/9357,tokyo-ghoul" -saveto tokyo-ghoul.txt

Though it will not render the javascript. Also the downloaded file probably will be not well-formatted xml and you wont be able to process it with an xml tool.
Probably the aGerman's option is the best you have.

You can take look also to phantomjs if the pages are not compatible with IE.

ShadowThief · #10 Post by **ShadowThief** » 04 Apr 2017 09:06

Tami wrote:I asked them for an API and they will help me^^ If i get the API, would you help me making the tool?

Honestly, the API will make it so easy that you'll likely be able to figure it out yourself, but sure.

DosTips.com

Extract information from a website to txt

Extract information from a website to txt

Re: Extract information from a website to txt

Re: Extract information from a website to txt

Re: Extract information from a website to txt

Re: Extract information from a website to txt

Re: Extract information from a website to txt

Re: Extract information from a website to txt

Re: Extract information from a website to txt

Re: Extract information from a website to txt

Re: Extract information from a website to txt