mirror of
https://github.com/Garmelon/PFERD.git
synced 2026-04-12 07:25:04 +02:00
Add regex option to config and CLI parser
This commit is contained in:
parent
88afe64a92
commit
13b8c3d9c6
3 changed files with 16 additions and 2 deletions
|
|
@ -138,11 +138,16 @@ crawler simulate a slower, network-based crawler.
|
|||
|
||||
### The `kit-ipd` crawler
|
||||
|
||||
This crawler crawls a KIT ipd page by url. The root page can be crawled from
|
||||
This crawler crawls a KIT-IPD page by url. The root page can be crawled from
|
||||
outside the KIT network so you will be informed about any new/deleted files,
|
||||
but downloading files requires you to be within. Adding a show delay between
|
||||
requests is likely a good idea.
|
||||
|
||||
- `target`: URL to a KIT-IPD page
|
||||
- `link_regex`: A regex that is matched against the `href` part of links. If it
|
||||
matches, the given link is downloaded as a file. This is used to extract
|
||||
files from KIT-IPD pages. (Default: `^.*/[^/]*\.(?:pdf|zip|c|java)$`)
|
||||
|
||||
### The `kit-ilias-web` crawler
|
||||
|
||||
This crawler crawls the KIT ILIAS instance.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue