Commit graph

25 commits

Author SHA1 Message Date
bce3dc384d Deduplicate path names in crawler
Also rename files so they follow the restrictions for windows file names if
we're on windows.
2021-05-25 12:11:15 +02:00
c687d4a51a Implement cookie sharing 2021-05-24 13:10:44 +02:00
I-Al-Istannen
3ab3581f84 Add timeout for HTTP connection 2021-05-23 23:41:05 +02:00
be4b1040f8 Document status and report options 2021-05-23 22:51:42 +02:00
I-Al-Istannen
6e9f8fd391 Add a keyring authenticator 2021-05-23 19:44:12 +02:00
I-Al-Istannen
ecdedfa1cf Add no-videos flag to ILIAS crawler 2021-05-23 12:37:01 +02:00
0d10752b5a Configure explain log level via cli and config file 2021-05-19 17:50:10 +02:00
I-Al-Istannen
db1219d4a9 Create a link file in ILIAS crawler
This allows us to crawl links and represent them in the file system.
Users can choose between an ILIAS-imitation (that optionally
auto-redirects) and a plain text variant.
2021-05-17 21:44:54 +02:00
I-Al-Istannen
467ea3a37e Document ILIAS-Crawler arguments in CONFIG.md 2021-05-16 13:26:58 +02:00
e1104f888d Add tfa authenticator 2021-05-15 18:27:16 +02:00
8c32da7f19 Let authenticators provide username and password separately 2021-05-15 18:27:03 +02:00
b70b62cef5 Make crawler sections start with "crawl:"
Also, use only the part of the section name after the "crawl:" as the crawler's
output directory. Now, the implementation matches the documentation again
2021-05-15 17:24:37 +02:00
868f486922 Rename local crawler path to target 2021-05-15 17:12:25 +02:00
a6fdf05ee9 Allow variable whitespace in arrow rules 2021-05-15 15:25:05 +02:00
f897d7c2e1 Add name variants for all arrows 2021-05-15 15:25:05 +02:00
302b8c0c34 Fix errors loading local crawler config
Apparently getint and getfloat may return a None even though this is not
mentioned in their type annotations.
2021-05-15 15:25:05 +02:00
acd674f0a0 Change limiter logic
Now download tasks are a subset of all tasks.
2021-05-15 15:25:05 +02:00
296a169dd3 Make limiter logic more complex
The limiter can now distinguish between crawl and download actions and has a
fancy slot system and delay logic.
2021-05-15 15:25:05 +02:00
1591cb9197 Add options to slow down local crawler
These options are meant to make the local crawler behave more like a
network-based crawler for purposes of testing and debugging other parts of the
code base.
2021-05-15 15:25:01 +02:00
961f40f9a1 Document simple authenticator 2021-05-13 19:55:04 +02:00
f9b2fd60e2 Document local crawler and auth 2021-05-09 01:33:47 +02:00
fde811ae5a Document on_conflict option 2021-05-05 12:24:35 +02:00
a8dcf941b9 Document possible redownload settings 2021-04-30 15:32:56 +02:00
e7a51decb0 Elaborate on transforms and implement changes 2021-04-29 20:24:18 +02:00
9ec19be113 Document config file format 2021-04-29 20:24:18 +02:00