|
|
67f405a21e
|
Make data representation more flexible
|
2022-10-22 15:52:07 +02:00 |
|
|
|
49b27715f0
|
Print duplicate page map entries
|
2022-10-22 15:40:45 +02:00 |
|
|
|
786b180b09
|
Add imhex patterns
|
2022-10-22 01:21:59 +02:00 |
|
|
|
853e09517f
|
Add unfinished path command
|
2022-10-22 01:21:59 +02:00 |
|
|
|
345462915b
|
Change AdjacencyMap associated data
|
2022-10-22 01:21:59 +02:00 |
|
|
|
5656f65b6c
|
Refactor ingestion
|
2022-10-22 01:21:59 +02:00 |
|
|
|
3296f6d15a
|
Fix page link_idx computation
|
2022-10-22 00:05:15 +02:00 |
|
|
|
a9435e4f64
|
Lowercase only first char when normalizing
|
2022-10-22 00:01:04 +02:00 |
|
|
|
3a75089e5a
|
Make adjacency list extensible
|
2022-10-21 20:39:53 +02:00 |
|
|
|
78aa27c019
|
Add more checks
|
2022-10-21 19:53:15 +02:00 |
|
|
|
23463522f0
|
Don't print escape characters directly
|
2022-10-04 21:47:43 +02:00 |
|
|
|
f71092058b
|
Refactor export and add page length
|
2022-10-03 22:14:58 +02:00 |
|
|
|
d910047b48
|
Perform consistency check when reexporting
|
2022-10-03 18:11:51 +02:00 |
|
|
|
e74eee89e6
|
Add reexport command
|
2022-10-03 18:07:30 +02:00 |
|
|
|
266f001d46
|
Move commands to own module
|
2022-10-03 18:04:24 +02:00 |
|
|
|
969fd01914
|
Export links to custom binary format
|
2022-10-03 18:01:15 +02:00 |
|
|
|
0e0789cc4d
|
Ingest new json format
|
2022-10-03 17:36:08 +02:00 |
|
|
|
78a5aa5169
|
Ignore all namespaces except 0
|
2022-10-03 16:26:08 +02:00 |
|
|
|
ecdeb4086a
|
Make json format more consistent
|
2022-10-03 15:00:23 +02:00 |
|
|
|
51096c99e1
|
Make stored data more compact
|
2022-10-01 01:49:01 +02:00 |
|
|
|
f6bcb39c52
|
Import data and check consistency
|
2022-09-30 19:53:41 +02:00 |
|
|
|
1ea09a9be9
|
Export data to CBOR
|
2022-09-30 19:50:02 +02:00 |
|
|
|
499642cda9
|
Convert first stage data into proper adjacency list
|
2022-09-30 19:30:47 +02:00 |
|
|
|
11c4ff699f
|
Try out faster hash algorithm
|
2022-09-30 19:02:57 +02:00 |
|
|
|
5e8589f73e
|
Load input into adjacency-list-like structure
|
2022-09-30 18:53:56 +02:00 |
|
|
|
b1f2af9577
|
Use simd-json
|
2022-09-30 18:07:50 +02:00 |
|
|
|
c195fbb8d4
|
Load sift data from stdin
|
2022-09-30 18:07:31 +02:00 |
|
|
|
2e2045a74d
|
Set up brood subproject
|
2022-09-30 16:01:09 +02:00 |
|
|
|
d29e7257ba
|
Include namespace in info
|
2022-09-30 02:19:29 +02:00 |
|
|
|
7cf5b013da
|
Handle revisions without text
|
2022-09-30 01:34:06 +02:00 |
|
|
|
1db581725b
|
Handle redirects
|
2022-09-30 01:18:51 +02:00 |
|
|
|
23c7df3c43
|
Elaborate on sift
|
2022-09-30 01:18:41 +02:00 |
|
|
|
73064ea2b0
|
Extract links from articles
|
2022-09-30 00:39:44 +02:00 |
|
|
|
fe1db32c0e
|
Iterate through pages in dump
|
2022-09-30 00:09:49 +02:00 |
|
|
|
76a4fbb6ad
|
Set up sift subproject
|
2022-09-29 23:07:30 +02:00 |
|
|
|
5d5be23c79
|
Create project
|
2022-09-29 23:07:00 +02:00 |
|