Elaborate on sift

This commit is contained in:
Joscha 2022-09-30 01:18:41 +02:00
parent 73064ea2b0
commit 23c7df3c43

View file

@ -7,7 +7,15 @@ Analyze wikipedia dumps obtained from
This project has two main parts, `sift` and `brood`.
`sift` is written in Python and sifts through the wikipedia dump, parsing and
analyzing individual articles and printing interesting data.
### Sift
`sift` is written in Python and sifts through a wikipedia article dump
(`*-pages-articles.xml.bz2`), parsing and analyzing individual articles and
printing interesting data.
It takes a (decompressed) XML article dump on stdin. For each article in the
dump, it prints a single-line JSON object to stdout.
### Brood
`brood` is written in Rust and analyzes the data obtained by `sift`.