Elaborate on sift
This commit is contained in:
parent
73064ea2b0
commit
23c7df3c43
1 changed files with 10 additions and 2 deletions
12
README.md
12
README.md
|
|
@ -7,7 +7,15 @@ Analyze wikipedia dumps obtained from
|
||||||
|
|
||||||
This project has two main parts, `sift` and `brood`.
|
This project has two main parts, `sift` and `brood`.
|
||||||
|
|
||||||
`sift` is written in Python and sifts through the wikipedia dump, parsing and
|
### Sift
|
||||||
analyzing individual articles and printing interesting data.
|
|
||||||
|
`sift` is written in Python and sifts through a wikipedia article dump
|
||||||
|
(`*-pages-articles.xml.bz2`), parsing and analyzing individual articles and
|
||||||
|
printing interesting data.
|
||||||
|
|
||||||
|
It takes a (decompressed) XML article dump on stdin. For each article in the
|
||||||
|
dump, it prints a single-line JSON object to stdout.
|
||||||
|
|
||||||
|
### Brood
|
||||||
|
|
||||||
`brood` is written in Rust and analyzes the data obtained by `sift`.
|
`brood` is written in Rust and analyzes the data obtained by `sift`.
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue