Elaborate on sift
This commit is contained in:
parent
73064ea2b0
commit
23c7df3c43
1 changed files with 10 additions and 2 deletions
12
README.md
12
README.md
|
|
@ -7,7 +7,15 @@ Analyze wikipedia dumps obtained from
|
|||
|
||||
This project has two main parts, `sift` and `brood`.
|
||||
|
||||
`sift` is written in Python and sifts through the wikipedia dump, parsing and
|
||||
analyzing individual articles and printing interesting data.
|
||||
### Sift
|
||||
|
||||
`sift` is written in Python and sifts through a wikipedia article dump
|
||||
(`*-pages-articles.xml.bz2`), parsing and analyzing individual articles and
|
||||
printing interesting data.
|
||||
|
||||
It takes a (decompressed) XML article dump on stdin. For each article in the
|
||||
dump, it prints a single-line JSON object to stdout.
|
||||
|
||||
### Brood
|
||||
|
||||
`brood` is written in Rust and analyzes the data obtained by `sift`.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue