diff --git a/README.md b/README.md index b9f6281..aead6f9 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,15 @@ Analyze wikipedia dumps obtained from This project has two main parts, `sift` and `brood`. -`sift` is written in Python and sifts through the wikipedia dump, parsing and -analyzing individual articles and printing interesting data. +### Sift + +`sift` is written in Python and sifts through a wikipedia article dump +(`*-pages-articles.xml.bz2`), parsing and analyzing individual articles and +printing interesting data. + +It takes a (decompressed) XML article dump on stdin. For each article in the +dump, it prints a single-line JSON object to stdout. + +### Brood `brood` is written in Rust and analyzes the data obtained by `sift`.