From 23c7df3c4356aa99e3fd8140ff21b82789d9fb73 Mon Sep 17 00:00:00 2001 From: Joscha Date: Fri, 30 Sep 2022 01:18:41 +0200 Subject: [PATCH] Elaborate on sift --- README.md | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index b9f6281..aead6f9 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,15 @@ Analyze wikipedia dumps obtained from This project has two main parts, `sift` and `brood`. -`sift` is written in Python and sifts through the wikipedia dump, parsing and -analyzing individual articles and printing interesting data. +### Sift + +`sift` is written in Python and sifts through a wikipedia article dump +(`*-pages-articles.xml.bz2`), parsing and analyzing individual articles and +printing interesting data. + +It takes a (decompressed) XML article dump on stdin. For each article in the +dump, it prints a single-line JSON object to stdout. + +### Brood `brood` is written in Rust and analyzes the data obtained by `sift`.