Analyze wikipedia dumps
Find a file
2022-09-30 00:09:49 +02:00
sift Iterate through pages in dump 2022-09-30 00:09:49 +02:00
.gitignore Create project 2022-09-29 23:07:00 +02:00
README.md Create project 2022-09-29 23:07:00 +02:00

Wikilyze

Analyze wikipedia dumps obtained from https://dumps.wikimedia.org/backup-index.html.

Structure

This project has two main parts, sift and brood.

sift is written in Python and sifts through the wikipedia dump, parsing and analyzing individual articles and printing interesting data.

brood is written in Rust and analyzes the data obtained by sift.