Well, today my web site service provider informed me that my database size is more than the quota (200MB). I could not believe at first since wiki is mostly composed of texts and I never wrote that much yet. When I checked the table size (use the sum of Data_length column of SHOW TABLE STATUS), wow it was 5 GB and all discussion pages at my Wiki got heavily infected with spam messages like selling drugs, Viagra, etc. you know.
MediaWiki, which is the back-end software for Wiki (as used in Wikipedia), supports MySQL or PostgreSQL database. My web server uses MySQL 4.1 which is the most stable version. I know a little bit on its database layout but could not simply figure out the way to remove such spam messages. Google only returned useless pages out of my interests. So here I record what I did to remove spams in the Wiki Discussion (or called Talk) pages.
Since there are various spamming methods to Wiki sites, let me confine the environment to:
Your working environment should be some MySQL console or such Web-based database admin tool (phpMyAdmin for my case). Let me remind you first that MediaWiki table definitions are different by version. So any table definitions (DDL) in this page should be modified to comply with that of your MediaWiki table definitions.
Based on the MediaWiki database layout, the first table you will work on would be text table. And its table size might be the biggest one if your Wiki has been attacked by spammers.
If your text table is too large, then just removing records may fail because web service providers set the time limit on PHP code execution or MySQL code execution. So I could not delete discussion texts but had to go through like below. In below SQL statements, *w_* is the prefix of table names. So modify table names like yours. If you are interested in the role of each table, refer MediaWiki database tables.
# create the backup text table that will store not-infected pages CREATE TABLE w_text_backup ( `old_id` int(10) unsigned NOT NULL, `old_text` mediumblob NOT NULL, `old_flags` tinyblob NOT NULL, PRIMARY KEY (`old_id`) ) ENGINE=InnoDB; # From existing text table, copy all page texts except discussion page. # Namespace 1 means the discussion page. INSERT INTO w_text_backup SELECT w_text.* FROM w_page INNER JOIN w_revision ON page_id = rev_page INNER JOIN w_text ON rev_text_id = old_id WHERE page_namespace <> 1; # Now replace text table with the new one DROP TABLE w_text; RENAME TABLE w_text_backup TO w_text;
BE AWARE!! Above statements will remove ALL discussion pages. If you have any discussion pages to save, then you should manually export that pages before doing this. For that, visit the Wiki special pages to export pages in XML. After you finish all jobs below, then import that XML page into your wiki back.
Since we deleted many texts, related records at other tables should be removed either.
DELETE w_revision FROM w_page INNER JOIN w_revision ON page_id = rev_page WHERE page_namespace=1; DELETE w_recentchanges FROM w_recentchanges WHERE rc_namespace = 1; DELETE w_externallinks FROM w_page INNER JOIN w_externallinks ON el_from = page_id WHERE page_namespace = 1; DELETE w_page FROM w_page WHERE page_namespace=1;
Now my wiki size is reduced back to tens of megabytes from 5GB.