wiki-connect/parsewiki

A library that helps parse wikitext template data

2.0 2025-06-29 21:46 UTC

This package is auto-updated.

Last update: 2025-06-29 21:47:54 UTC


README

A powerful PHP library for parsing MediaWiki-style content from raw wiki text.

๐Ÿ“š Overview

This library allows you to extract:

  • Templates (single, multiple, nested)
  • Internal wiki links
  • External links
  • Citations (references)
  • Categories (with or without display text)

Perfect for handling wiki-formatted text in PHP projects.

๐Ÿ—‚๏ธ Project Structure

  • ParserTemplates: Parses multiple templates.
  • ParserTemplate: Parses a single template.
  • ParserInternalLinks: Parses internal wiki links.
  • ParserExternalLinks: Parses external links.
  • ParserCitations: Parses citations and references.
  • ParserCategories: Parses categories from wiki text.
  • DataModel classes:
    • Template
    • InternalLink
    • ExternalLink
    • Citation
  • tests/: Contains PHPUnit test files:
    • ParserTemplatesTest
    • ParserTemplateTest
    • ParserInternalLinksTest
    • ParserExternalLinksTest
    • ParserCitationsTest
    • ParserCategoriesTest

๐Ÿš€ Features

  • โœ… Parse single and multiple templates.
  • โœ… Support nested templates.
  • โœ… Handle named and unnamed template parameters.
  • โœ… Extract internal links with or without display text.
  • โœ… Extract external links with or without labels.
  • โœ… Parse citations including attributes and special characters.
  • โœ… Parse categories, support custom namespaces, handle whitespaces and special characters.
  • โœ… Full PHPUnit test coverage.

โš™๏ธ Requirements

  • PHP 8.0 or higher
  • PHPUnit 9 or higher

๐Ÿ’ป Installation

composer require wiki-connect/parsewiki

Make sure you have proper PSR-4 autoloading for the WikiConnect\ParseWiki namespace.

๐Ÿงช Running Tests

vendor/bin/phpunit tests

Test Coverage:

  • Templates: Single, multiple, nested, named/unnamed parameters.
  • Internal Links: Simple, with display text, special characters.
  • External Links: With/without labels, multiple links, whitespace handling.
  • Citations: With/without attributes, special characters.
  • Categories: Simple, with display text, custom namespaces, whitespaces, special characters.

โœจ Example Usage

Parsing Templates

use WikiConnect\ParseWiki\ParserTemplates;

$text = '{{Infobox person|name=John Doe|birth_date=1990-01-01}}';

$parser = new ParserTemplates($text);
$templates = $parser->getTemplates();

foreach ($templates as $template) {
    echo $template->getName();
    print_r($template->getParameters());
}

Parsing Internal Links

use WikiConnect\ParseWiki\ParserInternalLinks;

$text = 'See [[Main Page|the main page]] and [[Help]].';

$parser = new ParserInternalLinks($text);
$links = $parser->getTargets();

foreach ($links as $link) {
    echo 'Target: ' . $link->getTarget() . PHP_EOL;
    echo 'Text: ' . ($link->getText() ?? $link->getTarget()) . PHP_EOL;
}

Parsing External Links

use WikiConnect\ParseWiki\ParserExternalLinks;

$text = 'Visit [https://example.com Example Site] and [https://nolabel.com].';

$parser = new ParserExternalLinks($text);
$links = $parser->getLinks();

foreach ($links as $link) {
    echo 'URL: ' . $link->getLink() . PHP_EOL;
    echo 'Label: ' . ($link->getText() ?: 'No label') . PHP_EOL;
}

Parsing Citations

use WikiConnect\ParseWiki\ParserCitations;

$text = 'Some text with a citation.<ref name="source">This is a citation</ref>';

$parser = new ParserCitations($text);
$citations = $parser->getCitations();

foreach ($citations as $citation) {
    echo 'Content: ' . $citation->getContent() . PHP_EOL;
    echo 'Attributes: ' . $citation->getAttributes() . PHP_EOL;
}

Parsing Categories

use WikiConnect\ParseWiki\ParserCategories;

$text = 'Some text [[Category:Science]] and [[Category:Math|Displayed]].';

$parser = new ParserCategories($text);
$categories = $parser->getCategories();

foreach ($categories as $category) {
    echo 'Category: ' . $category . PHP_EOL;
}

๐Ÿ™Œ Author

Developed with โค๏ธ by Gerges.