ubertech-za/html-to-asciidoc

HTML to AsciiDoc converter for PHP, inspired by and architecturally based on The PHP League's html-to-markdown

0.1.0 2025-09-16 20:26 UTC

This package is auto-updated.

Last update: 2025-09-16 20:46:55 UTC


README

โš ๏ธ BETA SOFTWARE NOTICE This package is currently in beta and is being prepared for testing in upcoming projects. Please expect possible breaking changes in future releases. We do not recommend using this package in production environments without thorough testing.

This package converts HTML to AsciiDoc markup, inspired by and architecturally based on The PHP League's html-to-markdown package. Like its inspiration, this library uses a DOM-based approach with pluggable converters to ensure reliable and extensible HTML parsing and conversion. We extend our gratitude to The PHP League for their excellent architectural foundationโ€”if League would like to incorporate this AsciiDoc functionality into their ecosystem, they are most welcome to do so.

Tests Latest Stable Version Total Downloads License

Features

  • ๐Ÿ—๏ธ DOM-based parsing - Uses PHP's DOMDocument for reliable HTML parsing
  • ๐Ÿ”ง Extensible architecture - Add custom converters for specific elements
  • ๐Ÿ“ Complete AsciiDoc support - Headers, emphasis, links, images, lists, tables, code blocks
  • โš™๏ธ Configurable - Customize conversion behavior with options
  • ๐Ÿงช Well tested - Comprehensive test suite with 100+ assertions
  • ๐Ÿš€ Laravel integration - Optional service provider for Laravel projects
  • ๐Ÿ“ฆ Framework agnostic - Works with any PHP project

Installation

Install the package via Composer:

composer require ubertech-za/html-to-asciidoc

Framework Independence: This package works standalone with any PHP project. Laravel integration is completely optional and only activated when Laravel is detected in your project.

Quick Start

use UbertechZa\HtmlToAsciidoc\HtmlConverter;

$converter = new HtmlConverter();

$html = '
<h1>Welcome to AsciiDoc</h1>
<p>This is a <strong>bold</strong> statement with a <a href="https://asciidoc.org">link</a>.</p>
<ul>
    <li>First item</li>
    <li>Second item</li>
</ul>
';

$asciidoc = $converter->convert($html);
echo $asciidoc;

Output:

= Welcome to AsciiDoc

This is a *bold* statement with a https://asciidoc.org[link].

* First item
* Second item

Usage

Basic Usage

The simplest way to convert HTML to AsciiDoc:

use UbertechZa\HtmlToAsciidoc\HtmlConverter;

$converter = new HtmlConverter();
$asciidoc = $converter->convert('<h1>Hello World</h1>');
// Result: = Hello World

Using the Laravel Wrapper

If you're using the package in a Laravel project, you can use the provided converter wrapper:

use UbertechZa\HtmlToAsciidoc\HtmlToAsciiDocConverter;

$converter = new HtmlToAsciiDocConverter();
$asciidoc = $converter->convert('<p>Hello <em>world</em>!</p>');
// Result: Hello _world_!

Using the Laravel Facade

For even simpler usage in Laravel, you can use the provided facade:

use UbertechZa\HtmlToAsciidoc\Facades\HtmlToAsciidoc;

$asciidoc = HtmlToAsciidoc::convert('<h1>Easy Conversion</h1>');
// Result: = Easy Conversion

// Chain methods for configuration
$asciidoc = HtmlToAsciidoc::setOptions(['hard_break' => true])
                          ->convert('<p>Line 1<br>Line 2</p>');

Configuration Options

Customize the conversion behavior with configuration options:

use UbertechZa\HtmlToAsciidoc\HtmlConverter;

$converter = new HtmlConverter([
    'header_style' => 'atx',        // Use = for headers (default)
    'hard_break' => false,          // Use + for line breaks (default)
    'list_item_style' => '*',       // Use * for unordered lists (default) 
    'remove_nodes' => 'script style', // Remove these tags completely
    'strip_tags' => false,          // Don't strip unknown tags (default)
    'suppress_errors' => true,      // Suppress HTML parsing errors (default)
]);

$asciidoc = $converter->convert($html);

Method Chaining

You can chain methods for fluent configuration:

$asciidoc = (new HtmlConverter())
    ->setOptions(['hard_break' => true])
    ->convert('<p>Line 1<br>Line 2</p>');

Supported HTML Elements

Headers

<h1>Level 1</h1>
<h2>Level 2</h2>
<h3>Level 3</h3>

Converts to:

= Level 1

== Level 2

=== Level 3

Text Formatting

<strong>Bold text</strong>
<b>Also bold</b>
<em>Italic text</em>
<i>Also italic</i>
<code>Inline code</code>

Converts to:

*Bold text*
*Also bold*
_Italic text_
_Also italic_
`Inline code`

Links and Images

<a href="https://example.com">Link text</a>
<img src="/images/logo.png" alt="Company Logo">

Converts to:

https://example.com[Link text]
image::/images/logo.png[Company Logo]

Lists

<ul>
    <li>Unordered item 1</li>
    <li>Unordered item 2</li>
</ul>

<ol>
    <li>Ordered item 1</li>
    <li>Ordered item 2</li>
</ol>

Converts to:

* Unordered item 1
* Unordered item 2

. Ordered item 1
. Ordered item 2

Code Blocks

<pre>
function hello() {
    return "Hello World";
}
</pre>

Converts to:

----
function hello() {
    return "Hello World";
}
----

Blockquotes

<blockquote>
    <p>This is a quoted paragraph.</p>
</blockquote>

Converts to:

> This is a quoted paragraph.

Tables

<table>
    <tr>
        <th>Header 1</th>
        <th>Header 2</th>
    </tr>
    <tr>
        <td>Cell 1</td>
        <td>Cell 2</td>
    </tr>
</table>

Converts to:

|===
|Header 1 |Header 2 
|Cell 1 |Cell 2 
|===

Other Elements

  • <hr> โ†’ ''' (horizontal rule)
  • <br> โ†’ + (line break) or \n with hard_break option
  • <p> โ†’ Paragraphs with proper spacing
  • <div> โ†’ Content blocks with spacing

Configuration Reference

Option Type Default Description
header_style string 'atx' Header style (always uses = syntax for AsciiDoc)
hard_break boolean false Use \n instead of + for line breaks
list_item_style string '*' Character for unordered list items
remove_nodes string '' Space-separated list of HTML tags to remove
strip_tags boolean false Strip unknown HTML tags
suppress_errors boolean true Suppress HTML parsing errors
preserve_comments boolean false Preserve HTML comments

Advanced Usage

Custom Environment

For more control, you can create a custom environment:

use UbertechZa\HtmlToAsciidoc\Environment;
use UbertechZa\HtmlToAsciidoc\HtmlConverter;

$environment = Environment::createDefaultEnvironment([
    'custom_option' => 'value'
]);

$converter = new HtmlConverter($environment);

Adding Custom Converters

Create custom converters for specific HTML elements:

use UbertechZa\HtmlToAsciidoc\Converter\ConverterInterface;
use UbertechZa\HtmlToAsciidoc\ElementInterface;

class CustomConverter implements ConverterInterface
{
    public function convert(ElementInterface $element): string
    {
        return 'custom output for ' . $element->getValue();
    }

    public function getSupportedTags(): array
    {
        return ['custom-tag'];
    }
}

$environment = Environment::createDefaultEnvironment();
$environment->addConverter(new CustomConverter());

$converter = new HtmlConverter($environment);

Configuration-Aware Converters

Converters can access configuration options:

use UbertechZa\HtmlToAsciidoc\ConfigurationAwareInterface;
use UbertechZa\HtmlToAsciidoc\Configuration;

class ConfigurableConverter implements ConverterInterface, ConfigurationAwareInterface
{
    private $config;

    public function setConfig(Configuration $config): void
    {
        $this->config = $config;
    }

    public function convert(ElementInterface $element): string
    {
        $option = $this->config->getOption('my_option', 'default');
        // Use option in conversion logic
        return $element->getValue();
    }

    public function getSupportedTags(): array
    {
        return ['configurable-tag'];
    }
}

Laravel Integration

Note: Laravel integration is completely optional. The core package works independently without any Laravel dependencies. Laravel features are only available when illuminate/support is installed.

Automatic Registration

If you're using Laravel, the service provider is automatically registered when Laravel is detected. You can publish the configuration:

php artisan vendor:publish --provider="UbertechZa\HtmlToAsciidoc\HtmlToAsciiDocServiceProvider"

This will publish a config/html-to-asciidoc.php configuration file where you can set default conversion options.

Facade Registration

Add the facade to your config/app.php file if you want to use it globally:

'aliases' => [
    // Other aliases...
    'HtmlToAsciidoc' => UbertechZa\HtmlToAsciidoc\Facades\HtmlToAsciidoc::class,
],

Then use it anywhere in your application:

$asciidoc = HtmlToAsciidoc::convert('<h1>Global Usage</h1>');

Dependency Injection

Use dependency injection in your controllers:

use UbertechZa\HtmlToAsciidoc\HtmlToAsciiDocConverter;

class DocumentController extends Controller
{
    public function convert(HtmlToAsciiDocConverter $converter)
    {
        $html = request('html');
        $asciidoc = $converter->convert($html);
        
        return response()->json(['asciidoc' => $asciidoc]);
    }
}

Facade Usage in Controllers

Using the facade in controllers:

use UbertechZa\HtmlToAsciidoc\Facades\HtmlToAsciidoc;

class DocumentController extends Controller
{
    public function convert()
    {
        $html = request('html');
        
        $asciidoc = HtmlToAsciidoc::setOptions([
                'hard_break' => true,
                'list_item_style' => '*'
            ])
            ->convert($html);
        
        return response()->json(['asciidoc' => $asciidoc]);
    }
}

Blade Templates

Use the facade in Blade templates:

@php
    $html = '<h1>Dynamic Content</h1><p>From your CMS</p>';
    $asciidoc = HtmlToAsciidoc::convert($html);
@endphp

<pre>{{ $asciidoc }}</pre>

Available Container Bindings

The service provider registers the following bindings:

  • UbertechZa\HtmlToAsciidoc\HtmlToAsciiDocConverter::class - Main converter instance
  • UbertechZa\HtmlToAsciidoc\HtmlConverter::class - Core HTML converter
  • 'html-to-asciidoc' - Facade accessor binding

All bindings are registered as singletons for optimal performance.

Error Handling

The converter handles malformed HTML gracefully:

try {
    $converter = new HtmlConverter();
    $result = $converter->convert('<p>Malformed HTML<div>nested improperly</p></div>');
    // Will still produce reasonable output
} catch (InvalidArgumentException $e) {
    // Only thrown for completely invalid HTML structure
    echo "Invalid HTML provided";
}

Performance Considerations

  • DOM Parsing: Uses PHP's native DOMDocument for reliable parsing
  • Memory Usage: Processes HTML in memory; consider chunking for very large documents
  • Caching: Consider caching converted output for frequently accessed content
  • Configuration: Create converter instances once and reuse them

Testing

Run the test suite:

composer test

Run tests with coverage:

composer test-coverage

Run static analysis:

composer analyse

Contributing

Contributions are welcome! Please see our contributing guidelines for details.

  1. Fork the repository
  2. Create a feature branch
  3. Write tests for your changes
  4. Ensure all tests pass
  5. Submit a pull request

Changelog

Please see CHANGELOG.md for details on recent changes.

Security

If you discover any security-related issues, please email security@ubertech.co.za instead of using the issue tracker.

Credits

This package is architecturally based on and inspired by The PHP League's html-to-markdown package. We extend our gratitude to The PHP League for their excellent architectural foundation. Like its inspiration, this library uses a DOM-based approach with pluggable converters to ensure reliable and extensible HTML parsing and conversion.

Architectural Attribution

This package borrows and adapts the following architectural patterns from thephpleague/html-to-markdown:

  • DOM-based HTML parsing approach
  • Pluggable converter system for HTML elements
  • Environment and configuration management
  • Element interface and converter interface patterns

The implementation has been adapted specifically for AsciiDoc output format while maintaining the robust parsing and extensibility patterns established by The PHP League.

License

This package is open-sourced software licensed under the MIT license.

Related Packages

Made by Uber Technologies cc