ubertech-za / html-to-asciidoc
HTML to AsciiDoc converter for PHP, inspired by and architecturally based on The PHP League's html-to-markdown
Fund package maintenance!
Other
Requires
- php: ^8.2
Requires (Dev)
- laravel/pint: ^1.0
- orchestra/testbench: ^9.0
- pestphp/pest: ^3.0
- phpstan/phpstan: ^1.11
Suggests
- illuminate/support: Required for Laravel integration (^10.0|^11.0)
This package is auto-updated.
Last update: 2025-09-16 20:46:55 UTC
README
โ ๏ธ BETA SOFTWARE NOTICE This package is currently in beta and is being prepared for testing in upcoming projects. Please expect possible breaking changes in future releases. We do not recommend using this package in production environments without thorough testing.
This package converts HTML to AsciiDoc markup, inspired by and architecturally based on The PHP League's html-to-markdown
package. Like its inspiration, this library uses a DOM-based approach with pluggable converters to ensure reliable and extensible HTML parsing and conversion. We extend our gratitude to The PHP League for their excellent architectural foundationโif League would like to incorporate this AsciiDoc functionality into their ecosystem, they are most welcome to do so.
Features
- ๐๏ธ DOM-based parsing - Uses PHP's DOMDocument for reliable HTML parsing
- ๐ง Extensible architecture - Add custom converters for specific elements
- ๐ Complete AsciiDoc support - Headers, emphasis, links, images, lists, tables, code blocks
- โ๏ธ Configurable - Customize conversion behavior with options
- ๐งช Well tested - Comprehensive test suite with 100+ assertions
- ๐ Laravel integration - Optional service provider for Laravel projects
- ๐ฆ Framework agnostic - Works with any PHP project
Installation
Install the package via Composer:
composer require ubertech-za/html-to-asciidoc
Framework Independence: This package works standalone with any PHP project. Laravel integration is completely optional and only activated when Laravel is detected in your project.
Quick Start
use UbertechZa\HtmlToAsciidoc\HtmlConverter; $converter = new HtmlConverter(); $html = ' <h1>Welcome to AsciiDoc</h1> <p>This is a <strong>bold</strong> statement with a <a href="https://asciidoc.org">link</a>.</p> <ul> <li>First item</li> <li>Second item</li> </ul> '; $asciidoc = $converter->convert($html); echo $asciidoc;
Output:
= Welcome to AsciiDoc This is a *bold* statement with a https://asciidoc.org[link]. * First item * Second item
Usage
Basic Usage
The simplest way to convert HTML to AsciiDoc:
use UbertechZa\HtmlToAsciidoc\HtmlConverter; $converter = new HtmlConverter(); $asciidoc = $converter->convert('<h1>Hello World</h1>'); // Result: = Hello World
Using the Laravel Wrapper
If you're using the package in a Laravel project, you can use the provided converter wrapper:
use UbertechZa\HtmlToAsciidoc\HtmlToAsciiDocConverter; $converter = new HtmlToAsciiDocConverter(); $asciidoc = $converter->convert('<p>Hello <em>world</em>!</p>'); // Result: Hello _world_!
Using the Laravel Facade
For even simpler usage in Laravel, you can use the provided facade:
use UbertechZa\HtmlToAsciidoc\Facades\HtmlToAsciidoc; $asciidoc = HtmlToAsciidoc::convert('<h1>Easy Conversion</h1>'); // Result: = Easy Conversion // Chain methods for configuration $asciidoc = HtmlToAsciidoc::setOptions(['hard_break' => true]) ->convert('<p>Line 1<br>Line 2</p>');
Configuration Options
Customize the conversion behavior with configuration options:
use UbertechZa\HtmlToAsciidoc\HtmlConverter; $converter = new HtmlConverter([ 'header_style' => 'atx', // Use = for headers (default) 'hard_break' => false, // Use + for line breaks (default) 'list_item_style' => '*', // Use * for unordered lists (default) 'remove_nodes' => 'script style', // Remove these tags completely 'strip_tags' => false, // Don't strip unknown tags (default) 'suppress_errors' => true, // Suppress HTML parsing errors (default) ]); $asciidoc = $converter->convert($html);
Method Chaining
You can chain methods for fluent configuration:
$asciidoc = (new HtmlConverter()) ->setOptions(['hard_break' => true]) ->convert('<p>Line 1<br>Line 2</p>');
Supported HTML Elements
Headers
<h1>Level 1</h1> <h2>Level 2</h2> <h3>Level 3</h3>
Converts to:
= Level 1 == Level 2 === Level 3
Text Formatting
<strong>Bold text</strong> <b>Also bold</b> <em>Italic text</em> <i>Also italic</i> <code>Inline code</code>
Converts to:
*Bold text* *Also bold* _Italic text_ _Also italic_ `Inline code`
Links and Images
<a href="https://example.com">Link text</a> <img src="/images/logo.png" alt="Company Logo">
Converts to:
https://example.com[Link text]
image::/images/logo.png[Company Logo]
Lists
<ul> <li>Unordered item 1</li> <li>Unordered item 2</li> </ul> <ol> <li>Ordered item 1</li> <li>Ordered item 2</li> </ol>
Converts to:
* Unordered item 1 * Unordered item 2 . Ordered item 1 . Ordered item 2
Code Blocks
<pre> function hello() { return "Hello World"; } </pre>
Converts to:
---- function hello() { return "Hello World"; } ----
Blockquotes
<blockquote> <p>This is a quoted paragraph.</p> </blockquote>
Converts to:
> This is a quoted paragraph.
Tables
<table> <tr> <th>Header 1</th> <th>Header 2</th> </tr> <tr> <td>Cell 1</td> <td>Cell 2</td> </tr> </table>
Converts to:
|=== |Header 1 |Header 2 |Cell 1 |Cell 2 |===
Other Elements
<hr>
โ'''
(horizontal rule)<br>
โ+
(line break) or\n
withhard_break
option<p>
โ Paragraphs with proper spacing<div>
โ Content blocks with spacing
Configuration Reference
Option | Type | Default | Description |
---|---|---|---|
header_style |
string | 'atx' |
Header style (always uses = syntax for AsciiDoc) |
hard_break |
boolean | false |
Use \n instead of + for line breaks |
list_item_style |
string | '*' |
Character for unordered list items |
remove_nodes |
string | '' |
Space-separated list of HTML tags to remove |
strip_tags |
boolean | false |
Strip unknown HTML tags |
suppress_errors |
boolean | true |
Suppress HTML parsing errors |
preserve_comments |
boolean | false |
Preserve HTML comments |
Advanced Usage
Custom Environment
For more control, you can create a custom environment:
use UbertechZa\HtmlToAsciidoc\Environment; use UbertechZa\HtmlToAsciidoc\HtmlConverter; $environment = Environment::createDefaultEnvironment([ 'custom_option' => 'value' ]); $converter = new HtmlConverter($environment);
Adding Custom Converters
Create custom converters for specific HTML elements:
use UbertechZa\HtmlToAsciidoc\Converter\ConverterInterface; use UbertechZa\HtmlToAsciidoc\ElementInterface; class CustomConverter implements ConverterInterface { public function convert(ElementInterface $element): string { return 'custom output for ' . $element->getValue(); } public function getSupportedTags(): array { return ['custom-tag']; } } $environment = Environment::createDefaultEnvironment(); $environment->addConverter(new CustomConverter()); $converter = new HtmlConverter($environment);
Configuration-Aware Converters
Converters can access configuration options:
use UbertechZa\HtmlToAsciidoc\ConfigurationAwareInterface; use UbertechZa\HtmlToAsciidoc\Configuration; class ConfigurableConverter implements ConverterInterface, ConfigurationAwareInterface { private $config; public function setConfig(Configuration $config): void { $this->config = $config; } public function convert(ElementInterface $element): string { $option = $this->config->getOption('my_option', 'default'); // Use option in conversion logic return $element->getValue(); } public function getSupportedTags(): array { return ['configurable-tag']; } }
Laravel Integration
Note: Laravel integration is completely optional. The core package works independently without any Laravel dependencies. Laravel features are only available when
illuminate/support
is installed.
Automatic Registration
If you're using Laravel, the service provider is automatically registered when Laravel is detected. You can publish the configuration:
php artisan vendor:publish --provider="UbertechZa\HtmlToAsciidoc\HtmlToAsciiDocServiceProvider"
This will publish a config/html-to-asciidoc.php
configuration file where you can set default conversion options.
Facade Registration
Add the facade to your config/app.php
file if you want to use it globally:
'aliases' => [ // Other aliases... 'HtmlToAsciidoc' => UbertechZa\HtmlToAsciidoc\Facades\HtmlToAsciidoc::class, ],
Then use it anywhere in your application:
$asciidoc = HtmlToAsciidoc::convert('<h1>Global Usage</h1>');
Dependency Injection
Use dependency injection in your controllers:
use UbertechZa\HtmlToAsciidoc\HtmlToAsciiDocConverter; class DocumentController extends Controller { public function convert(HtmlToAsciiDocConverter $converter) { $html = request('html'); $asciidoc = $converter->convert($html); return response()->json(['asciidoc' => $asciidoc]); } }
Facade Usage in Controllers
Using the facade in controllers:
use UbertechZa\HtmlToAsciidoc\Facades\HtmlToAsciidoc; class DocumentController extends Controller { public function convert() { $html = request('html'); $asciidoc = HtmlToAsciidoc::setOptions([ 'hard_break' => true, 'list_item_style' => '*' ]) ->convert($html); return response()->json(['asciidoc' => $asciidoc]); } }
Blade Templates
Use the facade in Blade templates:
@php $html = '<h1>Dynamic Content</h1><p>From your CMS</p>'; $asciidoc = HtmlToAsciidoc::convert($html); @endphp <pre>{{ $asciidoc }}</pre>
Available Container Bindings
The service provider registers the following bindings:
UbertechZa\HtmlToAsciidoc\HtmlToAsciiDocConverter::class
- Main converter instanceUbertechZa\HtmlToAsciidoc\HtmlConverter::class
- Core HTML converter'html-to-asciidoc'
- Facade accessor binding
All bindings are registered as singletons for optimal performance.
Error Handling
The converter handles malformed HTML gracefully:
try { $converter = new HtmlConverter(); $result = $converter->convert('<p>Malformed HTML<div>nested improperly</p></div>'); // Will still produce reasonable output } catch (InvalidArgumentException $e) { // Only thrown for completely invalid HTML structure echo "Invalid HTML provided"; }
Performance Considerations
- DOM Parsing: Uses PHP's native DOMDocument for reliable parsing
- Memory Usage: Processes HTML in memory; consider chunking for very large documents
- Caching: Consider caching converted output for frequently accessed content
- Configuration: Create converter instances once and reuse them
Testing
Run the test suite:
composer test
Run tests with coverage:
composer test-coverage
Run static analysis:
composer analyse
Contributing
Contributions are welcome! Please see our contributing guidelines for details.
- Fork the repository
- Create a feature branch
- Write tests for your changes
- Ensure all tests pass
- Submit a pull request
Changelog
Please see CHANGELOG.md for details on recent changes.
Security
If you discover any security-related issues, please email security@ubertech.co.za instead of using the issue tracker.
Credits
This package is architecturally based on and inspired by The PHP League's html-to-markdown package. We extend our gratitude to The PHP League for their excellent architectural foundation. Like its inspiration, this library uses a DOM-based approach with pluggable converters to ensure reliable and extensible HTML parsing and conversion.
- Original Architecture: The PHP League's html-to-markdown (MIT License)
- AsciiDoc Implementation: Uber Technologies cc
- Contributors: All contributors
Architectural Attribution
This package borrows and adapts the following architectural patterns from thephpleague/html-to-markdown
:
- DOM-based HTML parsing approach
- Pluggable converter system for HTML elements
- Environment and configuration management
- Element interface and converter interface patterns
The implementation has been adapted specifically for AsciiDoc output format while maintaining the robust parsing and extensibility patterns established by The PHP League.
License
This package is open-sourced software licensed under the MIT license.
Related Packages
- thephpleague/html-to-markdown - Convert HTML to Markdown
- ubertech-za/tiptap-to-asciidoc - Convert Tiptap JSON to AsciiDoc
Made by Uber Technologies cc