yosina-lib / yosina
Japanese text transliteration library for PHP
Requires
- php: >=8.2
Requires (Dev)
- friendsofphp/php-cs-fixer: ^3.13
- phpdocumentor/phpdocumentor: ^3.4
- phpstan/phpstan: ^1.10
- phpunit/phpunit: ^10.0
This package is not auto-updated.
Last update: 2025-08-19 18:27:18 UTC
README
A PHP port of the Yosina Japanese text transliteration library.
Overview
Yosina is a library for Japanese text transliteration that provides various text normalization and conversion features commonly needed when processing Japanese text.
Usage
<?php use Yosina\TransliterationRecipe; use Yosina\Yosina; // Create a recipe with multiple transformations $recipe = new TransliterationRecipe( replaceSpaces: true, replaceCircledOrSquaredCharacters: true, replaceCombinedCharacters: true, kanjiOldNew: true, toFullwidth: true ); $transliterator = Yosina::makeTransliterator($recipe); // Use it with various special characters $input = "①②③ ⒶⒷⒸ ㍿㍑㌠㋿"; // circled numbers, letters, ideographic space, combined characters $result = $transliterator($input); echo $result; // "(1)(2)(3) (A)(B)(C) 株式会社リットルサンチーム令和" // Convert old kanji to new $oldKanji = "舊字體"; $result = $transliterator($oldKanji); echo $result; // "旧字体" // Convert half-width katakana to full-width $halfWidth = "テストモジレツ"; $result = $transliterator($halfWidth); echo $result; // "テストモジレツ"
Advanced Configuration
<?php use Yosina\Yosina; // Chain multiple transliterators $transliterator = Yosina::makeTransliterator([ ['kanji-old-new', []], ['spaces', []], ['radicals', []], ]); $result = $transliterator($inputText);
Requirements
- PHP 8.2 or higher
Installation
composer require yosina-lib/yosina
Available Transliterators
1. Circled or Squared (circled-or-squared
)
Converts circled or squared characters to their plain equivalents.
- Options:
templates
(custom rendering),includeEmojis
(include emoji characters) - Example:
①②③
→(1)(2)(3)
,㊙㊗
→(秘)(祝)
2. Combined (combined
)
Expands combined characters into their individual character sequences.
- Example:
㍻
(Heisei era) →平成
,㈱
→(株)
3. Hiragana-Katakana Composition (hira-kata-composition
)
Combines decomposed hiraganas and katakanas into composed equivalents.
- Options:
composeNonCombiningMarks
(compose non-combining marks) - Example:
か + ゙
→が
,ヘ + ゜
→ペ
4. Hiragana-Katakana (hira-kata
)
Converts between hiragana and katakana scripts bidirectionally.
- Options:
mode
("hira-to-kata" or "kata-to-hira") - Example:
ひらがな
→ヒラガナ
(hira-to-kata)
5. Hyphens (hyphens
)
Replaces various dash/hyphen symbols with common ones used in Japanese.
- Options:
precedence
(mapping priority order) - Available mappings: "ascii", "jisx0201", "jisx0208_90", "jisx0208_90_windows", "jisx0208_verbatim"
- Example:
2019—2020
(em dash) →2019-2020
6. Ideographic Annotations (ideographic-annotations
)
Replaces ideographic annotations used in traditional Chinese-to-Japanese translation.
- Example:
㆖㆘
→上下
7. IVS-SVS Base (ivs-svs-base
)
Handles Ideographic and Standardized Variation Selectors.
- Options:
charset
,mode
("ivs-or-svs" or "base"),preferSVS
,dropSelectorsAltogether
- Example:
葛󠄀
(葛 + IVS) →葛
8. Japanese Iteration Marks (japanese-iteration-marks
)
Expands iteration marks by repeating the preceding character.
- Example:
時々
→時時
,いすゞ
→いすず
9. JIS X 0201 and Alike (jisx0201-and-alike
)
Handles half-width/full-width character conversion.
- Options:
fullwidthToHalfwidth
,convertGL
(alphanumerics/symbols),convertGR
(katakana),u005cAsYenSign
- Example:
ABC123
→ABC123
,カタカナ
→カタカナ
10. Kanji Old-New (kanji-old-new
)
Converts old-style kanji (旧字体) to modern forms (新字体).
- Example:
舊字體の變換
→旧字体の変換
11. Mathematical Alphanumerics (mathematical-alphanumerics
)
Normalizes mathematical alphanumeric symbols to plain ASCII.
- Example:
𝐀𝐁𝐂
(mathematical bold) →ABC
12. Prolonged Sound Marks (prolonged-sound-marks
)
Handles contextual conversion between hyphens and prolonged sound marks.
- Options:
skipAlreadyTransliteratedChars
,allowProlongedHatsuon
,allowProlongedSokuon
,replaceProlongedMarksFollowingAlnums
- Example:
イ−ハト−ヴォ
(with hyphen) →イーハトーヴォ
(prolonged mark)
13. Radicals (radicals
)
Converts CJK radical characters to their corresponding ideographs.
- Example:
⾔⾨⾷
(Kangxi radicals) →言門食
14. Spaces (spaces
)
Normalizes various Unicode space characters to standard ASCII space.
- Example:
A B
(ideographic space) →A B
Development
Prerequisites
- PHP 7.4 or higher
- Composer (PHP dependency manager)
Setup
Install the development dependencies:
composer install
Code Generation
The transliterator implementations are generated from the shared data files:
php codegen/generate.php
This generates transliterator classes from the JSON data files in the ../data/
directory.
Testing
Run the basic tests:
php tests/BasicTest.php
Development Workflow
- Make changes to the code or data files
- If you modified data files, regenerate the transliterators:
php codegen/generate.php
- Run tests to ensure everything works:
composer test
Project Structure
php/
├── src/
│ ├── Char.php # Character data structure
│ ├── Chars.php # Character array utilities
│ ├── TransliteratorInterface.php # Transliterator interface
│ ├── TransliteratorFactoryInterface.php # Factory interface
│ ├── ChainedTransliterator.php # Chained transliterator
│ ├── TransliterationRecipe.php # Recipe configuration
│ ├── TransliteratorRegistry.php # Transliterator registry
│ ├── Yosina.php # Main API
│ └── Transliterators/ # Generated transliterators
│ ├── SpacesTransliterator.php
│ ├── RadicalsTransliterator.php
│ └── ...
├── tests/
│ └── BasicTest.php # Basic functionality tests
├── codegen/
│ └── generate.php # Code generator
├── composer.json # Composer configuration
└── README.md # This file
License
MIT License. See the main project README for details.
Contributing
This is part of the larger Yosina project. Please ensure changes maintain compatibility across all language implementations.