2e86c939
xu
“首次提交”
|
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
|
A super fast, highly extensible markdown parser for PHP
=======================================================
[](https://packagist.org/packages/cebe/markdown)
[](https://packagist.org/packages/cebe/markdown)
[](http://travis-ci.org/cebe/markdown)
[](http://hhvm.h4cc.de/package/cebe/markdown)
[](https://scrutinizer-ci.com/g/cebe/markdown/)
[](https://scrutinizer-ci.com/g/cebe/markdown/)
What is this? <a name="what"></a>
-------------
A set of [PHP][] classes, each representing a [Markdown][] flavor, and a command line tool
for converting markdown files to HTML files.
The implementation focus is to be **fast** (see [benchmark][]) and **extensible**.
Parsing Markdown to HTML is as simple as calling a single method (see [Usage](#usage)) providing a solid implementation
that gives most expected results even in non-trivial edge cases.
Extending the Markdown language with new elements is as simple as adding a new method to the class that converts the
markdown text to the expected output in HTML. This is possible without dealing with complex and error prone regular expressions.
It is also possible to hook into the markdown structure and add elements or read meta information using the internal representation
of the Markdown text as an abstract syntax tree (see [Extending the language](#extend)).
Currently the following markdown flavors are supported:
- **Traditional Markdown** according to <http://daringfireball.net/projects/markdown/syntax> ([try it!](http://markdown.cebe.cc/try?flavor=default)).
- **Github flavored Markdown** according to <https://help.github.com/articles/github-flavored-markdown> ([try it!](http://markdown.cebe.cc/try?flavor=gfm)).
- **Markdown Extra** according to <http://michelf.ca/projects/php-markdown/extra/> (currently not fully supported WIP see [#25][], [try it!](http://markdown.cebe.cc/try?flavor=extra))
- Any mixed Markdown flavor you like because of its highly extensible structure (See documentation below).
Future plans are to support:
- Smarty Pants <http://daringfireball.net/projects/smartypants/>
- ... (Feel free to [suggest](https://github.com/cebe/markdown/issues/new) further additions!)
[PHP]: http://php.net/ "PHP is a popular general-purpose scripting language that is especially suited to web development."
[Markdown]: http://en.wikipedia.org/wiki/Markdown "Markdown on Wikipedia"
[#25]: https://github.com/cebe/markdown/issues/25 "issue #25"
[benchmark]: https://github.com/kzykhys/Markbench#readme "kzykhys/Markbench on github"
### Who is using it?
- It powers the [API-docs and the definitive guide](http://www.yiiframework.com/doc-2.0/) for the [Yii Framework][] [2.0](https://github.com/yiisoft/yii2).
[Yii Framework]: http://www.yiiframework.com/ "The Yii PHP Framework"
Installation <a name="installation"></a>
------------
[PHP 5.4 or higher](http://www.php.net/downloads.php) is required to use it.
It will also run on facebook's [hhvm](http://hhvm.com/).
Installation is recommended to be done via [composer][] by running:
composer require cebe/markdown "~1.0.1"
Alternatively you can add the following to the `require` section in your `composer.json` manually:
```json
"cebe/markdown": "~1.0.1"
```
Run `composer update` afterwards.
[composer]: https://getcomposer.org/ "The PHP package manager"
Usage <a name="usage"></a>
-----
### In your PHP project
To parse your markdown you need only two lines of code. The first one is to choose the markdown flavor as
one of the following:
- Traditional Markdown: `$parser = new \cebe\markdown\Markdown();`
- Github Flavored Markdown: `$parser = new \cebe\markdown\GithubMarkdown();`
- Markdown Extra: `$parser = new \cebe\markdown\MarkdownExtra();`
The next step is to call the `parse()`-method for parsing the text using the full markdown language
or calling the `parseParagraph()`-method to parse only inline elements.
Here are some examples:
```php
// traditional markdown and parse full text
$parser = new \cebe\markdown\Markdown();
echo $parser->parse($markdown);
// use github markdown
$parser = new \cebe\markdown\GithubMarkdown();
echo $parser->parse($markdown);
// use markdown extra
$parser = new \cebe\markdown\MarkdownExtra();
echo $parser->parse($markdown);
// parse only inline elements (useful for one-line descriptions)
$parser = new \cebe\markdown\GithubMarkdown();
echo $parser->parseParagraph($markdown);
```
You may optionally set one of the following options on the parser object:
For all Markdown Flavors:
- `$parser->html5 = true` to enable HTML5 output instead of HTML4.
- `$parser->keepListStartNumber = true` to enable keeping the numbers of ordered lists as specified in the markdown.
The default behavior is to always start from 1 and increment by one regardless of the number in markdown.
For GithubMarkdown:
- `$parser->enableNewlines = true` to convert all newlines to `<br/>`-tags. By default only newlines with two preceding spaces are converted to `<br/>`-tags.
It is recommended to use UTF-8 encoding for the input strings. Other encodings are currently not tested.
### The command line script
You can use it to render this readme:
bin/markdown README.md > README.html
Using github flavored markdown:
bin/markdown --flavor=gfm README.md > README.html
or convert the original markdown description to html using the unix pipe:
curl http://daringfireball.net/projects/markdown/syntax.text | bin/markdown > md.html
Here is the full Help output you will see when running `bin/markdown --help`:
PHP Markdown to HTML converter
------------------------------
by Carsten Brandt <mail@cebe.cc>
Usage:
bin/markdown [--flavor=<flavor>] [--full] [file.md]
--flavor specifies the markdown flavor to use. If omitted the original markdown by John Gruber [1] will be used.
Available flavors:
gfm - Github flavored markdown [2]
extra - Markdown Extra [3]
--full ouput a full HTML page with head and body. If not given, only the parsed markdown will be output.
--help shows this usage information.
If no file is specified input will be read from STDIN.
Examples:
Render a file with original markdown:
bin/markdown README.md > README.html
Render a file using gihtub flavored markdown:
bin/markdown --flavor=gfm README.md > README.html
Convert the original markdown description to html using STDIN:
curl http://daringfireball.net/projects/markdown/syntax.text | bin/markdown > md.html
[1] http://daringfireball.net/projects/markdown/syntax
[2] https://help.github.com/articles/github-flavored-markdown
[3] http://michelf.ca/projects/php-markdown/extra/
Extensions
----------
Here are some extensions to this library:
- [Bogardo/markdown-codepen](https://github.com/Bogardo/markdown-codepen) - shortcode to embed codepens from http://codepen.io/ in markdown.
- [kartik-v/yii2-markdown](https://github.com/kartik-v/yii2-markdown) - Advanced Markdown editing and conversion utilities for Yii Framework 2.0.
- [cebe/markdown-latex](https://github.com/cebe/markdown-latex) - Convert Markdown to LaTeX and PDF
- [softark/creole](https://github.com/softark/creole) - A creole markup parser
- ... [add yours!](https://github.com/cebe/markdown/edit/master/README.md#L98)
Extending the language <a name="extend"></a>
----------------------
Markdown consists of two types of language elements, I'll call them block and inline elements simlar to what you have in
HTML with `<div>` and `<span>`. Block elements are normally spreads over several lines and are separated by blank lines.
The most basic block element is a paragraph (`<p>`).
Inline elements are elements that are added inside of block elements i.e. inside of text.
This markdown parser allows you to extend the markdown language by changing existing elements behavior and also adding
new block and inline elements. You do this by extending from the parser class and adding/overriding class methods and
properties. For the different element types there are different ways to extend them as you will see in the following sections.
### Adding block elements
The markdown is parsed line by line to identify each non-empty line as one of the block element types.
To identify a line as the beginning of a block element it calls all protected class methods who's name begins with `identify`.
An identify function returns true if it has identified the block element it is responsible for or false if not.
In the following example we will implement support for [fenced code blocks][] which are part of the github flavored markdown.
[fenced code blocks]: https://help.github.com/articles/github-flavored-markdown#fenced-code-blocks
"Fenced code block feature of github flavored markdown"
```php
<?php
class MyMarkdown extends \cebe\markdown\Markdown
{
protected function identifyLine($line, $lines, $current)
{
// if a line starts with at least 3 backticks it is identified as a fenced code block
if (strncmp($line, '```', 3) === 0) {
return 'fencedCode';
}
return parent::identifyLine($lines, $current);
}
// ...
}
```
In the above, `$line` is a string containing the content of the current line and is equal to `$lines[$current]`.
You may use `$lines` and `$current` to check other lines than the current line. In most cases you can ignore these parameters.
Parsing of a block element is done in two steps:
1. "consuming" all the lines belonging to it. In most cases this is iterating over the lines starting from the identified
line until a blank line occurs. This step is implemented by a method named `consume{blockName}()` where `{blockName}`
is the same name as used for the identify function above. The consume method also takes the lines array
and the number of the current line. It will return two arguments: an array representing the block element in the abstract syntax tree
of the markdown document and the line number to parse next. In the abstract syntax array the first element refers to the name of
the element, all other array elements can be freely defined by yourself.
In our example we will implement it like this:
```php
protected function consumeFencedCode($lines, $current)
{
// create block array
$block = [
'fencedCode',
'content' => [],
];
$line = rtrim($lines[$current]);
// detect language and fence length (can be more than 3 backticks)
$fence = substr($line, 0, $pos = strrpos($line, '`') + 1);
$language = substr($line, $pos);
if (!empty($language)) {
$block['language'] = $language;
}
// consume all lines until ```
for($i = $current + 1, $count = count($lines); $i < $count; $i++) {
if (rtrim($line = $lines[$i]) !== $fence) {
$block['content'][] = $line;
} else {
// stop consuming when code block is over
break;
}
}
return [$block, $i];
}
```
2. "rendering" the element. After all blocks have been consumed, they are being rendered using the
`render{elementName}()`-method where `elementName` refers to the name of the element in the abstract syntax tree:
```php
protected function renderFencedCode($block)
{
$class = isset($block['language']) ? ' class="language-' . $block['language'] . '"' : '';
return "<pre><code$class>" . htmlspecialchars(implode("\n", $block['content']) . "\n", ENT_NOQUOTES, 'UTF-8') . '</code></pre>';
}
```
You may also add code highlighting here. In general it would also be possible to render ouput in a different language than
HTML for example LaTeX.
### Adding inline elements
Adding inline elements is different from block elements as they are parsed using markers in the text.
An inline element is identified by a marker that marks the beginning of an inline element (e.g. `[` will mark a possible
beginning of a link or `` ` `` will mark inline code).
Parsing methods for inline elements are also protected and identified by the prefix `parse`. Additionally a `@marker` annotation
in PHPDoc is needed to register the parse function for one or multiple markers.
The method will then be called when a marker is found in the text. As an argument it takes the text starting at the position of the marker.
The parser method will return an array containing the element of the abstract sytnax tree and an offset of text it has
parsed from the input markdown. All text up to this offset will be removed from the markdown before the next marker will be searched.
As an example, we will add support for the [strikethrough][] feature of github flavored markdown:
[strikethrough]: https://help.github.com/articles/github-flavored-markdown#strikethrough "Strikethrough feature of github flavored markdown"
```php
<?php
class MyMarkdown extends \cebe\markdown\Markdown
{
/**
* @marker ~~
*/
protected function parseStrike($markdown)
{
// check whether the marker really represents a strikethrough (i.e. there is a closing ~~)
if (preg_match('/^~~(.+?)~~/', $markdown, $matches)) {
return [
// return the parsed tag as an element of the abstract syntax tree and call `parseInline()` to allow
// other inline markdown elements inside this tag
['strike', $this->parseInline($matches[1])],
// return the offset of the parsed text
strlen($matches[0])
];
}
// in case we did not find a closing ~~ we just return the marker and skip 2 characters
return [['text', '~~'], 2];
}
// rendering is the same as for block elements, we turn the abstract syntax array into a string.
protected function renderStrike($element)
{
return '<del>' . $this->renderAbsy($element[1]) . '</del>';
}
}
```
### Composing your own Markdown flavor
This markdown library is composed of traits so it is very easy to create your own markdown flavor by adding and/or removing
the single feature traits.
Designing your Markdown flavor consists of four steps:
1. Select a base class
2. Select language feature traits
3. Define escapeable characters
4. Optionally add custom rendering behavior
#### Select a base class
If you want to extend from a flavor and only add features you can use one of the existing classes
(`Markdown`, `GithubMarkdown` or `MarkdownExtra`) as your flavors base class.
If you want to define a subset of the markdown language, i.e. remove some of the features, you have to
extend your class from `Parser`.
#### Select language feature traits
The following shows the trait selection for traditional Markdown.
```php
class MyMarkdown extends Parser
{
// include block element parsing using traits
use block\CodeTrait;
use block\HeadlineTrait;
use block\HtmlTrait {
parseInlineHtml as private;
}
use block\ListTrait {
// Check Ul List before headline
identifyUl as protected identifyBUl;
consumeUl as protected consumeBUl;
}
use block\QuoteTrait;
use block\RuleTrait {
// Check Hr before checking lists
identifyHr as protected identifyAHr;
consumeHr as protected consumeAHr;
}
// include inline element parsing using traits
use inline\CodeTrait;
use inline\EmphStrongTrait;
use inline\LinkTrait;
/**
* @var boolean whether to format markup according to HTML5 spec.
* Defaults to `false` which means that markup is formatted as HTML4.
*/
public $html5 = false;
protected function prepare()
{
// reset references
$this->references = [];
}
// ...
}
```
In general, just adding the trait with `use` is enough, however in some cases some fine tuning is desired
to get most expected parsing results. Elements are detected in alphabetical order of their identification
function. This means that if a line starting with `-` could be a list or a horizontal rule, the preference has to be set
by renaming the identification function. This is what is done with renaming `identifyHr` to `identifyAHr`
and `identifyBUl` to `identifyBUl`. The consume function always has to have the same name as the identification function
so this has to be renamed too.
There is also a conflict for parsing of the `<` character. This could either be a link/email enclosed in `<` and `>`
or an inline HTML tag. In order to resolve this conflict when adding the `LinkTrait`, we need to hide the `parseInlineHtml`
method of the `HtmlTrait`.
If you use any trait that uses the `$html5` property to adjust its output you also need to define this property.
If you use the link trait it may be useful to implement `prepare()` as shown above to reset references before
parsing to ensure you get a reusable object.
#### Define escapeable characters
Depenedend on the language features you have chosen there is a different set of characters that can be escaped
using `\`. The following is the set of escapeable characters for traditional markdown, you can copy it to your class
as is.
```php
/**
* @var array these are "escapeable" characters. When using one of these prefixed with a
* backslash, the character will be outputted without the backslash and is not interpreted
* as markdown.
*/
protected $escapeCharacters = [
'\\', // backslash
'`', // backtick
'*', // asterisk
'_', // underscore
'{', '}', // curly braces
'[', ']', // square brackets
'(', ')', // parentheses
'#', // hash mark
'+', // plus sign
'-', // minus sign (hyphen)
'.', // dot
'!', // exclamation mark
'<', '>',
];
```
#### Add custom rendering behavior
Optionally you may also want to adjust rendering behavior by overriding some methods.
You may refer to the `consumeParagraph()` method of the `Markdown` and `GithubMarkdown` classes for some inspiration
which define different rules for which elements are allowed to interrupt a paragraph.
Acknowledgements <a name="ack"></a>
----------------
I'd like to thank [@erusev][] for creating [Parsedown][] which heavily influenced this work and provided
the idea of the line based parsing approach.
[@erusev]: https://github.com/erusev "Emanuil Rusev"
[Parsedown]: http://parsedown.org/ "The Parsedown PHP Markdown parser"
FAQ <a name="faq"></a>
---
### Why another markdown parser?
While reviewing PHP markdown parsers for choosing one to use bundled with the [Yii framework 2.0][]
I found that most of the implementations use regex to replace patterns instead
of doing real parsing. This way extending them with new language elements is quite hard
as you have to come up with a complex regex, that matches your addition but does not mess
with other elements. Such additions are very common as you see on github which supports referencing
issues, users and commits in the comments.
A [real parser][] should use context aware methods that walk trough the text and
parse the tokens as they find them. The only implentation that I have found that uses
this approach is [Parsedown][] which also shows that this implementation is [much faster][benchmark]
than the regex way. Parsedown however is an implementation that focuses on speed and implements
its own flavor (mainly github flavored markdown) in one class and at the time of this writing was
not easily extensible.
Given the situation above I decided to start my own implementation using the parsing approach
from Parsedown and making it extensible creating a class for each markdown flavor that extend each
other in the way that also the markdown languages extend each other.
This allows you to choose between markdown language flavors and also provides a way to compose your
own flavor picking the best things from all.
I chose this approach as it is easier to implement and also more intuitive approach compared
to using callbacks to inject functionallity into the parser.
[real parser]: http://en.wikipedia.org/wiki/Parsing#Types_of_parser
[Parsedown]: http://parsedown.org/ "The Parsedown PHP Markdown parser"
### Where do I report bugs or rendering issues?
Just [open an issue][] on github, post your markdown code and describe the problem. You may also attach screenshots of the rendered HTML result to describe your problem.
[open an issue]: https://github.com/cebe/markdown/issues/new
### How can I contribute to this library?
Check the [CONTRIBUTING.md](CONTRIBUTING.md) file for more info.
### Am I free to use this?
This library is open source and licensed under the [MIT License][]. This means that you can do whatever you want
with it as long as you mention my name and include the [license file][license]. Check the [license][] for details.
[MIT License]: http://opensource.org/licenses/MIT
[license]: https://github.com/cebe/markdown/blob/master/LICENSE
Contact
-------
Feel free to contact me using [email](mailto:mail@cebe.cc) or [twitter](https://twitter.com/cebe_cc).
|