swift - markdown emph regex match -


raw string:

these * should * not \*be\* selected. this* neither! *should be. *neither should\* be* *this should* and*this*

expect:

these * should * not *be* selected. this* neither! *should be. *neither should* be* <em>this should</em> ~~and<em>this</em>~~

old regex:

"(^|[\\w_])(?:(?!\\1)|(?=^))(\\*|_)(?=\\s)((?:(?!\\2).)*?\\s)\\2(?!\\2)(?=[\\w_]|$)"

the old 1 not enough deal situation

could help? swift regex

you should careful regex approach when parsing markdown regex since data can have escape sequences. means, cannot use lookarounds match if not preceded backslash. can try regex match escape sequences coming before markdown 1 group , markdown parts another.

"(?u)(\\\\.)|(\\*\\b(?:(?!\\\\[*]).)*?\\b\\*)" 

see this regex demo. inside code, need handle these 2 groups differently per specifications.

pattern details:

  • (?u) - make word boundaries unicode-aware in pattern
  • (\\\\.) - group 1 - escape sequence
  • | - or
  • (\\*\\b(?:(?!\\\\[*]).)*?\\b\\*) - group 2 matching
    • \\*\\b - * followed word char
    • (?:(?!\\\\[*]).)*? - char not starting char of \* sequence, few possible
    • \\b\\* - * preceded word char

better option custom parsing code.


Comments