javascript - How to make capture group "absorb" whitespace before/after it without capturing it? -


i have regex expression found here. try out strings below, problem i'm facing there's whitespace located @ beginning of each captured group after 1st one. need whitespace matched don't need them captured.

regex expression:

^(\/[a-za-z0-9]+)?(\s~[a-za-z]+)?([\w\s'()-]+)?((?:\s~[a-za-z]+){0,2})?$ 

viewing @ link above makes simpler comprehend.

these strings can paste test string area 1 one:

/test ~example matches ~extra ~space has ~space ~matched /like wise /and ~this 

take @ match groups area , notice after 1st group, 1 preceding whitespace between groups captured.

what want this:

for 1st , 2nd capture group, want them detect succeeding space , absorb not capture it, 3rd capture group won't detect , capture space. 4th capture group, want detect preceding space , absorb not capture it.

what mean absorb space gets "removed" in sense 3rd capture group won't realize it's there.

how can this?

thanks.

this regex came with-

^(\/[a-za-z0-9]+)?(?:\s)?(~[a-za-z]+)?(?:\s)?([\w\'()\-\s]+)?(?:\s(~[a-za-z]+))?(?:\s(~[a-za-z]+))?$ 

elaborating regex in 2 parts per requirement-

for 1st , 2nd capture group, want them detect succeeding space , absorb not capture it, 3rd capture group won't detect , capture space.

your regex 1st , 2nd groups -

(\/[a-za-z0-9]+)?(\s~[a-za-z]+)? 

so, after each first , second capturing group, i've added non-capturing (?:\s)? .this allows 3rd capturing group not absorb preceding space. regex -

(\/[a-za-z0-9]+)?(?:\s)?(~[a-za-z]+)?(?:\s)? 

for 4th capture group, want detect preceding space , absorb not capture it.

your regex

((?:\s~[a-za-z]+){0,2})? 

here, obvious solution capture text part([a-za-z]) , non-capture \s part. this,

(?:(?:\s(~[a-za-z]+)){0,2})?          ^^^^^^^^^^ capturing this. 

but repeated capturing group, capturing new element on top of old element. basically, repeated capturing group capture last iteration. if wanted match-

" ~space ~matched", capture last "~matched".

so 1 solution since checking {0,2}, can explicitly check 2 times, -

(?:\s(~[a-za-z]+))?(?:\s(~[a-za-z]+))? 

but if requirement {0,2} later changes then, best solution capture preceding spaces , split captured group spaces separately.

->  output - when run regex given strings in javascript- ["/test ~example matches ~extra ~space", "/test", "~example", "matches", "~extra", "~space", index: 0, input: "/test ~example matches ~extra ~space"] (index):18 ["this has ~space ~matched", undefined, undefined, "this has extra", "~space", "~matched", index: 0, input: "this has ~space ~matched"] (index):18 ["/like wise this", "/like", undefined, "wise this", undefined, undefined, index: 0, input: "/like wise this"] (index):18 ["/and ~this", "/and", "~this", undefined, undefined, undefined, index: 0, input: "/and ~this"]  

hope helped.


Comments

Popular posts from this blog

python - Subclassed QStyledItemDelegate ignores Stylesheet -

java - HttpClient 3.1 Connection pooling vs HttpClient 4.3.2 -

SQL: Divide the sum of values in one table with the count of rows in another -