regex Archives - Welcome To Golang By Example https://vikasboss.github.io/tag/regex/ Wed, 07 Jul 2021 13:00:12 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.1 https://i0.wp.com/golangbyexamples.com/wp-content/uploads/2021/05/cropped-go_border-1.png?fit=32%2C32&ssl=1 regex Archives - Welcome To Golang By Example https://vikasboss.github.io/tag/regex/ 32 32 159787465 Golang Regex: Match a floating-point number in Regular Expression https://vikasboss.github.io/golang-regex-floating-point-number/ https://vikasboss.github.io/golang-regex-floating-point-number/#respond Wed, 07 Jul 2021 12:56:41 +0000 https://vikasboss.github.io/?p=5864 Overview A floating-point number could have below properties It could have a negative and positive sign The integer part could be optional when the decimal part is present The dot and decimal...

The post Golang Regex: Match a floating-point number in Regular Expression appeared first on Welcome To Golang By Example.

]]>
Overview

A floating-point number could have below properties

  • It could have a negative and positive sign
  • The integer part could be optional when the decimal part is present
  • The dot and decimal part could be optional if the integer part is present
  • It could have an exponent or not

So below are valid floating-point numbers

1.2
.12
12
12.
+1.2
-1.2
1.2e3

Below are invalid floating points 

  • An empty string
  • + or – sign only
  • A single dot
  • A prefix of multiple 0. For eg 00.1 or 001
  • Anything like +. or –
  • A dot just before exponent.  Eg 1.e2
  • Any other char before or after the floating-point number. Eg a1.3 or a1.3b or 1.3b

Below are examples of invalid floats

""
.
00.1
001
+
-
+.
-.
1.e2
a1.2
1.2b
a1.2b

Let’s first see a simple regex which only matches only the integer, dot, and decimal parts.

^(?:(?:0|[1-9]\d*)(?:\.\d*)?|\.\d+)$

On a high level, the entire regex has two parts which are in OR relation

  • (?:0|[1-9]\d*)(?:\.\d*)? – This captures the part where the integer part is always present and the decimal part is optional
  • \.\d+ – This captures the part where the integer part is not present and the decimal part is always present.

Let’s dissect this regex

Let’s make it more complex by having it accept a negative or a positive sign. Note that negative or positive sign is optional

^[+\-]?(?:(?:0|[1-9]\d*)(?:\.\d*)?|\.\d+)$

The regex is the same as the earlier regex. We just added the optional positive negative sign in front

  • [+\-] – Match either positive or negative sign.
  • ? – Matching either positive or negative sign is optional

Let’s also add an exponent part to the regex. Again note that the exponent part is optional. This regex is the same as the previous regex. We just added the exponent part at the end

^[+\-]?(?:(?:0|[1-9]\d*)(?:\.\d*)?|\.\d+)(?:\d[eE][+\-]?\d+)?$

Let’s dissect the exponent part

  • (?: – It means non-capturing group
  • \d – Match one digit. This is to prevent numbers like 1.e2
  • [eE] – Match either lowercase e or uppercase E
  • [+\-] – Match either positive or negative sign. The matching either positive or negative sign is optional
  • \d+ – Match zero or more digits
  • )? – Entire regex  expression is optional

Program

Now see an example of this regular expression in action

package main

import (
	"fmt"
	"regexp"
)

func main() {
	sampleRegex := regexp.MustCompile(`^[+\-]?(?:(?:0|[1-9]\d*)(?:\.\d*)?|\.\d+)(?:\d[eE][+\-]?\d+)?$`)

	fmt.Println("Valid Inputs")
	match := sampleRegex.MatchString("1.2")
	fmt.Printf("For 1.2: %t\n", match)

	match = sampleRegex.MatchString(".12")
	fmt.Printf("For .12: %t\n", match)

	match = sampleRegex.MatchString("12")
	fmt.Printf("For 12: %t\n", match)

	match = sampleRegex.MatchString("12.")
	fmt.Printf("For 12.: %t\n", match)

	match = sampleRegex.MatchString("+1.2")
	fmt.Printf("For +1.2.: %t\n", match)

	match = sampleRegex.MatchString("-1.2")
	fmt.Printf("For -1.2.: %t\n", match)

	match = sampleRegex.MatchString("1.2e3")
	fmt.Printf("For 1.2e3.: %t\n", match)

	fmt.Println("\nInValid Inputs")
	match = sampleRegex.MatchString(".")
	fmt.Printf("For .: %t\n", match)

	match = sampleRegex.MatchString("")
	fmt.Printf("For empty string: %t\n", match)

	match = sampleRegex.MatchString("00.1")
	fmt.Printf("For 00.1: %t\n", match)

	match = sampleRegex.MatchString("001")
	fmt.Printf("For 001 %t\n", match)

	match = sampleRegex.MatchString("+")
	fmt.Printf("For +: %t\n", match)

	match = sampleRegex.MatchString("-")
	fmt.Printf("For -: %t\n", match)

	match = sampleRegex.MatchString("+.")
	fmt.Printf("For +.: %t\n", match)

	match = sampleRegex.MatchString("-.")
	fmt.Printf("For -.: %t\n", match)

	match = sampleRegex.MatchString("1.e2")
	fmt.Printf("For 1.e2: %t\n", match)

	match = sampleRegex.MatchString(".e2")
	fmt.Printf("For .e2: %t\n", match)

	match = sampleRegex.MatchString("a1.2")
	fmt.Printf("For a1.2 %t\n", match)

	match = sampleRegex.MatchString("1.2b")
	fmt.Printf("For 1.2b %t\n", match)

	match = sampleRegex.MatchString("a1.2b")
	fmt.Printf("For a1.2b %t\n", match)
}

Output

Valid Inputs
For 1.2: true
For .12: true
For 12: true
For 12.: true
For +1.2.: true
For -1.2.: true
For 1.2e3.: true

InValid Inputs
For .: false
For empty string: false
For 00.1: false
For 001 false
For +: false
For -: false
For +.: false
For -.: false
For 1.e2: false
For .e2: false
For a1.2 false
For 1.2b false
For a1.2b false

For all the valid inputs discussed above the program prints true

Valid Inputs
For 1.2: true
For .12: true
For 12: true
For 12.: true
For +1.2.: true
For -1.2.: true
For 1.2e3.: true

And for all the invalid inputs discussed above it gives false

InValid Inputs
For .: false
For empty string: false
For 00.1: false
For 001 false
For +: false
For -: false
For +.: false
For -.: false
For 1.e2: false
For .e2: false
For a1.2 false
For 1.2b false
For a1.2b false

Please try it out and post in the comments if in any case, this regex doesn’t work.

The above regex is used to validate if a given string is a number. If you want to find if an input string contains a number as a substring then we need to remove the anchor characters at the start and the end which is removing the caret (^) at the start and the dollar ($) character at the end

So the regex will be

[+\-]?(?:(?:0|[1-9]\d*)(?:\.\d*)?|\.\d+)(?:\d[eE][+\-]?\d+)?

This is all about matching floating point numbers through regex in golang. Hope you have liked this article. Please share feedback in the comments.

Also, check out our Golang advance tutorial Series – Golang Advance Tutorial

The post Golang Regex: Match a floating-point number in Regular Expression appeared first on Welcome To Golang By Example.

]]>
https://vikasboss.github.io/golang-regex-floating-point-number/feed/ 0 5864
Golang Regex: Backreferences https://vikasboss.github.io/golang-regex-backreferences/ https://vikasboss.github.io/golang-regex-backreferences/#comments Mon, 31 May 2021 21:18:04 +0000 https://vikasboss.github.io/?p=5778 Overview Golang regex package regexp uses the re2 engine which doesn’t support backreferences. You can check the same here https://github.com/google/re2/wiki/Syntax It does mention that it doesn’t support backreferences. However, there is another...

The post Golang Regex: Backreferences appeared first on Welcome To Golang By Example.

]]>
Overview

Golang regex package regexp uses the re2 engine which doesn’t support backreferences. You can check the same here

https://github.com/google/re2/wiki/Syntax

It does mention that it doesn’t support backreferences.

However, there is another golang package available that uses libpcre++, Perl regexes, and it supports backreferences.

https://github.com/glenn-brown/golang-pkg-pcre/tree/master/src/pkg/pcre

Program

So let’s see examples of backreferences in golang using this pcre package.

First Example

Let’s say we want to match the repetition of a digit. Valid inputs are

1111
888888888
444

Regex to match for the same would be

(\d)\1+

Let’s dissect this regex

  • (\d) – Matches a single digit. The single-digit is enclosed in parentheses so it acts as a capturing group.
  • \1 – Backreferences the first sub match by capturing group. So it will reference the first digit
  • + – One or more occurrences of the previous digit

Program for the same

package main

import (
	"fmt"

	"github.com/glenn-brown/golang-pkg-pcre/src/pkg/pcre"
)

func main() {
	regex := pcre.MustCompile(`(\d)\1+`, 0)

	matches := regex.MatcherString("1111", 0).Matches()
	fmt.Println("For 1111 : ", matches)

	matches = regex.MatcherString("88888888", 0).Matches()
	fmt.Println("For 88888888 : ", matches)

	matches = regex.MatcherString("444", 0).Matches()
	fmt.Println("For 444 : ", matches)

	matches = regex.MatcherString("123", 0).Matches()
	fmt.Println("For 123 : ", matches)
}

Output

For 1111 :  true
For 88888888 :  true
For 444 :  true
For 123 :  false

As expected it gives a match for repetition of digits

1111
888888888
444

And it does not match for below as it is not a repetition

123

Second Example

Let’s say we want to match the repetition of a word separated by a colon. Valid inputs are

John:John
The names are Simon:Simon

Regex to match for the same would be

(\w+):\1

Let’s dissect this regex

  • (\w+) – Matches a word having more than one character. It is enclosed in parentheses so it acts as a capturing group.
  • \1 – Backreferences the first sub match by capturing group. So it will reference the matched word

Program for the same

package main

import (
	"fmt"

	"github.com/glenn-brown/golang-pkg-pcre/src/pkg/pcre"
)

func main() {
	regex := pcre.MustCompile(`(\w+):\1`, 0)

	matches := regex.MatcherString("John:John", 0).Matches()
	fmt.Println("For John:John: ", matches)

	matches = regex.MatcherString("The names are Simon:Simon", 0).Matches()
	fmt.Println("For The names are Simon:Simon: ", matches)

	matches = regex.MatcherString("John:Simon", 0).Matches()
	fmt.Println("For John:Simon: ", matches)

}

Output

For John:John:  true
For The names are Simon:Simon:  true
For John:Simon:  false

As expected it gives a match for a string that contains a substring having a repetition of a word

John:John
The names are Simon:Simon

And it does not match for below as it does not contain a repetition of a word

John:Simon

Replace Matched String

The pcre package also provides functionality to replace the matched string. Below is an example of the same.

package main

import (
	"fmt"

	"github.com/glenn-brown/golang-pkg-pcre/src/pkg/pcre"
)

func main() {
	regex := pcre.MustCompile(`(\d)\1+`, 0)

	input := "The number is 91-88888888"

	result := regex.ReplaceAll([]byte(input), []byte("redacted"), 0)
	fmt.Println("result: ", string(result))
}

Output

result:  The number is 91-redacted

In the above example, we have a regex with a backreference that matches a repetition of a digit. We then redact this repetition of digit using the ReplaceAll method of the pcre package

result := regex.ReplaceAll([]byte(input), []byte("redacted"), 0)

And as expected from the output, the repetition of the digit is correctly redacted

result:  The number is 91-redacted

Hope you have liked this tutorial. Please share the feedback in the comments

Also, check out our Golang advance tutorial Series – Golang Advance Tutorial

The post Golang Regex: Backreferences appeared first on Welcome To Golang By Example.

]]>
https://vikasboss.github.io/golang-regex-backreferences/feed/ 1 5778