Understanding Rune in Golang

admin — Thu, 23 Jan 2020 16:25:59 +0000

Overview

rune in Go is an alias for int32 meaning it is an integer value. This integer value is meant to represent a Unicode Code Point. To understand rune you have to know what Unicode is. Below is short description but you can refer to the famous blog post about it –

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

What is Unicode

Unicode is a superset of ASCII characters which assigns a unique number to every character that exists. This unique number is called Unicode Code Point.For eg

Digit 0 is represented as Unicode Point U+0030 (Decimal Value – 48)

Small Case b is represented as Unicode Point U+0062 (Decimal Value – 98)

A pound symbol £ is represented as Unicode Point U+00A3 (Decimal Value – 163)

Visit https://en.wikipedia.org/wiki/List_of_Unicode_characters to know about Unicode Point of other characters. But Unicode doesn’t talk about how these code points will be saved in memory. This is where utf-8 comes into picture

UTF-8

utf-8 saves every Unicode Point either using 1, 2, 3 or 4 bytes. ASCII points are stored using 1 byte. That is why rune is an alias for int32 because a Unicode Point can be of max 4 bytes in Go as source code in GO is encoded using utf-8, hence every string is also encoded in utf-8

Every rune is intended to refer to one Unicode Point. For eg if you print a string after typecasting it to a rune array then it will print the Unicode Point for each of character. For for below string “0b£” output will be – [U+0030 U+0062 U+00A3]

fmt.Printf("%U\n", []rune("0b£"))

When to Use

You should use a rune when you intend to save Unicode Code Point in the rune value. A rune array should be used when all values in the array are meant to be a Unicode Code Point.

Rune is also used to represent a character.

Declare Rune

A rune is declared using a character between single quotes like below declaring a variable named ‘rPound’

rPound := '£'

After declaring Rune you can perform below things as well

Print Type – Output will be int32

fmt.Printf("Type: %s\n", reflect.TypeOf(rPound))

Print Unicode Code Point – Output will be U+00A3

fmt.Printf("Unicode CodePoint: %U\n", rPound)

Print Character – Output will be £

fmt.Printf("Character: %c\n", r)

Code:

Below is the code illustrating each point we discussed

package main

import (
    "fmt"
    "reflect"
    "unsafe"
)

func main() {
    r := 'a'
    
    //Print Size
    fmt.Printf("Size: %d\n", unsafe.Sizeof(r))
    
    //Print Type
    fmt.Printf("Type: %s\n", reflect.TypeOf(r))
    
    //Print Code Point
    fmt.Printf("Unicode CodePoint: %U\n", r)
    
    //Print Character
    fmt.Printf("Character: %c\n", r)
    s := "0b£"
    
    //This will print the Unicode Points
    fmt.Printf("%U\n", []rune(s))
    
    //This will the decimal value of Unicode Code Point
    fmt.Println([]rune(s))
}

Output:

Size: 4
Type: int32
Unicode CodePoint: U+0061
Character: a
[U+0030 U+0062 U+00A3]
[48 98 163]

Rune array to string and vice versa

Rune array to string

package main

import "fmt"

func main() {
    runeArray := []rune{'a', 'b', '£'}
    s := string(runeArray)
    fmt.Println(s)
}

Output:

ab£

String to Rune Array

package main

import "fmt"

func main() {
    s := "ab£"
    r := []rune(s)
    fmt.Printf("%U\n", r)
}

Output:

[U+0061 U+0062 U+00A3]

The post Understanding Rune in Golang appeared first on Welcome To Golang By Example.

Character in Go (Golang)

admin — Mon, 06 Jan 2020 16:52:28 +0000

Overview

Golang does not have any data type of ‘char‘. Therefore

byte is used to represent the ASCII character. byte is an alias for uint8, hence is of 8 bits or 1 byte and can represent all ASCII characters from 0 to 255

rune is used to represent all UNICODE characters which include every character that exists. rune is an alias for int32 and can represent all UNICODE characters. It is 4 bytes in size.

A string of one length can also be used to represent a character implicitly. The size of one character string will depend upon the encoding of that character. For utf-8 encoding, it will be between 1-4 bytes

To declare either a byte or a rune we use single quotes. While declaring byte we have to specify the type, If we don’t specify the type, then the default type is meant as a rune.

To declare a string, we use double quotes or backquotes. Double quotes string honors escape character while back quotes string is a raw literal string and doesn’t honor any kind of escaping.

Code Example

See the program below. It shows

A byte representing the character ‘a‘

A rune representing the pound sign ‘£‘

A string having one character micro sign ‘µ’

package main

import (
    "fmt"
    "reflect"
    "unsafe"
)

func main() {
    //If you don't specify type here
    var b byte = 'a'
    
    fmt.Println("Priting Byte:")
    //Print Size, Type and Character
    fmt.Printf("Size: %d\nType: %s\nCharacter: %c\n", unsafe.Sizeof(b), reflect.TypeOf(b), b)
    
    r := '£'
    
    fmt.Println("\nPriting Rune:")
    //Print Size, Type, CodePoint and Character
    fmt.Printf("Size: %d\nType: %s\nUnicode CodePoint: %U\nCharacter: %c\n", unsafe.Sizeof(r), reflect.TypeOf(r), r, r)

    s := "µ" //Micro sign
    fmt.Println("\nPriting String:")
    fmt.Printf("Size: %d\nType: %s\nCharacter: %s\n", unsafe.Sizeof(s), reflect.TypeOf(s), s)
}

Output:

Priting Byte:
Size: 1
Type: uint8
Character: a

Priting Rune:
Size: 4
Type: int32
Unicode CodePoint: U+00A3
Character: £

Priting String:
Size: 16
Type: string
Character: µ

Caveats

Declaring a byte with a NON-ASCII character will raise a compiler error as below. I tried with a character having a corresponding code as 285

constant 285 overflows byte

Only a single character can be declared inside a single quote while initializing byte or a rune. On trying to add two character between single quote, below compiler warning will be generated

invalid character literal (more than one character)

The post Character in Go (Golang) appeared first on Welcome To Golang By Example.

rune Archives - Welcome To Golang By Example

Understanding Rune in Golang

Overview

What is Unicode

UTF-8

When to Use

Code:

Rune array to string and vice versa

Rune array to string

String to Rune Array

Character in Go (Golang)

Overview

Code Example

Caveats