A Look At R’s stringr Package

Image Source: https://www.rstudio.com/wp-content/uploads/2014/06/RStudio-Ball.png

 

Hi there. I have been playing around with the stringr package in the statistical programming language R. This stringr package is a neat package for dealing with strings (text) and data cleaning/formatting.

In this page I showcase some experimentation work I have done using the stringr package function. I go over large portion of the functions in stringr but not all. (There are a few functions that I have not tested or not familiar with.)

My reference here is the R documentation on stringr and its functions.

Note: I am not that familiar with regular expressions but I am learning about it through this page.


Sections

The stringr Package In R

Installing and Loading stringr

A Look At The stringr Functions

 


The stringr Package In R

There are cases where you may need to manage strings/text in R. This stringr package is useful for dealing with such strings.

In R/RStudio, you can type in ??stringr and then click on stringr::stringr from the help window to find out some information on stringr. Other resources on stringr include the R documentation here (.pdf) and this webpage.

 


Installing and Loading stringr

To install the stringr package, type in this code into R/Rstudio:

After installing the stringr package into R, you can load the stringr package by typing in:

Once the stringr package is loaded, the functions in the stringr package can be used.

 


A Look At The stringr Functions

There are quite a few functions in the stringr package so I am keeping details at a minimum. Each subsection refers to a specific function in stringr. Examples are included to show how the functions operate.

 

Length Of Strings

The length or number of characters in a string can be determined with the str_length() function from stringr. Please refer to the examples below.

 

Converting To Uppercase, Lowercase, Titles

Converting strings to uppercase/lowercase letters or to a title format is not difficult. The functions involved are str_to_upper(), str_to_lower() and str_to_title().

 

Combining Multiple Strings Into A Single String

You can combine/concatenate multiple strings into a single string with the str_c() function. According to the R documentation, the function has this format.

Here are some examples.

 

The Number Of Matches In A String

The str_count() function can help in finding the number of a specific pattern in a string. Examples are shown below.

 

Detecting Matches In A String

In stringr, you can detect certain matches/patterns in a string with the str_detect() function. Knowledge of regular expressions is helpful depending on what you want to detect. Outputs from str_detect() are either TRUE or FALSE.

 

Duplicating & Multiplying Strings

You can create duplicates and multiply strings together with the str_dup() function.

 

Extract A Matching Pattern From A String

The str_extract() function from the stringr package allows for extracting certain patterns from a string. You can use a single string or a vector of strings in str_extract(). Regular expressions can be used as well here. More examples can be found in the stringr documentation.

 

Locating The Position Of A Pattern In A String

There are times when you would rather locate the position of the pattern in a string versus extracting that pattern. The str_locate() function requires a string and a pattern which would output the position of the specified pattern.

If the input is a vector of strings, the output would be a matrix where the number of rows is the number of strings and there would be two columns. The first column represents the starting position for a string and the second column represents the ending position of the string. If the pattern is a single character, the start and end position would be the same.

If there is no such pattern, the output would be NA or blank.

 

String Matching

You can match certain patterns in a string using the str_match() function. This function is very similar to str_extract().

 

Ordering/Sorting Strings

A useful function is ordering/sorting strings. The functions used here are str_order() and str_sort(). Given a vector of strings, str_order() outputs the positions in ABC order (by default). The str_order() function outputs the elements in the vector in ABC order.

The template code from the R documentation for both functions are shown below. (I do not know what locale means here.)

The first example uses five numbers.

 

Note: It is recommended to not sort numbered strings using str_sort() or str_order(). Here is an example.

The output above does not have “10” as the first string. This “10” is second last as I think it looks at the first digit and this “10” is before “1” because of this extra digit.

These next examples are about sorting text alphabetically.

 

Padding Strings

Padding strings involve adding characters to the left/right or both sides of a string. The code template from the R documentation is shown below along with some examples.

 

Replacing Patterns In Strings

You can replace patterns in strings with the str_replace() and str_replace_all() functions from stringr. The user would have to specify the pattern to be replaced inside a string (or a vector of strings) and a replacement.

 

Splitting Strings

Given a string or a vector of strings, you can split strings with the str_split() function. There is also the str_split_fixed() function where the output is a character matrix versus a character vector from str_split(). The code below shows the R documentation template code with some of my examples.

 

Substrings From Strings

Suppose you have a string (or strings) where you want a subset/portion of the string(s). The str_sub() function allows the R user to extract substrings from strings. Start and end positions are needed along with the string object itself in str_sub(). (You could also use str_locate() to find positions of patterns.)

 

 

Trimming Strings

Any whitespace in strings can be trimmed/removed by using the str_trim() function. The user has to specify whether to trim from the left, right or both sides of the string.

 

Truncating Strings

Truncate can be seen as another word for trim or subset. In stringr, the str_trunc() function takes a string, a specified width (substring length), and a side of either right, left or center. The ellipsis = “…” is there by default. The output for str_trunc() gives a substring customized to the arguments in str_trunc().

The code below includes the template code from the R documentation and some examples.

 

Extracting Words From A Sentence

Given a string that is a sentence, the word() function from stringr can extract words. The user has to specify a start and end. If you want to extract the first two words in a sentence you would use start = 1 and end = 2. You can somewhat view each word in a sentence as elements in an array/vector.

 

Leave a Reply