Introduction
Data is everywhere and important source and outcome in different fields in different organizations and in various forms. It is no surprise that majority of the data collected and reported is unstructured in nature and thus it becomes quite imperative to use it to our advantage by collecting information and generating insights. Majority of the unstructured data that we have is in the form of text and so it becomes increasingly suitable for us to deal with text and in R we have certain packages that make the process of dealing with strings efficient and smooth 2.Libraries In order to specifically deal with strings, we require the use of following basic packages that make our job quite easier while dealing with string manipulation in R. 3. Stringr basics In order to better understand the functioning of the stringr package lets us perform some basics operations on a string. Further, there are different ways to print the string stored in the variable as shown below. we generally use writeLines to print the raw text in the string. Multiple other special characters "/n" can also be written and other special non-English characters can be encoded separately. 3. Stringr functions - There are different useful functions in stringr package to calculate different aspects related to string. All different functions in the stringr package start with prefix str and thus make it quite easier to remember them appropriately. 1.Str_length - It helps us to calculate the length of the strings 2- Str_c- It proves to be useful in combining different string and also deciding the combinator to be used while combining them. 3 str_sub - Sometimes we only need a certain part of a string and str_sub provides us with so much control that we can exactly retrieve specific part of string we want just like we extract specific rows and columns through indexing 4. str_lower - There are these stringr functions that help us to convert strings from lower case to upper case and vice_versa However, changing case is more complicated than it might at first appear because different languages have different rules for changing case and in that case, we pick a locale that helps us to implement case function for different languages. In the above code line output, you can observe that there is a special dot on the top of I because in Turkish language (locale = tr) there is generally a dot on top of I, so in the same way we can use a different locale for different languages to perform case operations. 5. str_sort - It further helps in sorting strings according to their alphabetical order and that order is decided in terms of giving weight to the first alphabet of the string and then comparing the second alphabet if both have the same first alphabet. In the above lines of code, gate comes first in English locale as the second alphabet in gate being "a" has higher preference in terms of alphabetical order than the second alphabet "o" of word go. 6 str_view - It helps us to search strings for a pattern and match them accordingly Some of our other tutorials related to data analysis with R R for Data |
Computer Science >