The Janitor Package in R for Cleaning and Examining Data

As a trainer, my role is to show my students better ways of working. If I am successful, they leave my class with the skills to get things done faster, more accurately and with less time spent on boring, repetitive tasks. And as someone who works with data, I am always looking for tools that will make my life easier. Working with the R programming language, there are always new discoveries to be made amongst the nearly 18,000 packages created by the user community.

My latest discovery is the package janitor. It contains easy-to-use and convenient functions for cleaning and examining data. Let's take a look at some of these functions.

1. clean_names()

This function is used to change and clean up names of columns in data frames. It can be used to ensure consistency. You can choose to change all names to snake case (all lower case words, separated by underscores), variations on camel case (internal capital letters between words), title case or other styles. It can also be used to remove parts of names and any special characters, including replacing % symbols with the word percent.

To demonstrate functionality from the janitor package, a dataset was created in Excel.

The data were imported into a data frame (df) using the RStudio GUI From Excel… option.

Using the clean_names() function adds consistency to the names, removes spaces in the names and special characters.

2. remove_empty()

The dataset contains empty rows and empty columns that can be removed with the remove_empty() function.

3. get_dupes()

This function retrieves any duplicates in the dataset so that they can be examined during data clean-up operations. The first argument accepts the name of the data frame, the second and subsequent arguments accept one or more column names. These columns are searched for duplicate values. The function returns a data frame which includes a dupe_count column containing the number of duplicates of that value.

We can search for duplicated measurements on certain dates,

or for duplicated measurements on certain dates, at certain locations.

4. tabyl()

This function is used to produce frequency tables and contingency tables, i.e. counts of each category or combination of categories of data. Unlike the base R table() function, tabyl() returns a data frame which makes results easier to work with.

The code below creates a data frame showing the number of rows of data (n) for each location in the dataset. Also returned is a percent column, showing the percentage of rows containing data for that location.

5. adorn_

Janitor also provides adorn_ functions for formatting tabulated data. adorn_pct_formatting() can be used to format the percentage output.

We can also return the number of observations for each location on each date.

By default the values in the contingency table are shown as counts. They can be changed to percentages using adorn_percentages().

Use janitor functions with tidyverse pipes

If you use tidyverse pipes, you can use janitor functions in your pipelines to streamline data frame clean-up.

Learn more about janitor at the CRAN site.

If you're new to R, check out our R training course and certifications. We recommend R programming basics for beginners!

Why Nexacu?

I learnt ALOT! Thanks Kate, that was a great session. I feel it was a bit quick for me in the later part this afternoon, maybe I am not quite Intermediate level!

Amy - Microsoft 365 Intermediate Remote East, 26 Jun 2025.

Helpful and super friendly. Had a lot of patience with the whole group.

Samantha - Excel Intermediate Adelaide, 26 Jun 2025.

Great content. Easy to follow

Jennifer - Excel Intermediate Remote East, 26 Jun 2025.

Very knowledgeable and interactive trainer

Greegar - Excel Intermediate Adelaide, 26 Jun 2025.

Tim provided a lot of insightful tricks catered to the financial modelling challenges we experience day to day. I think the way Tim took note of our job descriptions at the start of our course in order to create links to the fundamental tools taught in each exercise made the experience engaging and relatable. Thanks Tim

Mitchell - Financial Modelling Remote East, 26 Jun 2025.

Very knowledgeable and patient. Always checking in with everyone online.

Rosy - Excel Intermediate Remote East, 26 Jun 2025.

Her explanation was excellent and followed a smooth pace

Rana - Excel Intermediate Adelaide, 26 Jun 2025.

Very informative, well explained

Tanya - Excel Intermediate Adelaide, 26 Jun 2025.

Thankyou very good have learned a lot

Cherrie - Excel Intermediate Remote East, 26 Jun 2025.

enjoyed and learnt well

Frances - Excel Intermediate Remote East, 26 Jun 2025.

Great information

Kerri - Excel Intermediate Adelaide, 26 Jun 2025.

David has been an amazing training both for Illustrator Essentials and InDesign Advanced. He is patient, explains topics thoroughly and guides both online and in-person students through all the steps with ease. As a working graphic designer, he is able to showcase real word scenarios and help us navigate and understand tricky questions/requests that clients often ask. David takes the time to show you little tips and tricks along the way that make things quicker and easier (e.g. keyboard shortcuts, generative AI, and cross-platform workings). He's very knowledgeable and brings a light and fun attitude to every class. Thank you, David.

The Janitor Package in R for Cleaning and Examining Data

Related Topics

Contact Us

Why Nexacu?

Valued by Individuals

Trusted by Business

Procured by Government

Awards and Accreditations

Follow us

Popular Microsoft Courses

Top Workplace Courses

Quick Links

Training Solutions