Recent Searches

Technical Writing

Janitor Package in R | Cleaning Data

Tamara Shatar | Jul 28, 2021

The Janitor Package in R for Cleaning and Examining Data

As a trainer, my role is to show my students better ways of working. If I am successful, they leave my class with the skills to get things done faster, more accurately and with less time spent on boring, repetitive tasks. And as someone who works with data, I am always looking for tools that will make my life easier. Working with the R programming language, there are always new discoveries to be made amongst the nearly 18,000 packages created by the user community.

My latest discovery is the package janitor. It contains easy-to-use and convenient functions for cleaning and examining data. Let's take a look at some of these functions.

1. clean_names()

This function is used to change and clean up names of columns in data frames. It can be used to ensure consistency. You can choose to change all names to snake case (all lower case words, separated by underscores), variations on camel case (internal capital letters between words), title case or other styles. It can also be used to remove parts of names and any special characters, including replacing % symbols with the word percent.

To demonstrate functionality from the janitor package, a dataset was created in Excel.

The data were imported into a data frame (df) using the RStudio GUI From Excel… option.

Using the clean_names() function adds consistency to the names, removes spaces in the names and special characters.

2. remove_empty()

The dataset contains empty rows and empty columns that can be removed with the remove_empty() function.

3. get_dupes()

This function retrieves any duplicates in the dataset so that they can be examined during data clean-up operations. The first argument accepts the name of the data frame, the second and subsequent arguments accept one or more column names. These columns are searched for duplicate values. The function returns a data frame which includes a dupe_count column containing the number of duplicates of that value.

We can search for duplicated measurements on certain dates,

or for duplicated measurements on certain dates, at certain locations.

4. tabyl()

This function is used to produce frequency tables and contingency tables, i.e. counts of each category or combination of categories of data. Unlike the base R table() function, tabyl() returns a data frame which makes results easier to work with.

The code below creates a data frame showing the number of rows of data (n) for each location in the dataset. Also returned is a percent column, showing the percentage of rows containing data for that location.

5. adorn_

Janitor also provides adorn_ functions for formatting tabulated data. adorn_pct_formatting() can be used to format the percentage output.

We can also return the number of observations for each location on each date.

By default the values in the contingency table are shown as counts. They can be changed to percentages using adorn_percentages().

Use janitor functions with tidyverse pipes

If you use tidyverse pipes, you can use janitor functions in your pipelines to streamline data frame clean-up.

Learn more about janitor at the CRAN site.

If you're new to R, check out our R training course and certifications. We recommend R programming basics for beginners!

Trusted Nationwide by Leading Organisations

at Nexacu, we're proud to be the trusted training partner for hundreds of leading organisations across Australia and New Zealand. From government departments to top corporates, we help teams upskill and succeed everyday

400+ companies rely on Nexacu for team training
Trusted by federal, state, and local government agencies
Delivering training across 9 countries

Why Nexacu?

80K+

Students
76K+

4 & 5 Star Reviews
4.7/5

Google Reviews
1.3K+

Businesses Trust Nexacu

Step by Step Courseware

Custom workbook included with a step by step exercises

Interactive real time training

Interactive, Real-Time Training

Learn with expert instructors, wherever you are

More than 1,300 Business trust Nexacu

Trusted by Business

Procured by Government

Procured by Goverment

Valued by Individuals

Michael was very knowledgeable about Power Apps integration and explained every step thoroughly. I found the session engaging and feel confident applying what I learned to real-life scenarios.

Caoimhe - Power Apps Intermediate (Power Automate + Power BI Integration) Remote East, 25 May 2026.

The strategy of Michael in teaching focuses on knowing the framework on why things are done instead of just showing how things are done

Bou - Power Automate Beginner Remote East, 22 May 2026.

Learned valuable base skills to get me started with exploring Power Automate further.

Sarah - Power Automate Beginner Remote East, 22 May 2026.

The Excel analogies early on made table and scalar functions clear.

Raymond - Power BI DAX Remote East, 21 May 2026.

Michael stepped logically through clear examples to explain the concept of 'why' use a particular expression. I really appreciated the content and explanations. This was very useful.

Fiona - Power BI DAX Remote East, 21 May 2026.

Very good session, Michael explained everything in detail and very clearly.

Mark - Power BI DAX Sydney, 21 May 2026.

Very useful. Learned a lot

Orlagh - Power BI DAX Sydney, 21 May 2026.

Parameters and Quick Measures

Sangeetha - Power BI Intermediate Remote East, 20 May 2026.

Time intelligence. All measures. Field Parameters

Cindy - Power BI Intermediate Sydney, 20 May 2026.

learning how copilot can help we daily power BI work from transforming and analyzing data. and it does save a lot of effort in memorizing all those DAX.

Parison - Copilot for Power BI Sydney, 18 May 2026.

Change the type of thinking as a report developer from creator to Curetor Adding Definition

Sue - Copilot for Power BI Sydney, 18 May 2026.

The course was structured well with good content. Michael was very clear in explaining and answering questions during the course and genuinely helpful.

Angeline - Copilot for Power BI Remote East, 18 May 2026.

View More Reviews

Recent Searches

Popular Searches

Microsoft Power BI

Microsoft Excel

Microsoft Copilot Courses

SharePoint

Professional Development

Microsoft Project

Power Automate

Power Apps

AI for Business

SQL

Microsoft 365

Microsoft Word

Python

Microsoft Teams

Excel Specialist

Microsoft PowerPoint

Microsoft Outlook

R Programming

Microsoft Access

Microsoft Visio

HTML Courses

WordPress

Microsoft Windows 11 Courses

Adobe InDesign Courses

Adobe Premiere Pro Training

Adobe After Effects Training

Adobe Photoshop Courses

Adobe Illustrator Courses

Canva Courses

Articulate

Adobe Acrobat Courses

Adobe Animate Training

Adobe Captivate Training

Power BI Beginner

Excel Beginner

Copilot for M365

SharePoint Beginner

Achieving Leadership & Success

Project Beginner

Power Apps Beginner

SQL Beginner

Word Beginner

Word Advanced

Word Intermediate

Python Beginner

Teams Essentials

Financial Modelling

PowerPoint Level 1

Microsoft Outlook Beginner to Intermediate

R Programming Beginner

Microsoft Access Essentials

Visio Essentials

HTML Training Intro

Windows 11 End User Course

InDesign Lite

Premiere Training Intro

After Effects Training Intro

Photoshop Lite

Illustrator Training Intro

Articulate Storyline 360 Essentials

Acrobat Essentials

Animate Training Intro

Captivate Training

Power BI Intermediate

Excel Intermediate

Copilot for Word

SharePoint Intermediate

Anger Management & Negotiation Skills

Project Intermediate

Power Apps Intermediate

ChatGPT Beginner

SQL Intermediate

Python Intermediate

Analysis and Dashboards

PowerPoint Level 2

R Programming Intermediate

InDesign Training Intro

Photoshop Training Intro