Import data into RStudio

Do you want to learn how to load datasets into RStudio?

RStudio allows you to import data from other data analysis tool like SAS, Stata, SPSS, Excel, text files and comma-separated values (CSV) files.

Usually this functionality can be done in pure R, but RStudio helps you eliminate the need for writing any code. On top of that, you have the ability to fine-tune, modify and even remove comments from the input files before importing it.

If you haven't had RStudio on your system yet, consider reading our guide on installing RStudio on Ubuntu.

Which file types you can import to RStudio?

Here's a list of formats which RStudio allows importing data from:

  • CSV
  • XLS
  • XLSX
  • SAV
  • DTA
  • POR
  • SAS
  • STATA

How to import datasets into RStudio

Inside the main interface of RStudio, find Environment tab and then click Import Dataset to begin importing data. The importers are grouped into 3 categories: Text data, Excel data and statistical data.

image-20201005092222278

From the screenshot above, you can also see a broom icon right next to Import Dataset which you can use to erase everything from your working environment.

Import data from CSV files into RStudio

To import a CSV file, you need to select "From Text (readr)" from RStudio menu.

The first time you use this feature, you may be prompted to install readr. Just click OK and wait a few minutes for the automatic installation to complete.

This option would use readr package to read comma-separated data or in general, any other character-separated data. Text Importer can also further process data to suit your need, with support for :

  • Importing from the file system or a url
  • Change column data types
  • Skip or include-only columns
  • Rename the data set
  • Skip the first N rows
  • Use the header row for column names
  • Trim spaces in names
  • Change the column delimiter
  • Encoding selection
  • Select quote, escape, comment and NA identifiers

To illustrate the import feature, we can use "Population, surface area and density" dataset from United Nations data at http://data.un.org/_Docs/SYB/CSV/SYB62_1_201907_Population,%20Surface%20Area%20and%20Density.csv.

Head back to Import Text Data dialog in RStudio, paste the CSV URL above into File/URL field. Before actually importing the data, you can click Update to get a preview of it.

image-20201005094926650

Here you have a few options for further data processing. You can change the name of the dataset to a more memorable one. Also, first row of the dataset can be marked as the header of the file.

By default, the data delimiter is a comma, you can change this by selecting another character from the drop-down menu in Delimiter.

Feel free to explore other advanced ability to fine tuning data before clicking Import to begin the importing process.

CSV data imported to RStudio

Import data from TXT files into RStudio

Data from txt files can be imported to RStudio using base package, which gives you maximum compatibility with previous versions of RStudio.

To import from text files to RStudio, select "From Text (base)" option from the Import Dataset drop-down.

RStudio will automatically scans for possible file encoding, rows, quotes and separator characters and shows you a preview of the Data Frame created from the input file.

RStudio also support removing comments inside the input files. In order to do that, you have to select the character that indicate a comment from the drop-down menu in Comment section. In most cases, the character would be the number sign (#).

Removing comments while importing txt files

You can change these settings from the panel in the left side until you satisfy with the output. Finally, clicking Import will begin the import process.

Import data from Excel files into RStudio

Excel import functionality in RStudio is provided by readxl package. The Excel importer supports:

  • Import from local storage or a url
  • Change column data types
  • Skip columns
  • Rename the data set
  • Select an specific Excel sheet to import from
  • Skip the first few rows
  • Detect NA identifiers

We are going to use National School Lunch Assistance Program from Food and Nutrition Service as the demo dataset.

First, you'll need to select "From Excel" in Import Dataset drop-down.

The first time you use this feature, you may be prompted to install readxl. Just click OK and wait a few minutes for the automatic installation to complete.

In File/URL field, input https://fns-prod.azureedge.net/sites/default/files/resource-files/09sbmeals-7.xls. The Browse button now changes to Update, clicking on it will download the actual Excel file and shows you the data in Data Preview panel.

image-20201005141442934

You can see that our input file has a few unnecessary rows. We are going to remove those first 2 rows from the input by enter the number 2 into the Skip field.

At this point, the data is looking good. However columns that contains number is being detected as "character", meaning they will be treated as strings.

To change that, click the header of each column to open the drop-down menu and select Numeric. If you want a column to be removed, you can also select Skip from here instead of Include.

image-20201005142200820

We also need to mark "--" as the string for empty values by entering "--" (without the quote) into NA field.

The final step is to click "Import" to run the code and import the data into RStudio. The data should fits nicely as follows:

image-20201005142406372

Import data from other programs such as SPSS, SAS, Stata into RStudio

RStudio supports importing SPSS, SAS and Stata file formats, which includes sav, dta, por, sas, stata file extensions. Data file can be accompanied by a model file.

Importing data from other data analysis applications is done with the help of haven package.

The first time you use this feature, you may be prompted to install haven. Just click OK and wait a few minutes for the automatic installation to complete.

To perform an import, you need to select the appropriate option in Import Dataset drop-down menu.

After that, either browse to your input data or enter the URL to the input file into File/URL field.

In this example, we use a SAV file from https://github.com/rstudio/webinars/raw/master/23-Importing-Data-into-R/data/Child_Data.sav as the demo dataset.

image-20201005144935099

Once the data is completely loaded and shows up in Data Preview, review it to ensure any problem. You can change the dataset name or select other format before clicking Import to run the code under Code Preview and perform the import process.

FAQ

Which format is supported by RStudio?

Currently RStudio supports txt,csv, xls, xlsx, sav, dta, por, sas, stata file extensions. Data file can be accompanied by a model file.

Does RStudio support pre-processing input data?

Yes. RStudio supports a handful of data manipulation tools without writing a single line of code.

Can I import datasets by writing R code instead of using RStudio?

Yes, but you have to consult documentation to fine-tune the input the data and removes unnecessary details yourself. Using RStudio also eliminate the need to write the same code again and again everytime you need to process some data.

Click to rate this post!
[Total: 1 Average: 5]

Leave a Comment