Cross-Platform File System Operations Based on ‘libuv’

Jim Hester

2018-03-05

fs

lifecycle Travis build status AppVeyor Build Status Coverage status

fs provides a cross-platform, uniform interface to file system operations. It shares the same back-end component as nodejs, the libuv C library, which brings the benefit of extensive real-world use and rigorous cross-platform testing. The name, and some of the interface, is partially inspired by Rust’s fs module.

Installation

You can install the released version of fs from CRAN with:

install.packages("fs")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("r-lib/fs")

Comparison vs base equivalents

fs functions smooth over some of the idiosyncrasies of file handling with base R functions:

Tidy paths

fs functions always return ‘tidy’ paths. Tidy paths

  • Always use / to delimit directories
  • never have multiple / or trailing /

Tidy paths are also coloured (if your terminal supports it) based on the file permissions and file type. This colouring can be customised or extended by setting the LS_COLORS environment variable, in the same output format as GNU dircolors.

Usage

fs functions are divided into four main categories:

Directories and links are special types of files, so file_ functions will generally also work when applied to a directory or link.

library(fs)

# list files in the current directory
dir_ls()
#> DESCRIPTION      LICENSE.md       NAMESPACE        NEWS.md          
#> R                README.Rmd       README.md        _pkgdown.yml     
#> appveyor.yml     codecov.yml      cran-comments.md docs             
#> fs.Rproj         inst             man              man-roxygen      
#> src              tests            vignettes

# create a new directory
tmp <- dir_create(file_temp())
tmp
#> /tmp/RtmpMLBzLW/file7c763d3ffc67

# create new files in that directory
file_create(path(tmp, "my-file.txt"))
dir_ls(tmp)
#> /tmp/RtmpMLBzLW/file7c763d3ffc67/my-file.txt

# remove files from the directory
file_delete(path(tmp, "my-file.txt"))
dir_ls(tmp)
#> character(0)

# remove the directory
dir_delete(tmp)

fs is designed to work well with the pipe, though because it is a minimal-dependency infrastructure package it doesn’t provide the pipe itself. You will need to attach magrittr or similar.

library(magrittr)

paths <- file_temp() %>%
  dir_create() %>%
  path(letters[1:5]) %>%
  file_create()
paths
#> /tmp/RtmpMLBzLW/file7c762a71251a/a /tmp/RtmpMLBzLW/file7c762a71251a/b 
#> /tmp/RtmpMLBzLW/file7c762a71251a/c /tmp/RtmpMLBzLW/file7c762a71251a/d 
#> /tmp/RtmpMLBzLW/file7c762a71251a/e

paths %>% file_delete()

fs functions also work well in conjunction with other tidyverse packages, like dplyr and purrr.

Some examples…

suppressMessages(
  library(tidyverse))

Filter files by type, permission and size

dir_info("src", recursive = FALSE) %>%
  filter(type == "file", permissions == "u+r", size > "10KB") %>%
  arrange(desc(size)) %>%
  select(path, permissions, size, modification_time)
#> # A tibble: 9 x 4
#>   path                permissions        size modification_time  
#>   <fs::path>          <fs::perms> <fs::bytes> <dttm>             
#> 1 src/RcppExports.o   rw-rw-r--         1.72M 2018-03-05 09:13:43
#> 2 src/dir.o           rw-rw-r--         1.02M 2018-03-05 09:13:37
#> 3 src/id.o            rw-rw-r--       707.09K 2018-03-05 09:13:23
#> 4 src/file.o          rw-rw-r--       573.33K 2018-03-05 09:13:25
#> 5 src/path.o          rw-rw-r--       476.63K 2018-03-05 09:13:31
#> 6 src/link.o          rw-rw-r--        415.2K 2018-03-05 09:13:28
#> 7 src/utils.o         rw-rw-r--       392.21K 2018-03-05 09:13:33
#> 8 src/error.o         rw-rw-r--        35.45K 2018-03-05 09:13:37
#> 9 src/RcppExports.cpp rw-rw-r--        11.01K 2018-03-05 09:08:25

Tabulate and display folder size.

dir_info("src", recursive = TRUE) %>%
  group_by(directory = path_dir(path)) %>%
  tally(wt = size, sort = TRUE)
#> # A tibble: 53 x 2
#>    directory                                        n
#>    <fs::path>                             <fs::bytes>
#>  1 src                                          5.33M
#>  2 src/libuv                                    2.47M
#>  3 src/libuv/test                             869.22K
#>  4 src/libuv/src/win                          683.14K
#>  5 src/libuv/src/unix                         518.71K
#>  6 src/unix                                   429.67K
#>  7 src/libuv/docs/src/static                  332.05K
#>  8 src/libuv/m4                               319.95K
#>  9 src/libuv/include                          192.33K
#> 10 src/libuv/docs/src/static/diagrams.key     191.74K
#> # ... with 43 more rows

Read a collection of files into one data frame.

dir_ls() returns a named vector, so it can be used directly with purrr::map_df(.id).

# Create separate files for each species
iris %>%
  split(.$Species) %>%
  map(select, -Species) %>%
  iwalk(~ write_tsv(.x, paste0(.y, ".tsv")))

# Show the files
iris_files <- dir_ls(glob = "*.tsv")
iris_files
#> setosa.tsv     versicolor.tsv virginica.tsv

# Read the data into a single table, including the filenames
iris_files %>%
  map_df(read_tsv, .id = "file", col_types = cols(), n_max = 2)
#> # A tibble: 6 x 5
#>   file           Sepal.Length Sepal.Width Petal.Length Petal.Width
#>   <chr>                 <dbl>       <dbl>        <dbl>       <dbl>
#> 1 setosa.tsv             5.10        3.50         1.40       0.200
#> 2 setosa.tsv             4.90        3.00         1.40       0.200
#> 3 versicolor.tsv         7.00        3.20         4.70       1.40 
#> 4 versicolor.tsv         6.40        3.20         4.50       1.50 
#> 5 virginica.tsv          6.30        3.30         6.00       2.50 
#> 6 virginica.tsv          5.80        2.70         5.10       1.90

file_delete(iris_files)

Feedback wanted!

We hope fs is a useful tool for both analysis scripts and packages. Please open GitHub issues for any feature requests or bugs.

In particular, we have found non-ASCII filenames in non-English locales on Windows to be especially tricky to reproduce and handle correctly. Feedback from users who use commonly have this situation is greatly appreciated.