An issue was opened https://github.com/r-lib/fs/issues/95 in the fs
package describing an unexpected error code (ENOENT
rather than EEXIST
)
when copying the same file to the same destination twice. Tracking down the
cause of this turned out to be one of the weirdest bugs I have seen.
The fs package provides filesystem operations for the R language, it is built on top of filesystem functions in libuv. Libuv in turn relies on the native system calls for each operating system it supports.
The first step in dealing with any bug is reproducing it. The original issue had a reprex (which is great!) and in this case I was able to simplify it to the following
fs::file_create("a")
fs::file_create("b")
fs::file_copy("a", "b")
#> Error: [ENOENT] Failed to copy 'a' to 'b': no such file or directory
The error being the error code ENOENT
, which corresponds to an errno
of
2
. The first interesting bit to this issue is that this incorrect error code
only occurred the first time the copy was attempted in an R session.
Subsequent attempts always returned the proper EEXIST
error code.
fs::file_create("a")
fs::file_create("b")
try(fs::file_copy("a", "b"))
#> Error : [ENOENT] Failed to copy 'a' to 'b': no such file or directory
try(fs::file_copy("a", "b"))
#> Error : [EEXIST] Failed to copy 'a' to 'b': file already exists
Usually if the behavior of code is different the first time it is run it is caused by failing to initialize an object. As always the first code to look at when debugging an error is your own, luckily in this case it had straightforward logic that looked correct.
file_copy <- function(path, new_path, overwrite = FALSE) {
# TODO: copy attributes, e.g. cp -p?
assert_no_missing(path)
assert_no_missing(new_path)
copyfile_(path_expand(path), path_expand(new_path), isTRUE(overwrite))
invisible(path_tidy(new_path))
}
void copyfile_(CharacterVector path, CharacterVector new_path, bool overwrite) {
for (R_xlen_t i = 0; i < Rf_xlength(path); ++i) {
uv_fs_t req;
const char* p = CHAR(STRING_ELT(path, i));
const char* n = CHAR(STRING_ELT(new_path, i));
uv_fs_copyfile(
uv_default_loop(),
&req,
p,
n,
!overwrite ? UV_FS_COPYFILE_EXCL : 0,
NULL);
stop_for_error2(req, "Failed to copy '%s' to '%s'", p, n);
uv_fs_req_cleanup(&req);
}
}
Essentially we iterate over the old and new paths in turn and pass them to
uv_fs_copyfile()
, specifying the UV_FS_COPYFILE_EXCL
option to indicate we
want to fail the copy if the destination file already exists.
My first thought was there was a race condition when creating the file and copying, however we observe the same behavior if the files exist prior to opening the R session, so that hypothesis was quickly ruled out.
The next idea was there was a bug in libuv, so lets go look at the
implementation for uv_fs_copyfile
. Luckily for us the
implementation it is also very short on macOS.
static ssize_t uv__fs_copyfile(uv_fs_t* req) {
#if defined(__APPLE__) && !TARGET_OS_IPHONE
/* On macOS, use the native copyfile(3). */
copyfile_flags_t flags;
flags = COPYFILE_ALL;
if (req->flags & UV_FS_COPYFILE_EXCL)
flags |= COPYFILE_EXCL;
return copyfile(req->path, req->new_path, NULL, flags);
So this is a very short shim over the system function copyfile(3) with
the COPYFILE_EXCL
flag.
Therefore we can rule out the issue being libuv by calling copyfile(3)
directly in our package.
void copyfile_(CharacterVector path, CharacterVector new_path, bool overwrite) {
for (R_xlen_t i = 0; i < Rf_xlength(path); ++i) {
uv_fs_t req;
const char* p = CHAR(STRING_ELT(path, i));
const char* n = CHAR(STRING_ELT(new_path, i));
int res = copyfile("a", "b", NULL, COPYFILE_ALL | COPYFILE_EXCL);
Rcpp::Rcout << res << ':' << errno << '\n';
}
}
Unfortunately we see the same behavior from this function the first time it is
called in a new R session, the errno
is set to 2
(ENOENT
) instead of 17
(EEXIST
) and is 17
as expected the rest of the time. This rules out libuv
as the culprit.
So the next idea, maybe it is a weird interaction with Rcpp or C++? To test
this we can create a new copyfile R package, that uses only C and
copyfile(3)
. Unfortunately after running the resultatnt
copyfile::copyfile()
in a new session the issue was still present. So this
must not be an issue with C++ or Rcpp.
Next, maybe it is an R issue, lets try reproduce it with a standalone C program.
#include <copyfile.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
int main(int argc, char **argv) {
errno = 0;
int res = copyfile(argv[1], argv[2], NULL, COPYFILE_ALL | COPYFILE_EXCL);
strerror(errno);
if (res == -1) {
printf("%i:%i:%s", res, errno, strerror(errno));
}
return errno;
}
In this case it does not reproduce the error, we always get 17
from errno
regardless of which call it is.
So this leaves only a few possibilities. It is a genuine issue in R, an issue in the
implementation of copyfile(3)
, an issue elsewhere in the OS, or a compiler
bug.
R does not use copyfile()
itself
and does not seem to change
errno so
it seemed unlikely to be the culprit.
Apple releases the source code to some of its utilities and system functions,
copyfile(3)
is one of them, so we can look at the source copyfile-138.
One thing to note in the source is there is logging being used by the
copyfile(3)
function. So lets see if the system logs can tell us anything
more.
If we run log stream --predicate 'processImpagePath ENDSWITH "R"'
we can
capture any system logs from R processes. Then we can run our
copyfile::copyfile()
function in another window. Doing this gives us this
interesting output from the log.
> log stream --predicate 'processImagePath ENDSWITH "R"'
Filtering the log data using "processImagePath ENDSWITH "R""
Timestamp Thread Type Activity PID
2018-03-30 12:23:45.134161-0400 0x4925ce Default 0x0 75117 R: (libcopyfile.dylib) open on b: File exists
But this output from R.
copyfile::copyfile()
#> -1:2:No such file or directory
So the errno
is set correctly when it is logged (File exists
), but changed
(No such file or directory
) when copyfile(3)
returns. The other clue from
the log is the log message “open on b:”. If we search for “open on” in the
copyfile(3)
source code we find two instances.
if ((s->src_fd = open(s->src, O_RDONLY | osrc , 0)) < 0)
{
copyfile_warn("open on %s", s->src);
return -1;
}
Then a little further down
copyfile_warn("open on %s", s->dst);
return -1;
Because b
is the destination filename that means the latter one
must be where the log occurred. Because the function returns immediately after
the logging and the errno
is correct when the log is printed maybe the errno
is being
changed in the logging function????
If we look at the copyfile_warn()
definition we see it is
# define copyfile_warn(str, ...) syslog(LOG_WARNING, str ": %m", ## __VA_ARGS__)
Which seems innocuous enough. However exploring the available sources we can see
there are multiple versions of the copyfile source available, if we use a
higher version number copyfile-146 (aside: does anyone know what macOS versions
correspond to these numbers?) we see the following definition of
copyfile_warn()
.
// These macros preserve the value of errno.
#ifndef _COPYFILE_TEST
# define copyfile_warn(str, ...) \
do { \
errno_t _errsv = errno; \
syslog(LOG_WARNING, str ": %m", ## __VA_ARGS__); \
errno = _errsv; \
} while (0)```
So it seems as though the code has been changed at some point to preserve the
errno
; indicating that previous versions did not preserve this in all cases
exactly the behavior we are seeing!
We can then test this hypothesis by disabling logging for our R process and seeing if it fixes the bug.
First we need to get the R PID
Sys.getpid()
#> [1] 77310
Then we can disable logging on the command line for that PID
sudo log config --process=77310 --mode 'level:off'
Last we can run our test command
copyfile::copyfile()
#> -1:17:File exists
SUCCESS!!! With logging turned off we get the expected error number 17
and
proper error message File exists
.
So after this lengthy exploration this seems like a bug in either the
copyfile()
or syslog()
implementations, and may possibly be
fixed in newer versions of macOS than I am using (10.12.6 Sierra). This was
truly one of the weirder bugs I have encountered, which is why I took the time
to document it. I am still not entirely sure the exact cause, but we have
enough evidence to show the likely culprit is the system logging.