linux-imx/include/linux/namei.h
Linus Torvalds 42bd2af595
vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
"The definition of insanity is doing the same thing over and over
    again and expecting different results”

We've tried to do this before, most recently with commit bb2314b479
("fs: Allow unprivileged linkat(..., AT_EMPTY_PATH) aka flink") about a
decade ago.

But the effort goes back even further than that, eg this thread back
from 1998 that is so old that we don't even have it archived in lore:

    https://lkml.org/lkml/1998/3/10/108

which also points out some of the reasons why it's dangerous.

Or, how about then in 2003:

    https://lkml.org/lkml/2003/4/6/112

where we went through some of the same arguments, just wirh different
people involved.

In particular, having access to a file descriptor does not necessarily
mean that you have access to the path that was used for lookup, and
there may be very good reasons why you absolutely must not have access
to a path to said file.

For example, if we were passed a file descriptor from the outside into
some limited environment (think chroot, but also user namespaces etc) a
'flink()' system call could now make that file visible inside a context
where it's not supposed to be visible.

In the process the user may also be able to re-open it with permissions
that the original file descriptor did not have (eg a read-only file
descriptor may be associated with an underlying file that is writable).

Another variation on this is if somebody else (typically root) opens a
file in a directory that is not accessible to others, and passes the
file descriptor on as a read-only file.  Again, the access to the file
descriptor does not imply that you should have access to a path to the
file in the filesystem.

So while we have tried this several times in the past, it never works.

The last time we did this, that commit bb2314b479 quickly got reverted
again in commit f0cc6ffb8c (Revert "fs: Allow unprivileged linkat(...,
AT_EMPTY_PATH) aka flink"), with a note saying "We may re-do this once
the whole discussion about the interface is done".

Well, the discussion is long done, and didn't come to any resolution.
There's no question that 'flink()' would be a useful operation, but it's
a dangerous one.

However, it does turn out that since 2008 (commit d76b0d9b2d87: "CRED:
Use creds in file structs") we have had a fairly straightforward way to
check whether the file descriptor was opened by the same credentials as
the credentials of the flink().

That allows the most common patterns that people want to use, which tend
to be to either open the source carefully (ie using the openat2()
RESOLVE_xyz flags, and/or checking ownership with fstat() before
linking), or to use O_TMPFILE and fill in the file contents before it's
exposed to the world with linkat().

But it also means that if the file descriptor was opened by somebody
else, or we've gone through a credentials change since, the operation no
longer works (unless we have CAP_DAC_READ_SEARCH capabilities in the
opener's user namespace, as before).

Note that the credential equality check is done by using pointer
equality, which means that it's not enough that you have effectively the
same user - they have to be literally identical, since our credentials
are using copy-on-write semantics.

So you can't change your credentials to something else and try to change
it back to the same ones between the open() and the linkat().  This is
not meant to be some kind of generic permission check, this is literally
meant as a "the open and link calls are 'atomic' wrt user credentials"
check.

It also means that you can't just move things between namespaces,
because the credentials aren't just a list of uid's and gid's: they
includes the pointer to the user_ns that the capabilities are relative
to.

So let's try this one more time and see if maybe this approach ends up
being workable after all.

Cc: Andrew Lutomirski <luto@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Peter Anvin <hpa@zytor.com>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://lore.kernel.org/r/20240411001012.12513-1-torvalds@linux-foundation.org
[brauner: relax capability check to opener of the file]
Link: https://lore.kernel.org/all/20231113-undenkbar-gediegen-efde5f1c34bc@brauner
Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-04-13 11:33:58 +02:00

145 lines
5.6 KiB
C

/* SPDX-License-Identifier: GPL-2.0 */
#ifndef _LINUX_NAMEI_H
#define _LINUX_NAMEI_H
#include <linux/fs.h>
#include <linux/kernel.h>
#include <linux/path.h>
#include <linux/fcntl.h>
#include <linux/errno.h>
enum { MAX_NESTED_LINKS = 8 };
#define MAXSYMLINKS 40
/*
* Type of the last component on LOOKUP_PARENT
*/
enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT};
/* pathwalk mode */
#define LOOKUP_FOLLOW 0x0001 /* follow links at the end */
#define LOOKUP_DIRECTORY 0x0002 /* require a directory */
#define LOOKUP_AUTOMOUNT 0x0004 /* force terminal automount */
#define LOOKUP_EMPTY 0x4000 /* accept empty path [user_... only] */
#define LOOKUP_DOWN 0x8000 /* follow mounts in the starting point */
#define LOOKUP_MOUNTPOINT 0x0080 /* follow mounts in the end */
#define LOOKUP_REVAL 0x0020 /* tell ->d_revalidate() to trust no cache */
#define LOOKUP_RCU 0x0040 /* RCU pathwalk mode; semi-internal */
/* These tell filesystem methods that we are dealing with the final component... */
#define LOOKUP_OPEN 0x0100 /* ... in open */
#define LOOKUP_CREATE 0x0200 /* ... in object creation */
#define LOOKUP_EXCL 0x0400 /* ... in exclusive creation */
#define LOOKUP_RENAME_TARGET 0x0800 /* ... in destination of rename() */
/* internal use only */
#define LOOKUP_PARENT 0x0010
/* Scoping flags for lookup. */
#define LOOKUP_NO_SYMLINKS 0x010000 /* No symlink crossing. */
#define LOOKUP_NO_MAGICLINKS 0x020000 /* No nd_jump_link() crossing. */
#define LOOKUP_NO_XDEV 0x040000 /* No mountpoint crossing. */
#define LOOKUP_BENEATH 0x080000 /* No escaping from starting point. */
#define LOOKUP_IN_ROOT 0x100000 /* Treat dirfd as fs root. */
#define LOOKUP_CACHED 0x200000 /* Only do cached lookup */
#define LOOKUP_LINKAT_EMPTY 0x400000 /* Linkat request with empty path. */
/* LOOKUP_* flags which do scope-related checks based on the dirfd. */
#define LOOKUP_IS_SCOPED (LOOKUP_BENEATH | LOOKUP_IN_ROOT)
extern int path_pts(struct path *path);
extern int user_path_at_empty(int, const char __user *, unsigned, struct path *, int *empty);
static inline int user_path_at(int dfd, const char __user *name, unsigned flags,
struct path *path)
{
return user_path_at_empty(dfd, name, flags, path, NULL);
}
struct dentry *lookup_one_qstr_excl(const struct qstr *name,
struct dentry *base,
unsigned int flags);
extern int kern_path(const char *, unsigned, struct path *);
extern struct dentry *kern_path_create(int, const char *, struct path *, unsigned int);
extern struct dentry *user_path_create(int, const char __user *, struct path *, unsigned int);
extern void done_path_create(struct path *, struct dentry *);
extern struct dentry *kern_path_locked(const char *, struct path *);
extern struct dentry *user_path_locked_at(int , const char __user *, struct path *);
int vfs_path_parent_lookup(struct filename *filename, unsigned int flags,
struct path *parent, struct qstr *last, int *type,
const struct path *root);
int vfs_path_lookup(struct dentry *, struct vfsmount *, const char *,
unsigned int, struct path *);
extern struct dentry *try_lookup_one_len(const char *, struct dentry *, int);
extern struct dentry *lookup_one_len(const char *, struct dentry *, int);
extern struct dentry *lookup_one_len_unlocked(const char *, struct dentry *, int);
extern struct dentry *lookup_positive_unlocked(const char *, struct dentry *, int);
struct dentry *lookup_one(struct mnt_idmap *, const char *, struct dentry *, int);
struct dentry *lookup_one_unlocked(struct mnt_idmap *idmap,
const char *name, struct dentry *base,
int len);
struct dentry *lookup_one_positive_unlocked(struct mnt_idmap *idmap,
const char *name,
struct dentry *base, int len);
extern int follow_down_one(struct path *);
extern int follow_down(struct path *path, unsigned int flags);
extern int follow_up(struct path *);
extern struct dentry *lock_rename(struct dentry *, struct dentry *);
extern struct dentry *lock_rename_child(struct dentry *, struct dentry *);
extern void unlock_rename(struct dentry *, struct dentry *);
/**
* mode_strip_umask - handle vfs umask stripping
* @dir: parent directory of the new inode
* @mode: mode of the new inode to be created in @dir
*
* In most filesystems, umask stripping depends on whether or not the
* filesystem supports POSIX ACLs. If the filesystem doesn't support it umask
* stripping is done directly in here. If the filesystem does support POSIX
* ACLs umask stripping is deferred until the filesystem calls
* posix_acl_create().
*
* Some filesystems (like NFSv4) also want to avoid umask stripping by the
* VFS, but don't support POSIX ACLs. Those filesystems can set SB_I_NOUMASK
* to get this effect without declaring that they support POSIX ACLs.
*
* Returns: mode
*/
static inline umode_t __must_check mode_strip_umask(const struct inode *dir, umode_t mode)
{
if (!IS_POSIXACL(dir) && !(dir->i_sb->s_iflags & SB_I_NOUMASK))
mode &= ~current_umask();
return mode;
}
extern int __must_check nd_jump_link(const struct path *path);
static inline void nd_terminate_link(void *name, size_t len, size_t maxlen)
{
((char *) name)[min(len, maxlen)] = '\0';
}
/**
* retry_estale - determine whether the caller should retry an operation
* @error: the error that would currently be returned
* @flags: flags being used for next lookup attempt
*
* Check to see if the error code was -ESTALE, and then determine whether
* to retry the call based on whether "flags" already has LOOKUP_REVAL set.
*
* Returns true if the caller should try the operation again.
*/
static inline bool
retry_estale(const long error, const unsigned int flags)
{
return unlikely(error == -ESTALE && !(flags & LOOKUP_REVAL));
}
#endif /* _LINUX_NAMEI_H */