commit | c8af66ab8ad7cd78557f0f9f5ef6a52fd46ee6dd | [log] [tgz] |
---|---|---|
author | Jeff King <peff@peff.net> | Thu Jul 26 03:21:05 2018 -0400 |
committer | Junio C Hamano <gitster@pobox.com> | Thu Jul 26 10:12:09 2018 -0700 |
tree | 39d67aa65cd39c4cc3ddea2fb6c9cb89328a438d | |
parent | b20a3cbb887be494a93d54858052f0a4f8df17c1 [diff] |
automatically ban strcpy() There are a few standard C functions (like strcpy) which are easy to misuse. E.g.: char path[PATH_MAX]; strcpy(path, arg); may overflow the "path" buffer. Sometimes there's an earlier constraint on the size of "arg", but even in such a case it's hard to verify that the code is correct. If the size really is unbounded, you're better off using a dynamic helper like strbuf: struct strbuf path = STRBUF_INIT; strbuf_addstr(path, arg); or if it really is bounded, then use xsnprintf to show your expectation (and get a run-time assertion): char path[PATH_MAX]; xsnprintf(path, sizeof(path), "%s", arg); which makes further auditing easier. We'd usually catch undesirable code like this in a review, but there's no automated enforcement. Adding that enforcement can help us be more consistent and save effort (and a round-trip) during review. This patch teaches the compiler to report an error when it sees strcpy (and will become a model for banning a few other functions). This has a few advantages over a separate linting tool: 1. We know it's run as part of a build cycle, so it's hard to ignore. Whereas an external linter is an extra step the developer needs to remember to do. 2. Likewise, it's basically free since the compiler is parsing the code anyway. 3. We know it's robust against false positives (unlike a grep-based linter). The two big disadvantages are: 1. We'll only check code that is actually compiled, so it may miss code that isn't triggered on your particular system. But since presumably people don't add new code without compiling it (and if they do, the banned function list is the least of their worries), we really only care about failing to clean up old code when adding new functions to the list. And that's easy enough to address with a manual audit when adding a new function (which is what I did for the functions here). 2. If this ends up generating false positives, it's going to be harder to disable (as opposed to a separate linter, which may have mechanisms for overriding a particular case). But the intent is to only ban functions which are obviously bad, and for which we accept using an alternative even when this particular use isn't buggy (e.g., the xsnprintf alternative above). The implementation here is simple: we'll define a macro for the banned function which replaces it with a reference to a descriptively named but undeclared identifier. Replacing it with any invalid code would work (since we just want to break compilation). But ideally we'd meet these goals: - it should be portable; ideally this would trigger everywhere, and does not need to be part of a DEVELOPER=1 setup (because unlike warnings which may depend on the compiler or system, this is a clear indicator of something wrong in the code). - it should generate a readable error that gives the developer a clue what happened - it should avoid generating too much other cruft that makes it hard to see the actual error - it should mention the original callsite in the error The output with this patch looks like this (using gcc 7, on a checkout with 022d2ac1f3 reverted, which removed the final strcpy from blame.c): CC builtin/blame.o In file included from ./git-compat-util.h:1246, from ./cache.h:4, from builtin/blame.c:8: builtin/blame.c: In function ‘cmd_blame’: ./banned.h:11:22: error: ‘sorry_strcpy_is_a_banned_function’ undeclared (first use in this function) #define BANNED(func) sorry_##func##_is_a_banned_function ^~~~~~ ./banned.h:14:21: note: in expansion of macro ‘BANNED’ #define strcpy(x,y) BANNED(strcpy) ^~~~~~ builtin/blame.c:1074:4: note: in expansion of macro ‘strcpy’ strcpy(repeated_meta_color, GIT_COLOR_CYAN); ^~~~~~ ./banned.h:11:22: note: each undeclared identifier is reported only once for each function it appears in #define BANNED(func) sorry_##func##_is_a_banned_function ^~~~~~ ./banned.h:14:21: note: in expansion of macro ‘BANNED’ #define strcpy(x,y) BANNED(strcpy) ^~~~~~ builtin/blame.c:1074:4: note: in expansion of macro ‘strcpy’ strcpy(repeated_meta_color, GIT_COLOR_CYAN); ^~~~~~ This prominently shows the phrase "strcpy is a banned function", along with the original callsite in blame.c and the location of the ban code in banned.h. Which should be enough to get even a developer seeing this for the first time pointed in the right direction. This doesn't match our ideals perfectly, but it's a pretty good balance. A few alternatives I tried: 1. Instead of using an undeclared variable, using an undeclared function. This shortens the message, because the "each undeclared identifier" message is not needed (and as you can see above, it triggers a separate mention of each of the expansion points). But it doesn't actually stop compilation unless you use -Werror=implicit-function-declaration in your CFLAGS. This is the case for DEVELOPER=1, but not for a default build (on the other hand, we'd eventually produce a link error pointing to the correct source line with the descriptive name). 2. The linux kernel uses a similar mechanism in its BUILD_BUG_ON_MSG(), where they actually declare the function but do so with gcc's error attribute. But that's not portable to other compilers (and it also runs afoul of our error() macro). We could make a gcc-specific technique and fallback on other compilers, but it's probably not worth the complexity. It also isn't significantly shorter than the error message shown above. 3. We could drop the BANNED() macro, which would shorten the number of lines in the error. But curiously, removing it (and just expanding strcpy directly to the bogus identifier) causes gcc _not_ to report the original line of code. So this strategy seems to be an acceptable mix of information, portability, simplicity, and robustness, without _too_ much extra clutter. I also tested it with clang, and it looks as good (actually, slightly less cluttered than with gcc). Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git is a fast, scalable, distributed revision control system with an unusually rich command set that provides both high-level operations and full access to internals.
Git is an Open Source project covered by the GNU General Public License version 2 (some parts of it are under different licenses, compatible with the GPLv2). It was originally written by Linus Torvalds with help of a group of hackers around the net.
Please read the file INSTALL for installation instructions.
Many Git online resources are accessible from https://git-scm.com/ including full documentation and Git related tools.
See Documentation/gittutorial.txt to get started, then see Documentation/giteveryday.txt for a useful minimum set of commands, and Documentation/git-.txt for documentation of each command. If git has been correctly installed, then the tutorial can also be read with man gittutorial
or git help tutorial
, and the documentation of each command with man git-<commandname>
or git help <commandname>
.
CVS users may also want to read Documentation/gitcvs-migration.txt (man gitcvs-migration
or git help cvs-migration
if git is installed).
The user discussion and development of Git take place on the Git mailing list -- everyone is welcome to post bug reports, feature requests, comments and patches to git@vger.kernel.org (read Documentation/SubmittingPatches for instructions on patch submission). To subscribe to the list, send an email with just “subscribe git” in the body to majordomo@vger.kernel.org. The mailing list archives are available at https://public-inbox.org/git/, http://marc.info/?l=git and other archival sites.
Issues which are security relevant should be disclosed privately to the Git Security mailing list git-security@googlegroups.com.
The maintainer frequently sends the “What's cooking” reports that list the current status of various development topics to the mailing list. The discussion following them give a good reference for project status, development direction and remaining tasks.
The name “git” was given by Linus Torvalds when he wrote the very first version. He described the tool as “the stupid content tracker” and the name as (depending on your mood):