Overview
Several years ago, I wrote a number of Linux kernel “cleanup” scripts that scanned the kernel source tree to identify obvious candidates for cleanup or simplification. The first few scripts are below, and they are admittedly imperfect as there is no way to consider every possible variation of what they look for, so they will almost certainly display some false positives. The point here is that Linux kernel “newbies” are welcome to examine these scripts, tweak them, run them, and submit organized and well-documented patches if they want to become first-time contributors to the kernel and be able to brag, “Hey, I'm in the Linux kernel Git log.”
Over the next several days, I'll be adding more cleanup scripts one at a time after I check each one over and perhaps tidy them up a bit, but be warned that many of them are just brute force searching, so you're expected to check the output. If you have any comments or want to improve the scripts, drop me a note at rpjday@crashcourse.ca.
So far, the cleanup scripts below are for:
- calculating the length of an array
- checking for testing for a power of 2
- identifying “bad” select directives in Kconfig files
- identifying “bad #if” preprocessor checks for non-existent Kconfig variables
- identifying allegedly “unused” header files
NOTE: Don't try to submit a mega-patch with as many similar patches as you possibly can; rather, submit patches on a subsystem by subsystem basis, to make the patches manageable so that they can be reviewed and approved by just the maintainers of that subsystem.
IMPORTANT: Do not accept the output from any of these or upcoming scripts verbatim. There is no question that these scripts cannot possibly take into account every conceivable variation being searched for, so treat the results with skepticism and do extra sanity checking to make sure your submitted improvements make sense. If you're unsure, check the Git log to see if there are previous commits that back up your interpretation of what you're seeing.
1. Calculating the length of an array
A lot of kernel code needs to calculate the length of an array, frequently to iterate through all of its elements. There are two standard ways to do this in C language:
sizeof(array) / (sizeof(array[0])) sizeof(array) / sizeof(*(array))
This shortcut is already defined in the kernel header file include/linux/array_size.h
:
/** * ARRAY_SIZE - get the number of elements in array @arr * @arr: array to be sized */ #define ARRAY_SIZE(arr) (sizeof(arr) / sizeof((arr)[0]) + __must_be_array(arr))
Here is a script that looks for expressions that appear to match that pattern (note how the script accepts an optional argument as to which directory is the target of the search):
#!/bin/sh DIR=${1-.} grep -Er "sizeof ?\(?([^\)]+)\)? ?/ ?sizeof ?\(?.*\1.*" ${DIR}
As a sample run, here's running the script on the kernel “arch/” directory:
$ arraysize.sh arch arch/powerpc/boot/types.h:#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) arch/powerpc/xmon/ppc-opc.c: sizeof (powerpc_opcodes) / sizeof (powerpc_opcodes[0]); arch/powerpc/xmon/ppc-opc.c: sizeof (vle_opcodes) / sizeof (vle_opcodes[0]); arch/powerpc/xmon/ppc-opc.c: sizeof (powerpc_macros) / sizeof (powerpc_macros[0]); arch/um/include/shared/user.h:#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) arch/um/kernel/config.c.in: for (i = 0; i < sizeof(config)/sizeof(config[0]); i++) arch/um/kernel/skas/stub_exe.c: .len = sizeof(filter) / sizeof(filter[0]), arch/s390/tools/gen_opcode_table.c: for (i = 0; i < sizeof(insn_type_table) / sizeof(insn_type_table[0]); i++) { arch/s390/tools/gen_facilities.c: for (i = 0; i < sizeof(facility_defs) / sizeof(facility_defs[0]); i++) arch/s390/include/asm/fpu.h: __save_fp_regs(fprs, sizeof(freg_t) / sizeof(freg_t)); arch/s390/include/asm/fpu.h: __load_fp_regs(fprs, sizeof(freg_t) / sizeof(freg_t)); arch/mips/boot/tools/relocs.h:#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0])) arch/mips/sgi-ip22/ip28-berr.c: for (i = 0; i < sizeof(hpc3)/sizeof(struct hpc3_stat); ++i) { arch/mips/sgi-ip22/ip28-berr.c: if (i < sizeof(hpc3)/sizeof(struct hpc3_stat)) { arch/x86/um/shared/sysdep/stub_32.h: for (int i = 0; i < sizeof(arch->tls) / sizeof(arch->tls[0]); i++) { arch/x86/boot/boot.h:#define ARRAY_SIZE(x) (sizeof(x) / sizeof(*(x))) arch/x86/tools/relocs.h:#define ARRAY_SIZE(x) (sizeof(x) / sizeof((x)[0]))
Note that there some obvious examples of what the script is looking for, as well as some false positives. Note also how various files insist on re-inventing the test in defining a macro that does what is already in that Linux header file.
2. Check if something is a power of two
Quite a lot of kernel code needs to check if an integer value is an exact power of two – the general test for that is already defined in the header file include/linux/log2.h
as follows:
/** * is_power_of_2() - check if a value is a power of two * @n: the value to check * * Determine whether some value is a power of two, where zero is * *not* considered a power of two. * Return: true if @n is a power of 2, otherwise false. */ static __always_inline __attribute__((const)) bool is_power_of_2(unsigned long n) { return (n != 0 && ((n & (n - 1)) == 0)); }
Because of the number of variations of that test involving parenthesization, the cleanup script just brute forces its way through a number of slight changes:
#!/bin/sh DIR=${1-*} echo "PATTERN: x & (x - 1):\n" grep -Ern "([^\(\)]+) ?\& ?\(\1 ?- ?1\)" ${DIR} echo "PATTERN: x & ((x) - 1):\n" grep -Ern "([^\(\)]+) ?\& ?\(\(\1\) ?- ?1\)" ${DIR} echo "PATTERN: (x) & (x - 1):\n" grep -Ern "\(([^\(\)]+)\) ?\& ?\(\1 ?- ?1\)" ${DIR} echo "PATTERN: (x) & ((x) - 1):\n" grep -Ern "\(([^\(\)]+)\) ?\& ?\(\(\1\) ?- ?1\)" ${DIR}
As a simple example, check the lib/
directory – here's the snipped output:
$ test_for_power_of_2.sh lib PATTERN: x & (x - 1): lib/zstd/compress/zstd_lazy.c:810: assert((align & (align - 1)) == 0); lib/zstd/compress/huf_compress.c:116: assert((align & (align - 1)) == 0); /* pow 2 */ lib/zstd/compress/zstd_compress_internal.h:1189: assert((maxDist & (maxDist - 1)) == 0); lib/zstd/common/compiler.h:181: return (u & (u-1)) == 0;
Note well that if you search for that expression, many tests are actually checking if the number in question is not a power of two, so make sure you notice the difference. Here's the output running that script against the arch/powerpc
directory:
arch/powerpc/sysdev/fsl_rio.c:317: if ((size & (size - 1)) != 0 || size > 0x400000000ULL) arch/powerpc/mm/book3s32/mmu.c:370: if (n_hpteg & (n_hpteg - 1)) { arch/powerpc/boot/cuboot-pq2.c:173: if (mem->size[1] & (mem->size[1] - 1)) arch/powerpc/boot/cuboot-pq2.c:175: if (io->size[1] & (io->size[1] - 1)) arch/powerpc/include/asm/bitops.h:93: return !(x & (x - 1)); arch/powerpc/platforms/44x/pci.c:166: if ((size & (size - 1)) != 0 || arch/powerpc/lib/rheap.c:258: if ((alignment & (alignment - 1)) != 0) arch/powerpc/lib/rheap.c:307: if ((alignment & (alignment - 1)) != 0) arch/powerpc/lib/rheap.c:450: if (size <= 0 || (alignment & (alignment - 1)) != 0)
so you need to be careful as to what you think any of that simplifies to.
3. Finding "bad" select directives in Kconfig files
Many kernel Kconfig files contain “select” directives, some of which are no longer relevant since the config entry they refer to was deleted previously, but the associated “select” directives were never removed. This is clearly not fatal, but it's still something that can be cleaned up.
Here is the current script that scans the tree (or a specified subdirectory) searching for select directives that don't seem to have any current value:
#!/bin/sh DIR=${1-*} DIRKS=$(find ${DIR} -name "Kconfig*") ALLKS=$(find . -name "Kconfig*") SELS=$(grep -h "^ select " ${DIRKS} | \ sed -e 's|.*select[ ]*\([^ ]*\).*|\1|' | \ sort -u) for sel in ${SELS} ; do grep -q "config[ ]*${sel}$" ${ALLKS} || { echo "===== ${sel}" grep -rwn ${sel} * } done
Running that script on the arch/
subdirectory produces the following potentially removable lines:
$ find_bad_selects.sh arch ===== ARCH_HAS_HOLES_MEMORYMODEL arch/arm/mach-omap1/Kconfig:7: select ARCH_HAS_HOLES_MEMORYMODEL ===== ARM_ERRATA_794072 arch/arm/mach-npcm/Kconfig:33: select ARM_ERRATA_794072 ===== ARM_SMC_MBOX arch/arm64/Kconfig.platforms:315: select ARM_SMC_MBOX ===== HAVE_LEGACY_CLK arch/sh/boards/Kconfig:13: select HAVE_LEGACY_CLK arch/mips/Kconfig:334: select HAVE_LEGACY_CLK arch/mips/Kconfig:475: select HAVE_LEGACY_CLK arch/m68k/Kconfig.cpu:33: select HAVE_LEGACY_CLK drivers/clk/Kconfig:12:config HAVE_LEGACY_CLK # TODO: Remove once all legacy users are migrated drivers/clk/Kconfig:23: depends on !HAVE_LEGACY_CLK ===== PINCTRL_MILBEAUT arch/arm/mach-milbeaut/Kconfig:16: select PINCTRL_MILBEAUT ===== USB_OHCI_SH arch/sh/Kconfig:335: select USB_OHCI_SH if USB_OHCI_HCD arch/sh/Kconfig:345: select USB_OHCI_SH if USB_OHCI_HCD arch/sh/Kconfig:430: select USB_OHCI_SH if USB_OHCI_HCD arch/sh/Kconfig:456: select USB_OHCI_SH if USB_OHCI_HCD
Run the same script against the “drivers” directory:
$ find_bad_selects.sh drivers ===== DRM_DEBUG_SELFTEST drivers/gpu/drm/i915/Kconfig.debug:53: select DRM_DEBUG_SELFTEST ===== DRM_KMS_DMA_HELPER drivers/gpu/drm/adp/Kconfig:9: select DRM_KMS_DMA_HELPER drivers/gpu/drm/logicvc/Kconfig:7: select DRM_KMS_DMA_HELPER ===== TEST_KUNIT_DEVICE_HELPERS drivers/iio/test/Kconfig:11: select TEST_KUNIT_DEVICE_HELPERS $
In other words, the above “select” directives look like they can be removed, but it's your responsibility to verify that.
4. Find "badif" CONFIG symbols
This check refers to “CONFIG_”-prefixed preprocessor symbols that are tested in the kernel source somewhere, but are not defined in any Kconfig file; that usually means that the symbol was once dropped from a Kconfig file, but the (now pointless) preprocessor tests are still being done.
As one example, here is the output generated by the script for the string “ACORNSCSI_CONSTANTS”:
>>>>> ACORNSCSI_CONSTANTS drivers/scsi/arm/acornscsi.c:92:#undef CONFIG_ACORNSCSI_CONSTANTS drivers/scsi/arm/acornscsi.c:393:#ifdef CONFIG_ACORNSCSI_CONSTANTS drivers/scsi/arm/acornscsi.c:471:#ifdef CONFIG_ACORNSCSI_CONSTANTS
The standard approach here would be to check carefully that there are no references to that string anywhere in the source tree, and check the Git log to see if/when that symbol was removed, and why, and clean it up.
Additional examples from the drivers/ directory:
>>>>> CRYPTO_DEV_ASPEED_HACE_CRYPTO_DEBUG drivers/crypto/aspeed/aspeed-hace-crypto.c:19:#ifdef CONFIG_CRYPTO_DEV_ASPEED_HACE_CRYPTO_DEBUG >>>>> DRM_AMD_DC_DP2_0 drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c:107:#if defined(CONFIG_DRM_AMD_DC_DP2_0) >>>>> DRM_XE_LMTT_2L_128GB drivers/gpu/drm/xe/xe_lmtt_2l.c:57:#if IS_ENABLED(CONFIG_DRM_XE_LMTT_2L_128GB) >>>>> FUSION_MAX_FC_SGE drivers/message/fusion/mptbase.h:180:#ifdef CONFIG_FUSION_MAX_FC_SGE drivers/message/fusion/mptbase.h:181:#if CONFIG_FUSION_MAX_FC_SGE < 16 drivers/message/fusion/mptbase.h:183:#elif CONFIG_FUSION_MAX_FC_SGE > 256 drivers/message/fusion/mptbase.h:186:#define MPT_SCSI_FC_SG_DEPTH CONFIG_FUSION_MAX_FC_SGE
However, there is a complication in that there are “CONFIG_”-prefixed variables that are not defined in any Kconfig file, but are defined in a Makefile instead:
>>>>> NCR53C8XX_PREFETCH drivers/scsi/ncr53c8xx.c:1779:#ifdef CONFIG_NCR53C8XX_PREFETCH drivers/scsi/Makefile:180: := -DCONFIG_NCR53C8XX_PREFETCH -DSCSI_NCR_BIG_ENDIAN \
This messes up what should have been a simple search, and it also seems to fly in the face of an old coding standard that the macro prefix “CONFIG_” should be reserved exclusively for Kconfig entries, but clearly there are exceptions, which is why the janitor needs to look carefully at the script output. Here's the script:
#!/bin/sh ################################################ # Make sure you install autoconf for "ifnames". ################################################ SCAN_DIR=${1-*} CVARS=$(find ${SCAN_DIR} -name "*.[ch]" | \ grep -v "mach-types.h" | \ xargs ifnames | \ grep "^CONFIG_" | \ cut -d' ' -f1 | \ sed "s/^CONFIG_//" | \ sed "s/_MODULE$//" | \ sort -u) ALL_KC_FILES=$(find . -name "Kconfig*") # # Scan the entire tree, just to see what turns up. # NOTE the extra grep to see if the CONFIG_ symbol # is defined as perhaps part of cflags in a Makefile. # for cv in ${CVARS} ; do egrep -q "^[[:space:]]*config[[:space:]]+${cv}\b" ${ALL_KC_FILES} || grep -q "^menuconfig *${cv}$" ${ALL_KC_FILES} || egrep -qr "^[[:space:]]*#[[:space:]]*define[[:space:]]+CONFIG_${cv}\b" * || { echo ">>>>> ${cv}" grep -rwn "CONFIG_${cv}" * | grep -v defconfig grep -rwn "${cv}" * | grep -v defconfig grep -rn -- "-DCONFIG_${cv}" * } done
5. Finding allegedly "unused" header files
“Unused” Linux kernel header files simply means header files that don't appear to be #included from anywhere in the kernel source tree. There could be all sorts of reasons for that.
One reason is that a source file was removed, but its associated supporting header file was overlooked and is still sitting there, now having no purpose in life. Another (quite common) reason is that many of those header files contain enums or macros for hex offsets for particular devices, so that even if nothing is including them at the moment, they still need to be preserved in case something needs all that content.
As a basic example of the current find_unused_headers.sh
script, let's have it check under the directory drivers/usb
:
===== phy-mv-usb.h ===== ./drivers/usb/phy/phy-mv-usb.h ===== sisusb_tables.h ===== ./drivers/usb/misc/sisusbvga/sisusb_tables.h
The above tells us simply that there are two header files under that directory that appear to not be included from anywhere in the Linux kernel source tree. Why that is would require taking a closer look, possibly checking the Git log regarding that header file, and so on; it does not mean you can simply submit a patch to delete that file.
Here is the admittedly brute force script find_unused_headers.sh
:
#!/bin/sh DIR=${1-*} LONGHDRS=$(find ${DIR} -name "*.h") HDRS="" for h in ${LONGHDRS} ; do HDRS="${HDRS} $(basename ${h})" done HDRS=$(for h in ${HDRS} ; do echo $h ; done | sort -u) # Test that each header file is included from *somewhere*. for h in ${HDRS} ; do # echo "Testing $h ..." egrep -rq ".*#.*include.*${h}" * || { echo "===== ${h} =====" find . -name "${h}" grep -rwH ${h} * } done
EXERCISE: Run the script on the drivers/gpu
directory to get far more output.