linux_kernel_cleanup

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
linux_kernel_cleanup [2025/06/30 17:52] – [Overview] rpjdaylinux_kernel_cleanup [2025/07/02 13:03] (current) – [5. Finding allegedly "unused" header files] rpjday
Line 1: Line 1:
 ==== Overview ==== ==== Overview ====
  
-Several years ago, I wrote a number of Linux kernel "cleanup" scripts that scanned the kernel source tree to identify obvious candidates for cleanup or simplification. The scripts are below, and they are admittedly imperfect as there is no way to consider every possible variation of what they look for, so they will almost certainly display some false positives.+Several years ago, I wrote a number of Linux kernel "cleanup" scripts that scanned the kernel source tree to identify obvious candidates for cleanup or simplification. The first few scripts are below, and they are admittedly imperfect as there is no way to consider every possible variation of what they look for, so they will almost certainly display some false positives. The point here is that Linux kernel "newbies" are welcome to examine these scripts, tweak them, run them, and submit organized and well-documented patches if they want to become first-time contributors to the kernel and be able to brag, "Hey, I'm in the Linux kernel Git log."
  
-The point here is that beginners to working with the Linux kernel are welcome to examine these scripts, tweak them, run them, and submit organized and well-documented patches if they want to become contributors to the kernel. +Over the next several days, I'll be adding more cleanup scripts one at a time after I check each one over and perhaps tidy them up a bit, but be warned that many of them are just brute force searching, so you're expected to check the output. If you have any comments or want to improve the scripts, drop me a note at rpjday@crashcourse.ca.
- +
-Over the next several days, I'll be adding the cleanup scripts one at a time after I check them over and perhaps tidy them up a bit, but be warned that many of them are just brute force searching, so you're expected to check the output. If you have any comments or want to improve the scripts, drop me a note at rpjday@crashcourse.ca.+
  
 So far, the cleanup scripts below are for: So far, the cleanup scripts below are for:
   * calculating the length of an array   * calculating the length of an array
-  * checking for testing for power of 2+  * checking for testing for power of 2
   * identifying "bad" select directives in Kconfig files   * identifying "bad" select directives in Kconfig files
 +  * identifying "bad #if" preprocessor checks for non-existent Kconfig variables
 +  * identifying allegedly "unused" header files
  
 NOTE: Don't try to submit a mega-patch with as many similar patches as you possibly can; rather, submit patches on a subsystem by subsystem basis, to make the patches manageable so that they can be reviewed and approved by just the maintainers of that subsystem. NOTE: Don't try to submit a mega-patch with as many similar patches as you possibly can; rather, submit patches on a subsystem by subsystem basis, to make the patches manageable so that they can be reviewed and approved by just the maintainers of that subsystem.
Line 16: Line 16:
 **//IMPORTANT//**: Do not accept the output from any of these or upcoming scripts verbatim. There is no question that these scripts cannot possibly take into account every conceivable variation being searched for, so treat the results with skepticism and do extra sanity checking to make sure your submitted improvements make sense. If you're unsure, check the Git log to see if there are previous commits that back up your interpretation of what you're seeing. **//IMPORTANT//**: Do not accept the output from any of these or upcoming scripts verbatim. There is no question that these scripts cannot possibly take into account every conceivable variation being searched for, so treat the results with skepticism and do extra sanity checking to make sure your submitted improvements make sense. If you're unsure, check the Git log to see if there are previous commits that back up your interpretation of what you're seeing.
  
-==== The cleanup scripts ==== +==== 1. Calculating the length of an array ====
- +
-=== 1. Calculating the length of an array ===+
  
 A lot of kernel code needs to calculate the length of an array, frequently to iterate through all of its elements. There are two standard ways to do this in C language: A lot of kernel code needs to calculate the length of an array, frequently to iterate through all of its elements. There are two standard ways to do this in C language:
Line 72: Line 70:
 Note that there some obvious examples of what the script is looking for, as well as some false positives. Note also how various files insist on re-inventing the test in defining a macro that does what is already in that Linux header file. Note that there some obvious examples of what the script is looking for, as well as some false positives. Note also how various files insist on re-inventing the test in defining a macro that does what is already in that Linux header file.
  
-=== 2. Check if something is a power of two ===+==== 2. Check if something is a power of two ====
  
 Quite a lot of kernel code needs to check if an integer value is an exact power of two -- the general test for that is already defined in the header file ''include/linux/log2.h'' as follows: Quite a lot of kernel code needs to check if an integer value is an exact power of two -- the general test for that is already defined in the header file ''include/linux/log2.h'' as follows:
Line 121: Line 119:
 </code> </code>
  
-Note well that if you search for that expression, many tests are actually checking if the number in question is **not** a power of two, so make sure you notice the difference.+Note well that if you search for that expression, many tests are actually checking if the number in question is **not** a power of two, so make sure you notice the difference. Here's the output running that script against the ''arch/powerpc'' directory:
  
-=== 3. Finding "bad" select directives in Kconfig files ===+<code> 
 +arch/powerpc/sysdev/fsl_rio.c:317: if ((size & (size - 1)) != 0 || size > 0x400000000ULL) 
 +arch/powerpc/mm/book3s32/mmu.c:370: if (n_hpteg & (n_hpteg - 1)) { 
 +arch/powerpc/boot/cuboot-pq2.c:173: if (mem->size[1] & (mem->size[1] - 1)) 
 +arch/powerpc/boot/cuboot-pq2.c:175: if (io->size[1] & (io->size[1] - 1)) 
 +arch/powerpc/include/asm/bitops.h:93: return !(x & (x - 1)); 
 +arch/powerpc/platforms/44x/pci.c:166: if ((size & (size - 1)) != 0  || 
 +arch/powerpc/lib/rheap.c:258: if ((alignment & (alignment - 1)) != 0) 
 +arch/powerpc/lib/rheap.c:307: if ((alignment & (alignment - 1)) != 0) 
 +arch/powerpc/lib/rheap.c:450: if (size <= 0 || (alignment & (alignment - 1)) != 0) 
 +</code> 
 + 
 +so you need to be careful as to what you think any of that simplifies to. 
 + 
 +==== 3. Finding "bad" select directives in Kconfig files ====
  
 Many kernel Kconfig files contain "select" directives, some of which are no longer relevant since the config entry they refer to was deleted previously, but the associated "select" directives were never removed. This is clearly not fatal, but it's still something that can be cleaned up. Many kernel Kconfig files contain "select" directives, some of which are no longer relevant since the config entry they refer to was deleted previously, but the associated "select" directives were never removed. This is clearly not fatal, but it's still something that can be cleaned up.
Line 174: Line 186:
 arch/sh/Kconfig:456: select USB_OHCI_SH if USB_OHCI_HCD arch/sh/Kconfig:456: select USB_OHCI_SH if USB_OHCI_HCD
 </code> </code>
 +
 +Run the same script against the "drivers" directory:
 +
 +<code>
 +$ find_bad_selects.sh drivers
 +===== DRM_DEBUG_SELFTEST
 +drivers/gpu/drm/i915/Kconfig.debug:53: select DRM_DEBUG_SELFTEST
 +===== DRM_KMS_DMA_HELPER
 +drivers/gpu/drm/adp/Kconfig:9: select DRM_KMS_DMA_HELPER
 +drivers/gpu/drm/logicvc/Kconfig:7: select DRM_KMS_DMA_HELPER
 +===== TEST_KUNIT_DEVICE_HELPERS
 +drivers/iio/test/Kconfig:11: select TEST_KUNIT_DEVICE_HELPERS
 +$
 +</code>
 +
 +In other words, the above "select" directives look like they can be removed, but it's your responsibility to verify that.
 +
 +==== 4. Find "badif" CONFIG symbols ====
 +
 +This check refers to "CONFIG_"-prefixed preprocessor symbols that are tested in the kernel source somewhere, but are not defined in any Kconfig file; that usually means that the symbol was once dropped from a Kconfig file, but the (now pointless) preprocessor tests are still being done.
 +
 +As one example, here is the output generated by the script for the string "ACORNSCSI_CONSTANTS":
 +
 +<code>
 +>>>>> ACORNSCSI_CONSTANTS
 +drivers/scsi/arm/acornscsi.c:92:#undef CONFIG_ACORNSCSI_CONSTANTS
 +drivers/scsi/arm/acornscsi.c:393:#ifdef CONFIG_ACORNSCSI_CONSTANTS
 +drivers/scsi/arm/acornscsi.c:471:#ifdef CONFIG_ACORNSCSI_CONSTANTS
 +</code>
 +
 +The standard approach here would be to check carefully that there are no references to that string anywhere in the source tree, and check the Git log to see if/when that symbol was removed, and why, and clean it up.
 +
 +Additional examples from the drivers/ directory:
 +
 +<code>
 +>>>>> CRYPTO_DEV_ASPEED_HACE_CRYPTO_DEBUG
 +drivers/crypto/aspeed/aspeed-hace-crypto.c:19:#ifdef CONFIG_CRYPTO_DEV_ASPEED_HACE_CRYPTO_DEBUG
 +>>>>> DRM_AMD_DC_DP2_0
 +drivers/gpu/drm/amd/display/dc/hwss/dcn35/dcn35_hwseq.c:107:#if defined(CONFIG_DRM_AMD_DC_DP2_0)
 +>>>>> DRM_XE_LMTT_2L_128GB
 +drivers/gpu/drm/xe/xe_lmtt_2l.c:57:#if IS_ENABLED(CONFIG_DRM_XE_LMTT_2L_128GB)
 +>>>>> FUSION_MAX_FC_SGE
 +drivers/message/fusion/mptbase.h:180:#ifdef CONFIG_FUSION_MAX_FC_SGE
 +drivers/message/fusion/mptbase.h:181:#if CONFIG_FUSION_MAX_FC_SGE  < 16
 +drivers/message/fusion/mptbase.h:183:#elif CONFIG_FUSION_MAX_FC_SGE  > 256
 +drivers/message/fusion/mptbase.h:186:#define MPT_SCSI_FC_SG_DEPTH CONFIG_FUSION_MAX_FC_SGE
 +</code>
 +
 +However, there is a complication in that there are "CONFIG_"-prefixed variables that are not defined in any Kconfig file, but are defined in a Makefile instead:
 +
 +<code>
 +>>>>> NCR53C8XX_PREFETCH
 +drivers/scsi/ncr53c8xx.c:1779:#ifdef CONFIG_NCR53C8XX_PREFETCH
 +drivers/scsi/Makefile:180:              := -DCONFIG_NCR53C8XX_PREFETCH -DSCSI_NCR_BIG_ENDIAN \
 +</code>
 +
 +This messes up what should have been a simple search, and it also seems to fly in the face of an old coding standard that the macro prefix "CONFIG_" should be reserved exclusively for Kconfig entries, but clearly there are exceptions, which is why the janitor needs to look carefully at the script output. Here's the script:
 +
 +<code>
 +#!/bin/sh
 +
 +################################################
 +#  Make sure you install autoconf for "ifnames".
 +################################################
 +
 +SCAN_DIR=${1-*}
 +
 +CVARS=$(find ${SCAN_DIR} -name "*.[ch]" |    \
 +        grep -v "mach-types.h" |        \
 +        xargs ifnames |                 \
 +        grep "^CONFIG_" |               \
 +        cut -d' ' -f1 |                 \
 +        sed "s/^CONFIG_//" |            \
 +        sed "s/_MODULE$//" |            \
 +        sort -u)
 +
 +ALL_KC_FILES=$(find . -name "Kconfig*")
 +
 +#
 +#  Scan the entire tree, just to see what turns up.
 +#  NOTE the extra grep to see if the CONFIG_ symbol
 +#  is defined as perhaps part of cflags in a Makefile.
 +#
 +
 +for cv in ${CVARS} ; do
 +        egrep -q "^[[:space:]]*config[[:space:]]+${cv}\b" ${ALL_KC_FILES} ||
 +        grep -q "^menuconfig *${cv}$" ${ALL_KC_FILES} ||
 +        egrep -qr "^[[:space:]]*#[[:space:]]*define[[:space:]]+CONFIG_${cv}\b" * || {
 +                echo ">>>>> ${cv}"
 +                grep -rwn "CONFIG_${cv}" * | grep -v defconfig
 +                grep -rwn "${cv}" * | grep -v defconfig
 +                grep -rn -- "-DCONFIG_${cv}" *
 +        }
 +done
 +</code>
 +
 +==== 5. Finding allegedly "unused" header files ====
 +
 +"Unused" Linux kernel header files simply means header files that don't appear to be #included from anywhere in the kernel source tree. There could be all sorts of reasons for that.
 +
 +One reason is that a source file was removed, but its associated supporting header file was overlooked and is still sitting there, now having no purpose in life. Another (quite common) reason is that many of those header files contain enums or macros for hex offsets for particular devices, so that even if nothing is including them at the moment, they still need to be preserved in case something needs all that content.
 +
 +As a basic example of the current ''find_unused_headers.sh'' script, let's have it check under the directory ''drivers/usb'':
 +
 +<code>
 +===== phy-mv-usb.h =====
 +./drivers/usb/phy/phy-mv-usb.h
 +===== sisusb_tables.h =====
 +./drivers/usb/misc/sisusbvga/sisusb_tables.h
 +</code>
 +
 +The above tells us simply that there are two header files under that directory that appear to not be included from //anywhere// in the Linux kernel source tree. Why that is would require taking a closer look, possibly checking the Git log regarding that header file, and so on; it does **not** mean you can simply submit a patch to delete that file.
 +
 +Here is the admittedly brute force script ''find_unused_headers.sh'':
 +
 +<code>
 +#!/bin/sh
 +
 +DIR=${1-*}
 +
 +LONGHDRS=$(find ${DIR} -name "*.h")
 +
 +HDRS=""
 +
 +for h in ${LONGHDRS} ; do
 +        HDRS="${HDRS} $(basename ${h})"
 +done
 +
 +HDRS=$(for h in ${HDRS} ; do echo $h ; done | sort -u)
 +
 +#  Test that each header file is included from *somewhere*.
 +
 +for h in ${HDRS} ; do
 +        # echo "Testing $h ..."
 +        egrep -rq ".*#.*include.*${h}" * || {
 +                echo "===== ${h} ====="
 +                find . -name "${h}"
 +                grep -rwH ${h} *
 +        }
 +done
 +</code>
 +
 +**EXERCISE**: Run the script on the ''drivers/gpu'' directory to get far more output.
  • linux_kernel_cleanup.1751305964.txt.gz
  • Last modified: 2025/06/30 17:52
  • by rpjday