GCC gOlogy: studying the impact of optimizations on debugging
-g-Ology, or gOlogy, stands for the study of how optimization levels
(selected by -O flags) affect the quality of debugging information
(enabled by -g flags). This report assesses the theoretical and
practical impact of various optimizations available in the GNU
Compiler Collection version 8 on the debugging experience of
applications compiled by it. The goal is to assess the quality of the
debug information generated by GCC with optimization enabled, document
the effects of optimization passes, and identify and document problems
and opportunities to improve it.
GCC offers various optimization levels, from -O0 to -O3, plus -Og,
-Osize and -Ofast, and way over a hundred independently-controllable
optimization flags. Each of the optimization levels enables a subset
of the optimization flags; enabling debugging information generation,
on the other hand, is not supposed to have any effect whatsoever on
the executable code. This report focuses on flags that are enabled by
the -O* options, and their effects on (extended) DWARF debug
information generated by GCC.
This report is structured as follows. The introduction outlines how
GCC gets from source code to output assembly code and debug
information, the major internal representation forms used throughout
compilation, and several techniques used by GCC to keep track of the
mapping from internal representations and output executable code to
corresponding source code concepts. Then, the bulk of the report goes
through each of the -O flags, and in each of them, through the
optimization passes that are enabled or affected by the -O flag,
describing the general behavior of the pass and what effects it may
have on debug information. The final section highlights and
consolidates the most relevant findings.
Introduction
In GCC, language front ends parse a translation unit and deliver to
the so-called middle end a number of functions (procedures, methods,
subprograms) to compile in a form that, although language-independent,
closely resembles a parse tree. Each function then goes through a
number of passes, some of which are only executed when certain
optimization flags are enabled, or other conditions are met.
The tree form is turned into gimple form, in which each function
amounts to a set of basic blocks in a control flow graph, each
containing a sequence of stmts represented as tuples. A stmt may be a
label definition, a simple assignment, a function call, a conditional
or unconditional branch, an asm statement, debug binds or markers, or
other less common forms. Scalar variables are versioned and converted
to static single assignment (SSA) form, in which each reference to a
variable takes a version that links it back to a single definition of
that variable version. Additional definitions, called PHI nodes, may
be introduced at confluence basic block, indicating which version is
to be taken when arriving from each incoming block. This is the form
in which most of the optimization passes in GCC take place.
Each function is then expanded to the register transfer language (RTL)
form, in which basic blocks are now formed by a sequence of insns,
each one corresponding to a machine instruction defined in the target
back end, or other machine-independent forms such as debug binds and
markers, notes and other forms not relevant for this report. Each
insn may contain zero or more computations represented as SETs (one of
which may set PC to indicate a branch), a CALL, an ASM, and indicators
that additional registers or memory can be used or unpredictably
modified. Scalar variables are initially assigned to
pseudo-registers, and many RTL optimization passes operate in this
form. Register allocation will then map each remaining
pseudo-register to a hardware register (if optimizing) or a stack
slot, adding spills and reloads as needed to satisfy the requirements
of each hardware instruction.
A few RTL passes run after register allocation, and at the end
assembly code is output for each insn, while outputting debug
information that is to be interspersed with the assembly code, and
gathering debug information that is consolidated and output
afterwards.
Preserving debug information
There was a time when debugging required disabling optimizations.
Debug information formats back then could only assign a single
location to each variable, and optimizing out the frame pointer would
remove the base reference for all stack-based variables.
GCC has long had the notion that enabling debug information should not
cause any changes to executable code. To that end, each stmt and insn
carries source location information, i.e., file and line (and, more
recently, column) numbers and lexical blocks, even when debug
information is not enabled. Without optimization, this makes for
single-stepping in a debugger just in the natural order of execution,
and all variables are assigned stable memory locations, which makes
for a single location per variable throughout its lifetime.
Optimizations introduce complications, combining, simplifying and
removing computations, modifying the order of execution, reusing
registers and stack slots, duplicating portions of code, introducing
alternate induction variables and modifying the iteration order in
loop nests. Compiler and debug information formats have evolved over
time so as to enable optimized programs to be represented and
debugged, with varying levels of success.
For example, automatic variables in optimized programs may live in a
register for some time, another register at another time, and a stack
slot at other times. DWARF debug information supports location lists,
that may indicate a different location for a variable for different,
possibly-overlapping executable code ranges. Memory references in
gimple and RTL forms carry symbolic expressions used for alias
analysis, and also to build location lists; SSA versions, RTL
pseudo-registers and hardware registers also carry symbolic references
to the variables they refer to. The variable tracking pass
identifies, using such symbolic references, situations in which the
location of a variable varies throughout its lifetime, and arranges
for location lists to be output accordingly.
As location expressions gained the ability to represent value
expressions, it became possible to indicate that in a certain range a
variable holds a known constant value, or that its value is not
available directly, but it can be computed from other locations.
Variable tracking at assignments extended variable tracking,
introducing debug binds early in compilation that associate a scalar
source variable with the location in which its value is stored,
arranging for the location/value expressions to be adjusted throughout
the compilation (even if computations are removed or moved past the
binds, so that the bound value expressions remain accurate) while
preserving their natural execution order, and using such binds to
generate location lists.
Although each stmt and insn carries source location information, as
they're shuffled by optimization, single-stepping may go back to
earlier statements, and it becomes impossible to tell when the effects
of a statement are complete. Statement Frontier Notes (SFN) are
introduced as additional debug notes, emitted (so far only by C and
C++ parsers) in the stmt stream to mark the beginning of logical
statements, thus after any debug binds associated with previous
statements take effect. Their natural execution order is retained by
the compiler, so the markers can be used to output source location
information marked as recommended stop points (the is_stmt flag in
DWARF line number tables), avoiding bouncing and making for
predictable observability of side effects.
Given optimization, it is not uncommon for no executable code to
remain between inspection points for multiple neighbor statements.
This was a problem because, although multiple source locations can be
associated with a single address in the line number table, ranges in
location lists could only name addresses of executable instructions.
Location view (LVu) numbering was introduced to identify each of the
entries in the line number table that refer to the same code address,
so that they can then be referenced unambiguously in location lists.
The representation of such extended location lists requires extensions
proposed for DWARF v6, and at the time of this writing, there aren't
any debuggers that support such extended location lists. Still, since
GCC makes the information available and we expect debuggers to catch
up eventually, the analyses that follow assume the disambiguation
given by LVu is effective in masking the optimization effects it was
created to overcome.
Despite all this effort, it is not realistic to expect the debug
experience of a program without optimization to be the same as that of
a program optimized even by optimizations regarded as not affecting
debugging. For example, a variable assigned to an exclusive stack
slot will be available throughout a function, but optimization may
assign it to a register during its limited live range, and then it
won't be possible to inspect it elsewhere. Setting breakpoints based
on addresses of executable code may not work as effectively in
optimized programs, because the same spot of the program may have been
duplicated by optimization, and then the breakpoint may not hit where
expected. Having the value of a variable available in a given
locations, say its stack slot, does not guarantee it is possible to
modify it, say it could have just been loaded into a register, that
may then be modified by the program and stored back in the stack slot;
this might happen even without optimization, but the windows for this
possibility are narrower. Furthermore, folding that logically follows
from reasoning about what is known about a variable at compile time
may no longer be applicable if the variable is modified in the
debugger; if a block was removed because the condition guarding it was
provably false at compile time, changing a variable so that the
condition would evaluate to true will not bring back the code that was
optimized out.
So, inspecting variables in optimized programs is more likely to yield
"optimized out" because optimizations may expose dead ranges that are
not noticed with -O0, and modifying them may always conflict with
optimizations. As for breakpoints, using source locations rather than
code addresses is less likely to yield surprising results.
Optimizations
In this section, each optimization level is detailed, enumerating the
flags incrementally enabled by it over the previous level, and
detailing the effects on debugging brought about by each of the
optimization levels and flags.
Optimization levels form a nearly-strict crescendo in terms of passes
they activate: -O0, -Og, -O1, -Os, -O2,
-O3, -Ofast.
Nevertheless, determining when a pass is run is an involved process.
Each pass has a gate function, that decides whether to run the pass
based on optimization levels and flags. The default_options_table
array in gcc/opts.c arranges for flags to be enabled depending on the
optimization level, but some flags are enabled by default through
their initializer in e.g. gcc/common.opt. Some are also forced
enabled or disabled depending on other conditions. However, even if
the gate condition of a pass is enabled, it might not run if any
enclosing pass group fails its own gate condition.
The following outline depicts the optimization passes GCC goes through
while compiling a function, in the order they might run; the
information is extracted from gcc/passes.def. Indentation indicates
grouping of the indented passes within the previous less-indented pass
group. Parameters for the pass are indicated between parentheses
after the pass name.
all_lowering_passes:
pass_warn_unused_result
pass_diagnose_omp_blocks
pass_diagnose_tm_blocks
pass_lower_omp see -O1
pass_lower_cf
pass_lower_tm
pass_refactor_eh
pass_lower_eh see -Og
pass_build_cfg
pass_warn_function_return
pass_expand_omp see -Og, and -O1
pass_sprintf_length(!fold_return_value)
pass_walloca(strict_mode)
pass_build_cgraph_edges
all_small_ipa_passes:
pass_ipa_free_lang_data
pass_ipa_function_and_variable_visibility
pass_ipa_chkp_versioning
pass_ipa_chkp_early_produce_thunks
pass_build_ssa_passes:
pass_fixup_cfg
pass_build_ssa
pass_warn_nonnull_compare
pass_ubsan
pass_early_warn_uninitialized
pass_nothrow
pass_rebuild_cgraph_edges
pass_chkp_instrumentation_passes:
pass_fixup_cfg
pass_chkp
pass_rebuild_cgraph_edges
pass_local_optimization_passes:
pass_fixup_cfg
pass_rebuild_cgraph_edges
pass_local_fn_summary
pass_early_inline
pass_all_early_optimizations:
pass_remove_cgraph_callee_edges
pass_object_sizes(insert_min_max)
pass_ccp(!nonzero) see also --tree-bit-ccp, and --ipa-bit-cp
pass_forwprop
pass_early_thread_jumps
pass_sra_early
pass_build_ealias
pass_fre
pass_early_vrp
pass_merge_phi
pass_dse
pass_cd_dce see also --tree-dce(aggressive)
pass_early_ipa_sra
pass_tail_recursion
pass_convert_switch
pass_cleanup_eh see -Og
pass_profile see --guess-branch-probability
pass_local_pure_const
pass_split_functions
pass_strip_predict_hints
pass_release_ssa_names
pass_rebuild_cgraph_edges
pass_local_fn_summary
pass_ipa_oacc:
pass_ipa_pta
pass_ipa_oacc_kernels:
pass_oacc_kernels:
pass_ch
pass_fre see above
pass_lim
pass_dominator(!may_peel_loop_headers)
pass_dce
pass_parallelize_loops(oacc_kernels)
pass_expand_omp_ssa see -Og, and -O1
pass_rebuild_cgraph_edges
pass_target_clone
pass_ipa_chkp_produce_thunks
pass_ipa_auto_profile
pass_ipa_tree_profile:
pass_feedback_split_functions
pass_ipa_free_fn_summary(small)
pass_ipa_increase_alignment
pass_ipa_tm
pass_ipa_lower_emutls
all_regular_ipa_passes:
pass_ipa_whole_program_visibility
pass_ipa_profile
pass_ipa_icf
pass_ipa_devirt see also --devirtualize-speculatively
pass_ipa_cp see also --ipa-bit-cp, --ipa-vrp, and --ipa-cp-clone
pass_ipa_cdtor_merge
pass_ipa_hsa
pass_ipa_fn_summary
pass_ipa_inline see -Og, --inline-functions-called-once, --inline-small-functions, --indirect-inlining, -Os, -O2, and -O3
pass_ipa_pure_const
pass_ipa_free_fn_summary(!small)
pass_ipa_reference
pass_ipa_comdats
all_late_ipa_passes:
pass_materialize_all_clones
pass_ipa_pta
pass_omp_simd_clone
all_passes:
pass_fixup_cfg
pass_lower_eh_dispatch
pass_oacc_device_lower
pass_omp_device_lower
pass_omp_target_link
pass_all_optimizations:
pass_remove_cgraph_callee_edges
pass_strip_predict_hints
pass_ccp(nonzero) see above
pass_post_ipa_warn
pass_complete_unrolli see also --tree-loop-ivcanon
pass_backprop
pass_phiprop
pass_forwprop see above
pass_object_sizes(!insert_min_max)
pass_build_alias
pass_return_slot
pass_fre see above
pass_merge_phi see above
pass_thread_jumps
pass_vrp(warn_array_bounds)
pass_chkp_opt
pass_dce see above
pass_stdarg
pass_call_cdce
pass_cselim
pass_copy_prop
pass_tree_ifcombine
pass_merge_phi see above
pass_phiopt see also --hoist-adjacent-loads
pass_tail_recursion see above
pass_ch see above
pass_lower_complex
pass_sra
pass_thread_jumps see above
pass_dominator(may_peel_loop_headers) see above
pass_isolate_erroneous_paths
pass_phi_only_cprop
pass_dse see above
pass_reassoc(insert_powi)
pass_dce see above
pass_forwprop see above
pass_phiopt see above
pass_ccp(nonzero) see above
pass_cse_sincos
pass_optimize_bswap
pass_laddress
pass_lim see above
pass_walloca(!strict_mode)
pass_pre see also --code-hoisting, --tree-tail-merge, and --tree-partial-pre
pass_sink_code
pass_sancov
pass_asan
pass_tsan
pass_dce see above
pass_fix_loops
pass_tree_loop:
pass_tree_loop_init
pass_tree_unswitch
pass_scev_cprop
pass_loop_split
pass_loop_jam
pass_cd_dce see above
pass_iv_canon
pass_loop_distribution see also --tree-loop-distribute-patterns
pass_linterchange
pass_copy_prop see above
pass_graphite:
pass_graphite_transforms
pass_lim see above
pass_copy_prop see above
pass_dce see above
pass_parallelize_loops(!oacc_kernels)
pass_expand_omp_ssa see above
pass_ch_vect see also --tree-loop-vectorize
pass_if_conversion
pass_vectorize: see also --vect-cost-model=cheap, and --vect-cost-model=dynamic
pass_dce see above
pass_predcom
pass_complete_unroll see also --tree-loop-ivcanon, and --peel-loops
pass_slp_vectorize see also --vect-cost-model=cheap, and --vect-cost-model=dynamic
pass_loop_prefetch
pass_iv_optimize
pass_lim see above
pass_tree_loop_done
pass_tree_no_loop:
pass_slp_vectorize see above
pass_simduid_cleanup
pass_lower_vector_ssa see -Og
pass_cse_reciprocals
pass_sprintf_length(fold_return_value)
pass_reassoc(!insert_powi) see above
pass_strength_reduction see also --expensive-optimizations
pass_split_paths
pass_tracer
pass_thread_jumps see above
pass_dominator(!may_peel_loop_headers) see above
pass_strlen
pass_thread_jumps see above
pass_vrp(!warn_array_bounds) see above
pass_warn_restrict
pass_phi_only_cprop see above
pass_dse see above
pass_cd_dce see above
pass_forwprop see above
pass_phiopt see above
pass_fold_builtins see -Og, and --inline-atomics
pass_optimize_widening_mul
pass_store_merging
pass_tail_calls
pass_dce see above
pass_split_crit_edges
pass_late_warn_uninitialized
pass_uncprop
pass_local_pure_const see above
pass_all_optimizations_g:
pass_remove_cgraph_callee_edges
pass_strip_predict_hints
pass_lower_complex
pass_lower_vector_ssa see above
pass_ccp(nonzero) see above
pass_post_ipa_warn
pass_object_sizes
pass_fold_builtins see above
pass_sprintf_length(fold_return_value)
pass_copy_prop see above
pass_dce see above
pass_sancov
pass_asan
pass_tsan
pass_split_crit_edges see above
pass_late_warn_uninitialized
pass_uncprop see above
pass_local_pure_const see above
pass_tm_init:
pass_tm_mark
pass_tm_memopt
pass_tm_edges
pass_simduid_cleanup
pass_vtable_verify
pass_lower_vaarg
pass_lower_vector see -Og
pass_lower_complex_O0
pass_sancov_O0
pass_lower_switch
pass_asan_O0
pass_tsan_O0
pass_sanopt
pass_cleanup_eh see above
pass_lower_resx
pass_nrv
pass_cleanup_cfg_post_optimizing
pass_warn_function_noreturn
pass_gen_hsail
pass_expand see -Og, --tree-coalesce-vars, --tree-ter, --defer-pop, and --expensive-optimizations
pass_rest_of_compilation:
pass_instantiate_virtual_regs
pass_into_cfg_layout_mode
pass_jump see -Og, and --thread-jumps
pass_lower_subreg
pass_df_initialize_opt see -Og
pass_cse see also --expensive-optimizations, --rerun-cse-after-loop, and --cse-follow-jumps
pass_rtl_fwprop
pass_rtl_cprop
pass_rtl_pre
pass_rtl_hoist
pass_rtl_cprop see above
pass_rtl_store_motion
pass_cse_after_global_opts see also --cse-follow-jumps
pass_rtl_ifcvt
pass_reginfo_init
pass_loop2:
pass_rtl_loop_init
pass_rtl_move_loop_invariants see also -Og
pass_rtl_unroll_loops
pass_rtl_doloop
pass_rtl_loop_done
pass_web
pass_rtl_cprop see above
pass_cse2 see also --cse-follow-jumps
pass_rtl_dse1
pass_rtl_fwprop_addr
pass_inc_dec
pass_initialize_regs
pass_ud_rtl_dce
pass_combine see also --expensive-optimizations
pass_if_after_combine
pass_partition_blocks
pass_outof_cfg_layout_mode
pass_split_all_insns
pass_lower_subreg2
pass_df_initialize_no_opt
pass_stack_ptr_mod
pass_mode_switching
pass_match_asm_constraints
pass_sms
pass_live_range_shrinkage
pass_sched
pass_early_remat
pass_ira see -Og, --ira-share-save-slots, --omit-frame-pointer, -Os, --expensive-optimizations, --caller-saves, --ipa-ra, and --lra-remat
pass_reload see -Og, and --expensive-optimizations
pass_postreload:
pass_postreload_cse
pass_gcse2
pass_split_after_reload
pass_ree
pass_compare_elim_after_reload
pass_branch_target_load_optimize1
pass_thread_prologue_and_epilogue see -Og, and --shrink-wrap
pass_rtl_dse2
pass_stack_adjustments
pass_jump2 see --crossjumping
pass_duplicate_computed_gotos
pass_sched_fusion
pass_peephole2
pass_if_after_reload
pass_regrename
pass_cprop_hardreg
pass_fast_rtl_dce see also Og
pass_reorder_blocks see also --reorder-blocks-algorithm=stc
pass_branch_target_load_optimize2
pass_leaf_regs
pass_split_before_sched2
pass_sched2
pass_stack_regs:
pass_split_before_regstack
pass_stack_regs_run
pass_late_compilation:
pass_compute_alignments see --align-loops, --align-jumps, --align-labels, and --align-functions
pass_variable_tracking
pass_free_cfg
pass_machine_reorg
pass_cleanup_barriers
pass_delay_slots
pass_split_for_shorten_branches
pass_convert_to_eh_region_ranges
pass_shorten_branches see -Og
pass_set_nothrow_function_flags
pass_dwarf2_frame
pass_final see -Og, --peephole, and --ipa-ra
pass_df_finish
pass_clean_state
Before optimizations, the program is parsed so as to build
a tree representation, that is then gimplified.
Some optimization passes run such cleanup passes as TODO_cleanup_cfg, TODO_rebuild_alias, and
TODO_remove_unused_locals.
There are other flags that affect too many passes to mention, such as
--strict-aliasing, --merge-constants and --fast-math, or that
cannot be associated with any optimization pass, such as
--reorder-functions.
-O0: optimize=0
Disable optimization.
This flag sets optimization level to 0. This is the base level, the
golden standard for the debugging experience, against which other
levels are compared. All automatic variables and parameters are
allocated to memory, being loaded and, if modified, stored back, at
every use. All branches and labels are preserved, and no blocks are
duplicated. Functions are not inlined, except for mandatory inlines,
e.g., functions marked with attribute always_inline. Source locations
preserved from branches or returns only in CFG edges are materialized
as NOPs.
-Og: optimize=1 + debug
Perform only very fast optimizations with low impact on debugging.
This flag sets the optimization level to 1, but limited by an option
for better debugging that disables a number of optimizations, even
some that would otherwise be enabled at optimization level 1.
Optimization enables the selection of the local dynamic TLS model to
access thread-local variables known to be defined in the dynamic
module being compiled. Without that, the global dynamic TLS model is
used instead, but this change has no effect on debugging.
Type conversions attempt to substitute conversions to float of results
of standard calls that return double to calls that return float.
Likewise, conversions to integral types of results of standard calls
that return double (e.g. round, logb) are converted to calls that
return integral types (lround, ilogb). These only affect debugging
inasmuch as the behavior of the substituted functions is to be
inspected.
Small changes in the processing of nested functions that enable frame
structs and static chains to be optimized away, without impact on
debugging, and in representing variable-length arrays in nested
functions, which may lose some details about the types.
Some OpenMP primitives may also be simplified when optimization is
enabled. These are internal implementation details, so they shouldn't
affect debugging.
Gimple EH lowering decisions change with optimization, but finally
regions may be duplicated either way, and with the same minor effects
on debugging: different code addresses for the same source code lines.
Critical edges are also split to ease optimizations, and later unsplit
if they remain.
Optimization affects slightly the way variables and parameters are
remapped when inlining, but these changes have their effects on debug
information masked away.
When optimizing, various passes run cleanups of the control flow
graph. This may delete unreachable blocks and trivially dead insns
like unused sets or copies to self. In gimple mode, the removal of
unreachable blocks may propagate SSA defs to uses, but it is hard to
imagine that any uses thereof will be reachable, so there should be no
impact on debugging. Removed blocks may be missed during debugging:
breakpoints can't be set in removed blocks. Cleanup may renumber
basic blocks, detect forwarder blocks, remove unused labels and
fallthrough forwarder blocks, merge blocks with unconditional
fallthrough, replace jumps to returns or jumps with copies of the
targets, simplify conditional jumps and remove single-destination
jumps. The removal of fallthrough forwarder blocks may discard debug
binds and markers, which could make single-stepping or breaking at the
source locations represented by the removed markers impossible. Binds
might also be lost, though at least in gimple there will often be
redundant binds at confluence points, shortly thereafter. A similar
negative effect arises when a jump is replaced with a return or
another jump, bypassing any debug markers and binds at the original
target's block.
When optimizing, NOPs that would materialize CFG edge source locations
are not inserted, and extra steps that preserve source locations
during gimplification of jumps and labels are not taken. If
corresponding debug markers are also dropped, this may remove the
possibility of stopping at some goto.
Optimization enables unused local variables and lexical blocks to be
released early; it may cause variables and scopes that cannot ever be
entered to be omitted altogether from debug information.
Optimization enables the named return value pass, that detects
functions that return aggregate types in memory, always returning the
same local variable, and unifies that variable with the result, using
the name and source location of the variable, and mapping all uses of
the variable to the result. This may have an effect on debugging if
the variable happens to be taken from an inlined function: in this
case, the source name and location mapping is skipped, because it
would introduce a name not present in the original function, but the
variable is still remapped to the return declaration, so the source
location of the variable's declaration is lost.
Optimization enables a pass that combines calls to sin, cos and cexpi
with the same SSA operand into a single dominating cexpi call, taking
the real or imaginary part of the result at each former sin or cos
call. This pass also attempts to simplify pow, powi and cabs calls.
None of these affect debugging, aside from the ability to step into
any of the affected math function calls.
With optimization, a pass that simplifies memcpy to memset if the
copied-from range is known to be all zeros, some stdarg calls to
simple pointer operations if va_list is a simple pointer type, and
other similar transformations that do not affect debugging, aside from
stepping into or breaking at simplified functions.
Optimization enables attempts to optimize divide and modulus
operations on vectors of integral types into combinations of vector
multiply, shift, and add. It also enables attempts to optimize
initialization of vectors to avoid piecewise initialization. None of
these affect debugging.
Enabling optimization changes defer_stack_allocation behavior, but its
effect on debugging is limited to narrowing the live ranges of dead
values.
It also enables reordering of operations in expand, so that those
requiring more operands are performed first. This reordering does not
involve memory-modifying operations, and debug binds cover affected
cases, so it does not affect debugging.
Expand also introduces plenty of pseudos when optimizing, which allows
replacement of common subexpressions and whatnot. Conversely,
gimplification introduces more temporaries when not optimizing, and it
attempts to reuse temporaries when optimizing. The effects on
debugging are limited to variations in variable location assignments.
The jump and pro_and_epilogue RTL passes run cleanup_cfg with
CLEANUP_EXPENSIVE, given optimize. This performs some more expensive
block merging, and simplification of conditional jumps around jumps.
The merging has no effect on debugging (indeed, it could reduce the
loss of debug markers and binds if done on forwarder blocks), whereas
the simplification might drop markers and binds along with the jumps,
with impact on debugging similar to that of the other jump
simplifications.
Several RTL optimization passes also use dataflow analysis to update
notes about unused register definitions, as well as death points of
registers. Debug binds that reference registers after their death
points or unused sets are detected during this analysis, and debug
temporaries are introduced next to the death points to preserve the
equivalent expressions for use in the debug binds. This generally
improves the debugging experience, enabling bind expressions to resort
to the equivalences to express the values bound to user variables even
if the register is reused for another purpose and no longer holds the
value.
The first CSE (common subexpression elimination) pass is enabled when
optimizing. The effects of this pass are described under
--rerun-cse-after-loop. A third CSE pass may be
activated with --rerun-cse-after-global-opts.
Depending on the selected register allocation model, optimization
changes register pressure cost estimates in the RTL loop analyzers,
but that's not something that changes the kinds of optimizations made
there, or the kinds of impacts on debugging they may have.
Optimization enables the init-regs pass, that adds zero-initialization
for pseudos before uninitialized uses, without effects on debugging.
Optimization enables combine, a pass that performs arithmetic
substitution of single-use pseudo-set insns into others. After
successful substitution, insns become useless and are removed, but if
their values are still used in debug binds, the binds are updated
accordingly, and markers ensure the bind effects are still visible.
Therefore, this pass has no effect on debugging.
It also changes the default register allocation region setting,
without effects on debugging.
Optimization enables reload inheritance and removal of redundant
reload stores, without effects on debugging.
Additional insn splitting passes are enabled after reload when
optimizing, without any effects on debugging; any impact would have
been brought about by later splitting passes anyway.
Several RTL optimization passes run a fast dead code elimination subpass,
at the end of the live registers dataflow analysis, as long as --dce
is enabled; see --dce(fast) for details.
Optimization enables variable tracking, debug binds and markers, to
try to mask the effects of optimizations on debugging. They are not
needed without optimization.
When optimizing, insn lengths are estimated with multiple passes that
grow lengths as needed, which may result in shorter variants, without
effects on debugging.
Final may discard redundant compares when optimizing. It also links
back single-use labels to jumps to them, for use in machine-specific
transformations such as SH's constant pool placement. These
transformations have no effect on debugging.
Enable SSA-CCP optimization on trees.
Conditional constant propagation attempts to determine the value of
conditions that control conditional branches. It may simplify (fold)
some calls and assigns into constant assignments, and turn conditional
branches into unconditional ones, possibly dropping blocks that become
unreachable.
The most significant effect on the debugging experience is that
setting breakpoints at certain source code ranges may become
impossible as the blocks containing them are dropped. The extra
folding might make additional lines not be represented by any
instructions, but SFN provides markers to stand for them, and VTA and
LVu ensure the effects of the optimized-away code can be inspected
even without remaining instructions, so the overall impact of this
pass on the debugging information is likely negligible.
Enable Full Redundancy Elimination (FRE) on trees.
This pass uses value numbering to identify and remove redundant SSA
computations, replacing them with previously-computed results, while
also propagating copies, removing dead computations, folding
computations, and resolving conditional branches and indirect calls.
Changes are only relevant for debugging sessions that would modify
variables to create situations that wouldn't normally arise at
runtime. The substitutions and folding have no effect on debugging,
unless variables are changed in the debugger so as to break the
equivalences. Stmt removals are masked by debug binds, markers and
views. Resolving conditional branches may remove entire blocks if
they aren't reachable to begin with, but the consequent inability to
set breakpoints on them could be surprising, especially if the
debugging session were to change variables so as to try to force the
execution of the unreachable block. Resolving indirect calls to
direct ones might also surprise attempts to modify pointers in a debug
session, attempting to cause a different function to be called.
Enable dead store elimination.
This pass removes stores and mem* calls that modify memory that is
overwritten without intervening reads. Addressable variables, that
might be modified by such removed stmts, are not tracked by debug
binds, so debugging sessions might be confusing as expected effects of
removed dead stores will not be observable.
--guess-branch-probability: pass_profile
Enable guessing of branch probabilities.
No effect on debugging per se.
Enable loop header copying on trees.
This pass copies loop headers, turning the copies into entry tests.
Debug binds in the copied blocks are also copied to the post-loop
block, modeling the binds introduced after PHI nodes when entering
SSA. With those additional bindings, duplicating the header blocks
does not impact debugging significantly within the copied blocks or
after them. One possibly confusing consequence is that setting a
breakpoint at the current program counter, while single-stepping the
loop entry test, will not break at subsequent iterations, and
vice-versa. This is unlikely to be surprising, and setting
breakpoints by line overcomes this effect. User labels, that would
not be present in the copy, could make for further confusion, but if
they provide for additional edges into the loop header, they will
actually stop the transformation from taking place.
When --tree-loop-vectorize is enabled, another
ch_vect pass is activated, that differs from the regular ch pass only
in deciding which loops are to undergo such header copying, so both
passes have essentially the same effects on debugging.
Enable SSA dead code elimination optimization on trees.
This may remove assignments, branches and even some calls that are
deemed unused/dead. Dead assignments are propagated into debug stmts
before removal, which makes the removal itself not to affect
debugging. Dead branches may cause entire blocks to be removed,
making any expectation of stepping through or setting breakpoints at
such blocks during debugging impossible to meet. Pure or const calls,
as well as malloc and free pairs that are deemed dead may be removed,
frustrating expectations of stepping into them during debugging.
Perform interprocedural profile propagation.
This pass propagates execution frequencies from callers to callees.
Also, upon identifying the target of an indirect call from execution
profiles, it introduces a speculative direct call that can then be
inlined or otherwise optimized. None of this affects debugging.
Discover pure and const functions.
Detect and mark functions on whether or not they have side effects,
loop, or throw, and propagate the information to decide about callers.
This, by itself, has no effect on debugging, but it may enable the
elision of calls that would return the same value, without any other
side effects, of functions that are not explicitly marked as pure or
const, and this elision may be slightly confusing for debugging, as
such functions may be called (and hit breakpoints) fewer times than
expected, and stepping into elided calls will not be possible.
Discover readonly and non addressable static variables.
This pass analyses how static variables are used by functions, and
propagates the gathered information to callers, so that it can be used
in later optimizations. There aren't any effects on debugging.
Enable copy propagation on trees.
This pass identifies and simplifies expressions based on copy-related
SSA names. This may unify multiple variables into a single location,
in ranges in which they take up equivalent values, making it
impossible to modify them independently in the debugger. The
identification of such equivalences may also resolve conditional
branches to unconditional ones, removing entire basic blocks and the
possibility of overriding the conditions in the debugger.
Enable SSA code sinking on trees.
This pass moves statements down the control flow, closer to uses
thereof, when it may be profitable, and removes them when they are
unused. As the DEF is removed from a position that dominates a debug
bind, the bind is adjusted, masking the effects on debugging, at least
as far as scalars are concerned. Addressable variables are not
subject to value tracking in debug binds, and so the delaying of
stores may actually be observable during debugging.
Perform straight-line strength reduction.
This pass replaces computations involving multiplies into ones
involving adds, in some cases introducing additional temporaries. In
the end, trackable variables end up getting the same values, just
computed in a different way, so this does not affect debugging.
Enable SSA coalescing of user variables.
This flag allows the compiler to assign to a single pseudo-register
SSA versions originally created for different user variables. With
the aid of debug binds, this has very little effect on debugging: the
impact is limited to early loss of values expected to be about to be
overwritten, e.g. when an earlier value of a variable is already dead,
and the location holding it is overwritten by a value computed for a
temporary or for another variable, before being copied to the former
variable. Between the computation point and the binding point,
attempting to inspect the variable may indicate it is optimized out at
that point, which is perfectly accurate, if undesirable from a
debugging perspective.
Replace temporary expressions in the SSA->normal pass.
This substitutes singly-used SSA defs into their single (non-debug)
uses for expand to have larger expressions to select insns from.
Debug binds may end up with more complex expressions than needed,
bound before the actual computation of the larger expression takes
place, but this does not affect debugging.
Defer popping functions args from stack until later.
No effect on debugging.
Split wide types into independent registers.
This flag enables two RTL lowering passes that explode wide-mode
pseudos into multiple word-mode ones. In many cases this modifies
insns in place, but it occasionally emits multiple insns to replace a
single one. In no such case does it affect debugging. Such splitting
may be performed on user variables, and although we can represent
variable locations with independent locations for different fragments,
such wide variables do not always get debug binds at assignments for
tracking throughout compilation. Location inference from DECLs
associated with REGs and MEMs is used for fragments of such variables
instead, which does correctly identify locations, but not necessarily
at points of the program that reflect the recommended inspection
points. This may cause debugging sessions to observe changes to such
variables too early or too late, which can make debugging confusing.
Adding debug binds for the fragments, and arranging for GCC to
aggregate them back, might get more accurate information, but since
this would be done at such a late stage, it is possible that the binds
would be introduced at points that do not satisfy the usual
expectation that side effects would take place between the markers
immediately before and after the assignment. There are also issues
with dismembered aggregates, mentioned under --tree-sra,
that would likely affect such split variables as well.
Perform a forward propagation pass on RTL.
These RTL passes replace uses of a pseudo with its single reaching
definition. This in itself has no impact on debugging. If a pseudo
is propagated into all uses, it will become unused, but then it will
have been substituted into debug binds as well and, if not, the unused
def might end up preserved as a debug temp. There is a possibility
that, by propagating a pseudo, it becomes dead earlier, and then,
after register allocation, debug binds that referenced it while it was
still set end up finding the register reused for other purposes
earlier than without this transformation. Since the propagation found
the source of the definition was available all the way to the
propagation point, and the equivalence between the propagated pseudo
and its definition is noted by the variable tracking machinery at the
definition point, it is very likely that an alternate expression for
the register value will be found.
Use the RTL dead store elimination pass.
This flag is enabled by default, but it's only activated when
optimizing. The RTL passes enabled by it remove stores in memory that
are overwritten without intervening reads, that store the same value
as the previous store, or that write a value to the stack that is not
read before the function returns. Since it affects addressable
variables, global or local, debug binds do not apply, and so the
effects of removing these stores are going to be noticeable in
debugging, except for the redundant stores.
Generate auto-inc/dec instructions.
The flag is enabled by default, but it's only activated when
optimizing, and when the target architecture supports auto inc or auto
dec addressing modes.
It detects insns that add or subtract a constant or pseudo from a
pseudo before or after the pseudo or a copy thereof is used in a
memory reference, and it attempts to turn the memory address into a
pre- or post-inc, -dec or -mod addressing mode. This may cause one of
the pseudos to change earlier or later than expected, and although
this is only done when the pseudo is not otherwise used between the
original and modified modification insns, debug binds between them are
not adjusted, so they will bind to the wrong value, and when the
pseudo is modified even that incorrect location may be lost.
--ira-share-save-slots: pass_ira
Share slots for saving different hard registers.
The flag is enabled by default, but it's only activated when
optimizing. It allows registers whose lifetimes do not overlap to be
saved in the same slot across calls. This could shorten the apparent
live range of variables, making them unavailable at spots in which
they might be in the absence of this flag.
--omit-frame-pointer: pass_ira
When possible do not generate stack frames.
This flag attempts to avoid reserving and using a register as a frame
pointer, using stack pointer-relative addresses as needed. A frame
pointer register used to be essential for debugging, but call frame
information obviated it: it is now irrelevant for this purpose, and
this optimization has no effect on debugging.
Perform comparison elimination after register allocation has finished.
This pass removes redundant compare insns, relying on insns that set
flags as side effects instead. It has no effect on debugging.
Emit function prologues only before parts of the function that need it,
rather than at the top of the function.
This pass attempts to inserts the prologue sequence at a later point
than the entry point, which may involve duplicating some blocks and
moving non-prologue early insns down to other blocks. The moved insns
are simple enough that debug binds can be adjusted and mask the moves,
so it does not affect debugging. Block duplication has little to no
impact on debugging, though breakpoints set based on code addresses,
rather than on logical locations, may notice the difference. The
later prologue may confuse debuggers that assume the end of the
epilogue, noted in debug information, marks the beginning of user
code: such debuggers will likely be significantly affected by this
optimization.
Looks for opportunities to reduce stack adjustments and stack references.
This flag consolidates consecutive stack allocations, consecutive
stack deallocations, or deallocations followed by allocations, within
single blocks, adjusting stack pointer-relative addresses as needed.
It has no effect on debugging.
Perform a register copy-propagation optimization pass.
This pass only replaces (pseudos assigned to) hard regs in SET_SRCs
with earlier-defined equivalent values, and removes noop moves.
Substitutions are made in debug bind insns too. So, aside from noop
moves that stood for source lines on their own in non-SFN settings,
this shouldn't affect the debugging experience in any way.
Use the RTL dead code elimination pass.
This flag is enabled by default, but the fast rtl_dce pass is only
activated when optimizing. Insns are regarded as dead if they only
set registers and none of them are live. Dead sets used in debug
binds are preserved in debug temps, so this does not affect debugging.
Reorder basic blocks to improve code placement.
The reorder blocks pass attempts to increase the number of fallthrough
edges by moving basic blocks. This may remove the possibility of
breaking at explicit goto statements.
Attempt to fill delay slots of branch instructions.
This pass moves insns about, attempting to fill delay slots on arches
that support them, most often of calls, branches, jumps and returns.
It runs after var-tracking, and it may move insns across debug bind
notes that would be affected by it, potentially confusing location
information. It may create opportunities for jumps to jumps to be
redirected to the ultimate jump target, which may invalidate
breakpoints that could have been set at the bypassed jumps. On a few
arches, calls followed by jumps may have their delay slots filled with
insns that modify the register holding the return address for the
call, which may confuse debuggers as to the point of the call,
including the recovery of entry-point values from the caller frame and
location information.
Conditional markers might enable CFG simplifications without
invalidating breakpoints, but failing that, it would probably be wise
to disable this and return address adjustments at -Og.
Enable machine specific peephole optimizations.
This flag is enabled by default, but it is only activated if
optimization is enabled, on machines that define peepholes, not to be
confused with the newer peephole2, handled by --peephole2.
Unlike peephole2, these older peepholes recognize sequence of insns
during the final pass and output assembly code directly. Any debug
notes between insns that are recognized as a peephole group are moved
before or after the peephole output, which keeps markers mostly
correct, but may corrupt binds.
--merge-constants: varasm
Attempt to merge identical constants across compilation units.
With this flag, constant pool entries and other constants that do not
amount to objects that may have their addresses taken and compared (or
--merge-all-constants is given, requesting even such read-only objects
to be merged), are emitted in mergeable sections so that the linker
can detect and remove duplicates. This may affect debugging inasmuch
as the address/identity of the unified objects matters; since
so-unified objects are usually string literals and initializers,
rather than user-visible variables, this should seldom if ever affect
debugging.
-O1: optimize=1
Perform only very fast optimizations.
This option sets the optimization level to 1.
With -O0 or -Og, the maximum vectorization factor for OpenMP is
limited to 1. At -O1 or higher, target-specific vector sizes are used
instead.
Basic blocks containing only PHI nodes, debug binds and markers may be
dropped altogether by the mergephi pass. Dropping markers could make
some statements impossible to stop at when stepping, and dropping
binds makes their side effects not visible, so that earlier binds seem
to remain effective. It might be possible to move the binds and
markers into the destination block so as to keep them as conditionals.
Pairs of tests guarding conditional blocks in && or || arrangements
may be combined into a single test by the ifcombine pass. The block
holding the second test becomes unconditional, so any markers and
binds in it will take effect even when they shouldn't. Further
optimizations are enabled if the then block is a forwarder to the else
block, or vice-versa (a forwarder block is empty except for phi nodes,
debug binds and markers). These may further confuse debugging
changing the situations in which the forwarder's binds and markers
take effect. Conditional binds and markers may alleviate these
problems.
The laddress pass lowers address-taking operations that are not
invariant, so as to expose the computations involving offsets and
array indexing to optimizers. It has no effect on debugging.
Enable SSA-BIT-CCP optimization on trees.
This flag modifies slightly the behavior of the SSA tree-ccp pass, so that it keeps track of individual bits in SSA
registers, rather than just entire registers. This allows some
further simplifications, especially of conditional branches based on
individual bits.
This does not introduce any new kind of impact on the debugging
experience but it may make further blocks unreachable and thus
unavailable for breakpointing, and further assignments reduced to
reuse of constants without additional code.
Enable forward propagation on trees.
This pass, enabled by default but activated only at -O1 or
higher, is run up to 3 times on each function. It substitutes
expressions assigned to SSA names into uses thereof, folding
statements in place. This doesn't affect debugging, but other
transformations made by these passes do. Loads of complex types whose
real or imaginary parts are used separately are broken up into
separate component loads, but debug binds referencing the complex
value loaded from memory are reset, degrading debug information: the
bind stmt might be adjusted instead. Stores of complex values are
also split up, without effect on debugging. Expressions taking the
address of variables, and possibly adding offsets to them, may be
substituted into indirections, enabling variables to become
non-addressable and turned into SSA form, as in --tree-phiprop. The conditions in conditional branches may be
folded to constants, which changes the control flow graph and can
render entire blocks unreachable. Likewise, simplifications in switch
expressions may rule out some case targets. It may combine memcpy and
memset calls to neighbor ranges into a single memcpy, which may affect
debugging if the pointer returned by the memset call is referenced in
debug binds. Additional specialized transformations involve bit
rotations, permutations, bitfield refs and vector constructors, but
none of these affect debugging.
Perform scalar replacement of aggregates.
This flag enables passes that turn members of aggregates that would
normally live in memory into stand-alone scalars that can be optimized
like registers. The original aggregate object may in some cases be
fully taken apart, but when it is still used as a whole, the scalar is
"spilled" back in place and "reloaded" as needed.
After assignments to the scalar introduced by these passes, as well as
spills and reloads, debug binds are introduced so that var-tracking
can keep track of the fragments of the aggregate, so this pass should
be transparent as far as debug information is concerned.
Unfortunately, there are problems or limitations in the var-tracking
pass that cause us to not use the annotations for the scalarized
members, at least in cases in which the aggregate as a whole is small
enough to be regarded as an SSA register. Some investigation to
var-tracking is needed to determine how to use at least the
conflicting notes that apply to both the whole aggregate and the
scalarized member, but this may turn out to show significant
shortcomings in VTA (variable tracking at assignments) and require
some work to make use of the available annotations so as to bring
debug information quality of (fully- and?) partially-scalarized
aggregates in line with that of scalars.
Another notable limitation introduced by this pass is that dismembered
aggregates can no longer be used in inferior calls that expect
references or pointers.
Enable loop invariant motion on trees.
Although this flag is enabled by default, the pass is omitted from the
set of passes activated at -Og, so it is only run at -O1
or higher.
This pass moves invariants out of loops, and performs store motion.
Floating-point divides and shifts for bit tests may have invariant
divisors and shifted bits rearranged for hoisting, without impact on
debugging.
Access to memory at an invariant address may be turned into a SSA
scalar, with a load at the loop entry and a store at the loop exit;
such early loads and delayed stores may be confusing for debugging.
Invariant computations are moved to the edge into the loop from the
preheader, after being removed from their original position. The
removal triggers propagation into debug binds, which preserves bind
equivalences but drops the actual location, and becomes more fragile.
With a bit of additional effort, it would be possible to keep the
binds unchanged. Still, this movement should have little to no impact
on debugging.
Enable dominator optimizations.
Although this flag is enabled even at -Og, the passes controlled
by it are omitted from the set of passes activated at -Og, so
they are only run at -O1.
It propagates constants and copies into uses, folds expressions,
attempts to resolve conditionals, eliminates redundant computations
and redundant stores, replaces inequalities with equality tests,
propagates coalescible SSA names equivalent to PHI values incoming
from each edge, propagates and removes degenerate PHIs, and performs
jump threading.
The only transformation that has any significant effect on the debug
experience, given that VTA, SFN and LVu mask the effects of the
others, is jump threading. See the effects of (gimple) jump threading
under --tree-vrp.
--inline-functions-called-once: pass_ipa_inline
Integrate functions only required by their single caller.
This option works as an enabler for certain cases of inlining, in
that, if this option is disabled, or optimization is disabled, for a
function or for any of its callers, and no other flag or attribute
mandates or enables inlining, then the possibility of inlining into
all callers and not emitting an out-of-line copy will not even be
considered. Oddly, the "called once"/"single caller" bit seems to be
a left-over artifact of earlier implementations: there doesn't seem to
be any test involving the caller count in the inlining code paths
activated by this flag.
Inline substitution, per se, is not usually a significant source of
debug information degradation: any piece of debug information that
could be represented in the out of line function can be and is equally
represented for each inlined copy. Potential loss arises out of
debug-lossy optimizations, when performing transformations that are
enabled or strengthened by the additional information available when
analyzing both the caller and the callee in a single context. For
example, the inline expansion of a function within a loop that is
unrolled may face significant ambiguity as to how many inlined copies
of the function are there, how far scopes in each copy extend,
especially if instructions of different iterations are shuffled
together by e.g. modulo scheduling.
Another situation in which inlining may affect the debug experience
significantly is that of heavy use of abstraction calls. As large
numbers of nearly empty, abstraction-only functions are inlined, the
density of code vs debug annotations becomes low, and the risk of
hitting upper limits on debug annotations counts grows. When they are
hit, such annotations as debug markers and binds may be dropped,
removing the compiler's ability to mask the effects of optimizations
on debugging. The loss of markers removes the linearity of
single-stepping and the robustness of the relationship between source
locations in the program and observable effects that they bring. The
loss of debug binds takes with it much of the possibility of observing
variables not held in stable memory locations. Such degradation, that
takes debug information back to the days in which the debugging of
optimized programs was reasonably held to be unreasonably difficult,
may sometimes be avoided at the expense of significant compile time
and memory, using such parameters as "max-debug-marker-count",
"max-vartrack-size", "max-vartrack-expr-depth", and
"max-vartrack-reverse-op-size".
Enable backward propagation of use properties at the SSA level.
This flag is enabled by default, but the pass is only activated at -O1 or higher.
It detects numeric variables whose sign does not matter, and optimizes
away operations that affect only their sign. Debug binds referencing
modified SSA DEFs are adjusted when possible, but since some cases
involve function calls and those do not belong in debug binds, some
binds may be lost, and others, especially after PHI nodes, may be
bound to expressions that have their signs reversed, which may be
confusing.
Enable hoisting loads from conditional pointers.
This pass, enabled by default but activated only at -O1 or
higher, replaces phi nodes whose incoming args all take the address of
a scalar value, and are later dereferenced, into phi nodes that take
the scalar values directly. The pass makes sure that the loaded
memory values cannot change between the load points, original and
optimized, but this transformation might affect debugging if it
involves modifying any of the affected memory variables, as the values
may have already been loaded. It may also cause a variable that was
addressable to become non-addressable and promoted to an SSA register.
Debug binds would only be assigned at the time of this promotion,
which may be too late to capture assignments that might have already
been moved or optimized out. As a result, such variables, promoted to
non-addressable, will have worse location tracking than scalar
variables that never have their address taken, but no worse than if
they had remained addressable all the way.
Perform function-local points-to analysis on trees.
This just computes more refined alias sets, it doesn't make any
transformations, so whatever effects it might have in the debugging
experience are indirect.
Optimize amount of stdarg registers saved to stack at start of function.
The code enabled by this flag estimates the maximum sizes of
general-purpose and floating-point registers areas used in a stdarg
variable argument list function, so as to limit the number of
registers that need to be saved. This does not affect debugging.
Enable conditional dead code elimination for builtin calls.
Although this flag is enabled even at -Og, the pass is omitted
from the set of passes activated at -Og, so it is only run at
-O1.
This pass replaces builtin calls with simpler operations, and/or
guards the operation by conditions that decide whether or not to
execute the call, replaced or not. This may be slightly confusing
when setting breakpoints at the omitted calls, or attempting to
single-step into them.
Transform condition stores into unconditional ones.
This flag is enabled by default when there is a conditional move
instruction, but the pass is only activated at -O1 or higher.
The pass moves gimple stores in conditional blocks to subsequent join
blocks, introducing PHI nodes to select the value to be stored.
Addressable variables rely on var-tracking (MEM annotations) rather
than var-tracking-at-assignments debug binds, so moving stores cause
observable changes in the debug experience: if a variable that should
be modified by a store is inspected after the expected store point,
but before the replacement store is executed, an outdated value will
be found.
I wonder if it might be possible to insert debug binds to temporarily
override the location of variables that live in memory most of their
lifetime, so that such deferred writes could be reflected in location
lists, and observed immediately through such a bind, in spite of the
deferred execution of the store.
As in --hoist-adjacent-loads, the moves
could leave the conditional blocks empty, which could make it
impossible to set breakpoints at lines within them or to single-step
into them, as SFNs get dropped along with the removed blocks. Unlike
the combined stores from if/then/else structures, sunk stores from
else-less then blocks (or from else blocks with empty then blocks)
retain their location information, so one might be able to stop at
them even when the conditional block to be executed does not include
that line. This can all get confusing, and it could be alleviated
with conditional binds and markers.
Optimize conditional patterns using SSA PHI nodes.
This pass performs various transformations (see --hoist-adjacent-loads for more) that may drop small or empty
conditional blocks, combining a test and a conditional assignment
(represented as a PHI node) into a flag-store, an abs, min, or max
expr. If a temporary is needed, it may be cloned from the phi result,
but that will then be placed in one of the operands of the original
PHI node, so any debug binds referencing the original result remain
correctly unchanged. The potential negative impact on the debug
experience of these transformations is limited to the removal of a
conditional block, with diminished ability to step into the block or
set breakpoints in it, and the potential of an early (temporary)
overwrite of the location of the variable that will eventually hold
the join value, which might make the variable impossible to inspect or
modify after such overwrite. The 3-way min-max cases do not change
this picture much, except for the possibility of loss of visibility of
the result of the intermediate assignment, as bind and marker are
removed along with the conditional block.
Another situation in which a conditional block may be eliminated is
that in which both edges out of the condition yield the same value for
the PHI (e.g. x != a ? a : x simplifies to a). Such simple cases of
value unification have just the usual impact of removing a conditional
block, but more elaborate cases, with multiple assignments computing
the result of the conditional block, have the assignments, but not
markers or binds, moved out of the conditional block, with the usual
consequences of difficulty of stepping into the removed block, or
inspect the results of computations whose debug binds were dropped,
before the debug binds at a subsequent join point, if any.
Yet another transformation is factoring a conversion out of a PHI
node. If both incoming edges perform the same conversion, or if one
is a constant and moving the conversion after the join is still found
potentially profitable for enabling other optimizations, a new PHI is
introduced with type and values prior to the conversion, the original
conversions are removed, a new conversion stmt is introduced at the
top of the join block, storing in the original PHI result, and finally
the original PHI def is removed. This transformation does not remove
any block, the original conversions can be propagated into any debug
binds, and the new conversion (without location information) is
inserted before the debug bind of the original PHI node. The final
removal of the original PHI node does not reset debug binds, because
we skip propagation into binds upon PHI node removal, and the
conversion assignment becomes the new definition. The moved
conversions can still be inspected, thanks to SFN and VTA, and the
converted value is bound to the variable that takes that value at the
join point too, so this transformation does not affect the debug
experience.
Enable reassociation on tree level.
Although this flag is enabled by default, the pass is omitted from the
set of passes activated at -Og, so it is only run at -O1
or higher.
This patch rearranges multiple stmts that perform the same operation,
say addition, ordering operands by rank and issuing multiple
operations in parallel when that's advantageous. This ends up
removing nearly all of the original stmts and issuing new ones, using
new SSA names. Debug binds retain the original operations, and
markers allow them to be inspected when single-stepping. The
reassociation might insert extraneous calls, however, e.g. turning
repeated multiplies into powi calls; this might be slightly confusing
if stepping into calls. Range tests in conditional branches may end
up simplified, making the branches unconditional, and rendering some
blocks unreachable, which prevents setting breakpoints in them.
Enable loop optimizations on tree level.
This flag is enabled by default, but it is only activated when
optimization at -O1 or higher is enabled.
When activated, this flag enables a pass that detects loops and
gathers information about them. If the flag is activated and loops
are found in a function, then various loop passes are run over that
function; otherwise, only the pass enabled by --tree-slp-vectorize is.
Enable copy propagation of scalar-evolution information.
This flag is enabled by default, but it is only activated when
--tree-loop-optimize is activated.
If scalar evolution determines that a PHI node is invariant, replace
uses thereof, including those in debug binds, by the invariant. This
has no effect on debugging.
It also computes, through scalar evolution, the final value of
variables modified in loops, dropping the PHI node in favor of a
computation based on values known before the loop is entered. This
may affect debugging when the removal of the PHI node resets a debug
bind referencing it, but the bind could be preserved, since a new,
equivalent definition will be introduced.
Create canonical induction variables in loops.
This flag is enabled by default, but it is only activated when
--tree-loop-optimize is activated.
This pass estimates the number of iterations of each loop, identifies
exit edges and removes those whose conditions are never met, based on
gathered information about the maximum number of iterations. It
attempts complete loop unrolling and completes if that
succeeds. Otherwise, if the loop meets certain conditions, a
countdown induction variable is introduced and the loop exit test is
replaced so as to compare this variable with zero.
The only transformations that minimally impact debugging are the
removal of loop exits, which may render some unreachable blocks
unavailable for setting breakpoints (that would never be hit), and
loop unrolling, that uses the same machinery and has the same effects
on debugging that loop peeling (see --peel-loops).
Optimize induction variables on trees.
This flag is enabled by default, but it is only activated when
--tree-loop-optimize is activated.
For each loop, after detecting base and general induction variables
and selecting the optimal set, any new, artificial induction variables
are created and added to the loop. Then, uses of induction variables
not chosen for the optimal set are rewritten in terms of the optimal
set, adjusting their original assignments or inserting new assignments
instead of phi nodes. Finally, assignments to induction variables set
to be removed are propagated into debug binds, if needed, and then
discarded.
Alas, propagation into debug binds may lose plenty of useful
information: PHI nodes cannot be propagated into binds, and regular
assignments are not removed so that, say, if a definition of A is used
in a definition of B and both are to be removed, we get a chance to
propagate B and then A into debug binds that referenced only B. If we
happen to remove A first, uses of B in debug binds end up having to be
reset, losing relevant location information.
Inline __atomic operations when a lock free instruction sequence is available.
This flag is enabled by default, but the transformations described
herein, part of the fold builtins pass, are only activated at -O1 or higher.
Various atomic operations are turned into atomic bit test and set,
complement or reset. The transformation may invalidate user variables
used only in compares with zero.
Perform conversion of conditional jumps to branchless equivalents.
Various situations in this RTL pass remove tests, conditional branches
and basic blocks. This can make for very surprising single-stepping
into the blocks guarded by the conditions, as lines that would not be
expected to run given the condition actually get to run, or
vice-versa. SFNs don't help, they just reinforce whatever block
execution is taken, or get dropped altogether.
Aside from the confusing single-stepping, the block removal might (but
likely doesn't) cause GCC to lose track of debug bindings. In theory,
at confluence points (when entering SSA), we introduce additional
debug binds that allow GCC to recover from the loss of bindings in the
separate branches. These should allow GCC to get back in sync with
the result of the if-converted assignments at the confluence point, so
at least after the confluence point, the bindings should have been
recovered: if-converted sets will be inserted before the
confluence-recovering debug bind.
These transformations usually apply to a single assignment in each
conditional block, but there is support for turning multiple
assignments in a then block into multiple assignments from
IF_THEN_ELSE (cond, then_value, orig_value) too. There aren't further
debugging complications in this case, but the blocks can be much
longer, breaking users' expectations of single stepping for longer.
SFN might make all of this worse, in that the statement markers in the
conditional blocks are actually dropped, so you don't get to step into
the blocks any more.
Support for conditional markers and binds could alleviate the effects
of these transformations.
Move loop invariant computations out of loops.
This pass identifies SET insns that are invariant within a loop, and
moves them to the loop preheader, possibly using a new pseudo to hold
the invariant, or replaces them with a copy from the pseudo holding an
equivalent invariant. Debug binds remain in place and need not be
adjusted, as the transformations ensure the values are available in
the original pseudos at the points right after the original SETs,
where the binds will tend to be.
The only risk I can see to debuggability is that moved insns, and
insns leading to equivalences that may end up dead and removed at
later passes, may leave lines of code without any insns standing for
them. The use of SFN and LVu information in debuggers, enabling them
to stop at and inspect the state even at such lines, removes this
potential problem.
Replace add, compare, branch with branch on count register.
This pass replaces the conditional branch at the end of a loop with a
single decrement-counter-and-conditionally-loop sequence, when the
loop iteration count can be computed. The original loop counter is
not removed by this pass, so this pass by itself does not affect debug
information. However, the original loop counter may become unused,
and then be optimized away, and then it is unlikely that the generic
adjustments to debug bind statements will be able to realize it can be
computed from the newly-introduced loop counter. There is room for
improvement, adjusting the debug binds of the original loop counter in
terms of the new related IV. This might require some additional
infrastructure that could likely be generalized and used for IVs in
general.
Perform conversion of conditional jumps to conditional execution.
This pass turns insns in then and else blocks into COND_EXEC, enabled
by the if condition (then) or its negation (else), removing the
conditional branch, the branches at the end of the conditional blocks,
and bringing it all into a single basic block.
It does not modify or remove debug insns, so single-stepping will
enter and execute both blocks, though the side effects of insns whose
condition is not active will not be executed. In general, insns that
modify a variable will be followed by a debug insn that binds the
variable to the location holding its modified value.
Although debug insns don't have conditional binds, the location of a
variable often (but not always) remains the same across modification.
In the cases it doesn't, only the bind at the confluence of the
conditional blocks will get the variable location and value back in
sync.
In addition to the post-confluence point, a variable modified within a
block turned into conditionally-executed insns can also be correctly
inspected right after an (active) assignment to it, i.e., the
conditional assignment that would have been executed should the
conditional blocks have remained separate. SFN and LVu technology
help make sure there will be a usable inspection point with the
correct bindings at that point.
At other points in the combined block, variables potentially modified
in it may be regarded as bound to a stale or unused location holding
an unrelated or uninitialized value, corresponding to what would have
been assigned to the variable in the other block. This can get
confusing if one does not realize that the block that is apparently
being executed was not the one corresponding to the guarding
condition.
All of these caveats of conditional execution only apply in the
somewhat unusual cases in which the location of the variable actually
changes. Because of control flow confluence and variable value
unification at that point (regardless of the debug bind at the
confluence point), it will most often be the case that the variable
lives at the same register or memory location throughout the
conditionally executed blocks, so the degradation of the debugging
experience by this pass, although possible, should be rare.
Debug binds and markers cannot currently be marked as conditional;
making that possible could further alleviate the impact of this
transformation.
-Os: optimize=2 + size
Perform optimizations that tend to reduce the code size.
This option sets the optimization level to 2, in a mode that assigns
higher priority to reducing code size.
Optimization at level 2 or higher extends tests on whether memory
references may overlap with affine combinations analysis. This may
infer non-aliasing in cases lower optimization levels wouldn't,
enabling further optimizations, but nothing with effects on debugging
that couldn't be had in other more obvious cases of non-aliasing.
Optimization level 2 or higher enables a pass that completely unrolls
inner loops that iterate just a few times. Unrolling uses the same
machinery that performs loop peeling (see --peel-loops) and, by itself, does not affect debugging.
An early rematerialization pass runs at optimization level 2 or
higher. It rematerializes pseudos whose live ranges cross calls by
copying the reaching definition insns between calls and uses. The
pseudo may then be regarded as dead before the call, which might reset
binds after the new death points, even when they could be adjusted so
as to refer to the definition that will be used for rematerialization.
In some cases, however, the expression may be lost entirely, but even
when it is preserved, it might be too complex to be recognized as
unchanged when the pseudo is rematerialized, so locations or values
based on the pseudo might be lost.
Optimizing for size changes the default register allocation region
setting back to the one used when not optimizing.
--expensive-optimizations:
Perform a number of minor, expensive optimizations.
Gimple jump threading is one of the significant transformations
enabled by this flag; see the effects of jump threading on debugging
under --tree-vrp.
The bswap gimple pass, also enabled by expensive optimizations,
recognizes shifts and rotates equivalent to byte-swap transformations,
and replaces them with a byte-swap builtin. Any user-visible
intermediate computations should have debug bind statements that will
ultimately be adjusted and preserved even if the computations
themselves are dropped, but some stmt moving, replacing, and
inserting-then-removing, might actually mess up debug bind tracking of
the final value.
Another expensive optimizations pass is widening_mul. It recognizes
various opportunities for math optimizations, such as fusing multiply
and add, testing overflows on adds or subtracts, and combining divide
and modulus into a single operation. Final assignment stmts are
replaced and stmts performing no longer needed computations are
removed in a way that doesn't harm debugging.
Some of the changes brought about by this flag are additional
canonicalization of addresses when comparing base addresses in alias
analysis, searching for alternate base addresses in gimple strength
reduction, loop iteration count estimation even for loops with
multiple exits, taking conflict counts into account when ordering SSA
names for coalescing, combination of temporary slots for automatic
variables, reuse of wider-mode ANDs and MEMs for CSE, simplifications
and cheap extensions in combine, slightly more elaborate selection of
register class preferences and attempts to decrease the number of live
ranges in the integrated register allocator, removal of some unneeded
reloads, and additional post-reload combine and CSE subpasses. None
of these modify passes in ways that impact debugging but that aren't
similarly impacted without this flag.
Another of the expensive optimizations is the compgotos RTL pass, that
duplicates each small-enough block ending in computed jumps and merges
the copies with predecessors that have it as their single successor,
with no effects on debugging.
Assume strict aliasing rules apply.
This flag limits the cases in which pointer accesses may alias, but
that does not enable any kind of transformation with impact on
debugging that could be incurred otherwise, using pointers known not
to alias through other means.
Use the cheap cost model for vectorization.
This affects --tree-loop-vectorize and
--tree-slp-vectorize decisions, but not the
kinds of transformations they make.
Perform Value Range Propagation on trees.
This flag activates two different passes: early vrp and vrp proper.
Early vrp is simpler in that it is not iterative, going through basic
blocks once in dominance order rather than using the SSA propagation
engine.
Once the range assigned to an SSA name is narrowed down to a single
constant, subsequent statements referencing the name can be propagated
into and possibly folded, and the definition may be removed.
Conditional statements may be simplified, removing edges and basic
blocks. Expressions in other statements may also be simplified based
on ranges.
Such simplifications, in themselves, do not affect significantly the
debugging experience. Removed definitions, if mentioned in debug
binds, will be propagated into them and preserved there, with markers
and views enabling them to be single-stepped and inspected; otherwise
simplified statements remain in place with the same outputs, and don't
require any debug information changes. Simplified conditions may
cause entire blocks to become unreachable and be removed, which would
stop placing breakpoints at them, but such breakpoints wouldn't be
reached anyway.
At the end of VRP proper, (gimple) jump threading takes place, using
value ranges to simplify conditional stmts to tell whether outgoing
edges of threadable blocks can be determined from incoming edges.
Gimple jump threading duplicates a block when arriving at it through a
certain incoming edge implies exiting it through a certain outgoing
edge. This duplication, in itself, does not affect the debug
experience: the copied block carries as much debug information as the
original block. During threading, however, there are blocks that are
not copied, namely forwarding blocks. From a codegen perspective, all
they seem to do is to jump to another block. From a debug experience
perspective, however, they may contain plenty of bind statements and
markers, and those are not duplicated: binds are consolidated so that
only the latest bind to each variable is copied, and markers are
dropped entirely. This arrangement, intended to reinforce binds after
newly-introduced confluences, drops debug binds that would not be
observable before the introduction of markers and views. With markers
and views, dropping the blocks in favor of bind consolidation amounts
to significant loss. Effects need to be assessed, as forwarding
blocks and leading/trailing debug stmts may end up removed by CFG
cleanup. Better means to preserve them when consolidating forwarding
blocks guarded by optimized-out conditions may be needed: conditional
markers and binds are a possibility to explore.
--tree-dce(aggressive): pass_cd_dce
See --tree-dce. At optimization level 2 or higher
(i.e., starting at -Os), the second tree dead code elimination
pass is run in aggressive mode, that takes control dependencies into
account, enabling additional conditional branches to be eliminated.
This does not, however, fundamentally change the kinds of effects
these passes have on debugging.
Perform interprocedural reduction of aggregates.
This pass modifies the argument list of a function that takes
aggregates as arguments, splitting them into scalars, and adjusting
the callers. The impact on debugging could possibly be no different
from that of --tree-sra, but the parameter transformations
do not retain any traces of the original parameters that could have
variable location information generated in a way that reconstructed
the original object, or even that tracked each replacement scalar
parameter separately. This would require infrastructure to somehow
retain the original parameters and describe how they map to the
replacement parameters.
Optimize sibling and tail recursive calls.
This enables two separate passes. One attempts to turn tail recursion
into loops, the other marks non-recursive tail calls as such, so that
the expander emits them as jumps rather than calls.
Neither transformation affects debugging within an activation of a
function, but they do affect debugging in that call stacks may be
missing expected frames, stepping over a tail call would require
additional logic in the debugger and the call would not return to the
expected caller, and setting a breakpoint at the entry point of a
recursively tail-called function may miss the recursive tail-calls.
Perform conversions of switch initializations.
This activates switch statement lowering alternatives that may be more
efficient than the jump tables or decision trees that are otherwise
used.
One of the lowering possibilities uses the switch value as a shift
count, and then uses bit tests instead of multiple equality tests. No
visible effects on the debug experience are expected from this.
Another turns a switch statement with all cases containing assignments
of constants to the same variables into arrays of the constants and
assignments to the variables from indexed elements of the arrays.
This collapses the code for all (in-range) cases into a single block,
losing any debug annotations they might contain. This ultimately
prevents stepping into the switch statement or breaking at any of the
cases. Optimized-out assignments that might have been preserved in
such annotations will be lost altogether. As for assignments that are
handled by this transformation, even though debug binds in the cases
are lost, binds introduced by VTA after the post-switch PHI nodes will
enable the variables to be inspected afterwards.
Perform partial inlining.
This flag enables splitting of functions, so that a part will be
inlined while another part remains as a separate out of line function.
In theory, this shouldn't be a problem for debugging: the inlined part
is represented as an inlined function, the part that remains out of
line (or that is further split) is represented as an out of line
function. Alas, it's not that simple: the out of line portion should
be recognized as a part of a function, with an enclosing context taken
from the inlined portion. There is no standardized representation
that could enable debuggers to recognize this relationship, so at the
very least there is going to be confusion as to stack frames, incoming
arguments, and available variables from split contexts.
If the partial function is output as an optimized version of the
original function (it is), a debugger might also set breakpoints at
its entry point as if they were entry points for the entire function.
We have a debug info extension proposal to enable at least the entry
point of the out of line part to not be regarded as an entry point for
the entire function, which alleviates the breakpoint setting problem,
but we may still need more annotations to allow a debugger to
represent a single virtual call frame when the inlined portion
activates the out of line one, with the entire set of enclosing
variables and whatnot.
Without that, this flag can make debugging very difficult.
Perform Identical Code Folding for functions and read-only variables.
This pass identifies read-only variables with identical
representation, and functions with equivalent executable code, and
outputs only one copy of each. This is a disaster for debugging the
discarded functions: line number and variable location information is
dropped for all but the selected function in each equivalence group.
It is even more confusing because the wrong function seems to be
called when stepping into a dropped one, and unexpected breakpoint
hits may occur.
This is some room for improvement here, but it is hardly trivial. We
should generate debug information for all copies, but we don't want to
compile them all the way to the end and then attempt to unify labels
and whatnot to output location lists for each variant, and multiple
line number tables. Unifying the functions combining and turning all
debug annotations, including source locations, into conditionals that
identify each of the unified copies could enable us to compile them
normally, and then emit a single line number table (augmented with
conditionals) and location information for each of the separate
copies. Debug information consumers may then be able to identify the
copies using return addresses and call-graph debug information, the
same machinery used to determine entry-values of parameters.
Try to convert virtual calls to direct ones.
It replaces indirect calls with direct calls, possibly enabling
folding, inlining and whatnot. The replacement of calls in itself
does not affect debugging, but the enabled transformations might.
Perform speculative devirtualization.
This is somewhat like --devirtualize, but the direct call is guarded
by a test that confirms the selected target of the call is the correct
one, and the indirect call remains as an alternative. Nothing there
would affect debugging.
Perform interprocedural constant propagation.
This pass collects plenty of information about opportunities for
propagating constants from callers to callees, cloning functions and
replacing parameters with the constants or other known properties.
This may make room for many other optimizations, including resolution
of indirect calls to direct ones.
Cloning and substitution do not impact significantly the debug
experience: the clones refer back to the original function as their
abstract origin, and the substituted parameters, even if eliminated
from the cloned function's ABI, are noted as bound to the constant in
the debug info for the concrete function.
One potentially confusing situation that arises out of cloning is to
set a breakpoint at a code address, and then be surprised that it is
not hit at other activations of the function that do not use the same
clone. Since this also comes up with such traditional transformations
as inlining and loop unrolling, it probably won't be too surprising.
Perform interprocedural bitwise constant propagation.
This flag extends --ipa-cp so that it also gathers
information about which bits are known to be zero in values passed
from one function to another. This creates additional opportunities
for folding, --tree-ccp, etc.
Perform IPA Value Range Propagation.
This flag extends --ipa-cp so that it also gathers
range information in values passed from one function to another. This
creates additional opportunities for folding, --tree-vrp,
etc.
Integrate functions into their callers when code size is known not to grow.
Like --inline-functions-called-once, this flag is an enabler for
inlining, in that if it's not active, various cases of early inlining
(and splitting for --partial-inlining) are
suppressed.
Perform indirect inlining.
Like other inline flags, this flag is an enabler: if it's not active,
it stops the compiler short of attempting to resolve indirect edges
(e.g., indirect or virtual calls) to direct edges.
Integrate functions not declared "inline" into their callers when profitable.
Like other inline flags, this flag is an enabler: if it's not active,
it stops the compiler from considering inlining functions not
explicitly declared inline. See --inline-functions-called-once for an analysis of the impact of
inlining on debugging.
--hoist-adjacent-loads: pass_phiopt
Enable hoisting adjacent loads to encourage generating conditional move
instructions.
This flag modifies the ssa-phiopt pass, so as to move before a
conditional branch loads of adjacent fields of the same struct into
(different SSA names joined into) the same variable, one load in the
then block and the other in the else block.
A debug bind will likely follow each of the original loads, so the
moves won't change the ability to inspect the destination variable
after each load. However, the early overwriting of the variable can
make its previous value unavailable sooner than expected.
The moves could leave the conditional blocks empty, especially if a
conditional move ends up being used, which could make it impossible to
set breakpoints at lines within them or to single-step into them, as
SFNs get dropped along with the removed blocks. The moved loads
retain their location information, however, so one might be able to
stop at them even when the conditional block to be executed does not
include that line. This can all get confusing, but I don't see ways
to improve that.
Turn undefined behavior into traps
This pass detects dereferences of null pointers and replaces them with
trap statements. When the deference involves a PHI node, the incoming
edge that carries the null value is redirected to a copy of the block,
and the copy gets the trap statement instead.
This affects debugging mostly in minor ways. A chunk of code that
follows an unconditional null dereference may become unavailable for
breakpoints as the traps enables it to be completely optimized away.
When a block is copied for the case of conditional null dereferences,
references to the copied labels by name may not be resolved to the
corresponding locations in the copied blocks. In extreme cases, in
which all remaining incoming edges bring a null value, the original
block may end up unreachable and optimized away, potentially making
the label unavailable even while copies thereof remain.
When an indirect call is replaced with a trap, say because the callee
address is null, debugger users may be surprised for not being allowed
to step into the called function, even if they modify the pointer so
that it is not null, because the call was turned into a trap. Such
types of debugging sessions, involving debugging-time modification of
pointers that at compile-time could be determined to evaluate to null,
may become impossible to carry out after these transformations.
This flag, as well as --isolate-erroneous-paths-attribute and
-Wnull-dereference (though a warning flag should not enable
optimizations), enable turning divide by zero into trap (unless
--non-call-exceptions is enabled), with the same logic and
consequences as the above, and addresses of local automatic variables
returned from functions into NULL, with no effects on debugging.
The flag --isolate-erroneous-paths-attribute uses the same logic and
machinery as this option, but it recognizes cases in which a null
pointer is passed to a function in an argument marked (with an
attribute) as requiring a nonnull pointer, or returned from a function
that marked as returning a nonnull pointer, and replaces the erroneous
call or return with a trap. The effects of these transformations on
debugging are of essentially the same kind.
Enable SSA-PRE optimization on trees.
When an expression is computed redundantly in a block and some of its
predecessors, make it fully redundant by inserting it in other
predecessors, and then remove the redundant computation.
In theory, the insertions have no effect on debugging, but SSA
coalescing may cause them to overwrite a variable earlier than
expected, making it unavailable for inspection until the expected
assignment point. The removals are preserved in debug binds, so as
long as the computations are not optimized out, they will be
representable, and with SFN and LVu, the binds will be available for
inspection at the expected spots.
--code-hoisting: pass_pre
Enable code hoisting.
When equivalent expressions are computed in multiple blocks, move them
to a dominating block, and then remove the redundant computations.
The considerations that apply to --tree-pre also apply
to this flag.
--tree-tail-merge: pass_pre
Enable tail merging on trees.
This option is conceptually similar to --crossjumping, but it works on the gimple SSA representation,
rather than on RTL, as a subpass at the end of SSA-PRE
->#Os-code-hoisting. Despite the name, it only merges entire basic
blocks that share a common successor or predecessor.
Considerations that are also similar apply: the combined blocks may
refer to different source fragments, they may have different debug
annotations that are correctly ignored when comparing blocks, but that
are dropped altogether from one of each pair of merged blocks.
I envision a possibility of preserving the annotations with the
introduction of conditionals, though, unlike the case of jump
threading, it is not immediately obvious how to identify a condition
that might be available at run time and that could be used to tell
which set of annotations to activate, so as to enable a debugger to
show one source fragment or another as active.
Merge adjacent stores.
This combines multiple stores to adjacent or overlapping memory
locations in a single basic block into fewer wider stores. This is
done in gimple, before automatic variables are assigned to specific
stack slots, so it is unlikely to combine effects in more than one
user variable: it might combine accesses into a single array or
structure, i.e., larger addressable objects committed to memory early
in compilation.
These are objects that are not tracked or affected by VTA, so debug
binds are unlikely to be affected. However, the postponement of
merged stores may affect values visible at inspection points derived
from statement boundaries (SFN).
Perform (RTL) jump threading optimizations.
(Jump threading passes or subpasses in gimple/SSA are enabled by
--expensive-optimizations, by --tree-dominator-opts, and by --tree-vrp)
If a block is found to have no side effects, and if its being entered
through a certain edge E1 implies it will always be left through an
edge E2, this cleanup pass redirects edge E1 to the destination of E2,
bypassing the block altogether. This removes from the expected flow
any of the markers and bindings that were to be found in the bypassed
block. This may be confusing not only when single-stepping a program,
for an unexpected jump over a reasonably large piece of code might
take place, but also after the bypassed block, as the skipped bindings
may not be integrated in the subsequent views.
Perform global common subexpression elimination.
The PRE and hoist passes on RTL introduce new pseudos to hold
redundant/hoisted expressions, and new insns to compute them as needed
to make exprs fully redundant, and then replaces the redundant set
insns with copies from the new pseudos. Since the values still end up
in the REGs, debug binds referencing them are unchanged and remain
valid. Register allocation might be able to optimize away these
copies, but with SFN and LVu, it should still be possible to stop
after assignments, and inspect the assigned values. The only expected
negative effect on the debugging experience is that of early
overwriting of variables, should the new pseudos be assigned to the
same location as the dead variables whose future values they hold.
Another pass enabled by this flag is a constant/copy propagation RTL
pass. As pseudos are replaced with constants or other pseudos, this
may simplify and remove conditional branches and get unreachable basic
blocks removed, which may then prevent breakpoints from being set at
the source code ranges corresponding to the removed blocks. Trapping
insns may also be turned into unconditional traps, making the
subsequent code unreachable with similar consequences. Insns may
become dead as the pseudos they set are replaced; this might cause
debug binds referencing them to be reset, if the setting expression
cannot be preserved by propagating into the debug bind or by creating
a debug temporary. This may result in loss of debug location/value
information.
With --gcse-lm, PRE may pull loads out of loops, replacing stores with
copies to the pseudo, immediately followed by newly-inserted stores of
the pseudo. This may impact debugging in that variables that live in
memory will not be loaded again within the loop, so if the debugger is
used to modify the value of the variable, that may fail to affect the
program.
With --ira-hoist-pressure, hoist changes the weighting of decisions on
whether or not to hoist computations to dominating blocks, but that
doesn't cause different kinds of transformations to be done, so the
kinds of effects on the debugging experience remain unchanged.
Add a common subexpression elimination pass after loop optimizations.
We run an RTL common subexpression elimination pass when optimization
is enabled, and another after RTL global optimizations (--gcse cprop, hoist, and PRE, and --gcse-sm: store motion,
never enabled implicitly), if they ran and made any changes; this flag
adds another such pass after RTL loop optimizations.
CSE scans blocks linearly, detecting equivalent expressions stored in
different pseudos, and replacing uses of later-set pseudos with uses
of the earlier-set equivalent ones. This may render the later sets
trivially dead, and they are ultimately removed if so.
The register replacements per se do not affect the debug experience;
the dead insn removal might, but debug binds will have been replaced
as well, so the main issues are the potential early overwrite making a
variable unavailable for inspection, and the removal of insns at
inspection points, that are made up by SFN and LVu with debugger
support.
Register replacement might make it evident that a conditional branch
is always or never taken, turning it into an unconditional edge, and
then entire blocks might become unreachable. This might prevent
breakpoints from being set within such blocks, but since the condition
that led to them never held, they would never be reached anyway.
CSE can also combine condition code-setting insns when one block that
performs a compare flows into another that performs the same compare,
but this has no effect on the debug experience.
When running CSE, follow jumps to their targets.
This flag extends the CSE pass (see --rerun-cse-after-loop) so that registers set in one block can be used in
substitutions in subsequent blocks that have no other predecessors
than those in the path from the setting point. This does not change
the effects CSE may have on the debug experience, it just extends such
effects across separate blocks.
Use the RTL dead code elimination pass.
This flag is enabled by default, but the ud_dce pass described herein
is only activated when optimizing at level 2 or higher.
This pass relies on use-def chains to mark all defs of each use.
Then, it removes all unmarked insns, resetting debug binds that refer
to defs in any removed insns. It would be possible to preserve the
defs in debug temps for use in the binds, instead of resetting them,
and then the loss of debug locations would be avoided, but as it is,
this pass causes variables to lose their bindings.
Save registers around function calls.
Without this flag, pseudos that live across function calls will not be
assigned to call-clobbered registers. With it, they may end up in
such registers, and then they will be saved in a stack slot as needed
before calls, and restored as needed before other uses. In case a
debug bind references the register at a point in which the register
might be clobbered, it is adjusted to refer to the stack slot. Since
VTA notices the saves and restores and realizes the register and the
stack slot hold the same value, and regards call-clobbered registers
as such at calls, we end up with variable locations that reflect the
saving and restoring. This allows variables assigned to
call-clobbered registers to be inspected even while they live in stack
slots.
Modifying such variables in a debug session, however, is not
guaranteed to work: variable tracking does not find out which of the
copies GCC regards as the primary one, if there is one, it just
notices when a copy may no longer hold the current value and, at such
points, seeks alternate locations holding it. So debug information
may suggest modifying the memory slot will change the variable, even
though the variable has already been loaded into the register and
won't be reloaded from memory again, or vice-versa. The caller-save
implementation might be able to overcome this by issuing notes to be
used by variable-tracking to enforce the location changes.
Use caller save register across calls if possible.
This flag gathers information about which call-clobbered registers may
actually be modified in each function, and allows the register
allocator in their callers to select registers that it would otherwise
avoid, to hold values across calls known to not modify those
registers. This has no effect on the debugging experience.
Do CFG-sensitive rematerialization in LRA.
This pass recomputes the value of spilled registers, instead of
loading them back from memory. This makes for confusing debugging
sessions, if the spilled register holds a variable that is to be
modified by the debugger while it is only available in memory. The
expectation that the modified value would be used in subsequent uses
will not be met, and at some point after the rematerialization, the
variable will seem to magically take its original value back.
This situation is not entirely uncommon in optimized debugging,
considering that we only take note of one location for a variable at a
time, and we don't indicate whether or not that location is a
modifiable one, but it's particularly apparent and worth noting in
this case. Tracking all potential locations is remarkably expensive,
but we might be able to mark binding statements as modifiable
locations and clear that indication when a location expression is
modified. This would likely be quite useful to avoid misleading
behavior, but it might also limit severely the possibilities of
modifying variables in debug sessions one can try and get away with.
Perform cross-jumping optimization.
This pass identifies common trailing insns in predecessors of a block,
or leading insns in successors of a block, splitting one of the blocks
so that the other can have the equivalent insns replaced with a jump.
This transformation ignores debug locations, markers and binds, as
needed for -g to not affect codegen, but this makes it unify insn
sequences that refer to different portions of the source code, even
ones that affect different variables. Users of debuggers may find
themselves wondering how they ended up at a certain point of the
program without hitting an earlier breakpoint, or just when they
expected to be elsewhere. Markers and binds will reflect the apparent
source location, even if the code was reached from a different path
that had unrelated computations that happened to become the same
instructions; this may seem to be less confusing, unless one realizes
that the code sequence is just equivalent to that which should be
running after an unrelated path in the source program. With that
realization, confusion can be even more thorough, as the loss of binds
and markers will make expectations about what should happen in the
dropped path are unlikely to be met.
All this said, the likelihood that completely unrelated computations
be unified by this pass is very low. Trailing compares and jumps,
perhaps preceded by code sequences performing identical computations,
to the point of storing results in the same registers, will likely not
be dissimilar enough as to make debugging impossible, aside from the
effect of seemingly finding oneself at the wrong part of the program.
Thus, even though very confusing transformations are theoretically
possible, odds are that the transformation results may be recognizably
similar to what would be expected, and the only real surprise be the
unexpected jumps and the inability to set breakpoints.
Instead of dropping binds and markers from the range to be unified,
conditional binds and markers could be introduced and used to enable a
debugger to distinguish between the unified paths, and the side
effects expected from each path, as suggested for --ipa-icf.
Enable an RTL peephole pass before sched2.
The peephole passes run close to the end of compilation, looking for
sequences of insns that the backend recognizes for special treatment.
The peephole2 pass, enabled by this flag, turns a sequence of insns
into another sequence of insns, unlike peephole, that
outputs alternate assembly code for recognized sequences within final.
These passes run so late that debug insns have already been turned
into notes, and notes are skipped when recognizing sequences. Unlike
peephole, however, peephole2 discards notes that appear among
recognized insns, which may ultimately discard debug location and
marker notes, whereas peephole will move them before or after the
replacement insns sequence. Both can cause degradation of debug
information, leading to missed or incorrectly-placed bindings and
inspection points, so that unexpected values can be found when
inspecting affected variables.
Reschedule instructions after register allocation.
This pass computes dependencies between insns, and then reorders them
so as to better use hardware units, and so as to hide latencies.
The following assessment of impact is based on the standard insn
scheduler used by GCC, and on the extended basic block scheduler, as
opposed to the selective scheduler, which is largely incompatible with
the debug insn-based technologies introduced to improve debuggability
of optimized programs.
Debug insns, be they binds or statement markers, are retained in
order, and binds carry their preceding insn as a dependency, in
addition to any other dependencies from the bound value, but otherwise
debug insns are pulled ahead of nondebug ones. Nondebug insns,
however, are never regarded as dependent on debug ones, not even as
anti-dependencies, so a nondebug insn that modifies an input to a
debug bind resets the bind, which loses debug information. The bound
value might still be available in alternate locations, or through
other expressions, but no attempt is made to find out alternate
representations for the binding in this pass.
Another potentially lossy situation is that of moving an insn so that
it overwrites a variable before expected, which may cause the earlier
value to no longer be available for inspection.
Without SFN support in debuggers, insn scheduling is the most common
cause of the undesirable effect of jumping back and forth when
single-stepping optimized programs. With SFN, debuggers can advance
from one line to another according to the expected control flow, and,
with LVu, observe side effects noted in preceding debug binds, even if
insns that carry out those side effects are moved elsewhere.
Align the start of loops.
No effect on debugging.
Align labels which are only reached by jumping.
No effect on debugging.
Align all labels.
No effect on debugging.
Align the start of functions.
No effect on debugging.
--reorder-functions: varasm
Reorder functions to improve code placement.
Decides whether to emit (or start) functions in hot or cold sections.
No effect on debugging.
-O2: optimize=2
Perform optimizations that tend to make the program run faster.
This options sets the optimization level to 2, in a mode that assigns
higher priority to making the code run faster.
Although -O2 appears after -Os in the crescendo of
optimization levels, -Os and -O3 enable --inline-functions but -O2 doesn't.
Enable string length optimizations on trees.
This patch tracks string and memory calls, as well as char stores,
keeping track of string lengths, so as to optimize out builtin calls
involving such lengths into constants or previously-computed values.
Besides strlen(str) and strchr(str, 0) to length, it can optimize
strcat to strcpy or even memcpy, and more. The transformations may
involve removing redundant computations, possibly after inserting
simpler call sequences, or replacing calls with assignments.
Ultimately, if the return values of a call was stored in some SSA
name, the transformation will also store in it. It is possible,
however, that in the specific case of folding strstr(s,t)[=!]=s to
strncmp(s,t,strlen(t))[=!]=0, if the result of the strstr call is
stored in a user variable used only for the compare, the
transformation will take place and invalidate the debug bind for that
variable. There doesn't seem to be any other case in which a result
that might have been stored in a user variable could be lost in these
transformations.
The other potential surprise for debug sessions is attempting to step
into any of these calls, since different functions may be called. For
the same reason, setting breakpoints on the functions, both the ones
that are explicitly called, and the ones that may end up called
instead, will yield surprising results.
Reschedule instructions before register allocation.
See the analysis under --schedule-insns2. While
that pass runs after mapping pseudo registers to hardware registers or
stack slots, this one runs with a virtually infinite (pseudo) register
file. Pseudo registers are less likely than hardware ones to overlap
and conflict, so scheduling insns before register allocation resets
fewer debug binds than scheduling them after register allocation.
Furthermore, the earlier scheduling reduces the amount of scheduling
done later, which further helps preserve debug binds.
Set the used basic block reordering algorithm to STC.
The STC algorithm, unlike the default simple one, may duplicate blocks
and rotate loops, but still without any significant effect on the
debug experience.
-O3: optimize=3
Perform expensive optimizations, that might even make the program
larger and slower.
This option sets the optimization level to 3.
At optimization levels 3 or higher, loop peeling and complete
unrolling (see --peel-loops) are permitted
to grow code size, but this by itself does not affect debugging.
Computation of the iteration count and other loop properties may be
simplified using the evolutions of the loop invariants in outer loops,
enabling loop transformations that might not otherwise be performed in
specific cases, but whose effects on debugging are no different from
those of other transformations that could be performed regardless.
Enable loop vectorization on trees.
This flag is only activated when --tree-loop-optimize is activated.
This flag enables --tree-loop-if-convert.
Along with --tree-ch, it enables the ch_vect pass.
Along with --section-anchors, it enables the increase_alignment pass,
that increases (without any impact on debugging) the alignment of
global arrays so that loops over them can be vectorized.
This transformation, regardless of the selected cost model, combines
multiple iterations of a loop into one that uses vector operations to
perform the equivalent work of the combined iterations. This is
extremely confusing for debugging, not just because of the significant
control flow changes, but also because debug annotations used to
counter the effects of optimizations on debugging are discarded or
disabled. It might be possible to aggregate and unroll the debug
annotations of multiple iterations at the end of each vectorized
iteration, so as to make their effects progressively visible while
single-stepping over the markers.
Use the dynamic cost model for vectorization.
This affects --tree-loop-vectorize and
--tree-slp-vectorize decisions, but not the
kinds of transformations they make.
Perform cloning to make Interprocedural constant propagation stronger.
This flag, when disabled, stops externally-visible functions from
being versioned for constant propagation into them, disabling all
transformations enabled by --ipa-cp for such functions.
Conversely, enabling it does not introduce any kind of effect that
isn't potentially observable when --ipa-cp is enabled, it just extends
such effects to externally-visible functions.
See --inline-functions under -Os. This
is the only flag that's not in a strict crescendo of optimization
flags, in that -Os and -O3 have it enabled, but -O2
that's otherwise between -Os and -O3 doesn't.
--tree-partial-pre: pass_pre
In SSA-PRE optimization on trees, enable partial-partial redundancy
elimination.
The considerations that apply to SSA-PRE also apply to
this flag and its effects on the SSA-PRE pass.
Perform loop unswitching.
This flag is only activated when --tree-loop-optimize is activated.
This pass hoists invariant conditionals within inner loops, using loop
versioning to create two versions of the loop, one for each value of
the conditional, deciding once which version of the loop to enter. It
may further hoist such conditionals out of outer loops, without
versioning, if the outer loops are simple enough.
One might expect the early execution of the conditional to be
confusing for interactive debugging sessions, but it is actually
transparent: the condition has to be so trivial to compute that it is
moved without the corresponding line number information, and it is
executed as if part of the loop preheader. What's more: the original
test is not removed from either version of the loop, it is rather
replaced with a test that trivially evaluates to true or false. Even
if that ends up optimized out, a SFN marker remains for the test in
both versions of the loop, so it will be possible to stop at the test
point and verify the condition, whatever path is taken from it. Since
each block in the original loop will remain in at least one of the
loop versions, it will be possible to set breakpoints at any of the
lines of the loop after this transformation, even if some of the lines
may be duplicated. Single-stepping will not be surprising: guards of
conditional blocks will be stopped at, and the blocks will be entered
just when expected. As such, the impact of this transformation in the
debug experience is extremely low.
Perform loop splitting.
This flag is only activated when --tree-loop-optimize is activated.
This turns a loop with conditional blocks and a controlling condition
that changes value once throughout the iteration space into two loops,
each with only one of the conditional blocks. It uses loop versioning
to create two copies of the loop, using the controlling condition to
decide which of the versions to run. Then, it connects the exit of
the first loop to the entry of the second, adjusts the exit condition
of the first loop to transition to the other loop at the point the
condition switches, and forces the controlling conditions in each
block to the known value, removing the unused conditional blocks in
each copy. None of these transformations has a significant impact on
debuggability.
The only actual issue I see, that is probably of little significance,
is that the block duplicating infrastructure does not copy bind
statements for label declarations that were optimized away, so, if
such a label is bound within the conditional block that is versioned
and then discarded from the original loop, the label will seem to be
completely gone, even though a block containing it will still be
reachable in one of the loops.
Perform unroll-and-jam on loops.
This flag is only activated when --tree-loop-optimize is activated.
This transformations unrolls an outer loop and jams the multiple
instances of the inner loop into a single loop. This changes the
iteration sequence e.g. from [(0,0), (0,1), ..., (0,n), (1,0),
... (1,n), ... (m,n)] to [(0,0), (1,0), (0,1), (1,1), ... (0,n),
(1,n), (2,0), (3,0), (2,1), ... (m,n)]. This can be extremely
disruptive to debugging, as this sort of transformation, that
effectively modifies the order in which major blocks of computation
are executed, cannot be made up for with the existing infrastructure
to retain debug information across optimizations.
Considering the limited kinds of computations the may be performed in
such loops so as to enable this sort of transformation, it seems that
it might be possible to attempt to output debug information that would
enable a debugger to emulate the original loop nest, but it is not
evident that current debug information formats are sufficiently
expressive for that, nor that it would be worth the trouble.
It might be more useful to be able to somehow represent what kind of
loop transformation took place, so that users can understand what is
actually going on, rather than attempting to pretend we are still
running the original loop nest.
Enable loop distribution on trees.
Enable loop distribution for patterns transformed into a library call.
These flags are only activated when --tree-loop-optimize is activated.
Both enable the same pass, that partitions suitable inner loops each
into two loops over the same iteration space, copying the loop and
then removing stmt that should remain in only one of the loop bodies.
The multiple iterations over different statements of a loop can be
very confusing when debugging. Removed stmts cause debug binds that
reference them to be reset, which makes variables available in at most
one of the two iterations.
Enable loop interchange on trees.
This flag is only activated when --tree-loop-optimize is activated.
This transformation rearranges a loop nest, attempting to swap the
induction variables for each pair of loops in a nest. This changes
the order in which the nest's iteration space is walked, which is
confusing for debugging, and as it swaps and replaces induction
variables, it resets binds to the original ones, so the iteration
variables will not be visible within the loops after the
transformation. This makes it very difficult to do any debugging of
such loops.
This pass is enabled by default when --tree-loop-vectorize is enabled, but it is only activated when
--tree-loop-optimize is also activated.
It transforms multi-block loop bodies into a single basic block,
possibly after versioning the loop, turning statements in conditional
blocks into conditional statements. It makes debugging very hard, as
it resets all debug binds in the loop, and rearranges control flow so
that all conditional blocks become unconditionally executed.
Conditional binds and markers might alleviate this, enabling blocks
that wouldn't be executed without the optimization to be skipped
during debugging.
Run predictive commoning optimization.
This flag is only activated when --tree-loop-optimize is activated.
This pass optimizes loops by identifying and analyzing dependence
chains and unrolling them the right number of times to reuse loads and
stored values across iterations and remove dead stores. The removal
of dead stores may confuse debugging sessions, because inspecting
arrays will not show the temporarily-stored values, while removal of
loads may confuse sessions that modify the array expecting modified
values to be loaded and used, an expectation that may not be met if
the value was already loaded from memory.
Perform loop peeling.
This amounts to copying the blocks that make up the loop body so that
they can be run linearly before entering the remaining loop. Such
block duplication does not in itself cause any harm to the debugging
experience, but the linearization of initial iterations of the loop
can make room for other optimizations that could in turn make
debugging more difficult.
Enable basic block vectorization (SLP) on trees.
This pass detects opportunities to use vector operations, instead of
multiple operations on adjacent memory, in linear code. Although this
pass does not reset debug binds, unlike the loop vectorizer, that
hardly matters: the combined operations most often involve memory
references, and those do not involve debug binds. So, as they are
recombined, the timing of effects diverges from that implied by debug
markers, which makes debugging very confusing.
Split paths leading to loop backedges.
This flag is only activated when --tree-loop-optimize is activated.
This pass duplicates a basic block that dominates the loop latch, if
it ends in a conditional that may exit the loop, and it is the block
that closes a simple diamond in the control flow graph. This has no
effect on debugging, aside from the need for breakpoints in the
duplicate block covering more than one code address.
--gcse-after-reload: pass_gcse2
Perform global common subexpression elimination after register
allocation has finished.
Although the implementation of this pass is not the same as that of
--gcse PRE or hoist, and this pass's focus is
exclusively on eliminating loads, the insertion and deletion of loads
uses the same logic and thus has the same effects on debugging. Since
pseudos cannot be introduced after reload, it has to reuse registers
for loads and copies. This is done without regard to debug binds, but
the registers must not be live for them to be reused so, which implies
they couldn't be used in debug binds. So, the impact of that should
be limited to early unavailability of variables that happened to be
available at such registers, or computable with expressions involving
them.
-Ofast: optimize=3 + fast
Perform expensive optimizations, and also unsafe math transformations
that could make standard-compliant programs misbehave.
This option sets the optimization level to 3, while also enabling the
--fast-math option.
This flag enables multiple options that disable various aspects of
floating-point strict correctness. Several of them may allow
simplifications that would otherwise not take place, from folding to
removal of exception handling regions that could only catch
floating-point exceptions. Such simplifications, though enabled by
this flag, are not of kinds that could not possibly arise in the
absence of such flags. Its impact on the debugging experience is thus
regarded as very low.
Allow optimization for floating-point division which may change the
result of the operation due to rounding.
This optimization substitutes floating-point division by a SSA_NAME
with multiplication by the reciprocal. Squared divisors are also
detected and factored. The reciprocal of the SSA_NAME and of its
square, when needed, are inserted after the definition or before a
division. Divisions are turned into multiplications in place, so
there is no effect on debugging.
Highlights
Analyzed optimizations are so diverse that it is hardly possible to
summarize the various forms of impact on debug information of passes
that have any. The good news is that the findings are probably not
surprising for anyone familiar with the internal behavior of the
passes, and of the techniques used to mask the effects of optimization
on debugging. There are, however, a few findings that I consider
surprising, in a positive or negative way. A number of highlighted
issues can be fixed without much effort; others require far more
elaborate work, while others yet may border the unfixable.
I was surprised, throughout the analysis, by how seamless the
introduction of VTA turned out to be, especially in gimple. Very few
passes required additional logic to adjust debug binds: in nearly
every case, the decision was between disregarding debug binds or
adjusting them just like non-debug stmts or insns. This was favored
by logic that detected and coped with debug uses of dead pseudos in
RTL, and that dealt with adjustments to debug binds, sometimes
inserting debug temps, when moving or removing assignments in gimple
and RTL. Reviewing all these passes, I realized there may be room for
improvement when moving SSA defs to dominating blocks: some means to
signal, or detect internally, that such a move does not require
adjustments would avoid some unnecessary forward propagation or
introduction of debug temps, which both carry a risk of loss of debug
information. Cases in which SSA defs are removed before new,
equivalent defs are inserted at nearly the same point (e.g., replacing
a PHI node with an assign) can also be improved.
The option -Wnull-dereference enables the isolate-paths pass, that may have codegen effects
(e.g. changing returns of addresses of local automatic variables to
null), even if both --isolate-erroneous-paths-* flags, that are
supposed to enable codegen changes in this pass, are disabled.
Another case that is not too hard to fix is the lack of adjustment of
debug binds under --auto-inc-dec.
Although -Og is supposed to avoid harming debugging, it enables
--delayed-branch, that moves insns without regard
to preserving correctness of previously-computed variable locations,
and other potential harmful effects on branches and calls. It should
probably not be enabled at -Og.
Other very late optimization passes that may corrupt variable
locations are --peephole, also at -Og, and --peephole2, at -Os. They run after variable tracking, so
adjusting debug binds so as to recompute locations is not much of an
option. Adjusting notes might be possible, at significant effort, but
--peephole2 may actually drop notes that apear between peepholed
insns, and it is very hard to argue that doing something else would be
uniformly superior. These passes are limited to some target
architectures, but their effect on affected architectures could be
very significant.
Other passes that may break variable location information are those
that move or remove memory stores. Addressable variables are not
subject to debug binds, so such changes actually make their effects
observable at unexpected points, or not at all. Flags --tree-dse and --tree-sink enable such
optimizations, both implied by -Og. Flags --tree-loop-vectorize and --tree-slp-vectorize, both enabled at -O3, may bring about similar
effects on variables in memory, but there is hardly any expectation of
retaining significant debuggability after these.
Still, it might be worth exploring possibilities of extending VTA-like
tracking to non-scalar variables. Besides the above, and the late
tracking of addressable variables that become non-addressable and then
scalars due to optimizations, it might help mask optimization effects
of --split-wide-types, --tree-sra,
and --ipa-sra, that introduce scalars too late
to ensure debug binds are created at the correct points.
Furthermore, whatever support there is to track split-out components
separately, so as to be able to describe the aggregate location
member-wise, seems to not be up to the task. The effects of IPA SRA
on debugging are even worse, as dismembered params end up not
represented at all. It is not clear that there are means to express
such an apparently dropped parm as a composition of actual parms: some
extensions might be required to even start fixing IPA SRA.
Several optimizations that reorganize the control flow graph may drop
debug markers and binds. Gimple jump threading, for example, won't
duplicate forwarding blocks, discarding all debug stmts in them. In
some cases, it wouldn't be hard to retain them in predecessor or
successor blocks, but in others, some way to mark such stmts as
conditional might be the only way to preserve them. Conditional binds
can be handled with some effort in var-tracking and existing location
expressions and lists, but conditional markers would require some
extension to line number tables to enable debug information consumers
to decide e.g. whether or not a breakpoint at a line was hit when
reaching a conditional marker for that line. This could become a very
large project, but with significant expected benefits. Such an
extension could benefit many other passes: (RTL) --thread-jumps, --if-conversion,
--if-conversion2, --ssa-phiopt,
--crossjumping, --tree-tail-merge, and even such loop optimizations as
--tree-loop-if-convert.
I was a bit surprised to find out that a number of loop optimizations
did not harm debugging. It expected loop unrolling would be harmless,
but --split-loops, --unswitch-loops, and --peel-loops were
also found to not affect debug information, unlike transformations
that modify the order in which points in the iteration space are
visited, such as --loop-unroll-and-jam and
--tree-loop-vectorize.
Another somewhat surprising effect of induction variable optimizations
on loops, particularly --branch-count-reg, was the
risk of losing bindings for user-defined induction variables. Even if
they can be expressed in terms of remaining basic induction variables,
if the user-defined induction variable is no longer needed, there is
no effort to adjust debug bind insns accordingly. There is room for
improvement without much effort.
Partial inlining brings a significant challenge to debug information
representation: although a function fragment can be linked back to the
original abstract function and set some variables up to take locations
and values from from the caller, expressing that the concrete
subprogram is a fragment that does not contain an entry point for the
function requires extensions. It would take further extensions to
express how inlined subroutines combine with this fragment to form the
entire abstract subprogram, and even to support multiple splits of the
same subprogram. Similar mechanisms could also represent OpenMP
function fragments.
Identical code folding (--ipa-icf) is another
challenging case for debugging: a single executable code sequence may
be used to represent multiple unrelated functions, each requiring a
separate set of debug annotations. One potential way to address this
is to combine debug notes (markers and binds) from all functions that
share the same executable code, making all the notes conditional on
DWARF procedures that can determine which of the combined functions is
active, from e.g. caller points, or some other means to tell them
apart. Ideally the symbolic information of each such function could
be kept separate and guarded by the same conditionals, so that only
scopes and variables of the activated function are considered
available. This will require further extensions to debug information
representation conventions.
ChangeLog
2018-10-11 v1.0
SFNs are only available in C and C++ so far. Improved wording and
fixed spelling. Moved --caller-saves to the right place. Split
--peephole out of --peephole2. Reorganize cse order and links.
Mention the -O* crescendo sooner. Name passes after options, and
before paragraphs. Added more anchors, reorganized earlier ones, and
added in-text links. Check that all anchors are used, and that no
links are dangling.
2018-10-02 v0.9 DRAFT
Introduced section structure and section names, pass names next to
flags and a pass list as a TOC. Added some more info on how to tell
whether a pass is run. Highlighted the case of addressable variables
becoming scalars as benefitting from binds on non-scalars. Added
ChangeLog.
2018-09-04 v0.8 DRAFT
First published draft.
----
based on GCC 8.1.1 (gcc-8-branch@259831 68fc0ec2c57b0519bd7e1f9e013f37f112d65a3d)