fbpx
Wikipedia

Data-flow analysis

Data-flow analysis is a technique for gathering information about the possible set of values calculated at various points in a computer program. A program's control-flow graph (CFG) is used to determine those parts of a program to which a particular value assigned to a variable might propagate. The information gathered is often used by compilers when optimizing a program. A canonical example of a data-flow analysis is reaching definitions.

A simple way to perform data-flow analysis of programs is to set up data-flow equations for each node of the control-flow graph and solve them by repeatedly calculating the output from the input locally at each node until the whole system stabilizes, i.e., it reaches a fixpoint. This general approach, also known as Kildall's method, was developed by Gary Kildall while teaching at the Naval Postgraduate School.[1][2][3][4]

Basic principles edit

Data-flow analysis is the process of collecting information about the way the variables are defined and used in the program. It attempts to obtain particular information at each point in a procedure. Usually, it is enough to obtain this information at the boundaries of basic blocks, since from that it is easy to compute the information at points in the basic block. In forward flow analysis, the exit state of a block is a function of the block's entry state. This function is the composition of the effects of the statements in the block. The entry state of a block is a function of the exit states of its predecessors. This yields a set of data-flow equations:

For each block b:

 
 

In this,   is the transfer function of the block  . It works on the entry state  , yielding the exit state  . The join operation   combines the exit states of the predecessors   of  , yielding the entry state of  .

After solving this set of equations, the entry and/or exit states of the blocks can be used to derive properties of the program at the block boundaries. The transfer function of each statement separately can be applied to get information at a point inside a basic block.

Each particular type of data-flow analysis has its own specific transfer function and join operation. Some data-flow problems require backward flow analysis. This follows the same plan, except that the transfer function is applied to the exit state yielding the entry state, and the join operation works on the entry states of the successors to yield the exit state.

The entry point (in forward flow) plays an important role: Since it has no predecessors, its entry state is well defined at the start of the analysis. For instance, the set of local variables with known values is empty. If the control-flow graph does not contain cycles (there were no explicit or implicit loops in the procedure) solving the equations is straightforward. The control-flow graph can then be topologically sorted; running in the order of this sort, the entry states can be computed at the start of each block, since all predecessors of that block have already been processed, so their exit states are available. If the control-flow graph does contain cycles, a more advanced algorithm is required.

An iterative algorithm edit

The most common way of solving the data-flow equations is by using an iterative algorithm. It starts with an approximation of the in-state of each block. The out-states are then computed by applying the transfer functions on the in-states. From these, the in-states are updated by applying the join operations. The latter two steps are repeated until we reach the so-called fixpoint: the situation in which the in-states (and the out-states in consequence) do not change.

A basic algorithm for solving data-flow equations is the round-robin iterative algorithm:

for i ← 1 to N
initialize node i
while (sets are still changing)
for i ← 1 to N
recompute sets at node i

Convergence edit

To be usable, the iterative approach should actually reach a fixpoint. This can be guaranteed by imposing constraints on the combination of the value domain of the states, the transfer functions and the join operation.

The value domain should be a partial order with finite height (i.e., there are no infinite ascending chains   <   < ...). The combination of the transfer function and the join operation should be monotonic with respect to this partial order. Monotonicity ensures that on each iteration the value will either stay the same or will grow larger, while finite height ensures that it cannot grow indefinitely. Thus we will ultimately reach a situation where T(x) = x for all x, which is the fixpoint.

The work list approach edit

It is easy to improve on the algorithm above by noticing that the in-state of a block will not change if the out-states of its predecessors don't change. Therefore, we introduce a work list: a list of blocks that still need to be processed. Whenever the out-state of a block changes, we add its successors to the work list. In each iteration, a block is removed from the work list. Its out-state is computed. If the out-state changed, the block's successors are added to the work list. For efficiency, a block should not be in the work list more than once.

The algorithm is started by putting information-generating blocks in the work list. It terminates when the work list is empty.

Ordering edit

The efficiency of iteratively solving data-flow equations is influenced by the order at which local nodes are visited.[5] Furthermore, it depends on whether the data-flow equations are used for forward or backward data-flow analysis over the CFG. Intuitively, in a forward flow problem, it would be fastest if all predecessors of a block have been processed before the block itself, since then the iteration will use the latest information. In the absence of loops it is possible to order the blocks in such a way that the correct out-states are computed by processing each block only once.

In the following, a few iteration orders for solving data-flow equations are discussed (a related concept to iteration order of a CFG is tree traversal of a tree).

  • Random order - This iteration order is not aware whether the data-flow equations solve a forward or backward data-flow problem. Therefore, the performance is relatively poor compared to specialized iteration orders.
  • Postorder - This is a typical iteration order for backward data-flow problems. In postorder iteration, a node is visited after all its successor nodes have been visited. Typically, the postorder iteration is implemented with the depth-first strategy.
  • Reverse postorder - This is a typical iteration order for forward data-flow problems. In reverse-postorder iteration, a node is visited before any of its successor nodes has been visited, except when the successor is reached by a back edge. (Note that reverse postorder is not the same as preorder.)

Initialization edit

The initial value of the in-states is important to obtain correct and accurate results. If the results are used for compiler optimizations, they should provide conservative information, i.e. when applying the information, the program should not change semantics. The iteration of the fixpoint algorithm will take the values in the direction of the maximum element. Initializing all blocks with the maximum element is therefore not useful. At least one block starts in a state with a value less than the maximum. The details depend on the data-flow problem. If the minimum element represents totally conservative information, the results can be used safely even during the data-flow iteration. If it represents the most accurate information, fixpoint should be reached before the results can be applied.

Examples edit

The following are examples of properties of computer programs that can be calculated by data-flow analysis. Note that the properties calculated by data-flow analysis are typically only approximations of the real properties. This is because data-flow analysis operates on the syntactical structure of the CFG without simulating the exact control flow of the program. However, to be still useful in practice, a data-flow analysis algorithm is typically designed to calculate an upper respectively lower approximation of the real program properties.

Forward analysis edit

The reaching definition analysis calculates for each program point the set of definitions that may potentially reach this program point.

 if b == 4 then  a = 5;  else  a = 3;  endif   if a < 4 then  ... 

The reaching definition of variable a at line 7 is the set of assignments a = 5 at line 2 and a = 3 at line 4.

Backward analysis edit

The live variable analysis calculates for each program point the variables that may be potentially read afterwards before their next write update. The result is typically used by dead code elimination to remove statements that assign to a variable whose value is not used afterwards.

The in-state of a block is the set of variables that are live at the start of it. It initially contains all variables live (contained) in the block, before the transfer function is applied and the actual contained values are computed. The transfer function of a statement is applied by killing the variables that are written within this block (remove them from the set of live variables). The out-state of a block is the set of variables that are live at the end of the block and is computed by the union of the block's successors' in-states.

Initial code:

Backward analysis:

The in-state of b3 only contains b and d, since c has been written. The out-state of b1 is the union of the in-states of b2 and b3. The definition of c in b2 can be removed, since c is not live immediately after the statement.

Solving the data-flow equations starts with initializing all in-states and out-states to the empty set. The work list is initialized by inserting the exit point (b3) in the work list (typical for backward flow). Its computed in-state differs from the previous one, so its predecessors b1 and b2 are inserted and the process continues. The progress is summarized in the table below.

processing out-state old in-state new in-state work list
b3 {} {} {b,d} (b1,b2)
b1 {b,d} {} {} (b2)
b2 {b,d} {} {a,b} (b1)
b1 {a,b,d} {} {} ()

Note that b1 was entered in the list before b2, which forced processing b1 twice (b1 was re-entered as predecessor of b2). Inserting b2 before b1 would have allowed earlier completion.

Initializing with the empty set is an optimistic initialization: all variables start out as dead. Note that the out-states cannot shrink from one iteration to the next, although the out-state can be smaller than the in-state. This can be seen from the fact that after the first iteration the out-state can only change by a change of the in-state. Since the in-state starts as the empty set, it can only grow in further iterations.

Other approaches edit

Several modern compilers use static single-assignment form as the method for analysis of variable dependencies. [6]

In 2002, Markus Mohnen described a new method of data-flow analysis that does not require the explicit construction of a data-flow graph,[7] instead relying on abstract interpretation of the program and keeping a working set of program counters. At each conditional branch, both targets are added to the working set. Each path is followed for as many instructions as possible (until end of program or until it has looped with no changes), and then removed from the set and the next program counter retrieved.

A combination of control flow analysis and data flow analysis has shown to be useful and complementary in identifying cohesive source code regions implementing functionalities of a system (e.g., features, requirements or use cases).[8]

Special classes of problems edit

There are a variety of special classes of dataflow problems which have efficient or general solutions.

Bit vector problems edit

The examples above are problems in which the data-flow value is a set, e.g. the set of reaching definitions (Using a bit for a definition position in the program), or the set of live variables. These sets can be represented efficiently as bit vectors, in which each bit represents set membership of one particular element. Using this representation, the join and transfer functions can be implemented as bitwise logical operations. The join operation is typically union or intersection, implemented by bitwise logical or and logical and. The transfer function for each block can be decomposed in so-called gen and kill sets.

As an example, in live-variable analysis, the join operation is union. The kill set is the set of variables that are written in a block, whereas the gen set is the set of variables that are read without being written first. The data-flow equations become

 
 

In logical operations, this reads as

out(b) = 0 for s in succ(b) out(b) = out(b) or in(s) in(b) = (out(b) and not kill(b)) or gen(b) 

Dataflow problems which have sets of data-flow values which can be represented as bit vectors are called bit vector problems, gen-kill problems, or locally separable problems.[9] Such problems have generic polynomial-time solutions.[10]

In addition to the reaching definitions and live variables problems mentioned above, the following problems are instances of bitvector problems:[10]

IFDS problems edit

Interprocedural, finite, distributive, subset problems or IFDS problems are another class of problem with a generic polynomial-time solution.[9][11] Solutions to these problems provide context-sensitive and flow-sensitive dataflow analyses.

There are several implementations of IFDS-based dataflow analyses for popular programming languages, e.g. in the Soot[12] and WALA[13] frameworks for Java analysis.

Every bitvector problem is also an IFDS problem, but there are several significant IFDS problems that are not bitvector problems, including truly-live variables and possibly-uninitialized variables.

Sensitivities edit

Data-flow analysis is typically path-insensitive, though it is possible to define data-flow equations that yield a path-sensitive analysis.

  • A flow-sensitive analysis takes into account the order of statements in a program. For example, a flow-insensitive pointer alias analysis may determine "variables x and y may refer to the same location", while a flow-sensitive analysis may determine "after statement 20, variables x and y may refer to the same location".
  • A path-sensitive analysis computes different pieces of analysis information dependent on the predicates at conditional branch instructions. For instance, if a branch contains a condition x>0, then on the fall-through path, the analysis would assume that x<=0 and on the target of the branch it would assume that indeed x>0 holds.
  • A context-sensitive analysis is an interprocedural analysis that considers the calling context when analyzing the target of a function call. In particular, using context information one can jump back to the original call site, whereas without that information, the analysis information has to be propagated back to all possible call sites, potentially losing precision.

List of data-flow analyses edit

See also edit

References edit

  1. ^ Kildall, Gary Arlen (May 1972). Global expression optimization during compilation (Ph.D. dissertation). Seattle, Washington, USA: University of Washington, Computer Science Group. Thesis No. 20506, Technical Report No. 72-06-02.
  2. ^ Kildall, Gary Arlen (1973-10-01). "A Unified Approach to Global Program Optimization" (PDF). Proceedings of the 1st Annual ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (POPL). POPL '73. Boston, Massachusetts, USA: 194–206. doi:10.1145/512927.512945. hdl:10945/42162. S2CID 10219496. (PDF) from the original on 2017-06-29. Retrieved 2006-11-20. ([1])
  3. ^ Rüthing, Oliver; Knoop, Jens; Steffen, Bernhard (2003-07-31) [1999]. "Optimization: Detecting Equalities of Variables, Combining Efficiency with Precision". In Cortesi, Agostino; Filé, Gilberto (eds.). Static Analysis: 6th International Symposium, SAS'99, Venice, Italy, September 22–24, 1999, Proceedings. Lecture Notes in Computer Science. Vol. 1694 (illustrated ed.). Springer. pp. 232–247 [233]. ISBN 9783540664598. ISSN 0302-9743.
  4. ^ Huitt, Robert; Eubanks, Gordon; Rolander, Thomas "Tom" Alan; Laws, David; Michel, Howard E.; Halla, Brian; Wharton, John Harrison; Berg, Brian; Su, Weilian; Kildall, Scott; Kampe, Bill (2014-04-25). Laws, David (ed.). "Legacy of Gary Kildall: The CP/M IEEE Milestone Dedication" (PDF) (video transscription). Pacific Grove, California, USA: Computer History Museum. CHM Reference number: X7170.2014. Retrieved 2020-01-19. […] Eubanks: […] Gary […] was an inventor, he was inventive, he did things. His Ph.D. thesis proved that global flow analysis converges. […] This is a fundamental idea in computer science. […] I took a […] summer course once from a guy named Dhamdhere […] they talked about optimization for like a week and then they put a slide up and said, "Kildall's Method," this is the real story. […] that's something that no one ever thinks about. […] [2][3] (33 pages)
  5. ^ Cooper, Keith D.; Harvey, Timothy J.; Kennedy, Ken (2004-03-26) [November 2002]. "Iterative Data-Flow Analysis, Revisited" (PDF). PLDI 2003. ACM. TR04-432. Retrieved 2017-07-01.[permanent dead link]
  6. ^ "Static Single Assignment (with relevant examples)". GeeksforGeeks. 2021-10-02. Retrieved 2023-08-16.
  7. ^ Mohnen, Markus (2002). "A Graph—Free Approach to Data—Flow Analysis". Compiler Construction. Lecture Notes in Computer Science. Vol. 2304. pp. 185–213. doi:10.1007/3-540-45937-5_6. ISBN 978-3-540-43369-9.
  8. ^ Kuang, Hongyu; Mäder, Patrick; Hu, Hao; Ghabi, Achraf; Huang, LiGuo; Lü, Jian; Egyed, Alexander (2015-11-01). "Can method data dependencies support the assessment of traceability between requirements and source code?". Journal of Software: Evolution and Process. 27 (11): 838–866. doi:10.1002/smr.1736. ISSN 2047-7481. S2CID 39846438.
  9. ^ a b Reps, Thomas; Horwitz, Susan; Sagiv, Mooly (1995). "Precise interprocedural dataflow analysis via graph reachability". Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages - POPL '95. New York, New York, USA: ACM Press. pp. 1, 49–61. doi:10.1145/199448.199462. ISBN 0-89791692-1. S2CID 5955667.
  10. ^ a b Knoop, Jens; Steffen, Bernhard; Vollmer, Jürgen (1996-05-01). "Parallelism for free: efficient and optimal bitvector analyses for parallel programs". ACM Transactions on Programming Languages and Systems. 18 (3): 268–299. doi:10.1145/229542.229545. ISSN 0164-0925. S2CID 14123780.
  11. ^ Naeem, Nomair A.; Lhoták, Ondřej; Rodriguez, Jonathan (2010), "Practical Extensions to the IFDS Algorithm", Compiler Construction, Lecture Notes in Computer Science, vol. 6011, Berlin / Heidelberg, Germany: Springer Verlag, pp. 124–144, doi:10.1007/978-3-642-11970-5_8, ISBN 978-3-64211969-9
  12. ^ Bodden, Eric (2012). "Inter-procedural data-flow analysis with IFDS/IDE and Soot". Proceedings of the ACM SIGPLAN International Workshop on State of the Art in Java Program analysis. New York, New York, USA: ACM Press. pp. 3–8. doi:10.1145/2259051.2259052. ISBN 978-1-45031490-9. S2CID 3020481.
  13. ^ Rapoport, Marianna; Lhoták, Ondřej; Tip, Frank (2015). Precise Data Flow Analysis in the Presence of Correlated Method Calls. International Static Analysis Symposium. Lecture Notes in Computer Science. Vol. 9291. Berlin / Heidelberg, Germany: Springer Verlag. pp. 54–71. doi:10.1007/978-3-662-48288-9_4. ISBN 978-3-66248287-2.

Further reading edit

data, flow, analysis, this, article, about, static, program, analysis, dynamic, program, analysis, dynamic, program, analysis, dynamic, data, flow, analysis, this, section, needs, additional, citations, verification, please, help, improve, this, article, addin. This article is about static program analysis For dynamic program analysis see Dynamic program analysis Dynamic data flow analysis This section needs additional citations for verification Please help improve this article by adding citations to reliable sources in this section Unsourced material may be challenged and removed February 2018 Learn how and when to remove this template message Data flow analysis is a technique for gathering information about the possible set of values calculated at various points in a computer program A program s control flow graph CFG is used to determine those parts of a program to which a particular value assigned to a variable might propagate The information gathered is often used by compilers when optimizing a program A canonical example of a data flow analysis is reaching definitions A simple way to perform data flow analysis of programs is to set up data flow equations for each node of the control flow graph and solve them by repeatedly calculating the output from the input locally at each node until the whole system stabilizes i e it reaches a fixpoint This general approach also known as Kildall s method was developed by Gary Kildall while teaching at the Naval Postgraduate School 1 2 3 4 Contents 1 Basic principles 2 An iterative algorithm 2 1 Convergence 2 2 The work list approach 2 3 Ordering 2 4 Initialization 3 Examples 3 1 Forward analysis 3 2 Backward analysis 4 Other approaches 5 Special classes of problems 5 1 Bit vector problems 5 2 IFDS problems 6 Sensitivities 7 List of data flow analyses 8 See also 9 References 10 Further readingBasic principles editData flow analysis is the process of collecting information about the way the variables are defined and used in the program It attempts to obtain particular information at each point in a procedure Usually it is enough to obtain this information at the boundaries of basic blocks since from that it is easy to compute the information at points in the basic block In forward flow analysis the exit state of a block is a function of the block s entry state This function is the composition of the effects of the statements in the block The entry state of a block is a function of the exit states of its predecessors This yields a set of data flow equations For each block b o u t b t r a n s b i n b displaystyle out b trans b in b nbsp i n b j o i n p p r e d b o u t p displaystyle in b join p in pred b out p nbsp In this t r a n s b displaystyle trans b nbsp is the transfer function of the block b displaystyle b nbsp It works on the entry state i n b displaystyle in b nbsp yielding the exit state o u t b displaystyle out b nbsp The join operation j o i n displaystyle join nbsp combines the exit states of the predecessors p p r e d b displaystyle p in pred b nbsp of b displaystyle b nbsp yielding the entry state of b displaystyle b nbsp After solving this set of equations the entry and or exit states of the blocks can be used to derive properties of the program at the block boundaries The transfer function of each statement separately can be applied to get information at a point inside a basic block Each particular type of data flow analysis has its own specific transfer function and join operation Some data flow problems require backward flow analysis This follows the same plan except that the transfer function is applied to the exit state yielding the entry state and the join operation works on the entry states of the successors to yield the exit state The entry point in forward flow plays an important role Since it has no predecessors its entry state is well defined at the start of the analysis For instance the set of local variables with known values is empty If the control flow graph does not contain cycles there were no explicit or implicit loops in the procedure solving the equations is straightforward The control flow graph can then be topologically sorted running in the order of this sort the entry states can be computed at the start of each block since all predecessors of that block have already been processed so their exit states are available If the control flow graph does contain cycles a more advanced algorithm is required An iterative algorithm editThe most common way of solving the data flow equations is by using an iterative algorithm It starts with an approximation of the in state of each block The out states are then computed by applying the transfer functions on the in states From these the in states are updated by applying the join operations The latter two steps are repeated until we reach the so called fixpoint the situation in which the in states and the out states in consequence do not change A basic algorithm for solving data flow equations is the round robin iterative algorithm for i 1 to Ninitialize node i dd while sets are still changing for i 1 to Nrecompute sets at node i dd dd Convergence edit To be usable the iterative approach should actually reach a fixpoint This can be guaranteed by imposing constraints on the combination of the value domain of the states the transfer functions and the join operation The value domain should be a partial order with finite height i e there are no infinite ascending chains x 1 displaystyle x 1 nbsp lt x 2 displaystyle x 2 nbsp lt The combination of the transfer function and the join operation should be monotonic with respect to this partial order Monotonicity ensures that on each iteration the value will either stay the same or will grow larger while finite height ensures that it cannot grow indefinitely Thus we will ultimately reach a situation where T x x for all x which is the fixpoint The work list approach edit It is easy to improve on the algorithm above by noticing that the in state of a block will not change if the out states of its predecessors don t change Therefore we introduce a work list a list of blocks that still need to be processed Whenever the out state of a block changes we add its successors to the work list In each iteration a block is removed from the work list Its out state is computed If the out state changed the block s successors are added to the work list For efficiency a block should not be in the work list more than once The algorithm is started by putting information generating blocks in the work list It terminates when the work list is empty Ordering edit The efficiency of iteratively solving data flow equations is influenced by the order at which local nodes are visited 5 Furthermore it depends on whether the data flow equations are used for forward or backward data flow analysis over the CFG Intuitively in a forward flow problem it would be fastest if all predecessors of a block have been processed before the block itself since then the iteration will use the latest information In the absence of loops it is possible to order the blocks in such a way that the correct out states are computed by processing each block only once In the following a few iteration orders for solving data flow equations are discussed a related concept to iteration order of a CFG is tree traversal of a tree Random order This iteration order is not aware whether the data flow equations solve a forward or backward data flow problem Therefore the performance is relatively poor compared to specialized iteration orders Postorder This is a typical iteration order for backward data flow problems In postorder iteration a node is visited after all its successor nodes have been visited Typically the postorder iteration is implemented with the depth first strategy Reverse postorder This is a typical iteration order for forward data flow problems In reverse postorder iteration a node is visited before any of its successor nodes has been visited except when the successor is reached by a back edge Note that reverse postorder is not the same as preorder Initialization edit The initial value of the in states is important to obtain correct and accurate results If the results are used for compiler optimizations they should provide conservative information i e when applying the information the program should not change semantics The iteration of the fixpoint algorithm will take the values in the direction of the maximum element Initializing all blocks with the maximum element is therefore not useful At least one block starts in a state with a value less than the maximum The details depend on the data flow problem If the minimum element represents totally conservative information the results can be used safely even during the data flow iteration If it represents the most accurate information fixpoint should be reached before the results can be applied Examples editThe following are examples of properties of computer programs that can be calculated by data flow analysis Note that the properties calculated by data flow analysis are typically only approximations of the real properties This is because data flow analysis operates on the syntactical structure of the CFG without simulating the exact control flow of the program However to be still useful in practice a data flow analysis algorithm is typically designed to calculate an upper respectively lower approximation of the real program properties Forward analysis edit The reaching definition analysis calculates for each program point the set of definitions that may potentially reach this program point if b 4 then a 5 else a 3 endif if a lt 4 then The reaching definition of variable a at line 7 is the set of assignments a 5 at line 2 and a 3 at line 4 Backward analysis edit The live variable analysis calculates for each program point the variables that may be potentially read afterwards before their next write update The result is typically used by dead code elimination to remove statements that assign to a variable whose value is not used afterwards The in state of a block is the set of variables that are live at the start of it It initially contains all variables live contained in the block before the transfer function is applied and the actual contained values are computed The transfer function of a statement is applied by killing the variables that are written within this block remove them from the set of live variables The out state of a block is the set of variables that are live at the end of the block and is computed by the union of the block s successors in states Initial code b1 a 3 b 5 d 4 x 100 if a gt b then b2 c a b d 2 b3 endif c 4 return b d c Backward analysis in b1 a 3 b 5 d 4 x 100 x is never being used later thus not in the out set a b d if a gt b then out a b d union of all in successors of b1 gt b2 a b and b3 b d in a b b2 c a b d 2 out b d in b d b3 endif c 4 return b d c out The in state of b3 only contains b and d since c has been written The out state of b1 is the union of the in states of b2 and b3 The definition of c in b2 can be removed since c is not live immediately after the statement Solving the data flow equations starts with initializing all in states and out states to the empty set The work list is initialized by inserting the exit point b3 in the work list typical for backward flow Its computed in state differs from the previous one so its predecessors b1 and b2 are inserted and the process continues The progress is summarized in the table below processing out state old in state new in state work listb3 b d b1 b2 b1 b d b2 b2 b d a b b1 b1 a b d Note that b1 was entered in the list before b2 which forced processing b1 twice b1 was re entered as predecessor of b2 Inserting b2 before b1 would have allowed earlier completion Initializing with the empty set is an optimistic initialization all variables start out as dead Note that the out states cannot shrink from one iteration to the next although the out state can be smaller than the in state This can be seen from the fact that after the first iteration the out state can only change by a change of the in state Since the in state starts as the empty set it can only grow in further iterations Other approaches editSeveral modern compilers use static single assignment form as the method for analysis of variable dependencies 6 In 2002 Markus Mohnen described a new method of data flow analysis that does not require the explicit construction of a data flow graph 7 instead relying on abstract interpretation of the program and keeping a working set of program counters At each conditional branch both targets are added to the working set Each path is followed for as many instructions as possible until end of program or until it has looped with no changes and then removed from the set and the next program counter retrieved A combination of control flow analysis and data flow analysis has shown to be useful and complementary in identifying cohesive source code regions implementing functionalities of a system e g features requirements or use cases 8 Special classes of problems editThere are a variety of special classes of dataflow problems which have efficient or general solutions Bit vector problems edit The examples above are problems in which the data flow value is a set e g the set of reaching definitions Using a bit for a definition position in the program or the set of live variables These sets can be represented efficiently as bit vectors in which each bit represents set membership of one particular element Using this representation the join and transfer functions can be implemented as bitwise logical operations The join operation is typically union or intersection implemented by bitwise logical or and logical and The transfer function for each block can be decomposed in so called gen and kill sets As an example in live variable analysis the join operation is union The kill set is the set of variables that are written in a block whereas the gen set is the set of variables that are read without being written first The data flow equations become o u t b s s u c c b i n s displaystyle out b bigcup s in succ b in s nbsp i n b o u t b k i l l b g e n b displaystyle in b out b kill b cup gen b nbsp In logical operations this reads as out b 0 for s in succ b out b out b or in s in b out b and not kill b or gen b Dataflow problems which have sets of data flow values which can be represented as bit vectors are called bit vector problems gen kill problems or locally separable problems 9 Such problems have generic polynomial time solutions 10 In addition to the reaching definitions and live variables problems mentioned above the following problems are instances of bitvector problems 10 Available expressions Very busy expressions Use definition chainsIFDS problems edit Interprocedural finite distributive subset problems or IFDS problems are another class of problem with a generic polynomial time solution 9 11 Solutions to these problems provide context sensitive and flow sensitive dataflow analyses There are several implementations of IFDS based dataflow analyses for popular programming languages e g in the Soot 12 and WALA 13 frameworks for Java analysis Every bitvector problem is also an IFDS problem but there are several significant IFDS problems that are not bitvector problems including truly live variables and possibly uninitialized variables Sensitivities editMain article Polyvariance Data flow analysis is typically path insensitive though it is possible to define data flow equations that yield a path sensitive analysis A flow sensitive analysis takes into account the order of statements in a program For example a flow insensitive pointer alias analysis may determine variables x and y may refer to the same location while a flow sensitive analysis may determine after statement 20 variables x and y may refer to the same location A path sensitive analysis computes different pieces of analysis information dependent on the predicates at conditional branch instructions For instance if a branch contains a condition x gt 0 then on the fall through path the analysis would assume that x lt 0 and on the target of the branch it would assume that indeed x gt 0 holds A context sensitive analysis is an interprocedural analysis that considers the calling context when analyzing the target of a function call In particular using context information one can jump back to the original call site whereas without that information the analysis information has to be propagated back to all possible call sites potentially losing precision List of data flow analyses editReaching definitions Liveness analysis Definite assignment analysis Available expression Constant propagationSee also editAbstract interpretation Control flow analysis XLT86References edit Kildall Gary Arlen May 1972 Global expression optimization during compilation Ph D dissertation Seattle Washington USA University of Washington Computer Science Group Thesis No 20506 Technical Report No 72 06 02 Kildall Gary Arlen 1973 10 01 A Unified Approach to Global Program Optimization PDF Proceedings of the 1st Annual ACM SIGACT SIGPLAN Symposium on Principles of Programming Languages POPL POPL 73 Boston Massachusetts USA 194 206 doi 10 1145 512927 512945 hdl 10945 42162 S2CID 10219496 Archived PDF from the original on 2017 06 29 Retrieved 2006 11 20 1 Ruthing Oliver Knoop Jens Steffen Bernhard 2003 07 31 1999 Optimization Detecting Equalities of Variables Combining Efficiency with Precision In Cortesi Agostino File Gilberto eds Static Analysis 6th International Symposium SAS 99 Venice Italy September 22 24 1999 Proceedings Lecture Notes in Computer Science Vol 1694 illustrated ed Springer pp 232 247 233 ISBN 9783540664598 ISSN 0302 9743 Huitt Robert Eubanks Gordon Rolander Thomas Tom Alan Laws David Michel Howard E Halla Brian Wharton John Harrison Berg Brian Su Weilian Kildall Scott Kampe Bill 2014 04 25 Laws David ed Legacy of Gary Kildall The CP M IEEE Milestone Dedication PDF video transscription Pacific Grove California USA Computer History Museum CHM Reference number X7170 2014 Retrieved 2020 01 19 Eubanks Gary was an inventor he was inventive he did things His Ph D thesis proved that global flow analysis converges This is a fundamental idea in computer science I took a summer course once from a guy named Dhamdhere they talked about optimization for like a week and then they put a slide up and said Kildall s Method this is the real story that s something that no one ever thinks about 2 3 33 pages Cooper Keith D Harvey Timothy J Kennedy Ken 2004 03 26 November 2002 Iterative Data Flow Analysis Revisited PDF PLDI 2003 ACM TR04 432 Retrieved 2017 07 01 permanent dead link Static Single Assignment with relevant examples GeeksforGeeks 2021 10 02 Retrieved 2023 08 16 Mohnen Markus 2002 A Graph Free Approach to Data Flow Analysis Compiler Construction Lecture Notes in Computer Science Vol 2304 pp 185 213 doi 10 1007 3 540 45937 5 6 ISBN 978 3 540 43369 9 Kuang Hongyu Mader Patrick Hu Hao Ghabi Achraf Huang LiGuo Lu Jian Egyed Alexander 2015 11 01 Can method data dependencies support the assessment of traceability between requirements and source code Journal of Software Evolution and Process 27 11 838 866 doi 10 1002 smr 1736 ISSN 2047 7481 S2CID 39846438 a b Reps Thomas Horwitz Susan Sagiv Mooly 1995 Precise interprocedural dataflow analysis via graph reachability Proceedings of the 22nd ACM SIGPLAN SIGACT symposium on Principles of programming languages POPL 95 New York New York USA ACM Press pp 1 49 61 doi 10 1145 199448 199462 ISBN 0 89791692 1 S2CID 5955667 a b Knoop Jens Steffen Bernhard Vollmer Jurgen 1996 05 01 Parallelism for free efficient and optimal bitvector analyses for parallel programs ACM Transactions on Programming Languages and Systems 18 3 268 299 doi 10 1145 229542 229545 ISSN 0164 0925 S2CID 14123780 Naeem Nomair A Lhotak Ondrej Rodriguez Jonathan 2010 Practical Extensions to the IFDS Algorithm Compiler Construction Lecture Notes in Computer Science vol 6011 Berlin Heidelberg Germany Springer Verlag pp 124 144 doi 10 1007 978 3 642 11970 5 8 ISBN 978 3 64211969 9 Bodden Eric 2012 Inter procedural data flow analysis with IFDS IDE and Soot Proceedings of the ACM SIGPLAN International Workshop on State of the Art in Java Program analysis New York New York USA ACM Press pp 3 8 doi 10 1145 2259051 2259052 ISBN 978 1 45031490 9 S2CID 3020481 Rapoport Marianna Lhotak Ondrej Tip Frank 2015 Precise Data Flow Analysis in the Presence of Correlated Method Calls International Static Analysis Symposium Lecture Notes in Computer Science Vol 9291 Berlin Heidelberg Germany Springer Verlag pp 54 71 doi 10 1007 978 3 662 48288 9 4 ISBN 978 3 66248287 2 Further reading editCooper Keith D Torczon Linda 2003 2002 01 01 Engineering a Compiler Morgan Kaufmann ISBN 978 1 55860 698 2 Muchnick Steven Stanley 1997 Advanced Compiler Design and Implementation Morgan Kaufmann Publishers ISBN 978 1 55860 320 2 Hecht Matthew S 1977 05 03 Flow Analysis of Computer Programs Programming Languages Series Vol 5 Elsevier North Holland Inc ISBN 978 0 44400210 5 Khedker Uday P Sanyal Amitabha Karkare Bageshri 2009 Data Flow Analysis Theory and Practice CRC Press Taylor and Francis Group Nielson Flemming Nielson Hanne Riis Hankin Chris 2005 Principles of Program Analysis Springer Science Business Media Retrieved from https en wikipedia org w index php title Data flow analysis amp oldid 1185206509, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.