开发者

Static Analysis tool to detect table dependencies in PHP application

I'm working on a large legacy PHP project. The problem that we are facing is that many of the php scripts are tightly coupled to the database which makes managing and changing the application very difficult since we don't have a clear idea of what kind of changes will break what.

The code isn't really 开发者_JAVA技巧all that clever. Most of the db calls look something like

$theQuery = "select * from $theDatabase.TABLEName";
$theResult = mysql_query($theQuery);
//Do some rendering

So most of the table dependencies should be detectable directly in the code without analyzing the runtime behavior of the code. Is there a tool (paid or free) where you could point it to a php file and give it a list of table names to look for and it would look through the file and all the includes for that file and give you a list of tables that are affected by that php script?

Also it would be ok if it can't guarantee correctness. Any information at all would be better then where we are now!

Thanks!


There are two levels at which you can do this.

  • A full static analysis that traces the assembly of string fragments into SQL calls, examining the strings as SQL text (parsing) and pulling out the information you want. Since your program may be talking to multiple databases, you also need to trace the database connection steps (sources of the value of 'theDatabase'), so that you determine what the tables and columns are supposed to be. This is really hard: requires a full PHP parser, control and dataflow analysis (in the face of a dynamic language, ugh), SQL parsing and extraction. Such a tool might exist but it would surprise the h--- out of me. (My company builds custom tools and we try to track this kind of thing).

  • A hueristic that extracts all the string fragments from your code ("select * from " and "TABLEName") and tries to guess the tables and columns from that. For this you need something that will extract all the strings, and tear them apart looking for evidence. The only evidence you have in your example is "select *" (meaning "all columns") and "TABLEName"; if you have the set of databases schemas, you could match against the tablename to determine the columns.

In either case you need something that will parse PHP to some degree; the first case better than the PHP interpreter (you have to do flow analysis across all the files that might be involved), the second case at the level of lexemes.

Our DMS Software Reengineering Toolkit with its PHP Front End would be a starting place for the deep semantic analysis tool. A lot of work to implement.

Our Source Code Search Engine might be a good starting place for the hueristic tool. It can extract all the string fragments (accurately even for PHP which is harder than it looks) and locations easily. [Perhaps the PHP tokenizer does this well enough.] With that information, the additional code to extract the table names from the string fragments shouldn't be too hard.


The refactor/replace tools in the JetBrains PHPStorm IDE can do this.

See the docs for the rename dialog and the find usages dialog.

Let me know if you need clarification.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜