THE GRAND P2C NOTES FILE: This file contains notes to myself recording bugs, flaws, and suggested improvements to p2c. They have roughly been separated into "old", "older", and "oldest" groups. I can't guarantee I'll do any of these. If you do, please mail me the diffs so I can incorporate them into the next release. Thanks! -- Dave Gillespie daveg@csvax.caltech.edu ----------------------------------------------------------------------------- The Pascal declaration type foo = set of (one, two); does not ever write out an "enum { one, two }" declaration in C! ----------------------------------------------------------------------------- Handling of comments that trail the ELSE clause of an IF statement could use some work. ----------------------------------------------------------------------------- Technically speaking, "for byte := min to max do" is legal even if min > 255, i.e., the limits only need to be in range if the body of the loop executes. Thus, FOR loops should really use a shadow parameter not just when max is the constant 255, but whenever min or max are not provably in byte range. ----------------------------------------------------------------------------- Have a "-M"-like mode in which FREE is suppressed altogether, useful in case p2c crashes because bugs have corrupted malloc's free list. ----------------------------------------------------------------------------- For expressions building small-sets whose maximum element is <= 15, use "1 << x" instead of "1L << x". ----------------------------------------------------------------------------- Handle VAX Pascal VARYING OF CHARs used as the arguments of WITH statements. ----------------------------------------------------------------------------- StringTruncLimit would be helped by expr.c:strmax() interpreting sprintf control strings. For %s, use strmax of the corresponding argument. For %d, use 11, etc. For %.10s, use min(10, strmax(arg)). For %.*s, just use strmax(arg), I guess. Have a mode in which such assignments are automatically truncated. Perform truncation testing for non-VAR procedure arguments, too. ----------------------------------------------------------------------------- In cref.p, the "strappend(buf,#0)" statement translates into "strcpy(STR1,buf); strcpy(buf,STR1)" with a warning about a null character in a sprintf control string! ----------------------------------------------------------------------------- Still having problems where the opening comment of an imported module's interface text is copied into the program. ----------------------------------------------------------------------------- VAX Pascal features not yet handled: [UNSAFE] attribute is only implemented in a few situations. [UNBOUND] attribute on a procedure says it doesn't need a static link. [TRUNCATE] attribute on a parameter allows optional params w/no default. [LIST] attribute on a parameter is like &rest in Lisp. Support types like, e.g., [LONG] BOOLEAN. Can assign "structurally compatible" but different record types. File intrinsics need serious work, especially OPEN. If a copy param is [READONLY], don't need to copy it. If a procedure is [ASYNCHRONOUS], make all its variables volatile. If a procedure is [NOOPTIMIZE], make all its variables volatile. Provide a real implementation of BIN function and :BIN read format. BIT_OFFSET and CARD intrinsics are not supported. ----------------------------------------------------------------------------- Modula-2 features not yet handled: Local modules are faked up in a pretty superficial way. WORD is compatible with both pointers and CARDINALs. WORD parameters are compatible with any word-sized object. ARRAY OF WORD parameters are compatible with absolutely anything. Improve treatment of character strings. Find manuals for real implementations of Modula-2 and implement any common language extensions. Fix p2c to read system.m2 instead of system.imp automatically. ----------------------------------------------------------------------------- Oregon Software Pascal features not yet handled: procedure noioerror(var f:file); Built-in. Sets a flag on an already-open file so that I/O errors record an error code rather than crashing. Each file has its own error code. function ioerror(var f:file) : boolean; True if an error has occurred in the file. function iostatus(var f:file) : integer; The error code, when ioerror was true. reset and rewrite ignore the third parameter, and allow a fourth param which is an integer variable that receives a status code. Without this param, open errors are fatal. An optional param may be omitted as in reset(f,'foo',,v); ----------------------------------------------------------------------------- In p_search, if a file contains const/var/type/procedure/function declarations without any module declaration, surround the entire file with an implicit "module ; {PERMANENT}" ... "end.". This would help the Oregon Software dialect considerably. ----------------------------------------------------------------------------- Provide an explicit IncludeFrom syntax for "no include file". E.g., "IncludeFrom dos 0". ----------------------------------------------------------------------------- In docast, smallsets are converted to large sets of the requested type. Wouldn't it be better to convert to a set of 0..31 of the base type? This would keep foo([]), where the argument is "set of char", from allocating a full 256-bit array for the temporary. ----------------------------------------------------------------------------- When initializing a constant variant record or array of same in which non-first variants are initialized, create a function to do the initialization, plus, for modules w/o initializers, a note to call this function. Another possibility: Initialize the array as well as possible, but leave zeros in the variant parts. Then the function has only to fix up the non-first variant fields. ----------------------------------------------------------------------------- Figure out some way to initialize packed array constants, e.g., a short macro PACK4(x,y)=(((x)<<4)+(y)) which is used inside the C initializer. Alternatively, implement initializer functions as above and use those. ----------------------------------------------------------------------------- How about declaring Volatile any variables local to a function which are used after the first RECOVER? GNU also suggests writing the statement: "&foo;" which will have no side effect except to make foo essentially volatile, without relying on ANSI features. ----------------------------------------------------------------------------- Test the macros for GET, PUT, etc. ----------------------------------------------------------------------------- Can the #if 0'd code for strinsert in funcs.c be changed to test strcpyleft? ----------------------------------------------------------------------------- Even in Ansi mode, p2c seems to be casting Anyptrs into other pointer types explicitly. This is an automatic conversion in Ansi C. ----------------------------------------------------------------------------- A Turbo typed constant or VAX initialized variable with a VarMacro loses its initializer! ----------------------------------------------------------------------------- Test the ability of the parser to recover from common problems such as too many/few arguments to a procedure, missing/extra semicolon, etc. One major problem has been with undeclared identifiers being used as type names. ----------------------------------------------------------------------------- Line breaker still needs considerable tuning! ----------------------------------------------------------------------------- How about indenting trailing comments analogously to the code: Try to indent to column C+(X-Y), where C=original column number, X=output indentation, Y=original input indentation. Even fancier would be to study all the comment indentations in the function or struct decl to discover if most comments are at the same absolute indentation; if so, compute the average or minimum amount of space preceding the comments and indent the C comments to an analogous position. ----------------------------------------------------------------------------- After "type foo = bar;" variables declared as type foo are translated as type bar. Ought to assume the user has two names for a reason, and copy the distinction into the C code. ----------------------------------------------------------------------------- Warn if address is taken of an arithmetic expression like "v1+1". Allow user to declare certain bicalls as l-values, e.g., so that LSC's topLeft and botRight macros won't generate complaints. ----------------------------------------------------------------------------- Consider changing the "language" modes into a set of p2crc files which can be included to support the various modes. ----------------------------------------------------------------------------- If we exchange the THEN and ELSE parts of an IF statement, be sure to exchange their comments as well! ----------------------------------------------------------------------------- How about checking for a ".p2crc" file in the user's home directory. ----------------------------------------------------------------------------- Store comments in the following situations: On the first line of a record decl. On the default clause of a CASE statement (use same trick as for ELSE clauses). On the "end" of a CASE statement. On null statements. Use stealcomments for, e.g., decl_comments and others. ----------------------------------------------------------------------------- Think of other formatting options for format_gen to support. ----------------------------------------------------------------------------- Consider converting gratuitous BEGIN/END pairs into gratuitous { } pairs. ----------------------------------------------------------------------------- The construction "s := copy(s, 1, 3)" converts to a big mess that could be simplified to "s[3] = 0". ----------------------------------------------------------------------------- Have a mode (and make it the default!) in which declarations are mixed if and only if the original Pascal decls were mixed. Simply store a flag in each meaning to mark "mixed-with-preceding-meaning". ----------------------------------------------------------------------------- Have a column number at which to put names in variable and typedef declarations. Have another option to choose whether a '*' preceding a name should be left- or right-justified within the skipped space: int *foo; or int * foo; ----------------------------------------------------------------------------- Support the /* * */ form for multi-line comments. ----------------------------------------------------------------------------- Have an indentation parameter for the word "else" by itself on a line. ----------------------------------------------------------------------------- Have an option to use C++'s "//" comments when possible. (0=never, 1=always, def=only for trailing comments.) ----------------------------------------------------------------------------- Allow real comments to come before top-of-file comments like {Language}. ----------------------------------------------------------------------------- Teach the line breaker to remove spaces around innermost operators if in a crunch. ----------------------------------------------------------------------------- Is it possible that the line breaker is losing counts? A line that included lots of invisible parens converted to visible ones was allowed to be suspiciously long. ----------------------------------------------------------------------------- The notation t^ where t is a text file should convert \n's to spaces if necessary. ----------------------------------------------------------------------------- The assignment and type cast "f4 := tf4(i)" where type "tf4 = function (i, j : integer) : str255" generates something really weird. ----------------------------------------------------------------------------- The conditional expression strsub(s,1,4) = 'Spam' could be translated as strncmp(s, "Spam", 4) ----------------------------------------------------------------------------- Consider an option which generates a "file.p2c" or "module.p2c" file, that will in the future be read in by p2c as another p2crc type of file, both when the module is re-translated later and when it is imported. This file would contain commands like "NoSideEffects" for functions which are found to have this property, etc. ----------------------------------------------------------------------------- Extend the "file.log" or "module.log" file to contain a more detailed account of the translation, including all notes and warnings which were even considered. For example, ALL calls to na_lsl with non-constant shifts would be noted, even if regular notes in this case were not requested. Also, funny transformations along the lines of "str[0] := chr(len)" and "ch >= #128" should be mentioned in the log. How about a summary of non-default p2crc options and command-line args? ----------------------------------------------------------------------------- Create a TypeMacro analogous to FuncMacro, VarMacro, and ConstMacro. Should the definition be expressed in C or Pascal notation? (Yuck---not a C type parser!) ----------------------------------------------------------------------------- In argument type promotions, should "unsigned char" be promoted to "unsigned int"? ----------------------------------------------------------------------------- Turbo's FExpand translation is really weird. ----------------------------------------------------------------------------- Can we translate Erase(x) to unlink(x)? (This could just be a FuncMacro.) ----------------------------------------------------------------------------- There should be an option that causes a type to be explicitly named, even if it would not otherwise have had a typedef name. Have a mode that does this for all pointer types. ----------------------------------------------------------------------------- Make sure that the construction: if blah then {comment} else other does not rewrite to if (!blah) other; i.e., a comment in this situation should generate an actual placeholder statement. Or perhaps, a null statement written explicitly by the Pascal programmer should always produce a placeholder. ----------------------------------------------------------------------------- Allow the line breaker to treat a \003 as if it were a \010. The penalty should be enough less than SameIndentPenalty that same-indent cases will cause the introduction of parentheses. ----------------------------------------------------------------------------- A comment of the form "{------}" where the whole comment is 78, 79 or 80 columns wide, should be reduced by two to take the larger C comment brackets into account. Also, "{*****}", etc. ----------------------------------------------------------------------------- There should be a mode that translates "halt" as "exit(0)", and another that translates it as "exit(1)". ----------------------------------------------------------------------------- There should be a mode in which strread's "j" parameter is completely ignored. Also, in this mode, don't make a copy of the string being read. ----------------------------------------------------------------------------- Is there an option that generates an fflush(stdout) after every write (not writeln) statement? It should be easy to do---the code is already there to support the prompt statement. ----------------------------------------------------------------------------- Check out the Size_T_Long option; size_t appears to be int on most machines, not long. ----------------------------------------------------------------------------- The type "size_t" should really be made into a separate type, with a function to cast to type "size_t". This function would always do the cast unless sizeof(int) == sizeof(long), or unless the expression only involves constants and objects or functions of type "size_t". ----------------------------------------------------------------------------- Finish the Turbo Pascal features (in the file turbo.imp). ----------------------------------------------------------------------------- Are there any ways to take advantage of "x ?: y" in GCC? Is it worth using GCC constructor expressions for procedure variables? How about generating "volatile" and "const" for suitable functions? (doing this in the .h file would be very difficult...) Use the "asm" notation of 5.17 to implement var x ['y'] declarations. ----------------------------------------------------------------------------- Recognize GCC extensions in pc_expr(). (By the way, remember to implement += and friends in pc_expr(), too!) ----------------------------------------------------------------------------- Lightspeed C can't handle "typedef char foo[];" which arises from a MAXINT-sized array type declaration. ----------------------------------------------------------------------------- "Return" and friends are only introduced once. In code of the form: if (!done) { foo(); } if (!done) { bar(); } p2c should, after patching up bar(), check if the foo() branch is now also ripe for rearranging. ----------------------------------------------------------------------------- Have a global "paranoia" flag. Default=use current defaults for other options. 1=conservative defaults for other options. 0=sloppy defaults for other options. ----------------------------------------------------------------------------- Rather than just generating a note, have writes of attribute characters convert into calls to a "set attribute" procedure, such as nc_sethighlight. Is there any way of generalizing this into something useful for non-HP-Pascal-workstation users? ----------------------------------------------------------------------------- Warn when character constants which are control codes are produced. (E.g., arrow keys, etc.) Also, have an option which deletes all highlighting codes from strings being output. ----------------------------------------------------------------------------- Think how nice things would be if the arithmetic routines actually maintained the distinction between tp_int and tp_integer themselves, so that makeexpr_longcast didn't have to second-guess them. ----------------------------------------------------------------------------- Importing FS *still* copies its "file support" comment into the importing program! ----------------------------------------------------------------------------- Should parameterize those last few hard-wired names, such as "P_eoln", "LONG_MAX", ... ? ----------------------------------------------------------------------------- Check if we need to cache away any more options' values, as we did for VarStrings. How about FoldConstants, SetBits, CopyStructs? ============================================================================= Support the "CSignif" option (by not generating C identifiers which would not be unique if only that many characters were significant). ----------------------------------------------------------------------------- What if a procedure accesses strmax of a var-string parameter of a parent procedure? (Right now this generates a note.) ----------------------------------------------------------------------------- Handle full constructors for strings. Handle small-array constants. ----------------------------------------------------------------------------- Have an option that causes ANYVAR's to be translated to void *'s. In this mode, all uses of ANYVAR variables will need to be cast to the proper type, in the function body rather than in calls to the function. ----------------------------------------------------------------------------- Handle reading enums. Add full error checking for reading booleans. (And integer subranges?) ----------------------------------------------------------------------------- Support the "BigSetConst" option by creating constant arrays just as the Pascal compiler does. ----------------------------------------------------------------------------- The 2^(N+1) - 2^M method for generating [M..N] is not safe if N is 31. If the small-sets we are dealing with encompass the value 31 (== setbits-1) then we must use the bitwise construction instead. (Currently, the translator just issues a note.) (If N is 31, 2^32 will most likely evaluate to 0 on most machines, which is the correct value. So this is only a minor problem.) ----------------------------------------------------------------------------- Big-set constants right now are always folded. Provide a mechanism for defined set constants, say by having a #define with an argument which is the name of the temporary variable to use for the set. ----------------------------------------------------------------------------- Should we convert NA_LONGWORD-type variants into C casts? ----------------------------------------------------------------------------- Are there implementations of strcpy that do not return their first argument? If so, should have an option that says so. ----------------------------------------------------------------------------- Handle absolute-addressed variables better. For var a[12345]:integer, create an initialized int *. For var a['foo']:integer, create an int * which is initialized to NULL and accessed by a macro which locates the symbol the first time the variable is used. ----------------------------------------------------------------------------- Handle the idiom, "reset(f, name); open(f);" ----------------------------------------------------------------------------- Should have an option that lowercases all file names used in "reset", "fixfname", etc. This should be on by default. ----------------------------------------------------------------------------- Add more complete support for conformant arrays. Specifically, support non-GNU compilers by converting variable-sized array declarations into pointer declarations with calls to alloca or malloc/free (what if the function uses free and contains return statements)? Also convert variable-array references into explicit index arithmetic. ----------------------------------------------------------------------------- Have a mode in which the body of a TRY-RECOVER is moved out into a sub-procedure all its own, communicating with the parent through varstructs as usual, so that the ANSI C warning about what longjmp can do to the local variables is avoided. Alternatively, have an option in which all necessary locals are declared volatile when setjmps are present. ----------------------------------------------------------------------------- If a sub-procedure refers to a parent's variable with the VAX Pascal [STATIC] attribute, that variable is declared "static" inside the varstruct! Need to convert it into a varref to a static variable in the parent. ----------------------------------------------------------------------------- When comparing records and arrays a la UCSD Pascal, should expand into an "&&" expression comparing each field individually. (What about variants? Maybe just compare the first variant, or the tagged variant.) Probably best to write a struct-comparison function the first time a given type of struct is compared; that way, the function can include a complete if tree or switch statement in the case of tagged unions. ----------------------------------------------------------------------------- In the checkvarchanged code, take aliasing of VAR parameters into account. For example, in "procedure p(s1 : string; var s2 : string)" p2c now avoids copying s1 if s1 is not changed within p, but probably should also require that s2 not change, or at least that s1 have been read and used before s2 is changed. ----------------------------------------------------------------------------- Provide an option that tells the code generator to provide helpful comments of its own when generated code may be obscure. ============================================================================= Compact the various data structures. In particular, typical runs show the majority of memory is used by Meanings and Symbols for global and imported objects. ----------------------------------------------------------------------------- The program wastes memory. Find ways to reduce memory usage, and to avoid leaving dead records on the heap. (Garbage collection? Yuck!) (Maybe GC between each function declaration would be okay.) ----------------------------------------------------------------------------- Assign better names to temporaries. Also, could avoid making redundant temporaries by generating a unique temporary every time one is needed, then crunching them down at the end just before the declarations are written out. Each temporary would maintain a list of all the other temporaries (of the same type) with which it would conflict. (This would avoid the current method's waste when several temps are created, then most are cancelled.) Also, note that char STR1[10], STR2[20] can be considered type-compatible and merged into STR[20]. ----------------------------------------------------------------------------- Don't generate _STR_xxx structure names if they aren't forward-referenced. ----------------------------------------------------------------------------- Can optimize, e.g., "strpos(a,b) = 0" to a call to strstr() or strchr(), even though these ANSI functions are not Pascal-like enough to use in the general case. ----------------------------------------------------------------------------- Complete the handling of "usecommas=0" mode. ----------------------------------------------------------------------------- Optimize "s := strltrim(s)", "s := strrtrim(s)", and both together. ----------------------------------------------------------------------------- Convert "(ch < 'a') or (ch > 'z')" to "!islower(ch)", and so on. Also do "!islower(ch) && !isupper(ch)" => "!isalpha(ch)", etc. ----------------------------------------------------------------------------- Find other cases in which to call mixassignments(). ----------------------------------------------------------------------------- The sequence: sprintf(buf + strlen(buf), "...", ...); sprintf(buf + strlen(buf), "...", ...); could be changed to a single sprintf. Also, "sprintf(temp, "...%s", ..., buf); strcpy(buf, temp); (above);" could be crunched down. (This comes from strinsert, then strappend.) ----------------------------------------------------------------------------- If there is only one assignment to a structured function's return variable, and that assignment is at the very end and assigns a local variable to the return variable, then merge the variables. (Example: name_list in netcmp's datastruct module. RET_name_list should be renamed to namestr.) ----------------------------------------------------------------------------- Have an option that causes if-then-else's to be replaced by ? :'s in certain cases. If the branches of the if are either both returns or both assignments to the same variable, which has no side effects, and if the whole conditional will be simple enough to fit on one line when printed, then the ? : transformation is okay. ----------------------------------------------------------------------------- Have an option that makes limited use of variable initialization. If the first statement of a function is an assignment of a constant to a local variable, then the assignment is changed to an initialization. (Non-constant initializers are probably too hard to check for safety.) Should variables with initializers still be mixed? (Example: valid_node_class in netcmp's datastructs module.) File variable initialization is an especially good application for this. ----------------------------------------------------------------------------- Have an option that finds cases of multiple assignment. For example: a := x; b := x; => a = b = x; a := x; b := a; => a = b = x; (provided the objects in question have the same type). ----------------------------------------------------------------------------- Need an option that causes $if$'s to change to #if's, instead of being evaluated at translation-time. (This is *really hard* in general; do it for some common cases, such as entire statements being commented out, or fields being commented out of records.) ----------------------------------------------------------------------------- Have an option that prevents generation and inclusion of .h files for modules; instead, each module would contain extern declarations for the things it imports. (Only declare names that are actually used--- this means the declarations will have to be emitted on a procedure-by- procedure basis.) ----------------------------------------------------------------------------- Extend the ExpandIncludes option to compile include files as separate stand-alone C modules. The hard part is warning when the new module uses a procedure which was declared static in an earlier C module. Remember to re-emit all the proper #include's at the beginning. Anything else? ----------------------------------------------------------------------------- Consider an option where the variables in a varStruct are made into globals instead, or into parameters if there are few of them. ----------------------------------------------------------------------------- Perform a flow analysis after everything else is done; use this to eliminate redundant checks, e.g., for nil pointers or non-open or already-open files. Also eliminate redundant "j" variables for strwrite statements. ----------------------------------------------------------------------------- Need a method for simple pattern matching predicates in FuncMacros. For example, allow special translations of functions where a given argument is a known constant, or known to be positive, or two arguments are the same, etc. ----------------------------------------------------------------------------- Have some way to provide run-time templates for fixblock/fixexpr-like applications. The user enters two C expressions A and B, possibly including Prolog-like logical variables. If fixexpr sees an expression matching A, it rewrites it into the form of B. ----------------------------------------------------------------------------- Have an option to cause selected Pascal procedures or functions to be expanded in-line. Do this either by generating the keyword "inline", or by doing the expansion in the translator. ----------------------------------------------------------------------------- Technically speaking, strcmp shouldn't be used to compute < and > for strings on a machine with signed chars. Should we care? ----------------------------------------------------------------------------- Have an option for creating a "display" of LINK pointers local to a function. Should only create such pointers for static levels which are referred to in the function body. -----------------------------------------------------------------------------