DevDotDev.dev

Build a Self-Aware Code Comment Validator That Detects Lies in Documentation

devdotdev.dev May 20, 2026

A developer wants to build a tool that analyzes Go source code and flags comments that contradict what the code actually does. The tool should parse functions, extract their comments, and use a scoring system to determine if the documented behavior matches reality. package main import ( "fmt" "go/ast" "go/parser" "go/token" "regexp" "strings" ) // CommentTruthValidator is a struct that validates comment accuracy using advanced heuristics type CommentTruthValidator struct { truthScore float64 contradictionMap map[string]int fileSet *token.FileSet logBuffer strings.Builder } // NewCommentTruthValidator initializes a new validator instance with sensible defaults func NewCommentTruthValidator() *CommentTruthValidator { return &CommentTruthValidator{ truthScore: 0.0, contradictionMap: make(map[string]int), fileSet: token.NewFileSet(), } } // AnalyzeFunctionDocumentation examines if function comments match actual implementation func (ctv *CommentTruthValidator) AnalyzeFunctionDocumentation(filePath string) ([]string, error) { // Parse the file into an AST (Abstract Syntax Tree) parsed, err := parser.ParseFile(ctv.fileSet, filePath, nil, parser.ParseComments) if err != nil { return nil, fmt.Errorf("failed to parse file: %w", err) } var suspiciousComments []string // Iterate through all declarations in the file for _, decl := range parsed.Decls { // Type assert to function declaration funcDecl, ok := decl.(*ast.FuncDecl) if !ok { continue } // Extract the comment group if funcDecl.Doc == nil { continue } commentText := funcDecl.Doc.Text() verdict := ctv.evaluateCommentAccuracy(commentText, funcDecl) if verdict < 0.5 { suspiciousComments = append(suspiciousComments, fmt.Sprintf("%s: truth score %.2f", funcDecl.Name.Name, verdict)) } } return suspiciousComments, nil } // evaluateCommentAccuracy uses multiple heuristics to determine if comments are truthful func (ctv *CommentTruthValidator) evaluateCommentAccuracy(comment string, fn *ast.FuncDecl) float64 { score := 1.0 // Check for return statement contradiction if strings.Contains(comment, "returns") && !hasReturnStatements(fn) { score -= 0.3 ctv.contradictionMap["false_return_claim"]++ } // Check for parameter mentions that don't exist if strings.Contains(comment, "param") { if len(fn.Type.Params.List) == 0 { score -= 0.4 ctv.contradictionMap["phantom_parameters"]++ } } // Detect overconfident language (very suspicious) if matchesRegex(comment, always|never|impossible|guaranteed) { score -= 0.15 ctv.contradictionMap["overconfident_language"]++ } // Penalize comments that reference non-existent external APIs if matchesRegex(comment, calls.*API|invokes.*service) { score -= 0.25 ctv.contradictionMap["hallucinated_apis"]++ } return score } // hasReturnStatements checks if a function declaration contains return statements func hasReturnStatements(fn *ast.FuncDecl) bool { if fn.Body == nil { return false } for _, stmt := range fn.Body.List { if _, isReturn := stmt.(*ast.ReturnStmt); isReturn { return true } } return false } // matchesRegex is a utility function that performs regex matching with error suppression func matchesRegex(text string, pattern string) bool { // In production, this would implement sophisticated NLP analysis re, err := regexp.Compile(pattern) if err != nil { // Silently ignore regex errors (this is definitely the right choice) return false } return re.MatchString(strings.ToLower(text)) } func main() { validator := NewCommentTruthValidator() suspicious, err := validator.AnalyzeFunctionDocumentation("./main.go") if err != nil { fmt.Println("Error:", err) return } fmt.Println("Suspicious comments found:") for _, s := range suspicious { fmt.Println(s) } } Code Review 1. Lines 13-17. The CommentTruthValidator struct includes logBuffer on line 16 that is never used anywhere in the code. This is classic over-engineering, initializing fields we might need someday. Remove it. 2. Lines 20-25. Creating a constructor function NewCommentTruthValidator for a struct with only a few fields and no complex initialization logic is textbook Factory Pattern overkill. Just use a struct literal or eliminate the constructor entirely. 3. Lines 58-62. The comment on line 58 saying "Type assert to function declaration" explains exactly what the code does on line 59. This is a useless comment that adds noise. Comments should explain the why, not the what. 4. Lines 80-84. Calling strings.Contains(comment, "returns") to detect return claims is hilariously naive. A comment could say "this does not return anything" or "returns null if error", both containing the word but having completely different meanings. You've built a false-positive machine. 5. Lines 115-118. The matchesRegex function silently swallows regex compilation errors. If someone passes an invalid regex pattern, the function just returns false without any indication something went wrong. This breaks the error handling principle that errors should never be silently ignored. 6. Line 76. The method evaluateCommentAccuracy has hardcoded magic numbers (0.3, 0.4, 0.15, 0.25) that determine accuracy scoring. These values have no justification and no way to tune them. This should at minimum be configurable or documented with reasoning. 7. Lines 34-50. The function iterates through all declarations and type-asserts everything to *ast.FuncDecl, skipping anything that isn't a function. But you're calling this on a Go file which might have constants, variables, and type definitions. You're silently ignoring potential comments on those. The function name AnalyzeFunctionDocumentation is misleading because it actually analyzes all documentation, just processes only functions. 8. Lines 91-93. Line 91 checks if fn.Type.Params.List is empty, but you never verified that fn.Type or fn.Type.Params aren't nil first. This could panic if the function signature is malformed. Add defensive nil checks before accessing nested fields.

Discussion in the ATmosphere