Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreihvou4ypbmyyzl65p5wcy4x7hvfxtsokvwru5hlstqnmvqsxyklma",
    "uri": "at://did:plc:pi6woz4d47bkuws673w2il2r/app.bsky.feed.post/3mlbiu3tcqaq2"
  },
  "path": "/t/how-to-parse-specific-syntax-elements-and-discard-the-rest/14047#post_1",
  "publishedAt": "2026-05-07T13:30:15.000Z",
  "site": "https://discourse.haskell.org",
  "tags": [
    "getOffset",
    "observing"
  ],
  "textContent": "Hello,\n\nI’m trying to write a tool to analyze nix files, notably rewriting/analyzing the path literals. I would like to parse a nix file into a list of path literals and the location they can be found.\nWith this data I can then check whether the targets of the path literals are valid given the location of a file, simplify the paths and rewrite using the source location, or generate a directed graph for fun.\n\nI’m having trouble parsing the path literals in nix files. Specifically, I want to _only_ parse the path literals and discard the rest. I thought about using regex, but since I am more familiar with parser combinator libraries I went with Megaparsec.\n\n* * *\n\nThe difficulty is that I want to fish out all the path literals in a nix file, disregarding all other syntaxic elements. Megaparsec provides the getOffset primitive. However getOffset gives me the position of the _start_ of the failurue, so I can’t jump forward using this information.\n\n\n    ghci> parseTest (liftA2 (,) (optional (\"foo\" :: Parser Text)) getOffset) \"foo\"\n    (Just \"foo\",3)\n    it :: ()\n    (0.02 secs, 80,576 bytes)\n    ghci> parseTest (liftA2 (,) (optional (\"foo\" :: Parser Text)) getOffset) \"bar\"\n    (Nothing,0)\n    it :: ()\n    (0.01 secs, 78,480 bytes)\n\n\nI also have tried to use observing, but it also only reports the position at the start of the failure.\n\n\n    ghci> parseTest (observing (\"foo\" :: Parser Text)) \"bar\"\n    Left (TrivialError 0 (Just (Tokens ('b' :| \"ar\"))) (fromList [Tokens ('f' :| \"oo\")]))\n    it :: ()\n\n\n* * *\n\nWhat can I do to parse only the path literals efficiently and correctly while discarding the rest? Thanks a lot =D",
  "title": "How to parse specific syntax elements and discard the rest?"
}