Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreigs3weym4uwboknfe3igl7zbqavambmbeubety52cvd6iqtulomzi",
    "uri": "at://did:plc:3pjw65epwlo3rzajhx6xg4br/app.bsky.feed.post/3mhvyodcd6ki2"
  },
  "path": "/posts/new-toy-first-steps-with-ai-on-linux/",
  "publishedAt": "2026-03-25T11:48:34.000Z",
  "site": "https://peter.czanik.hu",
  "tags": [
    "AI mini workstation from HP",
    "installing Ubuntu",
    "AI on the Ryzen AI Max+ 395",
    "ROCm",
    "Fedora Heterogeneous Computing Special Interest Group",
    "PyTorch",
    "toy"
  ],
  "textContent": "Ever since I bought my AI mini workstation from HP, my goal was to run hardware accelerated artificial intelligence workloads in a Linux environment. Read more to learn how things turned out on Ubuntu and Fedora!\n\nI have been using various AI tools for a while now. Generating pictures about some impossible situations, like a dinosaur climbing the Hungarian parliament building, finding information where a simple web search is useless, or explaining syslog-ng code to me. All these are nice, sometimes even useful, however I prefer to know what is behind the magic. Well, at least part of it :-) I want to get a bottom up view of various components and processes, and getting my hands dirty. Hopefully this miniature but powerful box will help me in getting known with AI better.\n\n#### AI in a miniature box :-)\n\n# Testing AI on Ubuntu\n\nAs mentioned in my installing Ubuntu blog, the 24.04 LTS installer did not work on this machine. I found a nice tutorial about AI on the Ryzen AI Max+ 395 which mentioned using 25.10, so I installed that version instead of the LTS. It installed without any troubles, 3D graphics worked out of the box.\n\nHowever, AI is a different story. ROCm, hardware acceleration for AI workloads on AMD chips, is only packaged for Ubuntu LTS releases. The workaround described in the tutorial was to use distrobox. Unfortunately, the steps described in the tutorial did not work. Containerization brought in various problems with permissions, software availability, and so on. Most likely an experienced distrobox user could resolve these. In my case, after reading the distrobox documentation for hours, I just gave up.\n\n# Getting started with hardware accelerated AI on Fedora\n\nNext, I turned to Fedora Linux 43. The wiki page of the Fedora Heterogeneous Computing Special Interest Group proved to be a good starting point. Fedora has ROCm packaged as part of the distro, and the wiki page gives clear instructions how to get started.\n\nOnce I set up user rights and installed the necessary packages, I was able to get some info about my hardware. You can see the output of `rocminfo` and `rocm-clinfo` at the bottom of this blog. I did not want to shorten those, but given the many lines of output, I was not sure if anyone would read the rest of my blog :-)\n\n# Testing with llama\n\nOf course, seeing info about the hardware is nice, but it’s even better to see it in action. The Ubuntu ROCm tutorial mentioned `llama`, so I started with that one. Luckily Fedora includes it as a ready to install package, so I did not have to compile it from source. I also installed `huggingface-hub`, also from a package:\n\n\n    dnf install python3-huggingface-hub llama-cpp\n\n\nThis allowed me to download the model mentioned in the tutorial, and ask a few questions from the downloaded LLM. For now I just used the sample command line, but based on the output llama found the hardware and used it. Next up: learn more about the available models.\n\nYou can find the output of the following command at the end of this blog:\n\n\n    llama-cli   -m ~/models/llama-2-7b.Q4_K_M.gguf   --no-mmap   -ngl 99   -p \"Explain quantum computing in simple terms:\"   -n 256\n\n\n# Testing with pytorch\n\nWhen I mentioned a friend that hardware accelerated AI seems to work on my Linux box, he suggested to me to try it with PyTorch. Luckily this was available as a ready to install package for Fedora as well:\n\n\n    dnf install python3-torch\n\n\nI was quite a bit surprised, as the above command installed 8 GB worth of RPM packages (`texlive` accounting for a good part of it). I do not know much about PyTorch, but did a quick test anyway. Here is the really complex Pyhon code I built based on the documentation:\n\n\n    import torch\n    x = torch.rand(5, 3)\n    print(x)\n    print('Is hw AI accel available')\n    print(torch.cuda.is_available())\n\n\nAnd here is the output from the above code:\n\n\n    tensor([[0.1034, 0.0183, 0.1233],\n            [0.1787, 0.0097, 0.8426],\n            [0.2872, 0.6351, 0.8468],\n            [0.8226, 0.2991, 0.8539],\n            [0.2061, 0.6422, 0.8146]])\n    Is hw AI accel available\n    True\n\n\nIt’s simple, but looks promising :-)\n\n# Outputs\n\n## Ooutput of rocminfo and rocm-clinfo\n\n\n    czanik@fedora:~$ rocminfo\n    ROCk module is loaded\n    =====================\n    HSA System Attributes\n    =====================\n    Runtime Version:         1.1\n    Runtime Ext Version:     1.7\n    System Timestamp Freq.:  1000.000000MHz\n    Sig. Max Wait Duration:  18446744073709551615 (0xFFFFFFFFFFFFFFFF) (timestamp count)\n    Machine Model:           LARGE\n    System Endianness:       LITTLE\n    Mwaitx:                  DISABLED\n    XNACK enabled:           NO\n    DMAbuf Support:          YES\n    VMM Support:             YES\n\n    ==========\n    HSA Agents\n    ==========\n    *******\n    Agent 1\n    *******\n      Name:                    AMD RYZEN AI MAX+ PRO 395 w/ Radeon 8060S\n      Uuid:                    CPU-XX\n      Marketing Name:          AMD RYZEN AI MAX+ PRO 395 w/ Radeon 8060S\n      Vendor Name:             CPU\n      Feature:                 None specified\n      Profile:                 FULL_PROFILE\n      Float Round Mode:        NEAR\n      Max Queue Number:        0(0x0)\n      Queue Min Size:          0(0x0)\n      Queue Max Size:          0(0x0)\n      Queue Type:              MULTI\n      Node:                    0\n      Device Type:             CPU\n      Cache Info:\n        L1:                      49152(0xc000) KB\n      Chip ID:                 0(0x0)\n      ASIC Revision:           0(0x0)\n      Cacheline Size:          64(0x40)\n      Max Clock Freq. (MHz):   5187\n      BDFID:                   0\n      Internal Node ID:        0\n      Compute Unit:            32\n      SIMDs per CU:            0\n      Shader Engines:          0\n      Shader Arrs. per Eng.:   0\n      WatchPts on Addr. Ranges:1\n      Memory Properties:\n      Features:                None\n      Pool Info:\n        Pool 1\n          Segment:                 GLOBAL; FLAGS: FINE GRAINED\n          Size:                    131136832(0x7d0fd40) KB\n          Allocatable:             TRUE\n          Alloc Granule:           4KB\n          Alloc Recommended Granule:4KB\n          Alloc Alignment:         4KB\n          Accessible by all:       TRUE\n        Pool 2\n          Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED\n          Size:                    131136832(0x7d0fd40) KB\n          Allocatable:             TRUE\n          Alloc Granule:           4KB\n          Alloc Recommended Granule:4KB\n          Alloc Alignment:         4KB\n          Accessible by all:       TRUE\n        Pool 3\n          Segment:                 GLOBAL; FLAGS: KERNARG, FINE GRAINED\n          Size:                    131136832(0x7d0fd40) KB\n          Allocatable:             TRUE\n          Alloc Granule:           4KB\n          Alloc Recommended Granule:4KB\n          Alloc Alignment:         4KB\n          Accessible by all:       TRUE\n        Pool 4\n          Segment:                 GLOBAL; FLAGS: COARSE GRAINED\n          Size:                    131136832(0x7d0fd40) KB\n          Allocatable:             TRUE\n          Alloc Granule:           4KB\n          Alloc Recommended Granule:4KB\n          Alloc Alignment:         4KB\n          Accessible by all:       TRUE\n      ISA Info:\n    *******\n    Agent 2\n    *******\n      Name:                    gfx1151\n      Uuid:                    GPU-XX\n      Marketing Name:          Radeon 8060S Graphics\n      Vendor Name:             AMD\n      Feature:                 KERNEL_DISPATCH\n      Profile:                 BASE_PROFILE\n      Float Round Mode:        NEAR\n      Max Queue Number:        128(0x80)\n      Queue Min Size:          64(0x40)\n      Queue Max Size:          131072(0x20000)\n      Queue Type:              MULTI\n      Node:                    1\n      Device Type:             GPU\n      Cache Info:\n        L1:                      32(0x20) KB\n        L2:                      2048(0x800) KB\n        L3:                      32768(0x8000) KB\n      Chip ID:                 5510(0x1586)\n      ASIC Revision:           0(0x0)\n      Cacheline Size:          128(0x80)\n      Max Clock Freq. (MHz):   2900\n      BDFID:                   50432\n      Internal Node ID:        1\n      Compute Unit:            40\n      SIMDs per CU:            2\n      Shader Engines:          2\n      Shader Arrs. per Eng.:   2\n      WatchPts on Addr. Ranges:4\n      Coherent Host Access:    FALSE\n      Memory Properties:       APU\n      Features:                KERNEL_DISPATCH\n      Fast F16 Operation:      TRUE\n      Wavefront Size:          32(0x20)\n      Workgroup Max Size:      1024(0x400)\n      Workgroup Max Size per Dimension:\n        x                        1024(0x400)\n        y                        1024(0x400)\n        z                        1024(0x400)\n      Max Waves Per CU:        32(0x20)\n      Max Work-item Per CU:    1024(0x400)\n      Grid Max Size:           4294967295(0xffffffff)\n      Grid Max Size per Dimension:\n        x                        4294967295(0xffffffff)\n        y                        4294967295(0xffffffff)\n        z                        4294967295(0xffffffff)\n      Max fbarriers/Workgrp:   32\n      Packet Processor uCode:: 34\n      SDMA engine uCode::      18\n      IOMMU Support::          None\n      Pool Info:\n        Pool 1\n          Segment:                 GLOBAL; FLAGS: COARSE GRAINED\n          Size:                    65568416(0x3e87ea0) KB\n          Allocatable:             TRUE\n          Alloc Granule:           4KB\n          Alloc Recommended Granule:2048KB\n          Alloc Alignment:         4KB\n          Accessible by all:       FALSE\n        Pool 2\n          Segment:                 GLOBAL; FLAGS: EXTENDED FINE GRAINED\n          Size:                    65568416(0x3e87ea0) KB\n          Allocatable:             TRUE\n          Alloc Granule:           4KB\n          Alloc Recommended Granule:2048KB\n          Alloc Alignment:         4KB\n          Accessible by all:       FALSE\n        Pool 3\n          Segment:                 GROUP\n          Size:                    64(0x40) KB\n          Allocatable:             FALSE\n          Alloc Granule:           0KB\n          Alloc Recommended Granule:0KB\n          Alloc Alignment:         0KB\n          Accessible by all:       FALSE\n      ISA Info:\n        ISA 1\n          Name:                    amdgcn-amd-amdhsa--gfx1151\n          Machine Models:          HSA_MACHINE_MODEL_LARGE\n          Profiles:                HSA_PROFILE_BASE\n          Default Rounding Mode:   NEAR\n          Default Rounding Mode:   NEAR\n          Fast f16:                TRUE\n          Workgroup Max Size:      1024(0x400)\n          Workgroup Max Size per Dimension:\n            x                        1024(0x400)\n            y                        1024(0x400)\n            z                        1024(0x400)\n          Grid Max Size:           4294967295(0xffffffff)\n          Grid Max Size per Dimension:\n            x                        4294967295(0xffffffff)\n            y                        4294967295(0xffffffff)\n            z                        4294967295(0xffffffff)\n          FBarrier Max Size:       32\n        ISA 2\n          Name:                    amdgcn-amd-amdhsa--gfx11-generic\n          Machine Models:          HSA_MACHINE_MODEL_LARGE\n          Profiles:                HSA_PROFILE_BASE\n          Default Rounding Mode:   NEAR\n          Default Rounding Mode:   NEAR\n          Fast f16:                TRUE\n          Workgroup Max Size:      1024(0x400)\n          Workgroup Max Size per Dimension:\n            x                        1024(0x400)\n            y                        1024(0x400)\n            z                        1024(0x400)\n          Grid Max Size:           4294967295(0xffffffff)\n          Grid Max Size per Dimension:\n            x                        4294967295(0xffffffff)\n            y                        4294967295(0xffffffff)\n            z                        4294967295(0xffffffff)\n          FBarrier Max Size:       32\n    *******\n    Agent 3\n    *******\n      Name:                    aie2\n      Uuid:                    AIE-XX\n      Marketing Name:          AIE-ML\n      Vendor Name:             AMD\n      Feature:                 AGENT_DISPATCH\n      Profile:                 BASE_PROFILE\n      Float Round Mode:        NEAR\n      Max Queue Number:        1(0x1)\n      Queue Min Size:          64(0x40)\n      Queue Max Size:          64(0x40)\n      Queue Type:              SINGLE\n      Node:                    0\n      Device Type:             DSP\n      Cache Info:\n        L2:                      2048(0x800) KB\n        L3:                      32768(0x8000) KB\n      Chip ID:                 0(0x0)\n      ASIC Revision:           0(0x0)\n      Cacheline Size:          0(0x0)\n      Max Clock Freq. (MHz):   0\n      BDFID:                   0\n      Internal Node ID:        0\n      Compute Unit:            0\n      SIMDs per CU:            0\n      Shader Engines:          0\n      Shader Arrs. per Eng.:   0\n      WatchPts on Addr. Ranges:0\n      Memory Properties:\n      Features:                AGENT_DISPATCH\n      Pool Info:\n        Pool 1\n          Segment:                 GLOBAL; FLAGS: KERNARG, COARSE GRAINED\n          Size:                    131136832(0x7d0fd40) KB\n          Allocatable:             TRUE\n          Alloc Granule:           4KB\n          Alloc Recommended Granule:4KB\n          Alloc Alignment:         4KB\n          Accessible by all:       TRUE\n        Pool 2\n          Segment:                 GLOBAL; FLAGS: COARSE GRAINED\n          Size:                    65536(0x10000) KB\n          Allocatable:             TRUE\n          Alloc Granule:           4KB\n          Alloc Recommended Granule:0KB\n          Alloc Alignment:         4KB\n          Accessible by all:       TRUE\n        Pool 3\n          Segment:                 GLOBAL; FLAGS: COARSE GRAINED\n          Size:                    131136832(0x7d0fd40) KB\n          Allocatable:             TRUE\n          Alloc Granule:           4KB\n          Alloc Recommended Granule:4KB\n          Alloc Alignment:         4KB\n          Accessible by all:       TRUE\n      ISA Info:\n    *** Done ***\n\n\nand\n\n\n    czanik@fedora:~$ rocm-clinfo\n    Number of platforms:\t\t\t\t 1\n      Platform Profile:\t\t\t\t FULL_PROFILE\n      Platform Version:\t\t\t\t OpenCL 2.1 AMD-APP (3649.0)\n      Platform Name:\t\t\t\t AMD Accelerated Parallel Processing\n      Platform Vendor:\t\t\t\t Advanced Micro Devices, Inc.\n      Platform Extensions:\t\t\t\t cl_khr_icd cl_amd_event_callback\n\n\n      Platform Name:\t\t\t\t AMD Accelerated Parallel Processing\n    Number of devices:\t\t\t\t 1\n      Device Type:\t\t\t\t\t CL_DEVICE_TYPE_GPU\n      Vendor ID:\t\t\t\t\t 1002h\n      Board name:\t\t\t\t\t Radeon 8060S Graphics\n      Device Topology:\t\t\t\t PCI[ B#197, D#0, F#0 ]\n      Max compute units:\t\t\t\t 20\n      Max work items dimensions:\t\t\t 3\n        Max work items[0]:\t\t\t\t 1024\n        Max work items[1]:\t\t\t\t 1024\n        Max work items[2]:\t\t\t\t 1024\n      Max work group size:\t\t\t\t 256\n      Preferred vector width char:\t\t\t 4\n      Preferred vector width short:\t\t\t 2\n      Preferred vector width int:\t\t\t 1\n      Preferred vector width long:\t\t\t 1\n      Preferred vector width float:\t\t\t 1\n      Preferred vector width double:\t\t 1\n      Native vector width char:\t\t\t 4\n      Native vector width short:\t\t\t 2\n      Native vector width int:\t\t\t 1\n      Native vector width long:\t\t\t 1\n      Native vector width float:\t\t\t 1\n      Native vector width double:\t\t\t 1\n      Max clock frequency:\t\t\t\t 2900Mhz\n      Address bits:\t\t\t\t\t 64\n      Max memory allocation:\t\t\t 57070749280\n      Image support:\t\t\t\t Yes\n      Max number of images read arguments:\t\t 128\n      Max number of images write arguments:\t\t 8\n      Max image 2D width:\t\t\t\t 16384\n      Max image 2D height:\t\t\t\t 16384\n      Max image 3D width:\t\t\t\t 16384\n      Max image 3D height:\t\t\t\t 16384\n      Max image 3D depth:\t\t\t\t 8192\n      Max samplers within kernel:\t\t\t 16\n      Max size of kernel argument:\t\t\t 1024\n      Alignment (bits) of base address:\t\t 2048\n      Minimum alignment (bytes) for any datatype:\t 128\n      Single precision floating point capability\n        Denorms:\t\t\t\t\t Yes\n        Quiet NaNs:\t\t\t\t\t Yes\n        Round to nearest even:\t\t\t Yes\n        Round to zero:\t\t\t\t Yes\n        Round to +ve and infinity:\t\t\t Yes\n        IEEE754-2008 fused multiply-add:\t\t Yes\n      Cache type:\t\t\t\t\t Read/Write\n      Cache line size:\t\t\t\t 128\n      Cache size:\t\t\t\t\t 32768\n      Global memory size:\t\t\t\t 67142057984\n      Constant buffer size:\t\t\t\t 57070749280\n      Max number of constant args:\t\t\t 8\n      Local memory type:\t\t\t\t Local\n      Local memory size:\t\t\t\t 65536\n      Max pipe arguments:\t\t\t\t 16\n      Max pipe active reservations:\t\t\t 16\n      Max pipe packet size:\t\t\t\t 1236174432\n      Max global variable size:\t\t\t 57070749280\n      Max global variable preferred total size:\t 67142057984\n      Max read/write image args:\t\t\t 64\n      Max on device events:\t\t\t\t 1024\n      Queue on device max size:\t\t\t 8388608\n      Max on device queues:\t\t\t\t 1\n      Queue on device preferred size:\t\t 262144\n      SVM capabilities:\n        Coarse grain buffer:\t\t\t Yes\n        Fine grain buffer:\t\t\t\t Yes\n        Fine grain system:\t\t\t\t No\n        Atomics:\t\t\t\t\t No\n      Preferred platform atomic alignment:\t\t 0\n      Preferred global atomic alignment:\t\t 0\n      Preferred local atomic alignment:\t\t 0\n      Kernel Preferred work group size multiple:\t 32\n      Error correction support:\t\t\t 0\n      Unified memory for Host and Device:\t\t 1\n      Profiling timer resolution:\t\t\t 1\n      Device endianess:\t\t\t\t Little\n      Available:\t\t\t\t\t Yes\n      Compiler available:\t\t\t\t Yes\n      Execution capabilities:\n        Execute OpenCL kernels:\t\t\t Yes\n        Execute native function:\t\t\t No\n      Queue on Host properties:\n        Out-of-Order:\t\t\t\t No\n        Profiling :\t\t\t\t\t Yes\n      Queue on Device properties:\n        Out-of-Order:\t\t\t\t Yes\n        Profiling :\t\t\t\t\t Yes\n      Platform ID:\t\t\t\t\t 0x7ffb97d11d80\n      Name:\t\t\t\t\t\t gfx1151\n      Vendor:\t\t\t\t\t Advanced Micro Devices, Inc.\n      Device OpenCL C version:\t\t\t OpenCL C 2.0\n      Driver version:\t\t\t\t 3649.0 (HSA1.1,LC)\n      Profile:\t\t\t\t\t FULL_PROFILE\n      Version:\t\t\t\t\t OpenCL 2.0\n      Extensions:\t\t\t\t\t cl_khr_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_media_ops cl_amd_media_ops2 cl_khr_image2d_from_buffer cl_khr_subgroups cl_khr_depth_images cl_amd_copy_buffer_p2p cl_amd_assembly_program\n\n\n## Output from llama\n\n\n    root@fedora:~# llama-cli   -m ~/models/llama-2-7b.Q4_K_M.gguf   --no-mmap   -ngl 99   -p \"Explain quantum computing in simple terms:\"   -n 256\n    ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no\n    ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no\n    ggml_cuda_init: found 1 ROCm devices:\n      Device 0: Radeon 8060S Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32\n    build: 0 (unknown) with HIP version: 6.4.43484-9999 for x86_64-redhat-linux-gnu\n    main: llama backend init\n    main: load the model and apply lora adapter, if any\n    llama_model_load_from_file_impl: using device ROCm0 (Radeon 8060S Graphics) - 64031 MiB free\n    llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from /root/models/llama-2-7b.Q4_K_M.gguf (version GGUF V2)\n    llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.\n    llama_model_loader: - kv   0:                       general.architecture str              = llama\n    llama_model_loader: - kv   1:                               general.name str              = LLaMA v2\n    llama_model_loader: - kv   2:                       llama.context_length u32              = 4096\n    llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096\n    llama_model_loader: - kv   4:                          llama.block_count u32              = 32\n    llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008\n    llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128\n    llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32\n    llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 32\n    llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010\n    llama_model_loader: - kv  10:                          general.file_type u32              = 15\n    llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama\n    llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,32000]   = [\"<unk>\", \"<s>\", \"</s>\", \"<0x00>\", \"<...\n    llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...\n    llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...\n    llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 1\n    llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 2\n    llama_model_loader: - kv  17:            tokenizer.ggml.unknown_token_id u32              = 0\n    llama_model_loader: - kv  18:               general.quantization_version u32              = 2\n    llama_model_loader: - type  f32:   65 tensors\n    llama_model_loader: - type q4_K:  193 tensors\n    llama_model_loader: - type q6_K:   33 tensors\n    print_info: file format = GGUF V2\n    print_info: file type   = Q4_K - Medium\n    print_info: file size   = 3.80 GiB (4.84 BPW)\n    load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect\n    load: special tokens cache size = 3\n    load: token to piece cache size = 0.1684 MB\n    print_info: arch             = llama\n    print_info: vocab_only       = 0\n    print_info: n_ctx_train      = 4096\n    print_info: n_embd           = 4096\n    print_info: n_layer          = 32\n    print_info: n_head           = 32\n    print_info: n_head_kv        = 32\n    print_info: n_rot            = 128\n    print_info: n_swa            = 0\n    print_info: is_swa_any       = 0\n    print_info: n_embd_head_k    = 128\n    print_info: n_embd_head_v    = 128\n    print_info: n_gqa            = 1\n    print_info: n_embd_k_gqa     = 4096\n    print_info: n_embd_v_gqa     = 4096\n    print_info: f_norm_eps       = 0.0e+00\n    print_info: f_norm_rms_eps   = 1.0e-05\n    print_info: f_clamp_kqv      = 0.0e+00\n    print_info: f_max_alibi_bias = 0.0e+00\n    print_info: f_logit_scale    = 0.0e+00\n    print_info: f_attn_scale     = 0.0e+00\n    print_info: n_ff             = 11008\n    print_info: n_expert         = 0\n    print_info: n_expert_used    = 0\n    print_info: causal attn      = 1\n    print_info: pooling type     = 0\n    print_info: rope type        = 0\n    print_info: rope scaling     = linear\n    print_info: freq_base_train  = 10000.0\n    print_info: freq_scale_train = 1\n    print_info: n_ctx_orig_yarn  = 4096\n    print_info: rope_finetuned   = unknown\n    print_info: model type       = 7B\n    print_info: model params     = 6.74 B\n    print_info: general.name     = LLaMA v2\n    print_info: vocab type       = SPM\n    print_info: n_vocab          = 32000\n    print_info: n_merges         = 0\n    print_info: BOS token        = 1 '<s>'\n    print_info: EOS token        = 2 '</s>'\n    print_info: UNK token        = 0 '<unk>'\n    print_info: LF token         = 13 '<0x0A>'\n    print_info: EOG token        = 2 '</s>'\n    print_info: max token length = 48\n    load_tensors: loading model tensors, this can take a while... (mmap = false)\n    load_tensors: offloading 32 repeating layers to GPU\n    load_tensors: offloading output layer to GPU\n    load_tensors: offloaded 33/33 layers to GPU\n    load_tensors:        ROCm0 model buffer size =  3820.94 MiB\n    load_tensors:          CPU model buffer size =    70.31 MiB\n    ..................................................................................................\n    llama_context: constructing llama_context\n    llama_context: n_seq_max     = 1\n    llama_context: n_ctx         = 4096\n    llama_context: n_ctx_per_seq = 4096\n    llama_context: n_batch       = 2048\n    llama_context: n_ubatch      = 512\n    llama_context: causal_attn   = 1\n    llama_context: flash_attn    = 0\n    llama_context: freq_base     = 10000.0\n    llama_context: freq_scale    = 1\n    llama_context:  ROCm_Host  output buffer size =     0.12 MiB\n    llama_kv_cache_unified:      ROCm0 KV buffer size =  2048.00 MiB\n    llama_kv_cache_unified: size = 2048.00 MiB (  4096 cells,  32 layers,  1 seqs), K (f16): 1024.00 MiB, V (f16): 1024.00 MiB\n    llama_kv_cache_unified: LLAMA_SET_ROWS=0, using old ggml_cpy() method for backwards compatibility\n    llama_context:      ROCm0 compute buffer size =   288.00 MiB\n    llama_context:  ROCm_Host compute buffer size =    16.01 MiB\n    llama_context: graph nodes  = 1158\n    llama_context: graph splits = 2\n    common_init_from_params: setting dry_penalty_last_n to ctx_size = 4096\n    common_init_from_params: warming up the model with an empty run - please wait ... (--no-warmup to disable)\n    main: llama threadpool init, n_threads = 16\n\n    system_info: n_threads = 16 (n_threads_batch = 16) / 32 | ROCm : NO_VMM = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : LLAMAFILE = 1 | REPACK = 1 |\n\n    sampler seed: 2232334333\n    sampler params:\n    \trepeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000\n    \tdry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 4096\n    \ttop_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.800\n    \tmirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000\n    sampler chain: logits -> logit-bias -> penalties -> dry -> top-n-sigma -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist\n    generate: n_ctx = 4096, n_batch = 2048, n_predict = 256, n_keep = 1\n\n     Explain quantum computing in simple terms: what is it, how does it work, and what are its potential benefits?\n    This is a difficult question to answer because quantum computing is not yet a well-defined field of study, and many of the potential applications are still being researched. However, we can say that quantum computing is a type of computation that relies on the principles of quantum mechanics (the branch of physics that describes the behaviour of particles such as electrons and photons).\n    These particles obey a set of rules that are different from those obeyed by classical computers, which rely on the principles of classical mechanics. Quantum computing uses a particle’s quantum state (such as its spin) to store information. This means that quantum computers can perform computations that are not possible on classical computers.\n    In the simplest terms, quantum computing is a type of computation that takes advantage of the unique properties of quantum mechanics. These properties include superposition, entanglement, and non-locality. Superposition is the ability of a quantum system to exist in multiple states simultaneously.\n    This means that a quantum system can be in two different places at the same time, or have two different properties at the same time. Entanglement is the ability of two quantum systems to be inter\n\n    llama_perf_sampler_print:    sampling time =       4.27 ms /   265 runs   (    0.02 ms per token, 62075.43 tokens per second)\n    llama_perf_context_print:        load time =     631.46 ms\n    llama_perf_context_print: prompt eval time =      63.57 ms /     9 tokens (    7.06 ms per token,   141.57 tokens per second)\n    llama_perf_context_print:        eval time =    7110.09 ms /   255 runs   (   27.88 ms per token,    35.86 tokens per second)\n    llama_perf_context_print:       total time =    7184.25 ms /   264 tokens\n\n\n# Closing words\n\nThese are just my first steps. Most of the time I was not even fully aware what I was doing, just reused some sample command lines and code. These experiments were good enough to see that AI works on Linux as well, not just on Windows.\n\nThis blog is part of a longer series about my adventures with my new machine and AI. You can reach me to discuss this blog on one of the contacts listed in the upper right corner. You can read the rest of the blogs under the toy tag.",
  "title": "My new toy: first steps with AI on Linux"
}