Penry 的秘密小屋

Python - 100天从新手到大师

荐

日常学习网站导航推荐：效率神器全收藏（支持投稿）

荐

Lecture 2 - State Value and Bellman Equation

荐

RTX 50系列工作站 Ubuntu 深度学习环境配置全指南

荐

AutoLabor-ROS-Python 学习记录

hexo配置与魔改

深度学习

LaTex

数学基础

编程基础

强化学习

工具推荐

学习笔记

具身智能

深度学习

最新未读

深度学习笔记 - Softmax反向传播梯度计算推导过程

发表于2025-12-152025-12-15 深度学习反向传播 Softmax

这里主要参考了深肚学习 - 超详细的softmax的反向传播梯度计算推导并扩展为n分类的情况。对于深度学习多分类问题，神经网络最后一层常用softmax处理多项分布，这里我们设定n分类任务，推导其正向传播和反向传播中最后一层的相关计算公式。正向传播对于n分类任务，这里我们设定输出层的相关信息如下：输入：输出层神经元的输入是Z=[z1,z2,⋅,zn]Z = [z_1,z_2,\cdot, z_n]Z=[z1,z2,⋅,zn] 分类真实标签是Y=[y1,y2,⋅,yn]Y = [y_1, y_2, \cdot, y_n]Y=[y1,y2,⋅,yn]，其中YYY是one-shot分类标签，只有真实类别yk=1y_k=1yk=1，其余均为0。输出： Y^=softmax(Z)=[y^1,y^2,…,y^n]\hat{Y} = \text{softmax}(Z) = \left[\hat{y}_1, \hat{y}_2, \dots, \hat{y}_n\right]Y^=softmax(Z)=[y^1,y^2,…,y^n]，其中y^i\h ...

工具推荐

未读

RTX 50系列工作站 Ubuntu 深度学习环境配置全指南

发表于2025-12-092025-12-09 深度学习 Ubuntu GPU NVIDIA

前言本以为装好 Ubuntu 驱动就能直接使用，结果遭遇了“混合显卡黑屏”、“HDMI 外接显示器不亮”、“PyTorch 运行大矩阵运算报错”等一系列连环坑。经过漫长的排查，终于从驱动底层到环境变量彻底跑通。为了避免大家重蹈覆辙，特此记录这套目前最稳定、性能无损的解决方案。环境信息硬件：RTX 50 系列工作站 GPU（Blackwell 架构，如 5090/5080 等）系统：Ubuntu 22.04 / 24.04 驱动版本：NVIDIA Driver 580.x (Proprietary/闭源版) 核心问题： prime-select on-demand 模式下外接 HDMI 无法点亮 PyTorch 大模型/大矩阵乘法报错：RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED 显卡频繁休眠导致代码冷启动超时解决方案汇总（Gold Standard）第一步：切换显卡模式（解决 HDMI 与基础稳定性） RTX 50 系列高性能本/移动工作站的 HDMI 接口通常物理直连独显。在 Li ...

LaTex

未读

Mac 上配置 LaTeX 编译环境完整指南：从零到 VS Code 完美集成

发表于2025-11-132025-11-13 LaTex VSCode配置工具推荐 Mac

写在前面 LaTeX 是学术写作和科技文档排版的首选工具，但在 Mac 上配置 LaTeX 编译环境对于新手来说可能有些复杂。本文将从零开始，详细介绍如何在 Mac 上配置完整的 LaTeX 环境，并实现与 VS Code 的完美集成。本文特色：全局配置方案：一次配置，所有 LaTeX 项目自动应用，无需重复设置完整功能支持：包含 XeLaTeX、PDFLaTeX、LaTeXmk、BibTeX、Biber 等所有常用工具智能编译：自动处理多遍编译、参考文献和依赖关系自动清理：编译后自动清理临时文件，保持项目整洁效率提升：自定义快捷键配置，大幅提升写作速度通过本文的配置，你将能够在任何 LaTeX 项目中享受一致的、高效的编写体验。目录本文包含以下内容： MacTeX 安装（三种方案）环境变量配置 VS Code 集成配置（全局配置推荐）完整编译工具和配方配置自定义快捷键配置（提升写作效率）测试验证常见问题解决方案选择在 Mac 上安装 LaTeX，主要有三种方案：方案安装包大小特点适用场景 MacTeX ~4GB ...

工具推荐

未读

Hexo 博客从 Windows 迁移到 Mac 完整指南

发表于2025-11-122025-11-12 工具推荐 Mac Hexo 博客迁移 Windows

✨ 写在前面最近将我的 Hexo 博客从 Windows 系统迁移到了 Mac 系统。虽然 Hexo 本身是跨平台的，但在迁移过程中还是遇到了一些需要注意的问题。本文详细记录了整个迁移过程，希望能帮助有相同需求的读者顺利完成迁移工作。 📋 迁移前准备在开始迁移之前，建议先做好以下准备工作：备份重要文件备份整个博客文件夹（包括 source/、themes/、_config.yml 等）确保所有博客文章和配置文件都已保存检查系统环境 Mac 系统：macOS 15.5（或其他版本） Node.js：建议使用 LTS 版本 npm：随 Node.js 一起安装 🚀 迁移步骤 1. 安装 Node.js 和 npm Mac 系统上安装 Node.js 有多种方式，推荐使用 Homebrew： 123456# 使用 Homebrew 安装 Node.js（会自动安装 npm）brew install node# 验证安装node --versionnpm --version 或者从 Node.js 官网下载安装包直接安装。 2. 重新安装项目 ...

Lecture 4 - Value Iteration and Policy Iteration

强化学习

未读

Lecture 4 - Value Iteration and Policy Iteration

发表于2025-10-122025-10-12 Python 强化学习 Pytorch 数学原理

1-Value iteration algorithm 2-Policy iteration algorithm 3-Truncated policy iteration algorithm

Lecture 3 - Optimal Policy and Bellman Optimality Equation

强化学习

未读

Lecture 3 - Optimal Policy and Bellman Optimality Equation

发表于2025-10-112025-10-11 Python 强化学习 Pytorch 数学原理

Lecture 3: Optimal Policy and Bellman Optimality Equation 1-Outline In this lecture: Core concepts: optimal policy and optimal state value A fundamental tool: Bellman optimality equation (BOE) 2-Motivating examples Exercise: write out the Bellman equation and solve the state values (set γ=0.9\gamma = 0.9γ=0.9) Bellman equations: vπ(s1)=−1+γvπ(s2),v_\pi(s_1) = -1 + \gamma v_\pi(s_2), vπ(s1)=−1+γvπ(s2), vπ(s2)=+1+γvπ(s4),v_\pi(s_2) = +1 + \gamma v_\pi(s_4), vπ(s2)=+1+γvπ(s4 ...

Lecture 2 - State Value and Bellman Equation

强化学习

未读

Lecture 2 - State Value and Bellman Equation

发表于2025-10-012025-10-01 Python 强化学习 Pytorch 数学原理

Lecture 2: State Value and Bellman Equation Outline In this lecture: A core concept: state value A fundamental tool: Bellman equation Motivating examples Motivating example 1: Why return is important? In summary, starting from s1s_1s1, return1>return3>return2return_1 > return_3 > return_2 return1>return3>return2 The above inequality suggests that the first policy is the best and the second policy is the worst, which is exactly the same as our intui ...

Table of Contents for The Mathematical Principles of Reinforcement Learning

强化学习

未读

Table of Contents for The Mathematical Principles of Reinforcement Learning

发表于2025-09-302025-09-30 Python 强化学习 Pytorch 数学原理

引言本笔记系统整理和总结了强化学习领域的核心数学原理，内容主要参考自B站课程：【强化学习的数学原理】课程：从零开始到透彻理解（完结）。课程由西湖大学工学院赵世钰老师主讲，涵盖了强化学习的基本概念、马尔可夫决策过程（MDP）、动态规划、蒙特卡洛方法等重要内容，本课程重点讲解的是 RL 的算法原理，适合希望深入理解强化学习本质的同学学习。赵老师的个人主页可参考：赵世钰。相关资源与链接整理如下：课程视频（知乎）：https://www.zhihu.com/education/video-course/1574007679344930816?section_id=1574047391564390400 课程视频（B站）：https://space.bilibili.com/2044042934 全英课程视频（YouTube）：https://www.youtube.com/watch?v=ZHMWHr9811U&list=PLEhdbSEZZbDaFWPX4gehhwB9vJZJ1DNm8&index=2 书籍PDF与PPT下载（GitHub）：https://git ...

Lecture 0 - Overview of Reinforcement Learning in 30 Minutes

强化学习

未读

Lecture 0 - Overview of Reinforcement Learning in 30 Minutes

发表于2025-09-302025-09-30 Python 强化学习 Pytorch 数学原理

Overview of Reinforcement Learning in 30 Minutes Not just the map of this course, but also for RL fundations. Fundamental tools + Algorithms and methods. Importance of different parts. Chapter 1: Basic Concepts Concepts: state, action, reward, return, episode, policy,… Grid-world examples Markov decision process (MDP) Fundamental concepts, widely used later Chapter 2: Bellman Equations One concept: state valuevπ(s)=E[Gt∣St=s]\mathbb{v}_{\pi}(s) = \mathbb{E}[G_t | S_t = s] vπ(s)=E[G ...

Lecture 1 - Basic Concepts in Reinforcement Learning

强化学习

未读

Lecture 1 - Basic Concepts in Reinforcement Learning

发表于2025-09-302025-09-30 Python 强化学习 Pytorch 数学原理

Lecture 1: Basic Concepts in Reinforcement Learning Contents First, introduce fundamental concepts in RL by examples. Second, formalize the conceptts in the context of Markov decision process. A grid-world example State State: The status of the agent with respect to the environment 针对于 grid-world 示例，state 指的是 location，如下图中一共有 9 个 location，也就对应了 9 个 state。 State space: The set of all states S=sii=19\mathbb{S} = {s_i}^{9}_{i=1}S=sii=19. Action Action: For each state, there ar ...