阅读周刊(第 8 期)

1. 文章

1.1 Sam Altman 与开发者的一小时:GPT-5 的坦白、招聘放缓、以及 2026 年最担心的事 | 宝玉的分享

1.2 LangChain Skills 模式实战:构建按需加载知识的 SQL 助手

1.3 让 AI 不忘事:LangChain 团队的上下文管理实践

1.4 Vibe PPT?这个开源AI PPT项目,Github 6.3K Star了! - 知乎

1.5 现在是 2026 年,直接用 Postgres 吧 | Tiger Data — It’s 2026, Just Use Postgres | Tiger Data

SQL
-- Full-text search with BM25
CREATE EXTENSION pg_textsearch;

-- Vector search for AI
CREATE EXTENSION vector;
CREATE EXTENSION vectorscale;

-- AI embeddings & RAG workflows
CREATE EXTENSION ai;

-- Time-series
CREATE EXTENSION timescaledb;

-- Message queues
CREATE EXTENSION pgmq;

-- Scheduled jobs
CREATE EXTENSION pg_cron;

-- Geospatial
CREATE EXTENSION postgis;
点击展开查看更多
SQL
-- Create table
CREATE TABLE articles (
  id SERIAL PRIMARY KEY,
  title TEXT,
  content TEXT
);

-- Create BM25 index
CREATE INDEX idx_articles_bm25 ON articles USING bm25(content)
  WITH (text_config = 'english');

-- Search with BM25 scoring
SELECT title, -(content <@> 'database optimization') as score
FROM articles
ORDER BY content <@> 'database optimization'
LIMIT 10;


-- 混合搜索
SELECT 
  title,
  -(content <@> 'database optimization') as bm25_score,
  embedding <=> query_embedding as vector_distance,
  0.7 * (-(content <@> 'database optimization')) + 
  0.3 * (1 - (embedding <=> query_embedding)) as hybrid_score
FROM articles
ORDER BY hybrid_score DESC
LIMIT 10;
点击展开查看更多
SQL
-- Enable extensions
CREATE EXTENSION vector;
CREATE EXTENSION vectorscale CASCADE;

-- Table with embeddings
CREATE TABLE documents (
  id SERIAL PRIMARY KEY,
  content TEXT,
  embedding vector(1536)
);

-- High-performance index (DiskANN)
CREATE INDEX idx_docs_embedding ON documents USING diskann(embedding);

-- Find similar documents
SELECT content, embedding <=> '[0.1, 0.2, ...]'::vector as distance
FROM documents
ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector
LIMIT 10;

-- 自动生成向量
SELECT ai.create_vectorizer(
  'documents'::regclass,
  loading => ai.loading_column(column_name=>'content'),
  embedding => ai.embedding_openai(model=>'text-embedding-3-small', dimensions=>'1536')
);
点击展开查看更多
SQL
-- Enable TimescaleDB
CREATE EXTENSION timescaledb;

-- Create table
CREATE TABLE metrics (
  time TIMESTAMPTZ NOT NULL,
  device_id TEXT,
  temperature DOUBLE PRECISION
);

-- Convert to hypertable
SELECT create_hypertable('metrics', 'time');

-- Query with time buckets
SELECT time_bucket('1 hour', time) as hour,
       AVG(temperature)
FROM metrics
WHERE time > NOW() - INTERVAL '24 hours'
GROUP BY hour;

-- Auto-delete old data
SELECT add_retention_policy('metrics', INTERVAL '30 days');

-- Compression (90% storage reduction)
ALTER TABLE metrics SET (timescaledb.compress);
SELECT add_compression_policy('metrics', INTERVAL '7 days');
点击展开查看更多
SQL
-- UNLOGGED = no WAL overhead, faster writes
CREATE UNLOGGED TABLE cache (
  key TEXT PRIMARY KEY,
  value JSONB,
  expires_at TIMESTAMPTZ
);

-- Set with expiration
INSERT INTO cache (key, value, expires_at)
VALUES ('user:123', '{"name": "Alice"}', NOW() + INTERVAL '1 hour')
ON CONFLICT (key) DO UPDATE SET value = EXCLUDED.value;

-- Get
SELECT value FROM cache WHERE key = 'user:123' AND expires_at > NOW();

-- Cleanup (schedule with pg_cron)
DELETE FROM cache WHERE expires_at < NOW();
点击展开查看更多
SQL
CREATE EXTENSION pgmq;
SELECT pgmq.create('my_queue');

-- Send
SELECT pgmq.send('my_queue', '{"event": "signup", "user_id": 123}');

-- Receive (with visibility timeout)
SELECT * FROM pgmq.read('my_queue', 30, 5);

-- Delete after processing
SELECT pgmq.delete('my_queue', msg_id);

-- 使用native SKIP LOCKED pattern
CREATE TABLE jobs (
  id SERIAL PRIMARY KEY,
  payload JSONB,
  status TEXT DEFAULT 'pending'
);

-- Worker claims job atomically
UPDATE jobs SET status = 'processing'
WHERE id = (
  SELECT id FROM jobs WHERE status = 'pending'
  FOR UPDATE SKIP LOCKED LIMIT 1
) RETURNING *;
点击展开查看更多
SQL
CREATE TABLE users (
  id SERIAL PRIMARY KEY,
  data JSONB
);

-- Insert nested document
INSERT INTO users (data) VALUES ('{
  "name": "Alice",
  "profile": {"bio": "Developer", "links": ["github.com/alice"]}
}');

-- Query nested fields
SELECT data->>'name', data->'profile'->>'bio'
FROM users
WHERE data->'profile'->>'bio' LIKE '%Developer%';

-- Index JSON fields
CREATE INDEX idx_users_email ON users ((data->>'email'));
点击展开查看更多
SQL
CREATE EXTENSION postgis;

CREATE TABLE stores (
  id SERIAL PRIMARY KEY,
  name TEXT,
  location GEOGRAPHY(POINT, 4326)
);

-- Find stores within 5km
SELECT name, ST_Distance(location, ST_MakePoint(-122.4, 37.78)::geography) as meters
FROM stores
WHERE ST_DWithin(location, ST_MakePoint(-122.4, 37.78)::geography, 5000);
点击展开查看更多
SQL
CREATE EXTENSION pg_cron;

-- Run every hour
SELECT cron.schedule('cleanup', '0 * * * *', 
  $$DELETE FROM cache WHERE expires_at < NOW()$$);

-- Nightly rollup
SELECT cron.schedule('rollup', '0 2 * * *',
  $$REFRESH MATERIALIZED VIEW CONCURRENTLY daily_stats$$);
点击展开查看更多
SQL
-- Reciprocal Rank Fusion: combine keyword + semantic search
WITH bm25 AS (
  SELECT id, ROW_NUMBER() OVER (ORDER BY content <@> $1) as rank
  FROM documents LIMIT 20
),
vectors AS (
  SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> $2) as rank  
  FROM documents LIMIT 20
)
SELECT d.*, 
  1.0/(60 + COALESCE(b.rank, 1000)) + 
  1.0/(60 + COALESCE(v.rank, 1000)) as score
FROM documents d
LEFT JOIN bm25 b ON d.id = b.id
LEFT JOIN vectors v ON d.id = v.id
WHERE b.id IS NOT NULL OR v.id IS NOT NULL
ORDER BY score DESC LIMIT 10;
点击展开查看更多
SQL
CREATE EXTENSION pg_trgm;

CREATE INDEX idx_name_trgm ON products USING GIN (name gin_trgm_ops);

-- Finds "PostgreSQL" even with typo
SELECT name FROM products
WHERE name % 'posgresql'
ORDER BY similarity(name, 'posgresql') DESC;
点击展开查看更多
SQL
-- Find all reports under a manager (org chart)
WITH RECURSIVE org_tree AS (
  SELECT id, name, manager_id, 1 as depth
  FROM employees WHERE id = 42
  
  UNION ALL
  
  SELECT e.id, e.name, e.manager_id, t.depth + 1
  FROM employees e
  JOIN org_tree t ON e.manager_id = t.id
  WHERE t.depth < 10
)
SELECT * FROM org_tree;
点击展开查看更多

1.6 ChatGPT - 如何执行Postgres的向量与全文搜索

我与ChatGPT聊的关于混合搜索的方案。

2. 项目

2.1 goodreasonai/ScrapeServ: A self-hosted API that takes a URL and returns a file with browser screenshots.

Avatar of goodreasonai

goodreasonai/ScrapeServ

A self-hosted API that takes a URL and returns a file with browser screenshots.

Stars: 1.1k
🍴 Forks: 83
💻 Language: Python
📄 License: MIT License

部署ScrapeServ作为Docker容器服务,然后向这个服务发送一个链接,将会收到网页上的数据文件和截图。

2.2 netease-youdao/LobsterAI: Your 24/7 all-scenario AI agent that gets work done for you.

Avatar of netease-youdao

netease-youdao/LobsterAI

Your 24/7 all-scenario AI agent that gets work done for you.

Stars: 5k
🍴 Forks: 768
💻 Language: TypeScript
📄 License: MIT License
LobsterAI - 有道 AI Agent 产品

LobsterAI 是由网易有道 开发的全场景个人助理 Agent。它 7×24 小时待命,能够帮你完成日常办公中的各类事务 —— 数据分析、制作 PPT、生成视频、撰写文档、搜索信息、收发邮件、定时任务,以及更多。

2.3 DayuanJiang/next-ai-draw-io: A next.js web application that integrates AI capabilities with draw.io diagrams. This app allows you to create, modify, and enhance diagrams through natural language commands and AI-assisted visualization.

Avatar of DayuanJiang

DayuanJiang/next-ai-draw-io

A next.js web application that integrates AI capabilities with draw.io diagrams. This app allows you to create, modify, and enhance diagrams through natural language commands and AI-assisted visualization.

Stars: 27.5k
🍴 Forks: 2.9k
💻 Language: TypeScript
📄 License: Apache License 2.0

基于 Next.js 的 Web 应用程序,它将 AI 功能与 draw.io 图表集成在一起。用户可以通过自然语言命令和 AI 辅助可视化来创建、修改和增强图表。

版权声明

作者: Yao

链接: https://yaoliu.site/posts/weekly-008/

许可证: CC BY-NC-SA 4.0

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Please attribute the source, use non-commercially, and maintain the same license.

评论

开始搜索

输入关键词搜索文章内容

↑↓
ESC
⌘K 快捷键