MobileCLIP 完整使用指南

Apple MobileCLIP 模型介紹 + 完整程式碼範例

📑 目錄

模型介紹與選擇
安裝與環境設置
基礎使用範例
進階應用範例
效能優化技巧
常見問題處理

模型介紹與選擇

📊 五個模型完整對比

模型	參數量	推論速度	ImageNet 準確度	38 個數據集平均	檔案大小	適用場景
S0	54M	3.1ms	67.8%	58.1%	~45 MB	極致輕量，低階手機
S1	63M	3.3ms	72.6%	61.3%	~55 MB	平衡速度與準確度 ⭐
S2	82M	4.2ms	75.7%	63.7%	~70 MB	較高準確度需求 ⭐
B	86M	5.4ms	76.8%	65.2%	~75 MB	高準確度，中階手機
B (LT)	86M	5.4ms	77.2%	65.8%	~75 MB	最高準確度

⭐ 推薦：以圖找圖應用優先選擇 S1 或 S2

🔍 模型詳細說明

mobileclip_s0.pt - 極致輕量版

✓ 特點：體積最小、速度最快
✓ 參數：11.4M (圖像) + 42.4M (文字) = 53.8M
✓ 速度：1.5ms (圖像) + 1.6ms (文字) = 3.1ms
✓ 準確度：ImageNet 67.8%

適合：
- 入門級手機、IoT 裝置
- 即時處理需求（相機 App）
- 需要極低延遲的應用

比較：
與 OpenAI ViT-B/16 準確度相當，但快 4.8 倍、小 2.8 倍

mobileclip_s1.pt - 輕量平衡版 ⭐

✓ 特點：輕量與準確度的最佳平衡點
✓ 參數：約 63M
✓ 速度：約 3.3ms
✓ 準確度：ImageNet 72.6%

適合：
- 一般手機應用
- 大多數以圖找圖場景
- 平衡效能與品質

推薦理由：
- 快速驗證 POC 的最佳選擇
- Android 移植最容易
- 準確度已足夠大多數場景

mobileclip_s2.pt - 中等規模版 ⭐

✓ 特點：更高準確度，仍保持輕量
✓ 參數：約 82M
✓ 速度：約 4.2ms
✓ 準確度：ImageNet 75.7%

適合：
- 中高階手機
- 需要較高辨識準確度的應用
- 電商、圖片搜尋等場景

比較：
比 SigLIP ViT-B/16 快 2.3 倍、小 2.1 倍，但準確度更高

mobileclip_b.pt - 標準大模型

✓ 特點：高準確度版本
✓ 參數：約 86M
✓ 速度：約 5.4ms
✓ 準確度：ImageNet 76.8%

適合：
- 旗艦手機、平板
- 專業應用（設計、創意工具）
- 對準確度要求高的場景

mobileclip_blt.pt - 長訓練版本

✓ 特點：B 版本的增強訓練版，準確度最高
✓ 參數：86M（與 B 相同）
✓ 速度：5.4ms（與 B 相同）
✓ 準確度：ImageNet 77.2%（最高）
✓ 訓練：使用更長的訓練時間（600k iterations）

適合：
- 需要最佳準確度的應用
- 服務端部署（記憶體不受限）
- 品質優先的場景

比較：
準確度超越 OpenAI ViT-L/14@336

🎯 模型選擇決策樹

┌─────────────────────────────┐
│  需要最快速度？              │
│  └─ Yes → S0                │
└─────────────────────────────┘
         │ No
         ↓
┌─────────────────────────────┐
│  需要最高準確度？            │
│  └─ Yes → B (LT)            │
└─────────────────────────────┘
         │ No
         ↓
┌─────────────────────────────┐
│  介於兩者之間？              │
│  ├─ 偏向速度 → S1 ⭐        │
│  ├─ 平衡 → S2 ⭐            │
│  └─ 偏向準確度 → B          │
└─────────────────────────────┘

📱 實際應用場景推薦

應用場景	推薦模型	理由
相機即時辨識	S0	速度優先，低延遲
手機相簿搜尋	S1 或 S2	平衡體驗
電商以圖找圖	S2 或 B	準確度重要
專業圖片管理	B (LT)	品質優先
IoT/邊緣設備	S0	資源受限
服務端 API	B (LT)	無資源限制

安裝與環境設置

📦 安裝依賴

# 安裝必要套件
pip install torch torchvision pillow numpy tqdm matplotlib

# 安裝 MobileCLIP
pip install git+https://github.com/apple/ml-mobileclip.git

📥 下載預訓練模型

# 建立模型資料夾
mkdir -p checkpoints
cd checkpoints

# 下載模型（選一個或多個）
# S0 - 最輕量（建議先下載這個測試）
wget https://docs-assets.developer.apple.com/ml-research/datasets/mobileclip/mobileclip_s0.pt

# S1 - 平衡版（推薦）
wget https://docs-assets.developer.apple.com/ml-research/datasets/mobileclip/mobileclip_s1.pt

# S2 - 中等規模（推薦）
wget https://docs-assets.developer.apple.com/ml-research/datasets/mobileclip/mobileclip_s2.pt

# B - 大模型
wget https://docs-assets.developer.apple.com/ml-research/datasets/mobileclip/mobileclip_b.pt

# B (LT) - 最佳準確度
wget https://docs-assets.developer.apple.com/ml-research/datasets/mobileclip/mobileclip_blt.pt

或使用官方腳本：

# 下載所有模型
source get_pretrained_models.sh

基礎使用範例

🎯 範例 1：單張圖片特徵提取（最基本）

import torch
import mobileclip
from PIL import Image

# ========== 1. 載入模型 ==========
model, _, preprocess = mobileclip.create_model_and_transforms(
    'mobileclip_s1',  # 模型名稱：s0, s1, s2, b
    pretrained='checkpoints/mobileclip_s1.pt'  # 權重檔案路徑
)

# 設定為評估模式
model.eval()

# 選擇裝置（GPU 或 CPU）
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model.to(device)

print(f"✓ 模型已載入，使用裝置: {device}")


# ========== 2. 從圖片檔案轉換成 tensor ==========

# 讀取圖片（支援 jpg, png 等格式）
image_path = "my_cat.jpg"
image = Image.open(image_path).convert('RGB')  # 確保是 RGB 格式

print(f"✓ 原始圖片大小: {image.size}")

# 使用 preprocess 進行預處理（resize, normalize 等）
image_tensor = preprocess(image)  # 輸出 shape: (3, H, W)

print(f"✓ 預處理後 tensor shape: {image_tensor.shape}")

# 增加 batch 維度 (1, 3, H, W)
image_tensor = image_tensor.unsqueeze(0)

print(f"✓ 加入 batch 維度後: {image_tensor.shape}")

# 移動到對應裝置
image_tensor = image_tensor.to(device)


# ========== 3. 提取特徵 ==========

with torch.no_grad():  # 不需要計算梯度
    image_features = model.encode_image(image_tensor)
    
    # L2 歸一化（重要！用於計算相似度）
    image_features = image_features / image_features.norm(dim=-1, keepdim=True)

print(f"✓ 特徵向量 shape: {image_features.shape}")  # (1, 512)
print(f"✓ 特徵向量範例（前 10 維）: {image_features[0, :10]}")

# 轉換成 numpy（如果需要儲存或進一步處理）
features_numpy = image_features.cpu().numpy()
print(f"✓ Numpy 格式 shape: {features_numpy.shape}")

輸出範例：

✓ 模型已載入，使用裝置: cpu
✓ 原始圖片大小: (1920, 1080)
✓ 預處理後 tensor shape: torch.Size([3, 256, 256])
✓ 加入 batch 維度後: torch.Size([1, 3, 256, 256])
✓ 特徵向量 shape: torch.Size([1, 512])
✓ 特徵向量範例（前 10 維）: tensor([ 0.0234, -0.1234,  0.0567, ...])
✓ Numpy 格式 shape: (1, 512)

📐 關鍵概念說明

image_tensor 的完整轉換流程

# 步驟 1: 讀取圖片檔案
image = Image.open("cat.jpg").convert('RGB')
# → PIL.Image 物件，例如 (1920, 1080, 3)

# 步驟 2: 預處理（resize, normalize）
image_tensor = preprocess(image)
# → torch.Tensor, shape: (3, H, W)，例如 (3, 256, 256)
# → 值範圍已被標準化（通常是 [-1, 1] 或 [0, 1]）

# 步驟 3: 增加 batch 維度
image_tensor = image_tensor.unsqueeze(0)
# → shape: (1, 3, H, W)，例如 (1, 3, 256, 256)

# 步驟 4: 移到對應裝置
image_tensor = image_tensor.to(device)
# → 如果有 GPU 就移到 GPU，否則留在 CPU

# 步驟 5: 提取特徵
image_features = model.encode_image(image_tensor)
# → shape: (1, 512)，就是你要的特徵向量！

preprocess 做了什麼？

preprocess 是 MobileCLIP 提供的預處理函數，等同於：

from torchvision import transforms

preprocess = transforms.Compose([
    transforms.Resize(256),              # 調整大小
    transforms.CenterCrop(256),          # 中心裁切
    transforms.ToTensor(),               # 轉成 Tensor
    transforms.Normalize(                # 標準化
        mean=[0.485, 0.456, 0.406],      # ImageNet mean
        std=[0.229, 0.224, 0.225]        # ImageNet std
    )
])

進階應用範例

🚀 範例 2：批次處理多張圖片（更快）

import torch
import mobileclip
from PIL import Image
from pathlib import Path
from tqdm import tqdm

# 載入模型
model, _, preprocess = mobileclip.create_model_and_transforms(
    'mobileclip_s1',
    pretrained='checkpoints/mobileclip_s1.pt'
)
model.eval()
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model.to(device)


# ========== 批次處理多張圖片 ==========

def extract_features_batch(image_paths, batch_size=32):
    """
    批次提取多張圖片的特徵
    
    Args:
        image_paths: 圖片路徑列表
        batch_size: 批次大小
    
    Returns:
        features: (N, 512) 的特徵矩陣
        valid_paths: 成功處理的圖片路徑列表
    """
    all_features = []
    valid_paths = []
    
    # 分批處理
    for i in tqdm(range(0, len(image_paths), batch_size), desc="提取特徵"):
        batch_paths = image_paths[i:i+batch_size]
        
        # 載入並預處理這一批圖片
        batch_images = []
        batch_valid_paths = []
        
        for path in batch_paths:
            try:
                img = Image.open(path).convert('RGB')
                img_tensor = preprocess(img)
                batch_images.append(img_tensor)
                batch_valid_paths.append(path)
            except Exception as e:
                print(f"⚠ 無法讀取 {path}: {e}")
                continue
        
        if not batch_images:
            continue
        
        # 堆疊成 batch (B, 3, H, W)
        batch_tensor = torch.stack(batch_images).to(device)
        
        # 提取特徵
        with torch.no_grad():
            features = model.encode_image(batch_tensor)
            # L2 歸一化
            features = features / features.norm(dim=-1, keepdim=True)
        
        all_features.append(features.cpu())
        valid_paths.extend(batch_valid_paths)
    
    # 合併所有批次
    if all_features:
        all_features = torch.cat(all_features, dim=0)
    else:
        all_features = torch.empty(0, 512)
    
    return all_features, valid_paths


# ========== 使用範例 ==========

# 掃描圖片資料夾
image_folder = "./my_photos"
image_paths = []

for ext in ['*.jpg', '*.jpeg', '*.png', '*.webp']:
    image_paths.extend(list(Path(image_folder).glob(ext)))

image_paths = [str(p) for p in image_paths]
print(f"找到 {len(image_paths)} 張圖片")

# 批次提取特徵
features, valid_paths = extract_features_batch(image_paths, batch_size=32)

print(f"✓ 特徵矩陣 shape: {features.shape}")  # (N, 512)
print(f"✓ 成功處理 {len(valid_paths)} 張圖片")

# 儲存特徵
import numpy as np
np.savez('image_features.npz', 
         features=features.numpy(),
         paths=valid_paths)
print("✓ 特徵已儲存到 image_features.npz")

🔍 範例 3：以圖找圖（完整流程）

步驟 1: 建立圖片索引

import torch
import mobileclip
from PIL import Image
import numpy as np
from pathlib import Path
from tqdm import tqdm

def build_image_index(image_folder, model_name='mobileclip_s1', output_file='index.npz'):
    """
    建立圖片索引
    
    Args:
        image_folder: 圖片資料夾路徑
        model_name: 使用的模型名稱
        output_file: 索引輸出檔案
    
    Returns:
        features_matrix: (N, 512) 特徵矩陣
        image_paths: 圖片路徑列表
    """
    
    # 載入模型
    print("📦 載入模型...")
    model, _, preprocess = mobileclip.create_model_and_transforms(
        model_name,
        pretrained=f'checkpoints/{model_name}.pt'
    )
    model.eval()
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    model.to(device)
    print(f"✓ 模型已載入到 {device}")
    
    # 掃描所有圖片
    print("\n📂 掃描圖片...")
    image_paths = []
    for ext in ['*.jpg', '*.jpeg', '*.png', '*.webp', '*.bmp']:
        image_paths.extend(Path(image_folder).glob(ext))
    
    image_paths = [str(p) for p in image_paths]
    print(f"✓ 找到 {len(image_paths)} 張圖片")
    
    # 批次提取特徵
    print("\n🎨 提取特徵...")
    all_features = []
    valid_paths = []
    batch_size = 32
    
    for i in tqdm(range(0, len(image_paths), batch_size)):
        batch_paths = image_paths[i:i+batch_size]
        batch_images = []
        batch_valid = []
        
        for img_path in batch_paths:
            try:
                img = Image.open(img_path).convert('RGB')
                img_tensor = preprocess(img)
                batch_images.append(img_tensor)
                batch_valid.append(img_path)
            except Exception as e:
                print(f"⚠ 跳過 {img_path}: {e}")
        
        if not batch_images:
            continue
        
        # 批次推論
        batch_tensor = torch.stack(batch_images).to(device)
        
        with torch.no_grad():
            features = model.encode_image(batch_tensor)
            features = features / features.norm(dim=-1, keepdim=True)
        
        all_features.append(features.cpu().numpy())
        valid_paths.extend(batch_valid)
    
    # 合併特徵
    features_matrix = np.vstack(all_features)
    
    # 儲存索引
    print(f"\n💾 儲存索引...")
    np.savez(output_file, 
             features=features_matrix,
             paths=valid_paths,
             model_name=model_name)
    
    print(f"✓ 索引已建立: {output_file}")
    print(f"✓ 特徵矩陣 shape: {features_matrix.shape}")
    print(f"✓ 圖片數量: {len(valid_paths)}")
    
    return features_matrix, valid_paths


# ========== 使用範例 ==========
if __name__ == '__main__':
    features, paths = build_image_index(
        image_folder='./my_photos',
        model_name='mobileclip_s1',
        output_file='photo_index.npz'
    )

步驟 2: 搜尋相似圖片

import torch
import mobileclip
from PIL import Image
import numpy as np

def search_similar_images(query_image_path, 
                         index_file='index.npz', 
                         top_k=5,
                         model_name='mobileclip_s1'):
    """
    搜尋相似圖片
    
    Args:
        query_image_path: 查詢圖片路徑
        index_file: 索引檔案路徑
        top_k: 返回結果數量
        model_name: 使用的模型名稱
    
    Returns:
        results: [(image_path, similarity_score), ...]
    """
    
    # 載入索引
    print(f"📂 載入索引: {index_file}")
    data = np.load(index_file, allow_pickle=True)
    index_features = data['features']  # (N, 512)
    image_paths = data['paths'].tolist()
    
    print(f"✓ 載入 {len(image_paths)} 張圖片的索引")
    
    # 載入模型
    print(f"\n📦 載入模型...")
    model, _, preprocess = mobileclip.create_model_and_transforms(
        model_name,
        pretrained=f'checkpoints/{model_name}.pt'
    )
    model.eval()
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    model.to(device)
    
    # 提取查詢圖片特徵
    print(f"\n🔍 提取查詢圖片特徵...")
    query_img = Image.open(query_image_path).convert('RGB')
    query_tensor = preprocess(query_img).unsqueeze(0).to(device)
    
    with torch.no_grad():
        query_feat = model.encode_image(query_tensor)
        query_feat = query_feat / query_feat.norm(dim=-1, keepdim=True)
    
    query_feat = query_feat.cpu().numpy()  # (1, 512)
    
    # 計算餘弦相似度（矩陣乘法）
    print(f"\n📊 計算相似度...")
    similarities = np.dot(index_features, query_feat.T).squeeze()  # (N,)
    
    # 取 Top-K
    top_indices = np.argsort(similarities)[::-1][:top_k]
    
    # 顯示結果
    print(f"\n{'='*60}")
    print(f"查詢圖片: {query_image_path}")
    print(f"{'='*60}")
    print(f"\nTop-{top_k} 最相似圖片:\n")
    
    results = []
    for i, idx in enumerate(top_indices):
        path = image_paths[idx]
        score = similarities[idx]
        print(f"{i+1}. {path}")
        print(f"   相似度: {score:.4f} ({score*100:.2f}%)\n")
        results.append((path, float(score)))
    
    return results


# ========== 使用範例 ==========
if __name__ == '__main__':
    results = search_similar_images(
        query_image_path='./query_cat.jpg',
        index_file='photo_index.npz',
        top_k=5,
        model_name='mobileclip_s1'
    )

📊 範例 4：視覺化搜尋結果

import matplotlib.pyplot as plt
from PIL import Image
import numpy as np

def visualize_search_results(query_path, results, top_k=5, save_path='search_results.png'):
    """
    視覺化搜尋結果
    
    Args:
        query_path: 查詢圖片路徑
        results: [(image_path, score), ...] 搜尋結果
        top_k: 顯示數量
        save_path: 儲存路徑
    """
    
    # 設定圖表
    fig, axes = plt.subplots(1, top_k+1, figsize=(3*(top_k+1), 3))
    
    # 顯示查詢圖片
    query_img = Image.open(query_path)
    axes[0].imshow(query_img)
    axes[0].set_title('Query Image', fontsize=12, fontweight='bold', color='red')
    axes[0].axis('off')
    axes[0].set_facecolor('#ffe6e6')
    
    # 顯示搜尋結果
    for i, (img_path, score) in enumerate(results[:top_k]):
        try:
            img = Image.open(img_path)
            axes[i+1].imshow(img)
            axes[i+1].set_title(
                f'#{i+1}\nScore: {score:.3f}', 
                fontsize=10,
                color='green' if score > 0.8 else 'blue'
            )
            axes[i+1].axis('off')
        except Exception as e:
            print(f"無法載入圖片 {img_path}: {e}")
            axes[i+1].text(0.5, 0.5, 'Error', ha='center', va='center')
            axes[i+1].axis('off')
    
    plt.tight_layout()
    plt.savefig(save_path, dpi=150, bbox_inches='tight', facecolor='white')
    plt.show()
    
    print(f"✓ 搜尋結果已儲存到 {save_path}")


# ========== 使用範例 ==========
if __name__ == '__main__':
    # 先搜尋
    results = search_similar_images(
        query_image_path='./query_cat.jpg',
        index_file='photo_index.npz',
        top_k=5
    )
    
    # 視覺化
    visualize_search_results(
        query_path='./query_cat.jpg',
        results=results,
        top_k=5
    )

🎯 範例 5：完整的 CLI 工具

build_index.py - 建立索引

#!/usr/bin/env python3
"""
建立圖片索引
用法: python build_index.py --images ./photos --output index.npz --model mobileclip_s1
"""

import argparse
import torch
import mobileclip
from PIL import Image
import numpy as np
from pathlib import Path
from tqdm import tqdm

def main():
    parser = argparse.ArgumentParser(description='建立圖片索引')
    parser.add_argument('--images', required=True, help='圖片資料夾路徑')
    parser.add_argument('--output', default='index.npz', help='輸出索引檔案')
    parser.add_argument('--model', default='mobileclip_s1', 
                       choices=['mobileclip_s0', 'mobileclip_s1', 'mobileclip_s2', 
                               'mobileclip_b', 'mobileclip_blt'],
                       help='使用的模型')
    parser.add_argument('--batch-size', type=int, default=32, help='批次大小')
    args = parser.parse_args()
    
    print("="*60)
    print("MobileCLIP 圖片索引建立工具")
    print("="*60)
    
    # 載入模型
    print(f"\n📦 載入模型: {args.model}")
    model, _, preprocess = mobileclip.create_model_and_transforms(
        args.model.replace('mobileclip_', ''),
        pretrained=f'checkpoints/{args.model}.pt'
    )
    model.eval()
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    model.to(device)
    print(f"✓ 使用裝置: {device}")
    
    # 掃描圖片
    print(f"\n📂 掃描圖片資料夾: {args.images}")
    image_paths = []
    for ext in ['*.jpg', '*.jpeg', '*.png', '*.webp', '*.bmp']:
        image_paths.extend(Path(args.images).glob(ext))
    image_paths = [str(p) for p in image_paths]
    print(f"✓ 找到 {len(image_paths)} 張圖片")
    
    if len(image_paths) == 0:
        print("❌ 沒有找到圖片，請檢查路徑")
        return
    
    # 提取特徵
    print(f"\n🎨 提取特徵（batch_size={args.batch_size}）...")
    all_features = []
    valid_paths = []
    
    for i in tqdm(range(0, len(image_paths), args.batch_size)):
        batch_paths = image_paths[i:i+args.batch_size]
        batch_images = []
        batch_valid = []
        
        for img_path in batch_paths:
            try:
                img = Image.open(img_path).convert('RGB')
                img_tensor = preprocess(img)
                batch_images.append(img_tensor)
                batch_valid.append(img_path)
            except:
                continue
        
        if batch_images:
            batch_tensor = torch.stack(batch_images).to(device)
            with torch.no_grad():
                features = model.encode_image(batch_tensor)
                features = features / features.norm(dim=-1, keepdim=True)
            all_features.append(features.cpu().numpy())
            valid_paths.extend(batch_valid)
    
    # 合併並儲存
    features_matrix = np.vstack(all_features)
    
    print(f"\n💾 儲存索引...")
    np.savez(args.output,
             features=features_matrix,
             paths=valid_paths,
             model_name=args.model)
    
    print(f"\n{'='*60}")
    print("✅ 索引建立完成！")
    print(f"{'='*60}")
    print(f"輸出檔案: {args.output}")
    print(f"特徵矩陣: {features_matrix.shape}")
    print(f"成功處理: {len(valid_paths)} 張圖片")
    print(f"失敗: {len(image_paths) - len(valid_paths)} 張圖片")

if __name__ == '__main__':
    main()

search.py - 搜尋相似圖片

#!/usr/bin/env python3
"""
搜尋相似圖片
用法: python search.py --query cat.jpg --index index.npz --top 5
"""

import argparse
import torch
import mobileclip
from PIL import Image
import numpy as np

def main():
    parser = argparse.ArgumentParser(description='搜尋相似圖片')
    parser.add_argument('--query', required=True, help='查詢圖片路徑')
    parser.add_argument('--index', required=True, help='索引檔案路徑')
    parser.add_argument('--top', type=int, default=5, help='返回結果數量')
    parser.add_argument('--visualize', action='store_true', help='視覺化結果')
    args = parser.parse_args()
    
    print("="*60)
    print("MobileCLIP 以圖找圖工具")
    print("="*60)
    
    # 載入索引
    print(f"\n📂 載入索引: {args.index}")
    data = np.load(args.index, allow_pickle=True)
    index_features = data['features']
    image_paths = data['paths'].tolist()
    model_name = str(data['model_name'])
    
    print(f"✓ 載入 {len(image_paths)} 張圖片")
    print(f"✓ 使用模型: {model_name}")
    
    # 載入模型
    print(f"\n📦 載入模型...")
    model, _, preprocess = mobileclip.create_model_and_transforms(
        model_name.replace('mobileclip_', ''),
        pretrained=f'checkpoints/{model_name}.pt'
    )
    model.eval()
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    model.to(device)
    
    # 提取查詢特徵
    print(f"\n🔍 分析查詢圖片: {args.query}")
    query_img = Image.open(args.query).convert('RGB')
    query_tensor = preprocess(query_img).unsqueeze(0).to(device)
    
    with torch.no_grad():
        query_feat = model.encode_image(query_tensor)
        query_feat = query_feat / query_feat.norm(dim=-1, keepdim=True)
    
    query_feat = query_feat.cpu().numpy()
    
    # 計算相似度
    print(f"\n📊 計算相似度...")
    similarities = np.dot(index_features, query_feat.T).squeeze()
    top_indices = np.argsort(similarities)[::-1][:args.top]
    
    # 顯示結果
    print(f"\n{'='*60}")
    print(f"Top-{args.top} 最相似圖片:")
    print(f"{'='*60}\n")
    
    results = []
    for i, idx in enumerate(top_indices):
        path = image_paths[idx]
        score = similarities[idx]
        print(f"{i+1}. {path}")
        print(f"   相似度: {score:.4f} ({score*100:.2f}%)\n")
        results.append((path, float(score)))
    
    # 視覺化（可選）
    if args.visualize:
        import matplotlib.pyplot as plt
        
        fig, axes = plt.subplots(1, args.top+1, figsize=(3*(args.top+1), 3))
        
        # 查詢圖片
        axes[0].imshow(query_img)
        axes[0].set_title('Query', fontweight='bold')
        axes[0].axis('off')
        
        # 結果
        for i, (path, score) in enumerate(results):
            img = Image.open(path)
            axes[i+1].imshow(img)
            axes[i+1].set_title(f'#{i+1}: {score:.3f}')
            axes[i+1].axis('off')
        
        plt.tight_layout()
        plt.savefig('search_results.png', dpi=150, bbox_inches='tight')
        print(f"✓ 視覺化結果已儲存: search_results.png")
        plt.show()

if __name__ == '__main__':
    main()

使用範例

# 建立索引
python build_index.py --images ./my_photos --output photos.npz --model mobileclip_s1

# 搜尋相似圖片
python search.py --query ./cat.jpg --index photos.npz --top 5

# 搜尋並視覺化
python search.py --query ./cat.jpg --index photos.npz --top 5 --visualize

效能優化技巧

⚡ 1. 使用 GPU 加速

# 檢查並使用 GPU
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model.to(device)

# 確認是否使用 GPU
print(f"使用裝置: {device}")
print(f"GPU 名稱: {torch.cuda.get_device_name(0)}" if torch.cuda.is_available() else "")

⚡ 2. 批次處理（速度提升 5-10 倍）

# ❌ 不好：一張張處理
for img_path in image_paths:
    tensor = preprocess(Image.open(img_path)).unsqueeze(0)
    features = model.encode_image(tensor)

# ✅ 好：批次處理
batch_size = 32
for i in range(0, len(image_paths), batch_size):
    batch = [preprocess(Image.open(p)) for p in image_paths[i:i+batch_size]]
    batch_tensor = torch.stack(batch)
    batch_features = model.encode_image(batch_tensor)  # 快很多！

⚡ 3. 使用混合精度（FP16）

# 使用自動混合精度（AMP）
with torch.cuda.amp.autocast():
    image_features = model.encode_image(image_tensor)

# 速度提升約 2 倍，記憶體減少約 50%

⚡ 4. 不計算梯度

# 推論時必須使用
with torch.no_grad():
    image_features = model.encode_image(image_tensor)

# 節省記憶體和計算時間

⚡ 5. 預先計算並快取特徵

# 一次性建立索引
features = extract_all_features(image_folder)
np.save('features_cache.npy', features)

# 之後直接載入
features = np.load('features_cache.npy')

# 避免重複提取特徵

⚡ 6. 使用 DataLoader（大規模資料）

from torch.utils.data import Dataset, DataLoader

class ImageDataset(Dataset):
    def __init__(self, image_paths, transform):
        self.image_paths = image_paths
        self.transform = transform
    
    def __len__(self):
        return len(self.image_paths)
    
    def __getitem__(self, idx):
        img = Image.open(self.image_paths[idx]).convert('RGB')
        return self.transform(img), self.image_paths[idx]

# 使用 DataLoader
dataset = ImageDataset(image_paths, preprocess)
dataloader = DataLoader(dataset, batch_size=32, num_workers=4, pin_memory=True)

for images, paths in dataloader:
    images = images.to(device)
    with torch.no_grad():
        features = model.encode_image(images)

常見問題處理

🐛 錯誤 1: 忘記加 batch 維度

# ❌ 錯誤
image_tensor = preprocess(image)  # shape: (3, H, W)
features = model.encode_image(image_tensor)  # 報錯！

# ✅ 正確
image_tensor = preprocess(image).unsqueeze(0)  # shape: (1, 3, H, W)
features = model.encode_image(image_tensor)

🐛 錯誤 2: 忘記轉 RGB

# ❌ 錯誤（PNG 可能是 RGBA，灰階圖是 L）
image = Image.open('image.png')
features = model.encode_image(preprocess(image).unsqueeze(0))  # 可能報錯

# ✅ 正確
image = Image.open('image.png').convert('RGB')  # 強制轉成 RGB
features = model.encode_image(preprocess(image).unsqueeze(0))

🐛 錯誤 3: 忘記 L2 歸一化

# ❌ 錯誤（相似度計算不準確）
features = model.encode_image(image_tensor)
similarity = features @ features.T

# ✅ 正確
features = model.encode_image(image_tensor)
features = features / features.norm(dim=-1, keepdim=True)  # L2 歸一化
similarity = features @ features.T  # 正確的餘弦相似度

🐛 錯誤 4: 裝置不匹配

# ❌ 錯誤
model.to('cuda')
image_tensor = preprocess(image).unsqueeze(0)  # 在 CPU
features = model.encode_image(image_tensor)  # 報錯：tensor 不在同一裝置

# ✅ 正確
model.to(device)
image_tensor = preprocess(image).unsqueeze(0).to(device)  # 移到同一裝置
features = model.encode_image(image_tensor)

🐛 錯誤 5: 記憶體不足（OOM）

# 解決方法 1: 減少 batch size
batch_size = 16  # 原本 32，改成 16

# 解決方法 2: 清理 GPU 記憶體
torch.cuda.empty_cache()

# 解決方法 3: 使用梯度累積
with torch.no_grad():  # 不計算梯度
    features = model.encode_image(image_tensor)

# 解決方法 4: 使用 CPU
device = 'cpu'  # 改用 CPU（較慢但不會 OOM）

🐛 問題 6: 圖片讀取失敗

# 健壯的圖片讀取
def load_image_safely(image_path):
    try:
        img = Image.open(image_path).convert('RGB')
        return img
    except Exception as e:
        print(f"⚠ 無法讀取 {image_path}: {e}")
        return None

# 使用
img = load_image_safely('image.jpg')
if img is not None:
    features = extract_features(img)

🔧 除錯技巧

# 1. 檢查 tensor shape
print(f"Image tensor shape: {image_tensor.shape}")  # 應該是 (1, 3, H, W)
print(f"Features shape: {features.shape}")  # 應該是 (1, 512)

# 2. 檢查特徵向量是否歸一化
print(f"Feature norm: {torch.norm(features)}")  # 應該接近 1.0

# 3. 檢查裝置
print(f"Model device: {next(model.parameters()).device}")
print(f"Tensor device: {image_tensor.device}")

# 4. 視覺化相似度矩陣
import matplotlib.pyplot as plt
similarity_matrix = features @ features.T
plt.imshow(similarity_matrix.cpu().numpy())
plt.colorbar()
plt.title('Similarity Matrix')
plt.show()

📝 完整測試腳本

將以下程式碼儲存為 test_mobileclip.py：

#!/usr/bin/env python3
"""
MobileCLIP 完整測試腳本
"""

import torch
import mobileclip
from PIL import Image
import numpy as np

def test_single_image():
    """測試單張圖片特徵提取"""
    print("\n" + "="*60)
    print("測試 1: 單張圖片特徵提取")
    print("="*60)
    
    # 載入模型
    model, _, preprocess = mobileclip.create_model_and_transforms(
        'mobileclip_s1',
        pretrained='checkpoints/mobileclip_s1.pt'
    )
    model.eval()
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    model.to(device)
    
    print(f"✓ 模型已載入到 {device}")
    
    # 測試圖片
    image_path = "test.jpg"  # 替換成你的圖片
    image = Image.open(image_path).convert('RGB')
    
    print(f"✓ 圖片大小: {image.size}")
    
    # 提取特徵
    image_tensor = preprocess(image).unsqueeze(0).to(device)
    
    with torch.no_grad():
        features = model.encode_image(image_tensor)
        features = features / features.norm(dim=-1, keepdim=True)
    
    print(f"✓ 特徵 shape: {features.shape}")
    print(f"✓ 特徵 norm: {torch.norm(features).item():.4f}")
    print(f"✓ 特徵前 5 維: {features[0, :5]}")


def test_similarity():
    """測試相似度計算"""
    print("\n" + "="*60)
    print("測試 2: 相似度計算")
    print("="*60)
    
    # 載入模型
    model, _, preprocess = mobileclip.create_model_and_transforms(
        'mobileclip_s1',
        pretrained='checkpoints/mobileclip_s1.pt'
    )
    model.eval()
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    model.to(device)
    
    # 兩張測試圖片
    image1 = Image.open("test1.jpg").convert('RGB')
    image2 = Image.open("test2.jpg").convert('RGB')
    
    # 提取特徵
    tensor1 = preprocess(image1).unsqueeze(0).to(device)
    tensor2 = preprocess(image2).unsqueeze(0).to(device)
    
    with torch.no_grad():
        feat1 = model.encode_image(tensor1)
        feat2 = model.encode_image(tensor2)
        
        feat1 = feat1 / feat1.norm(dim=-1, keepdim=True)
        feat2 = feat2 / feat2.norm(dim=-1, keepdim=True)
    
    # 計算相似度
    similarity = (feat1 @ feat2.T).item()
    
    print(f"✓ 圖片 1 特徵: {feat1.shape}")
    print(f"✓ 圖片 2 特徵: {feat2.shape}")
    print(f"✓ 餘弦相似度: {similarity:.4f} ({similarity*100:.2f}%)")


def test_batch_processing():
    """測試批次處理"""
    print("\n" + "="*60)
    print("測試 3: 批次處理")
    print("="*60)
    
    # 載入模型
    model, _, preprocess = mobileclip.create_model_and_transforms(
        'mobileclip_s1',
        pretrained='checkpoints/mobileclip_s1.pt'
    )
    model.eval()
    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    model.to(device)
    
    # 批次圖片
    image_paths = ["test1.jpg", "test2.jpg", "test3.jpg"]
    images = [Image.open(p).convert('RGB') for p in image_paths]
    
    # 批次處理
    batch_tensor = torch.stack([preprocess(img) for img in images]).to(device)
    
    print(f"✓ Batch shape: {batch_tensor.shape}")
    
    with torch.no_grad():
        batch_features = model.encode_image(batch_tensor)
        batch_features = batch_features / batch_features.norm(dim=-1, keepdim=True)
    
    print(f"✓ Batch features shape: {batch_features.shape}")
    print(f"✓ 每張圖片特徵 norm: {torch.norm(batch_features, dim=-1)}")


def main():
    print("\n" + "🚀"*30)
    print("MobileCLIP 完整測試")
    print("🚀"*30)
    
    try:
        test_single_image()
    except Exception as e:
        print(f"❌ 測試 1 失敗: {e}")
    
    try:
        test_similarity()
    except Exception as e:
        print(f"❌ 測試 2 失敗: {e}")
    
    try:
        test_batch_processing()
    except Exception as e:
        print(f"❌ 測試 3 失敗: {e}")
    
    print("\n" + "✅"*30)
    print("測試完成！")
    print("✅"*30 + "\n")


if __name__ == '__main__':
    main()

執行測試：

python test_mobileclip.py

🎓 學習路徑建議

階段 1: 基礎（1-2 天）

✅ 安裝環境和下載模型
✅ 跑通範例 1（單張圖片）
✅ 理解 preprocess 和 unsqueeze 的作用
✅ 測試不同模型（S0, S1, S2）

階段 2: 實戰（3-5 天）

✅ 實作範例 2（批次處理）
✅ 實作範例 3（以圖找圖）
✅ 建立自己的圖片索引
✅ 測試搜尋功能

階段 3: 優化（2-3 天）

✅ 實作 CLI 工具
✅ 效能優化（GPU、批次）
✅ 視覺化搜尋結果
✅ 錯誤處理和健壯性

階段 4: Android 準備（3-5 天）

✅ 模型轉換（TorchScript）
✅ 量化測試（INT8）
✅ CPU 效能測試
✅ 撰寫移植文件

📚 參考資源

官方 GitHub: https://github.com/apple/ml-mobileclip
論文: MobileCLIP: Fast Image-Text Models
HuggingFace 模型: MobileCLIP Collection
PyTorch 官方文檔: https://pytorch.org/docs/stable/index.html

💡 快速參考

核心程式碼片段

# 載入模型
model, _, preprocess = mobileclip.create_model_and_transforms(
    'mobileclip_s1', pretrained='checkpoints/mobileclip_s1.pt'
)
model.eval()
model.to('cuda' if torch.cuda.is_available() else 'cpu')

# 單張圖片
image = Image.open('cat.jpg').convert('RGB')
tensor = preprocess(image).unsqueeze(0).to(device)
with torch.no_grad():
    features = model.encode_image(tensor)
    features = features / features.norm(dim=-1, keepdim=True)

# 相似度計算
similarity = (features1 @ features2.T).item()

祝您使用順利！有問題隨時查閱本指南 🚀

Keyboard shortcuts

Jason's Notes