index

2024.08.24
in Tech

PostgreSQL刪除一堆欄位資料，資料庫佔用硬碟空間反而變大

一個原本佔用磁碟空間25.921GB的資料庫，刪除某個欄位內容資料(SET foo='')，大概1000萬筆資料，發現資料庫佔用空間反而變大: 33.478GB(花了1703秒)。

PostgreSQL有個VACUUM指令，試用看看，果然執行完磁碟空間變成24.8GB(花了1106秒)。

VACUUM FULL清的最乾淨，但是花時間，而且會lock table，正式站要小心使用。

VACUUM FULL

2024.08.16
in Tech

Python 製作縮圖 (Pillow/PIL)

Pillow (PIL Fork) 10.4.0 documentation

Usage:

Read image and make thumbnails

thumb = Image.open(img)
thumb.thumbnail(i[1] , Image.LANCZOS)
# thumb = thumb.convert('RGB')
thumb.save(target_path, "JPEG")
thumb.close()

演算法可以參考下圖。 Compare Filters

screenshot via: Filters

Pillow-SIMD

Uploadcare提供了SIMD加速的Pillow: uploadcare/pillow-simd: The friendly PIL fork

Benchmark測試 Pillow Performance

Linux

CPU: Intel Celeron N4505 2.0GHz

安裝libjpeg-dev, zlib1g-dev後，安裝pillow-simd才會成功。但執行python出現illegal hardware instruction的錯誤訊息。

Linux

# install requirements
sudo apt install libjpeg-dev zlib1g-dev

# install pillow-simd
CC="cc -mavx2" pip install -U --force-reinstall pillow-simd

MacOS

MacBook Pro: 3.1GHz Intel Core i7

用brew安裝jpeg後，安裝pillow-simd成功，執行也沒問題。

MacOS

# install requirements
brew install jpeg
# install pillow-simd
pip install pillow-simd

總結

速度有比較快，希望有空來做benchmark。

2024.06.25
in Tech

Using SQLite3 Full Text Search (fts5) in Biology Taxonomy

Full Text Search In SQLite

2024.06.20
in Research

Collection Management Systems (TDWG 2020 Symposium)

TDWG 2020 Challenges of alignment of collection management sys. across globe & diff. domains - SYM04 - YouTube

隨筆紀錄一下TDWG 2020關於自然史典藏系統的討論(線上)，已經是4年前的討論了，有的似乎也沒有在維護了，仍有參考價值。

RECODE

Vince Smith

英國自然史提出的解決方案，完整的data model，滿強調Linked Data，很有啟發性。

多人協作? recode::community curation

NHM data workflow recode::collection object

NHM與世界的連結 recode::collection object

RECODE Data Model的關鍵: CollectionObject recode::collection object

Kotka

芬蘭的自然史典藏系統，強調Simple and Flexible，不用關聯式資料庫，很像新創邊移動邊開槍的模式。

Mikko Heikkinen

Collection Management System | Suomen Lajitietokeskus

focus on 80/20-rule, flexibility and simplicity
not focus on comprehensive data model (denormalized data)

沒有好用的系統、又有開發的人，所以就可以自幹 kotka::Background

跨組織要使用，要讓系統簡單而保持彈性。因為非技術的問題就夠麻煩的了。 kotka::Simple and Flexible

Symbiota

去中心化系統很厲害，但感覺要花很多精力處理系統之間的同步，不知道是不是美國這種人多地大物博的才運作的起來?

Edward Gilbert

decentralized data network (isolated decentralized network of mini-aggregators)
live-managed Vs. snapshot

Specify

新版(Specify 7)轉移到網頁，很大的破壞式更新。

Community-Driven decision making

DINA

不知道是不是沒繼續了，感覺沒有很活躍?

DIgital Information system for NAtural history data)

DINA: Open Source and Open Services - A Modern Approach for Sustainable Natural History Collection Management Systems

web-based modules, throuth API, components can be modified or replaced by other components

Meeting In-Between: Moving beyond the buzz, bottlenecks, and bubble to collaboratively develop digitization tooling

很讚的總結，但我暫時無法吸收了。

Matt Yoder

Digital Specimens in TaxonWorks

Name	start	mantance	status	tech stack
RECODE	2022	NHM	good model concept
Kotka	2012	Finnish Museum of Natural History Luomus	PHP, Zend	2020: 2.5 million specimens,12 institutions
Symbiota	2008	Arizona State University	PHP	2020:50-60 public portals
Specify		Specify Collections Consortium
DINA	2014	RBGE?	not available (2024)	web

2024.06.16
in Exhibition

戰鬥之城-張立人, 國北師

戰鬥之城．終 | 北師美術館

很多人，要排隊，幸好有預約。

小模型很酷，藝術家很清楚要表達什麼，但又覺的太過簡化，沒有很新概念，動畫令人印象深刻。

2024.06.14
in Learn

Design System For Public Transportion

Smashing Newsletter 看到關於大眾運輸的Design System，有趣。

Gestalte die Zukunft der BVG (design system for public transportation in Berlin)
Design System (Notion hub)
Documentation - BDS - Documentation (Figma)
All about SBB digital applications. (Swiss railway company)
Riktlinjer – SJs designsystem för digitala produkter & tjänster (Swedish railway design system)
Katarina Blind — HSL (Helsinki’s public transportation service)
Home – Transport for West Midlands Design System (Transport for West Midlands)
Ruter Design System (Journey Planner)

2024.06.08
in Research

Research of Digitizing Herbarium

Case1: Oklahoma State Universaty Herbarium

Search portal (Symbiota)

Select dataset:

Filter by taxon:

Other Filters:

Filter results:

Species page:

Specimen page:

Digitizing Process

Imaging:

Transcription from label and image information by volunteering and student worker

upload images to Notes from Nature — Zooniverse for volunteers to help transcribe.

ref: Digitizing Herbariums for Future Historians - YouTube

Case2: University of Alaska Herbarium (ALA)

University of Alaska Herbarium (ALA): Documenting Alaska's flora at the crossroads of Beringia - YouTube

ArctosDB

2024.05.30
in Learn

HUNG-YI LEE (李宏毅)

start: 2023年4月 2024-05-30

https://www.youtube.com/watch?v=fegAeph9UaA&list=PLJV_el3uVTsPy9oCRY30oBPNLCo89yu49

ML Lecture 0: Intro

AI (目標) → Machine Learning (手段) → Deep Learning (ML的其中一種方式)
AI ⇒ 人類賦予的本能
Machine Learning ≈ Looking for a Function from Data
Framework
Step1: define a set of function ⇒ Model
Step2: goodness of function
Step3: pick the best function
Learning Map
Supervised Learning
1. Regression [task]: the output of the target function f is "scalar" (數值)
2. Classification [task]
3. Linear Model [method]
4. Non-linear Model
  - Deep Learning [method]
  - SVM, decision tree, K-NN... [method]
5. Structured Learning
  - Beyond Classification
Semi-supervised Learning
- Labelled + Unlabeled data
Transfer Learning
- Labelled data + Data not related to the task considered (can be either labeled or unlabeled)
- ex: 不相干的圖片，有什麼方式可以幫助學習
Unsupervisied Learning
- 無師自通
Reinforcement Learning
- Supervised .vs.Reinforcement Learning: Learning from teacher v.s. Learning from critics (比較像人類的學習方式)