Skip to content

2024

Collection Management Systems (TDWG 2020 Symposium)

TDWG 2020 Challenges of alignment of collection management sys. across globe & diff. domains - SYM04 - YouTube

隨筆紀錄一下TDWG 2020關於自然史典藏系統的討論(線上),已經是4年前的討論了,有的似乎也沒有在維護了,仍有參考價值。

RECODE

Vince Smith

英國自然史提出的解決方案,完整的data model,滿強調Linked Data,很有啟發性。

多人協作? recode::community curation

NHM data workflow recode::collection object

NHM與世界的連結 recode::collection object

RECODE Data Model的關鍵: CollectionObject recode::collection object

Kotka

芬蘭的自然史典藏系統,強調Simple and Flexible,不用關聯式資料庫,很像新創邊移動邊開槍的模式。

Mikko Heikkinen

Collection Management System | Suomen Lajitietokeskus

  • focus on 80/20-rule, flexibility and simplicity
  • not focus on comprehensive data model (denormalized data)

沒有好用的系統、又有開發的人,所以就可以自幹 kotka::Background

跨組織要使用,要讓系統簡單而保持彈性。因為非技術的問題就夠麻煩的了。 kotka::Simple and Flexible

Symbiota

去中心化系統很厲害,但感覺要花很多精力處理系統之間的同步,不知道是不是美國這種人多地大物博的才運作的起來?

Edward Gilbert

  • decentralized data network (isolated decentralized network of mini-aggregators)
  • live-managed Vs. snapshot

Specify

新版(Specify 7)轉移到網頁,很大的破壞式更新。

  • Community-Driven decision making

DINA

不知道是不是沒繼續了,感覺沒有很活躍?

DIgital Information system for NAtural history data)

DINA: Open Source and Open Services - A Modern Approach for Sustainable Natural History Collection Management Systems

  • web-based modules, throuth API, components can be modified or replaced by other components

Meeting In-Between: Moving beyond the buzz, bottlenecks, and bubble to collaboratively develop digitization tooling

很讚的總結,但我暫時無法吸收了。

Matt Yoder

  • Digital Specimens in TaxonWorks
Name start mantance status tech stack stats
RECODE 2022 NHM good model concept
Kotka 2012 Finnish Museum of Natural History Luomus PHP, Zend 2020: 2.5 million specimens,12 institutions
Symbiota 2008 Arizona State University PHP 2020:50-60 public portals
Specify Specify Collections Consortium
DINA 2014 RBGE? not available (2024) web

Design System For Public Transportion

Smashing Newsletter 看到關於大眾運輸的Design System,有趣。

Research of Digitizing Herbarium

Case1: Oklahoma State Universaty Herbarium

Search portal (Symbiota)

Select dataset: okla-symbiota-filter1.png

Filter by taxon: okla-symbiota-filter2.png

Other Filters: okla-symbiota-filter3.png

Filter results: okla-symbiota-filter-result.png

Species page: okla-symbiota-species.png

Specimen page: okla-symbiota-specimen.png

Digitizing Process

Imaging: okla-imaging.png

Transcription from label and image information by volunteering and student worker

upload images to Notes from Nature — Zooniverse for volunteers to help transcribe.

ref: Digitizing Herbariums for Future Historians - YouTube

Case2: University of Alaska Herbarium (ALA)

ArctosDB

HUNG-YI LEE (李宏毅)

start: 2023年4月 2024-05-30

https://www.youtube.com/watch?v=fegAeph9UaA&list=PLJV_el3uVTsPy9oCRY30oBPNLCo89yu49

ML Lecture 0: Intro

  • AI (目標) → Machine Learning (手段) → Deep Learning (ML的其中一種方式)
  • AI ⇒ 人類賦予的本能
  • Machine Learning ≈ Looking for a Function from Data
  • Framework
  • Step1: define a set of function ⇒ Model
  • Step2: goodness of function
  • Step3: pick the best function
  • Learning Map
  • Supervised Learning
    1. Regression [task]: the output of the target function f is "scalar" (數值)
    2. Classification [task]
    3. Linear Model [method]
    4. Non-linear Model
      • Deep Learning [method]
      • SVM, decision tree, K-NN... [method]
    5. Structured Learning
      • Beyond Classification
  • Semi-supervised Learning
    • Labelled + Unlabeled data
  • Transfer Learning
    • Labelled data + Data not related to the task considered (can be either labeled or unlabeled)
    • ex: 不相干的圖片,有什麼方式可以幫助學習
  • Unsupervisied Learning
    • 無師自通
  • Reinforcement Learning
    • Supervised .vs.Reinforcement Learning: Learning from teacher v.s. Learning from critics (比較像人類的學習方式)

ML Lecture 1: Regression

Andrew Ng

Deep Learning Specialization [5 courses] (DeepLearning.AI) | Coursera

Python 打包成Windows的exe檔遇到被當成病毒的問題

StackOverflow都是說自己build PyInstaller的bootloader可以解決,因此參考: Building the Bootloader — PyInstaller 6.3.0 documentation

Build PyInstaller bootloader

1. 準備build環境 (GCC...)

試了自己build MinGW,遇到zlib static找不到的問題。

發現這包(WinLibs - GCC+MinGW-w64 compiler for Windows)可以直接使用,方便很多。

(另外也許可以用MSYS2,目前還沒試過)

2. Build from PyInstaller source code

git clone https://github.com/pyinstaller/pyinstaller

cd pyinstaller\bootloader 進入bootloader目錄

3. Start Build

python3 ./waf distclean all

4. Install

回到pyinstaller目錄,然後執行

python3 setup.py install

失敗的話換

pip install .

然後就可以執行pyinstaller指令了

測試結果

VirusTotal

VirusTotal服務測試

原本預設(pip install)的Pyinstaller:

pyinstaller-buildin-bootloader

用另一套Nuitka (先轉成C++,功能強大),但是也會被誤判。 python-nuitka

Custom build PyInstaller的bootloader看起來就好很多了,實際上還需要更多驗證。 pyinstaller-custom-bootloader

白名單處理

有人佛心整理了各家防毒軟體的誤判回報區

hankhank10/false-positive-malware-reporting: Trying to release your software sucks, mostly because of antivirus false positives. I don't have an answer, but I do have a list of links to help get your code whitelisted.

Blog說明 How to stop your Python programs being seen as malware | by Mark Hank | Medium

PWA(Progress Web App)原來真的是App

目前使用的Floorp瀏覽器更新,提到這次更新支援Windows的PWA(Progress Web App)跟SSB(Site-specific browser)。好奇去看了一下Floorp blog對PWA的介紹Floorp のプログレッシブウェブアプリの機能と仕様 | ABlog,內文用MDN這個開發網頁常需要查找資料的網站來當例子,原來PWA在瀏覽器上安裝後,在系統的應用程式圖示裡就真的會有一個新的Icon跑出來,執行的話就像是特別開一個網頁的應用程式。

引起我的興趣,我之前一直以為PWA跟RWD(Responsive Web Design)差不多,只是過幾年固定會出現的技術Buzz Word,原來他真的是接近原生App那樣的執行,操作界面在各種多到爆炸的前端工具/框架的網頁世界裡,開發起來真的是方便很多,還有service workers可以支援離線,雖然我覺得比起Native App還有很多先天無法克服的,但是已經是從網頁、網頁的擴充功能(extension)那邊跨出很大的一步了,想到一些之前開發桌機App考慮用到類似Electron的工具,現在知道了PWA,還有另一種不同的選擇了。

在Linux用Chromium安裝PWA

目前Floorp只支援Windows,試試看Linux(Debian 12 Bookworm)的Chromium,也是可以安裝。

linux pwa

linux pwa

應用程式區有Icon了

linux pwa

執行起來長這樣

linux pwa

可以uninstall

linux pwa

PWA的真實案例

除了上面提到的MDN,還有Spotify, Uber, Pinterest... 也都有做PWA。