Unity/Barracuda & Sentis

[Unity Sentis] ObjectDetection (Yolo v7 tiny) 모델 분석하기 - Onnx excute

pnltoen 2024. 4. 15.

Sentis yolo v7 tiny

Unity Technologies

서론

이전 포스팅인 Object Detection 모델 (Yolo v7 tiny) 실행하기에서 Yolo v7 Tiny 모델을 실행하여, 미리 준비한 동영상을 실행할 수 있는 것을 확인하였습니다.

[Unity Sentis] Object Detection 모델 (Yolo v7 tiny) 실행하기

Yolov7 Unity Technologies 서론 이미 Hugging Face에 Sentis와 Yolov7을 구현한 프로젝트가 있으니 해당 프로젝트 부터 뜯어보겠습니다. unity/sentis-yolotinyv7 at main huggingface.co 전반적인 사용과 관련해서, readme의

pnltoen.tistory.com

하지만, 이러한 영상 Inference 방식의 경우, 기존 Python을 통한 Inference로도 간단하게 진행할 수 있습니다. 즉, 영상만을 검출할 경우, 오히려 유니티에서 Inference하는 것이 더 어렵고 낯설게 느껴질 수 있습니다. 따라서 실제 유니티에서 본인의 프로젝트에 사용할 수 있는 방법에 대해 알아보고자 합니다.

포스팅은, Yolo v7 모델의 Inference (Execute)에 대해 알아보고, 이전 Style Transfer에서 진행했던 것과 같이, 카메라 렌더를 target texture로 받아, UI에 표현할 수 있도록 진행해보도록 하겠습니다.

[Unity Sentis 튜토리얼] Unity Sentis 소개 및 설치 + AdaIN 샘플 소개

Barrcuda → Sentis Unity Technologies 서론 기존에 2가지 포스팅을 진행하면서, Barracuda 패키지의 설치 방법을 알아보았습니다. [Unity Barracuda] 유니티 바라쿠다 튜토리얼 (StyleTransfer-AdaIN) Barracuda Unity Technol

pnltoen.tistory.com

본문 (Yolo v7 tiny 모델의 이해)

해당 내용은 Yolo v7 tiny 모델의 Run Yolo.cs 스크립트를 분석하는 방식으로 진행되었습니다.

전반적인 코드가 아닌, 필요한 부분만 다음과 같이 정리하였습니다.

Video Player

    const string videoName = "warehouse.mp4";

    void SetupInput()
    {
        video = gameObject.AddComponent<VideoPlayer>();
        video.renderMode = VideoRenderMode.APIOnly;
        video.source = VideoSource.Url;
        video.url = Application.streamingAssetsPath + "/" + videoName;
        video.isLooping = true;
        video.Play();
    }

위의 코드에서 video에 VideoPlayer 컴포넌트를 추가합니다. 이 후 재생할 Video를 streamginAssetsPath에서 앞서 설정한 videoName으로 명시하고, 반복재생을 진행합니다.

해당 SetupInput 함수는 void Start()에 포함되어 있습니다.

RenderTexture

이 후 targetRT 부분을 보면 좋습니다. Yolo v7 모델로 동영상을 Inference 할 경우, 각 프레임 별로 인식 및 검출을 진행합니다. 따라서, 각 프레임의 영상을 이미지의 형태로 저장하고, 저장된 이미지를 Sentis에서 execute 해줍니다.

이 때 유니티에서는 여러 형태로 이미지를 저장할 수 있습니다. 그 중 단순한 이미지의 경우 Texture 그리고 RenderTexture가 일반적입니다 (실제 Sentis는 Teture와 RenderTexutre만 지원합니다 - Convert a texture to tensor)

Convert a texture to a tensor | Sentis | 1.4.0-pre.3

Convert a texture to a tensor Use TextureConverter.ToTensor to convert a Texture2D or a RenderTexture to a tensor. using UnityEngine; using Unity.Sentis; public class ConvertTextureToTensor : MonoBehaviour { Texture2D inputTexture; void Start() { TensorFlo

docs.unity3d.com

Texture와 RenderTexture의 주된 차이점은 Texture가 정적인 이미지 데이터를 저장하는 데 사용되는 반면, RenderTexture는 동적인 렌더링 결과를 저장하는 데 사용됩니다. 재생되는 영상의 각 프레임을 저장할 때, 수정 및 덮어쓰기가 가능한 RenderTexture를 사용하는 것이 효율적입니다. 예를 들어, 10초 길이의 60프레임 영상을 저장한다면, Static Texture를 사용할 경우 600개의 Texture를 저장해야 하지만, Dynamic Texture의 경우 1개의 RenderTexture를 재활용할 수 있습니다.

        targetRT = new RenderTexture(imageWidth, imageHeight, 0); //RawImage와 동일한 크기의 RenderTexture 생성
        
        Graphics.Blit(video.texture, targetRT, new Vector2(1f / aspect, 1), new Vector2(0, 0)); //각 프레임의 video texture를 targetRT로 복사
        displayImage.texture = targetRT; //RawImage의 텍스처를 target texture로 설정

위와 같이 코드가 작동함으로써, 실제 눈에 보이는 렌더 결과는 Canvas - RawImage이고 이러한 RawImage의 텍스처는 각 프레임 별로 비디오에서 복사해서 사용합니다.

모델 실행 (Sentis Execute)

    public void ExecuteML()
    {
        using var input = TextureConverter.ToTensor(targetRT, imageWidth, imageHeight, 3); //RenderTexture를 Tensor로 변환
        engine.Execute(input);

        //Read output tensors
        var output = engine.PeekOutput() as TensorFloat;
        output.MakeReadable();

        float displayWidth = displayImage.rectTransform.rect.width;
        float displayHeight = displayImage.rectTransform.rect.height;

        float scaleX = displayWidth / imageWidth;
        float scaleY = displayHeight / imageHeight;

        //Draw the bounding boxes
        for (int n = 0; n < output.shape[0]; n++)
        {
            var box = new BoundingBox
            {
                centerX = ((output[n, 1] + output[n, 3])*scaleX - displayWidth) / 2,
                centerY = ((output[n, 2] + output[n, 4])*scaleY - displayHeight) / 2,
                width = (output[n, 3] - output[n, 1])*scaleX,
                height = (output[n, 4] - output[n, 2])*scaleY,
                label = labels[(int)output[n, 5]],
                confidence = Mathf.FloorToInt(output[n, 6] * 100 + 0.5f)
            };
            DrawBox(box, n);
        }
    }

빠른 이해를 위해 Netron을 사용해서, Yolo V7 Tiny 모델을 열어보도록 하겠습니다. 우리는 열린 Onnx에서 Input 그리고 Output만 기억해주면 됩니다.

Yolo v7 Tiny 모델의 경우, 이미지를 1x3x640x640 텐서로 변환하여 Input에 넣어줍니다.

이 후 모델을 실행해주면 해당 모델에서는 검출된 Class Label, Bounding Box, Confidence Score를 출력합니다.

예로 간단하게 확인해보도록 하겠습니다.

위에 보이는 것과 같이, Scene에 검출된 오브젝트 개수 n개에 맞춰서 (n,7)개의 output을 갖는 것을 알 수 있습니다.

bench와 person 즉 2개를 검출 했을 때는 (2,7)의 값을 갖게 됩니다.

output.printdatapart(7)을 통해 값을 확인해보도록 하겠습니다.

[0, x, y, w, h, Class Label, Confidence Score] 인 것을 확인할 수 있습니다. Output을 직접 출력하는 방법도 있지만 포럼 및 논문 또는 코드에서 확인하실 수도 있습니다 (예시)

How to convert bounding box (x1, y1, x2, y2) to YOLO Style (X, Y, W, H)

I'm training a YOLO model, I have the bounding boxes in this format:- x1, y1, x2, y2 => ex (100, 100, 200, 200) I need to convert it to YOLO format to be something like:- X, Y, W, H => 0.436...

stackoverflow.com

즉 Class Label에 조건문을 걸어서 0일 경우 (Label Asset에서 1 = Person이 아닌 경우 검출이 되지 않도록 설정할 수 있습니다)

        //Draw the bounding boxes
        for (int n = 0; n < output.shape[0]; n++)
        {
            if (output[n, 5] != 0f)
            {
                return;
            }

예로 위의 Draw the bounding boxes 부분에 If문 하나만 추가해도 다음과 같이 사람만 검출하는 것이 가능합니다.

결론

Sentis를 활용해서 Onnx 모델의 실행과 관련하여 분석해보았습니다. 실제 모델의 작동은 execute이 대부분을 진행합니다. GpuType 및 Dispose와 관련해서는 이전 포스팅에서 다뤘기 때문에 별도로 정리하지 않았습니다.

다음 포스팅에서는 Bounding Box를 그리는 방법과 관련하여, 내용을 확인해보도록 하겠습니다. 이 후 현재 동영상 플레이어로 설정되어 있는 부분을 Target Texture로 변경해보도록 하겠습니다.

저작자표시 비영리 변경금지 (새창열림)