Loading and QC'ing LAS files in R

Thu, Apr 1, 2021 4-minute read

LAS files

The LAS format is widely used in the Oil and Gas industry and is short from Log ASCII Standard, in this notebook we show a short workflow to load and qc multiple files. I´m using some files donated by Geolink to the geoscience community.


The libraries

I will use to libraries:

  • the tidyverse to perform data wrangling and plotting,

  • petroreadr from Ravenroadresources to load the LAS files.

  • skimr, to get an exccelent summary of the data.

This script can be found on my github repo


library(tidyverse)
library(petroreadr)
library(skimr)

Loading the files

The library indicates in the documentation a few options, I normally choose:

  • load directly to a dataframe, which is my go to mode.\
  • set the verbose option to TRUE, this is particularly useful when working with large amount of wells/logs, to have the peace of mind that the machine is working and has not hanged.

pathname <- "../../data/GEOLINK_Lithology and wells NORTH SEA/"

lasfiles <- list.files(pathname)
lasfiles <- lasfiles[grepl(".las", lasfiles)]
 
df<-read_las(file.path(pathname, lasfiles), verbose = TRUE)$data
##    + 34_10-12.las imported as <las> object
##    + 34_3-2 S.las imported as <las> object
##    + 35_3-2.las imported as <las> object
##    + 35_9-2.las imported as <las> object
##    + 35_9-9.las imported as <las> object

the loaded data

We have obtained a dataframe with all logs, and a column with the well name that makes our life easier for plotting, filtering, etc.


skim(df)

Table: Data summary

Name df
Number of rows 73611
Number of columns 22
_______________________
Column type frequency:
character 1
numeric 21
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
WELL 0 1 6 8 0 5 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
DEPT 0 1.00 2410.31 969.88 215.22 1711.01 2425.34 3082.65 4407.62 ▂▆▇▆▃
Lithology_geolink 21036 0.71 7.07 3.81 1.00 5.00 6.00 7.00 18.00 ▂▇▁▃▁
CALI 3029 0.96 11.92 2.57 2.02 8.84 12.50 13.31 26.73 ▁▅▇▁▁
DRHO 3418 0.95 0.02 0.06 -2.57 0.00 0.01 0.04 1.27 ▁▁▁▇▁
NPHI 9314 0.87 0.30 0.11 -0.06 0.22 0.30 0.38 0.87 ▁▇▇▁▁
RHOB 3487 0.95 2.38 0.21 -1.71 2.26 2.43 2.53 3.26 ▁▁▁▃▇
GR 16 1.00 70.61 27.80 -197.12 48.90 70.44 90.75 866.92 ▁▇▁▁▁
DTC 9599 0.87 104.65 25.45 -16.58 85.75 98.96 121.83 265.00 ▁▇▇▁▁
RDEP 394 0.99 11.36 147.15 0.34 1.34 1.90 3.51 29270.71 ▇▁▁▁▁
SP 27351 0.63 83.83 46.84 -279.13 51.63 88.53 112.79 178.31 ▁▁▁▇▇
RSHA 37489 0.49 7.85 43.40 0.13 1.00 2.01 3.87 1770.00 ▇▁▁▁▁
RMED 391 0.99 9.68 76.45 0.13 1.37 2.14 4.06 9700.00 ▇▁▁▁▁
BS 34035 0.54 10.94 1.93 8.50 8.50 12.25 12.25 17.50 ▅▁▇▁▁
ROP 46350 0.37 93.21 92.07 5.20 47.13 79.69 125.87 1290.77 ▇▁▁▁▁
THOR 70499 0.04 11.17 4.59 1.13 7.42 11.27 14.53 33.71 ▅▇▅▁▁
PEF 47126 0.36 5.51 4.83 1.35 3.89 4.64 6.86 667.36 ▇▁▁▁▁
URAN 70499 0.04 2.26 1.44 -0.54 1.57 2.26 2.80 27.93 ▇▁▁▁▁
DTS 53766 0.27 172.42 34.20 87.69 141.12 173.26 201.25 270.41 ▂▇▇▆▁
DCAL 61296 0.17 0.77 1.39 -4.81 0.02 0.30 0.83 7.88 ▁▇▇▁▁
SGR 67154 0.09 101.56 32.54 33.15 81.72 106.35 116.22 892.06 ▇▁▁▁▁
RMIC 71425 0.03 14.44 38.00 0.72 4.74 7.36 12.76 556.37 ▇▁▁▁▁

Quick view of a log

to check whether a well log is present in all wells, and compare some of their stats, the box plot is a powerful tool.


box_1 <-
  df %>%
  ggplot(aes(WELL, GR)) +
  geom_boxplot() +
  theme_bw()+
  theme(axis.text.x=element_text(angle=90,hjust=1)
        )


box_1
## Warning: Removed 16 rows containing non-finite values (stat_boxplot).


A quick view shows some negative values in well 34/10-12 and some potential outliers in well 35/9-2, we can repeat the plot cropping on a range to have a better view.



We can repeat the same for all logs at once to get a quick idea , however there are at least two points to consider when doing this:

  • the size of the plot will need to be increased to properly show all logs,

  • if you want to visualize a zoom of a particular log, you will need to apply a filter to the data before appliying the pivot.

  • similarly, when dealing with logs that behave logarithmicaly, I dont´t know how to set a logarithmic scale to a particular log using faceting, therefore I apply a log transform before the pivoting step.


df %>% 
  mutate(log10_RDEP= log10(RDEP)) %>% 
  select(-RDEP) %>% 
  filter( GR >   0 ,
         GR  < 200 )  %>% 
  pivot_longer(!c("WELL"), names_to = "logs", values_to = "value") %>% 
  drop_na(value) %>%
  ggplot()+
  geom_boxplot(aes(WELL, value))+
  facet_wrap(~logs, scales = "free", ncol = 1) +
  theme_bw()+
  theme(axis.text.x=element_text(angle=90,hjust=1))


Density neutron plot

Just a quick plot set creating a color scale between 0 and 100, the values below 0 are filtered from the plot, but values above 100 are colored as 100.


df %>% 
  filter(GR>0) %>% 
  ggplot(aes(RHOB,NPHI, color = GR))+
  geom_point( size = 1)+
  facet_wrap(~WELL, ncol = 2)+
  xlim(1,max(df$RHOB))+
  theme_bw()+
  scale_color_gradient(low = "yellow", high = "brown",limits = c(0,100), 
                       oob = scales::squish
                       )